You can make X lower by reducing the frequency (= having each cycle be longer)
But apart for that, the main reason big chips would clock slower is power, not timing. If you have a lot of transistors all switching on a high voltage so that the frequency is high, you get molten metal and the magic smoke leaves.
Big chips aren't one big stage where light travels from one side to the other. But they are giant weaves of heating elements that can't all run fast all of the time
cache latency is definitely part of what limits core clock. you're not going to have a good time if your L1 latency is, say 10 clocks. not to mention the fact that register files are not much different than SRAM (therefore cache-like).