That may be so, although a VM on AWS is measurably slower than the same program running on my Xeon from 2013 or so.
But, more generally, the problem is that tech stacks are an awful amalgamation of uncountable pieces which, taken by themselves, "don't add much overhead". But when you add them all together, you end up with a terribly laggy affair; see the other commenter's description of a web page with 30 dishes taking forever to load.
"But when you add them all together, you end up with a terribly laggy affair"
I think this is the key.
People fall in love with simple systems which solves a problem. Being totally in honey moon they want to use those systems for additional use cases. So they build stuff around it, atop of it, abstract the original system to allow growth for more complexity.
Then the system becomes too complex, so people might come up with a new idea. So they create a new simple system just to fix a part of the whole stack and allow migration from the old one. The party starts all over again.
But it's like hard drive fragmentation - at some point there are so many layers that it's basically impossible to recombine layers again. Everybody fears complexity already built.
My experience has been that HTML and making changes to HTML has huge amounts of overhead all by itself.
I don't dispute that very bad javascript can cause problems, but I don't think it's the virtualization layers or the specific language that are responsible for more than a sliver of that in the vast majority of use cases.
And the pile of build and distribution tools shouldn't hurt the user at all.
When half the layers add up to 10% of the problem, and the other half of the layers add up to 90% of the problem, I don't blame layers in general. If you remove the parts that legitimately are just bad by themselves, you solve most of the problems.
If you have a dozen layers and they each add 5% slowdown, okay that makes a CPU half as fast. That amount of slowdown is nothing for UI responsiveness. A modern core clocked at 300MHz would blow the pants off the 600MHz core that's responding instantly in the video, and then it would clock 10x higher when you turn off the artificial limiter. Those slight slowdowns of layered abstractions are not the real problem.
(Edit: And note that's not for a dozen layers total but for a dozen additional layers on top of what the NT code had.)
Unfortunately, at this point some of the slow layers are in hardware, or immediately adjacent to it. For example, AFAIR[0], the time between your keyboard registering a press, and the corresponding event reaching the application, is already counted in milliseconds and can become perceptible. Even assuming the app processes it instantly, that's just half of the I/O loop. The other half, that is changing something on display, involves digging through GUI abstraction layers, compositor, possibly waiting on GPU a bit, and then... these days, displays themselves tend to introduce single-digit millisecond lags due to the time it takes to flip a pixel, and buffering added to mask it.
These are things we're unlikely to get back (and by themselves already make typical PC stacks unable to deliver smooth hand-writing experience - Microsoft Research had a good demo some time ago, showing that you need to get to single-digit milliseconds on the round-trip between touch event and display update, for it to feel like manipulating a physical thing vs. having it attached to your finger on a rubber band). Win2K on a hardware from that era is going to remain snappier than modern computers for that reason alone. But that only underscores the need to make the userspace software part leaner.
--
[0] - Source: I'll need to find that blog that's regularly on HN, whose author did measurements on this.
At this point there are so many layers that it would be hard to figure out the common problems without doing some serious work profiling a whole bunch of applications
But, more generally, the problem is that tech stacks are an awful amalgamation of uncountable pieces which, taken by themselves, "don't add much overhead". But when you add them all together, you end up with a terribly laggy affair; see the other commenter's description of a web page with 30 dishes taking forever to load.