Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

Also, HT is not such a great performance win - on a few different 4-core/8-thread machines, I had access to, loading all 8 threads to "100% CPU" (whatever that means) usually only delivers 20-30% faster computation than with HT off (4-core/4-thread) - which is inline with your 30% number.

And that's an improvement - some 15 years ago, with similar computational loads, most of my tests ran 10-20% faster with the HT off (using 2 core / 2 threads) than with HT on (using 2 core / 4 threads) - there just wasn't enough cache to support those many threads.



A 20-30% increase is a BIG increase for a hardware feature, though. The cost of hyperthreading in transistors mostly amounts to the larger total register set. The whole point is the rest of the decode/dispatch/execute/retire pipeline is all shared.


How is 20%-30% not a great performance win? If I tell you today there's this One Simple Trick that you can do on your computer to instantly gain access to 20%-30% more performance, would you do it in a heartbeat?


What do you think is a good performance improvement then?


(and to the two other responses)

If your workload is already well parallelized, then, yes 20% is quite significant. However, working to parallelize properly over 8 rather than 4 has its own costs.

The thing that bothers me most is that 800% CPU and 500% CPU on this processor are roughly equivalent at 5x100%CPU, it makes everything very hard to reason about when planning capacity.


I think you’re misunderstanding what HT is. It’s not true parallelism, it’s just hiding latency by providing some extra superscalar parallelism. You can’t expect it to give you actual linear improvements in performance because it’s just an illusion.


I understand that very well. But non of the standard tools that manage CPU understand that, and most people don't either.

If I had a nickel for every time I had to explain why "You are at 50% CPU now, but you can't actually run twice as many processes on this machine and get the same runtime", I'd be able to buy a large frapuccino or two at starbucks.

Perhaps I'm uninformed though - is there a tool like htop, which would give me an idea of how close am I to maxing out a CPU?


No there isn’t. But if you understand it I don’t get why you think 20% isn’t a good performance boost, especially considering the rate of return for power and area in silicon.


Because many people believe it is a 100% improvement, plan/budget accordingly, and then look for help.

As far as silicon/power it is nice, but IIRC (I am not involved in purchasing anymore) it used to cost over 50% in USD for those 20% in performance when you non-HT parts were common.


What a strange way to measure the benefits of a performance optimization: "how people will perceive it and then ask me for help".


You ignored the price issue, which was measurable and real, but also:

It (used to be) my job. Does "because people fall for deceptive marketing, waste money, and then waste my time trying to salvage their reputation" sound better?


> loading all 8 threads to "100% CPU" (whatever that means)

What application?


Lots of numerical computations and simulations.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: