Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

User-level threads do not solve the problem of synchronization and data movement. That is, with a “thread-per-core” model, you eliminate most needs to synchronize between multiple CPU cores, and, therefore, eliminate the overhead of acquiring and releasing a lock and allow the out-of-order CPU cores run at full speed. Furthermore, “thread-per-core” model ensures that memory accesses are always CPU local, which eliminates the need to move data between CPU caches, which eliminates the expensive CPU cache consistency protocol (that makes scaling to multiple cores more difficult).

That said, I am not claiming thread-per-core is _the solution_ either, just saying that if you can partition data at application-level, you can make things run plenty fast. Of course, you’re also exposing yourself to other issues, like “hot shards” where some CPUs get disproportionately more work than others. However, as we scale to more and more cores, it seems inevitable that we must partition our systems at some level.



It's pretty much common knowledge that if you want linear performance scaling on a multiprocessor then each workload has to be effectively independent from the others. Synchronization and communication is expensive and only takes away from your per core performance but it never makes a given core faster. So yes, sharding is a critical part of this "thread per core" architecture.


To be fair, concurrent writes do not scale, but single writers multiple readers work just fine, so you only need to an hard partition of writers and can allow all threads (or the appropriate NUMA subset) to access any data.


binding user level threads to hardware cores without preemption is completely equivalent to the thread-per-core + coroutines, except that user level threads allow deep stacks (with all the benefits and issues that it implies).

In fact a well designed generic executor can be completely oblivious to whether it is scheduling plain closures, async functions or fibers. See for example boost.asio.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: