Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

This is a very cool page. I love little simulations like this for building intuition for systems problems.

Practical systems deal with this by not caring strongly about overflow (caches), by running at low enough utilizations that overflow is very unlikely given their item counts (e.g. Dynamo), by using explicit partitioning rather than consistent hashing (e.g. DynamoDB), by being able to take advantage of multi-tenancy to drive up per-physical-node utilization even in the case of low per-logical-node utilization, or by using some additional algorithmic sophistication (e.g. Chen et al https://arxiv.org/pdf/1908.08762).

In practice, this kind of overflow is a big deal for systems that deal with relatively small numbers of large objects, and are not as big a deal for systems that deal with large numbers of small objects. Try out the numbers in the page's "Handy Calculator" to see how that plays out.

It's also worth mentioning that this isn't unique to consistent hashing, but is a problem with random load balancing more generally. "Pick a random server and send traffic to it" is an OK load balancing strategy when requests are small and servers are large, but a terrible one when requests become relatively large or expensive. In the general load balancing/placement problem this is easier than the storage case, because you don't need to find requests again after dispatching them. That makes simple algorithms like best-of-2 and best-of-k applicable.



This is my post from 2018 (I didn't submit it to HN), and it could definitely use a "here's what practical systems do" update! I'll put it on the TODO list...

Your point about systems dealing with a relatively small number of large objects vs. small objects also makes sense: this is essentially the "cost" of an overflow (4kb spills once in a blue moon? Oh well, handle that as a special case. 4TB spills once in a blue moon? The system might crash). This is more obvious, as you also point out, in load balancing.

One aspect I found very counter-intuitive: before this investigation, I would've guessed that having a large number of large bins makes overflow increasingly unlikely. This is only partially true: more bins is obviously good, but larger bins are actually more sensitive to changes in load factor!

Overall, I think you are right that this is not really a concern in modern systems today. Compared to Dynamo, I still think Vimeo's solution (linked at the bottom of the post) is both intuitive and low-complexity. But regardless, more of an interesting mathematical diversion than a practical systems concern these days.


Yeah there's a nice load balancing paper called the power of two choices that shows sending two requests gives surprisingly good results.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: