They are working on this. All the major clouds have initiatives to do short term requests/reservations. It’s just not a feature that has ever been of much use pre-GenAI. How often do you need to request 1000 CPU nodes for 48 hours in a single zone?
Secondly, there is a fundamental question of resource sharing here. Even with this project by Evan and AI Grant (the second such cluster created by AI Grant btw), the question will arise — if one team has enough money to provision the entire cluster forever, why not do it? What are the exact parameters of fair use? In networking, we have algorithms around bandwidth sharing (TCP Fairness, etc.) that encode sharing mechanisms but they don’t work for these kinds of chunky workloads either.
But over the next few months, AWS and others are working to release queueing services that let you temporarily provision a chunk of compute, probably with upfront payment, and at a high expense (perhaps above the on demand rate).
Secondly, there is a fundamental question of resource sharing here. Even with this project by Evan and AI Grant (the second such cluster created by AI Grant btw), the question will arise — if one team has enough money to provision the entire cluster forever, why not do it? What are the exact parameters of fair use? In networking, we have algorithms around bandwidth sharing (TCP Fairness, etc.) that encode sharing mechanisms but they don’t work for these kinds of chunky workloads either.
But over the next few months, AWS and others are working to release queueing services that let you temporarily provision a chunk of compute, probably with upfront payment, and at a high expense (perhaps above the on demand rate).