> A whole enterprise A100 is a compromise position for them; they want an SXM cluster of H100s.
For a lot of use-cases you need at lest two A100s with a very fast interconnect, potentially many more. This isn't even about scaling with requests but about running one single LLM instance.
Sure you will find all of ways how people managed to runt his or that on smaller platforms, problem is that quite often doesn't scale to what is needed in production for a lot of subtle and less subtle reasons.
For a lot of use-cases you need at lest two A100s with a very fast interconnect, potentially many more. This isn't even about scaling with requests but about running one single LLM instance.
Sure you will find all of ways how people managed to runt his or that on smaller platforms, problem is that quite often doesn't scale to what is needed in production for a lot of subtle and less subtle reasons.