My understanding from reading this is that a 3090 GPU is 2x speedup over a decent modern CPU. Is that really the case, or am I reading it wrong? My initial thought was that it would be far higher. Is this typical of inference for these kind of models? If so, why do we need such expensive hardware? Please excuse my lack of knowledge :)
I think it was 2x total speedup vs previous version, which already used gpu for “most” things, so the real speedup is 2/(1-most), which could be a lot.