The difference in the cost of compute between 2026 and 2036 won’t be nearly as large as the difference in the cost of compute between 2016 and 2026. Even at 2016 the slowdown in improvements was noticeable.
We might see a one time bump in inference when we move off GPUs onto more limited and efficient dedicated hardware, but the sustained fast pace of improvements are far behind us.
I'm predicting now that there is a clear use-case for this tech that work will (and has) accelerate specialized hardware, software, models, etc that will run much more efficiently in 10 years. So that the real token costs will be a fraction of what they are now.
You can run models on FPGAs and get massive cost, speed, and throughput gains (like 10x). The reason people don’t do it is because of other improvements (algorithmic) means that nobody really thinks locking into a model makes sense…yet. Would I want to use gpt 4o for anything today at 1/10th the price? That would be $0.40 per input, $1.50 per output. Gemma-4 31b is much more capable and cheaper. So a FPGA version of the model is just not worth it today.
But if progress begins to slow down, then the economics work. Maybe Gemma 4 is a good example. It feels really generally useful. Getting it at 1/10th the cost feels like it could be competitive in 2 years.
The fpga would be for prototyping. The real progress comes from asics ... exactly as we saw with bitcoin mining. This GPU-based approach will eventually give way to bespoke circuits once everyone picks a favorite model.
Yeah I went shopping for a new computer a couple of years ago (to replace a 7 year old computer) and... the specs for what was for sale were the same as what I bought 7 years prior, and the price wasn't much lower.
I would much rather buy a 2026 computer than a 2019 computer. Two generations of Nvidia GPUs, Apple M series chips, the X3D AMD chips, and pcie5 ssds are all major upgrades.
It’s just that the pace of new stuff is slowing down, and many people are operating under the assumption that this wave will ride on forever.