AI is not cheap to run no matter where it is running. The price we get charged today for AI is a loss-leader. The actual cost is much higher, so much higher that the average paying user today would balk at what it actually costs to run. These AI companies are trying to get people hooked on their product, to get it integrated into every business and workflow that they can, then start raising prices.
Even if you live somewhere where it does, that is not remotely "almost free", and lots of places the payback period is more in the range of 10-15 years even with subsidies.
The power efficiency alone is a strong enough pressure to use centralized model providers.
My 3090 running 24b or 32b models is fun, but I know I'm paying way more per token in electricity, on top of lower quality tokens.
It's fun to run them locally, but for anything actually useful it's cheaper to just pay API prices currently.