turn out the schizos were right. most of OpenAI *real* investment money comes from Gulf countries. without that money flow they can't sustain the cash burn anymore.
Faster is not always the best thing. I still remember when vs code changed to ripgrep I had to change my habit using it, before then I can just open vs code to any folder and do something with it, even if the folder contains millions of small text files. It worked fine before, but then rg was picked, and it happily used all of my cpu cores scanning files, made me unable to do anything for awhile.
To be honest I hate all the new rust replacement tools, they introduce new behavior just for the sake of it, it's annoying.
Tensor core performance is inversely proportional to precision across all generations (i.e., reducing precision by a factor of 2 increases OPS by a factor of 2). 8-bit precision will give you the same improvement ratio. A100/H100 didn't support 4-bit if I remember correctly.
So FP4/INT4 will likely improve the same 30% OPS/W. You could get a separate improvement by reducing precision, but going 1-bit for 4x improvement feels unlikely for now.
the problem the price point is increasing sharply every time.
gemini 2 flash lite was $0.3 per 1Mtok output, gemini 2.5 flash lite is $0.4 per 1Mtok output, guess the pricing for gemini 3 flash lite now.
yes you guess it right, it is $1.5 per 1Mtok output. you can easily guest that because google did the same thing before: gemini 2 flash was $0.4, then 2.5 flash it jumps to $2.5.
and that is only the base price, in reality newer models are al thinking models, so it costs even more tokens for the sample task.
at some point it is stopped being viable to use gemini api for anything.
reply