Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

~3.5x more expensive to run my benchmarks[0].

[0]: https://aibenchy.com/compare/openai-gpt-5-4-medium/openai-gp...



Sure, but it did better on the test, which matches OpenAI's claim. More bang for more buck.

Interestingly, using your tests as a comparison, 5.5 low beats 5.4 medium at a 82% of the cost.[0]

[0]: https://aibenchy.com/compare/openai-gpt-5-4-medium/openai-gp...


5.5 is definitely smarter indeed.

And yes, 5.5 low seems to be as good as 5.4 medium, but a lot cheaper. In my experience that also holds up in real-life n8n agentic usage, so using 5.5 low is the best way to go atm.


Interesting test results. GLM-5 is ranked better than GLM-5.1


I think it's because the tests are more for general intelligence, but 5.1 is likely aimed more towards agentic coding and tool usage, so it lost a bit of intelligence in exchange for consistency and more focus coding use-case.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: