~3.5x more expensive to run my benchmarks[0]. [0]: https://aibenchy.com/compare/...

iosjunkie · 2026-05-08T16:29:59 1778257799

Sure, but it did better on the test, which matches OpenAI's claim. More bang for more buck.

Interestingly, using your tests as a comparison, 5.5 low beats 5.4 medium at a 82% of the cost.[0]

[0]: https://aibenchy.com/compare/openai-gpt-5-4-medium/openai-gp...

XCSme · 2026-05-13T13:18:40 1778678320

5.5 is definitely smarter indeed.

And yes, 5.5 low seems to be as good as 5.4 medium, but a lot cheaper. In my experience that also holds up in real-life n8n agentic usage, so using 5.5 low is the best way to go atm.

polski-g · 2026-05-13T13:00:20 1778677220

Interesting test results. GLM-5 is ranked better than GLM-5.1

XCSme · 2026-05-13T13:19:24 1778678364

I think it's because the tests are more for general intelligence, but 5.1 is likely aimed more towards agentic coding and tool usage, so it lost a bit of intelligence in exchange for consistency and more focus coding use-case.