And yes, 5.5 low seems to be as good as 5.4 medium, but a lot cheaper. In my experience that also holds up in real-life n8n agentic usage, so using 5.5 low is the best way to go atm.
I think it's because the tests are more for general intelligence, but 5.1 is likely aimed more towards agentic coding and tool usage, so it lost a bit of intelligence in exchange for consistency and more focus coding use-case.
[0]: https://aibenchy.com/compare/openai-gpt-5-4-medium/openai-gp...