4.5/4.6 were roughly the same in our testing. Opus 4.7 is smarter, but it's diff...

__s · 2026-05-28T19:06:06 1779995166

"personality issues" I was able to tell that Opus 4.7 would take instructions more literally, which I appreciated once I calibrated my phrasing to be more precise (often asking to investigate issues, pre-4.7 it'd start making code changes instead of just giving write up). But I can see contexts where handling vague prompts would've just been worse

swingboy · 2026-05-29T01:16:51 1780017411

Looking forward to the results. Thanks for your work.

gertlabs · 2026-05-29T03:58:50 1780027130

Appreciate that! Results are live: https://gertlabs.com/rankings

Opus 4.8 is the first tangible improvement since Opus 4.5. And it doesn't seem to have the personality problems of the last release -- I've been enjoying using it.

swingboy · 2026-05-29T11:07:55 1780052875

Nice! Looks like it’s topping the two coding ones. I noticed it is absent from the Social Intelligence board though?

gertlabs · 2026-05-29T14:28:37 1780064917

That'll populate over the next couple weeks -- those are the live games on the spectate tab which take a while to generate statistically worthwhile data. I'm curious how it does. From using it all day, I can say Opus 4.8 is my new favorite model, hands down.