> It's the very first LLM that I feel like I can wrangle into solving tedious, but straightforward, problems correctly. It still makes a ton of mistakes and needs to be very rigidly guided, but it does a pretty good job of tracing its own reasoning and correcting itself in a way that the other models do not.
I swear that people have said the same thing with effectively every new model that came out in the last six months.
I think it's because people walk every model up to its limits and become very aware of a task they can't make work. They do a lot of work simplifying and understanding limitations at that boundary. Then an improved model comes out and they immediately toe that barrier and make swift progress. They will also notice that the new model is natively doing tricks they had done manually.
The reality is likely that everyone is hitting similar barriers and the solutions are somewhat generalizable and get added to training new models.
Eventually people will reach the new limits and the cycle repeats.
> I swear that people have said the same thing with effectively every new model
That is definitely true, and at the same time, we can measure progress by who is making that claim. When Timothy Gowers, a Fields Medalist, says that models are now capable of "producing a piece of PhD-level research in an hour or so, with no serious mathematical input from me," we can be pretty confident that we are getting into seriously interesting territory.
I swear that people have said the same thing with effectively every new model that came out in the last six months.