The anecdote is compelling, but there's an interesting measurement gap. METR ran a randomized controlled trial with experienced open-source developers — they were actually 19% slower with AI assistance, but self-reported being 24% faster. A ~40 point perception gap.
Doesn't mean the tools aren't useful — it means we're probably measuring the wrong thing. "Prompt engineering" was always a dead end that obscured the deeper question: the structure an AI operates within — persistent context, feedback loops, behavioral constraints — matters more than the model or the prompts you feed it. The real intelligence might be in the harness, not the horse.
There's been a huge amount of improvement in coding agent effectiveness since they ran that experiment. In a more recent follow up experiment, METR found 20% speed up from AI assistance and says they believe that is likely an underestimate of the impact. https://metr.org/blog/2026-02-24-uplift-update/
They are working on making a new measurement approach that will be more accurate.
Respectfully, was this comment AI generated? It has all the signs.
And scaffolding does matter a lot, but mostly because the models just got a lot better and the corresponding scaffolding for long running tasks hasn't really caught up yet.
Ha, fair call. I use Claude a lot and it's definitely rubbed off on how I write and even think (which is something to explore in itself sometime). The scaffolding point is from building though, not prompting. Been doing AI-integrated dev for about a year and the gap between "better model" and "actually useful in production" is almost entirely the surrounding architecture. You're right the infrastructure hasn't caught up yet, that's kind of the whole problem right now. Most teams are building fancier autocomplete when the real problems are things like persistent memory and letting learned patterns earn trust over time.
Doesn't mean the tools aren't useful — it means we're probably measuring the wrong thing. "Prompt engineering" was always a dead end that obscured the deeper question: the structure an AI operates within — persistent context, feedback loops, behavioral constraints — matters more than the model or the prompts you feed it. The real intelligence might be in the harness, not the horse.