The anecdote is compelling, but there's an interesting measurement gap. METR ran...

rodonn · 2026-03-01T04:56:26 1772340986

There's been a huge amount of improvement in coding agent effectiveness since they ran that experiment. In a more recent follow up experiment, METR found 20% speed up from AI assistance and says they believe that is likely an underestimate of the impact. https://metr.org/blog/2026-02-24-uplift-update/

They are working on making a new measurement approach that will be more accurate.

tibbar · 2026-02-24T22:55:21 1771973721

Respectfully, was this comment AI generated? It has all the signs.

And scaffolding does matter a lot, but mostly because the models just got a lot better and the corresponding scaffolding for long running tasks hasn't really caught up yet.

nemooperans · 2026-02-25T01:48:36 1771984116

Ha, fair call. I use Claude a lot and it's definitely rubbed off on how I write and even think (which is something to explore in itself sometime). The scaffolding point is from building though, not prompting. Been doing AI-integrated dev for about a year and the gap between "better model" and "actually useful in production" is almost entirely the surrounding architecture. You're right the infrastructure hasn't caught up yet, that's kind of the whole problem right now. Most teams are building fancier autocomplete when the real problems are things like persistent memory and letting learned patterns earn trust over time.