"In my experience, AI still drifts from what I meant it to do on anything bigger...

eithed · 2026-05-29T17:52:15 1780077135

Depends - using Sonnet here and generally it should be as you say: plan would produce the result.

Still Claude will sneak things in - in my recent plan, for example I had defined, per acceptance criteria what colours the statuses should be: green for live, blue for sold, grey for anything else; it changed this to: green for live, orange for in progress, blue for sold, red in demolition, etc. When pressed why did it to this, it was unable to explain why. This is with a plan where AC were explicitly provided from the task in Given/When/Then format and were to be adhered to strictly. I've caught this within planning, but I shouldn't need to be doing this.

Even in standard prompts where I tell it "Change this label from X to Y", it ended reordering the tabs unrelated to ask. Again I was not able for it to explain why - it was so abrupt. And it was in fresh context, without any pollution on what I expect it to do.

I also noticed a different behaviour regarding skill; today and yesterday it would not be following skill guidance at all ie: skill writing skill - I'd have to explicitly tell it to test skills after writing them, when this is a behaviour expected by default. Similarly with other skills - knowing that it should have done something per skill guidelines and it not doing it at all. This is new behaviour that I've not seen a week ago.

jeremyjh · 2026-05-28T22:23:46 1780007026

There are certainly domains where AI is not so effective, but at this point I would agree that at least in terms of web development if you can't get effective results from agents at this point it is a skill issue. That skill can be learned, if you recognize that learning is part of the solution. I do think prior experience in product design, specifications & business analysis as well as engineering leadership are all extremely helpful. Its about putting the agent in a box so small that it really can't screw up; but its also about being able to review design and code rigorously - to see around corners and anticipate possible weaknesses etc. There is really nothing I have to do when working with an agent that I haven't already been doing for decades but it seems to me that a lot of developers have never found a single bug while reviewing someone else's code.