I wonder how much of this is simply needing to adapt one's workflows to models a...

8note · 2026-04-06T16:28:41 1775492921

I've noticed a strong degradation as its started doing more skill like things and writing more one off python scripts rather than using tools.

the agent has a set of scripts that are well tested, but instead it chooses to write a new bespoke script everytime it needs to do something, and as a result writes both the same bugs over and over again, and also unique new bugs every time as well.

SkyPuncher · 2026-04-06T16:43:01 1775493781

I'm going absolutely insane with this. Nearly all of my "agent engineering" effort is now figuring out how to keep Opus from YOLO'ing is own implementation of everything.

I've lost track of the number of times it's started a task by building it's own tools, I remind it that it has a tool for doing that exact task, then it proceeds to build it's own tools anyways.

This wasn't happening 2 months ago.

giwook · 2026-04-06T18:06:49 1775498809

Can you just tell it not to do that? Maybe you have to remind it every so often once context starts filling up.

SkyPuncher · 2026-04-07T16:30:20 1775579420

It just doesn't listen. Literally a conversation that I just had:

* ME: "Have sonnet background agent do X"

* Opus: "Agent failed, I'll do it myself"

* Me: "No, have a background agent do it"

* Opus: Proceeds to do it in the foreground

* Flips keyboard

This has completely broken my workflows. I'm stuck waiting for Opus to monitor a basic task and destroy my context.

germandiago · 2026-04-07T06:19:55 1775542795

> I wonder how much of this is simply needing to adapt one's workflows to models as they evolve and how much of this is actual degradation of the model,

I also wonder how much people are willing to adapt to non-reliability for the sake of laziness instead of, at some point, do a proper take the lead and solve a problem if you have the knowledge + realiable resoources.

It seems to me, the way you phrase it, that anything a human comes up with when coding must go through an LLM. There are times it helps, there are tasks it performs, but I also found quite often tasks for which if I had done it myself in the first place I would have skipped a lot of confusion, back and forth, time wasting and would have had a better coded, simpler solution.

giwook · 2026-04-07T15:10:31 1775574631

> It seems to me, the way you phrase it, that anything a human comes up with when coding must go through an LLM.

This seems like a creative interpretation. I never said anything of the sort.