There is not a lot of explanation WHY is this better than doing the opposite: start coding and see how it goes and how this would apply to Codex models.
I do exactly the same, I even developed my own workflows wit Pi agent, which works really well. Here is the reason:
- Claude needs a lot more steering than other models, it's too eager to do stuff and does stupid things and write terrible code without feedback.
- Claude is very good at following the plan, you can even use a much cheaper model if you have a good plan. For example I list every single file which needs edits with a short explanation.
- At the end of the plan, I have a clear picture in my head how the feature will exactly look like and I can be pretty sure the end result will be good enough (given that the model is good at following the plan).
A lot of things don't need planning at all. Simple fixes, refactoring, simple scripts, packaging, etc. Just keep it simple.
You need to be very specific about what to build and how to build it, what tools to use, what architecture it should do, what libraries, frameworks it should include. You need to be a programmer to be able to do this properly and it still takes a lot of practice to get it right.
This is why I am a big fan of self-hosting, owning your data and using your own Agent. pi is a really good example. You can have your own tooling and can switch any SOTA model in a single interface. Very nice!
This is exactly the right instinct. When you own the agent harness, you decide what's visible. I've been building my own tooling on top of Playwright for similar reasons — the feedback loop between 'what did the agent just do' and 'should I let it continue' is the core UX of any agent, not a detail to be abstracted away. Hiding it breaks the only trust mechanism the user has.
My non-technical friend, never learned coding, doesn't know Linux, zero sysadmi experience does this and he can do anything and doesn't even know what Clause is doing. He learned some concepts recently like Docker, SSH, but that's basically it.
I find templates atrocious to use for component fragments like this, that's why I wrote a Python component library when I started using Django with HTMX. Order of magnitude more pleasant to use, works with _every_ Python web framework not just Django: https://compone.kissgyorgy.me/
Imagine if Cargo was not first-party, but a third-party tool belonging to a vc startup with zero revenue.
Then that startup makes rustup, rustfmt and rust-analyzer.
Great, but I would be more comfortable with the ecosystem if at least the rust-analyzer and rustfmt parts had competitive alternatives.
The biggest difference is state, but that's also kind of easy from CLI, the tool just have to store it on disk, not in process memory.
reply