HN2new | past | comments | ask | show | jobs | submit | kissgyorgy's commentslogin

A very good example of this is playwright-cli vs Playwright MCP: https://github.com/microsoft/playwright-cli

The biggest difference is state, but that's also kind of easy from CLI, the tool just have to store it on disk, not in process memory.


There is not a lot of explanation WHY is this better than doing the opposite: start coding and see how it goes and how this would apply to Codex models.

I do exactly the same, I even developed my own workflows wit Pi agent, which works really well. Here is the reason:

- Claude needs a lot more steering than other models, it's too eager to do stuff and does stupid things and write terrible code without feedback.

- Claude is very good at following the plan, you can even use a much cheaper model if you have a good plan. For example I list every single file which needs edits with a short explanation.

- At the end of the plan, I have a clear picture in my head how the feature will exactly look like and I can be pretty sure the end result will be good enough (given that the model is good at following the plan).

A lot of things don't need planning at all. Simple fixes, refactoring, simple scripts, packaging, etc. Just keep it simple.


You need to be very specific about what to build and how to build it, what tools to use, what architecture it should do, what libraries, frameworks it should include. You need to be a programmer to be able to do this properly and it still takes a lot of practice to get it right.


This is why I am a big fan of self-hosting, owning your data and using your own Agent. pi is a really good example. You can have your own tooling and can switch any SOTA model in a single interface. Very nice!

https://lucumr.pocoo.org/2026/1/31/pi/


This is exactly the right instinct. When you own the agent harness, you decide what's visible. I've been building my own tooling on top of Playwright for similar reasons — the feedback loop between 'what did the agent just do' and 'should I let it continue' is the core UX of any agent, not a detail to be abstracted away. Hiding it breaks the only trust mechanism the user has.


My non-technical friend, never learned coding, doesn't know Linux, zero sysadmi experience does this and he can do anything and doesn't even know what Clause is doing. He learned some concepts recently like Docker, SSH, but that's basically it.


I find templates atrocious to use for component fragments like this, that's why I wrote a Python component library when I started using Django with HTMX. Order of magnitude more pleasant to use, works with _every_ Python web framework not just Django: https://compone.kissgyorgy.me/


It's just simple validation with some error logging. Should be done the same way as for humans or any other input which goes into your system.

LLM provides inputs to your system like any human would, so you have to validate it. Something like pydantic or Django forms are good for this.


I agree. Agentic use isn't always necessary. Most of the time it makes more sense to treat LLMs like a dumb, unauthenticated human user.


But hey! At least these four AI components made it in, so the important stuff is okay...


I simply forbid or force Claude Code to ask for permission to run a dangerous command. Here are my command validation rules:

    (
        r"\bbfs.*-exec",
        decision("deny", reason="NEVER run commands with bfs"),
    ),
    (
        r"\bbfs.*-delete",
        decision("deny", reason="NEVER delete files with bfs."),
    ),
    (
        r"\bsudo\b",
        decision("ask"),
    ),
    (
        r"\brm.*--no-preserve-root",
        decision("deny"),
    ),
    (
        r"\brm.*(-[rRf]+|--recursive|--force)",
        decision("ask"),
    ),

find and bfs -exec is forbidden, because when the model notices it can't delete, it works around with very creative solutions :)


This feels a lot like trying to sanitize database inputs instead of using prepared statements.


What's the equivalent of prepared statements when using AI agents?


Don't have the AI run the commands. You read them, consider them, and then run them yourself.


Why is that a good thing?


I don't think that the whole ecosystem should be dominated by a single VC backed startup.

I want my tools to be interchangeable and to play well with other choices.

Having multiple big players helps with that.


Maybe I'm wrong on this, but I rather have 1 tool everyone else is using. Cargo in Rust ecosystem works really well, everyone loves it.


Imagine if Cargo was not first-party, but a third-party tool belonging to a vc startup with zero revenue.

Then that startup makes rustup, rustfmt and rust-analyzer. Great, but I would be more comfortable with the ecosystem if at least the rust-analyzer and rustfmt parts had competitive alternatives.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: