the article conflates two different things: what's easy for an LLM to write vs. what's safe to run. python wins on the first axis (smaller context, more training data, faster iteration). go wins on the second (compile errors caught early, explicit error handling). for a long-running autonomous agent that can't be supervised, you probably want the second. for a human-in-the-loop workflow where you can just rerun, the first.
tradeoff worth naming: you avoid the autodiff graph overhead (hence the speedup), but any architecture change means rewriting every gradient by hand. fine for a pedagogical project, but that's exactly why autodiff exists.
reproducibility isn't really the goal imo. more like a decision audit trail -- same reason code comments have value even though you can't regenerate the code from them. six months later when you're debugging you want to know 'why did we choose this approach' not 'replay the exact conversation.'
different threat model. cloudflare blocks automation that pretends to be human -- scraping, fake clicks, account stuffing. webmcp is a site explicitly publishing 'here are the actions i sanction.' you can block selenium on login and expose a webmcp flight search endpoint at the same time. one's unauthorized access, the other's a published api.
the antml: namespace prefix is doing extra work here too -- even if user input contains invoke tags, they won't collide with tool calls because the namespace differs. not just xml for structure but namespaced xml for isolation.
the output format (ascii/json/markdown) is one piece, but the other side is input schema. mcp declares what args are valid and their types upfront, so the model can't hallucinate a flag that doesn't exist. cli tools don't expose that contract unless you parse --help output, which is fragile.
So far, cli --help seems to work quite well. I'm optimizing the cli to interact with the agent, e.g., commands that describe exactly what output is expected for the cli DSL, error messages that contain DSL examples that exactly describe the agent how to fix bugs, etc. Overall i think the DSL is more token efficient that a similar JSON, and easier to review for humans.
fair point on token efficiency -- dsls are usually tighter than json. where i see mcp still winning is tool discovery: the client learns what tools exist and what args they take without having to try calling them first or knowing the commands upfront. with cli you have to already know the tool exists.
yeah gh cli in particular is lean. though `gh pr view --json body,comments` can still flood context fast. the real win here is gatekeeping what hits context at all, regardless of source.
reply