Hacker Timesnew | past | comments | ask | show | jobs | submit | saltpath's commentslogin

Read-only by design is a smart constraint for agent tooling — eliminates a whole class of "oops the LLM dropped my table" failure modes. Curious about a couple things: how do you handle schema introspection? Do the tools auto-discover tables/columns or is there a config step? And for the query tools, is there any cost/complexity guardrail (e.g. preventing a full sequential scan on a 500M row table)?

No config step, the tools discover everything from pg_catalog at call time. list_schemas → list_tables → describe_table is the typical agent workflow, and there's a query_guide prompt baked in that suggests that progression.

On query guardrails: every query runs in a readonly transaction and results are capped at 500 rows via a wrapping SELECT * FROM (...) sub LIMIT 500. There's also explain_query which returns the plan without executing, so agents can check before running something expensive. That said, there's no cost-based gate that blocks a bad plan automatically; that's an interesting idea worth exploring.


The parallel execution model makes sense for independent tickets but I'm wondering what happens when agent A is halfway through a PR touching shared/utils.py and agent B gets assigned a ticket that needs the same file. Does the orchestrator do any upfront dependency analysis to detect that, or do you just let them both run and deal with the conflict at merge time?

It's generally not worth it worrying about it too much other than at a very high level vs. letting them fight it out, as long as your test suite is good enough and your orchestrator is even moderately prepared to handle retries.

The token cost difference is the metric nobody's capturing. 5K vs 210K tokens for the same JWT forgery isn't just efficiency — it's the blast surface. A contained agent leaves a narrow call trace. A thrashing one touches five APIs, retries three times, leaks context in every hop. If your proxy logs the full call chain with timestamps and response sizes per hop, that cost delta becomes a measurable risk signal, not just a billing line. The hard part isn't the instrumentation, it's getting teams to route agent traffic through anything they don't own.


The blast radius framing is right but the tooling gap is actually worse than debugging. It's about third-party verifiability. A regulator or auditor can't trust a log produced by the same operator who runs the agent.

Spent the last few months on this specific problem. chain hash per outbound call + external timestamp so anyone can verify independently what the agent called, when, and what it got back. works across providers which matters when you're chaining claude -> mistral -> internal endpoint.

Early days but if useful for the nist response: https://arkforge.tech/trust/v1/proof/prf_20260310_182226_cbc...


The "invisible" goal is harder than it sounds in air-gapped setups. We run AKS for a public sector client — private API server, no public egress, Azure Firewall with explicit allowlists. K8s is the right call, but invisible it is not. Podman for builds works fine until someone adds a base image that isn't mirrored locally. Then you get a silent pull failure at 2am.Most tooling just assumes outbound connectivity. Helm charts, operators,even some CNI plugins phone home somewhere at install. You don't find out until it breaks in prod.Not disagreeing with the direction just that invisible infrastructure means something different when egress is locked down by policy, not convention.


There's a related but distinct problem downstream: once the agent is running in production, verification debt shifts from code to execution. Internal logs of what the agent called and what it received are mutable — if a provider disputes delivery or compliance requires an audit trail, "we have logs" is a weak defense. The deterministic verification (tests, linters, CI) handles the code side. The execution side is a different problem: you need immutable witnesses at call time, before the agent proceeds, not post-hoc reconstructions.


CI pass/fail captures regression, but there's a layer beneath it that benchmarks can't touch: what exactly did the agent submit to each external API, and can you prove it after the fact? In the benchmark context this doesn't matter everything runs locally. In production it does. The agent calls a third-party service at 2am, the service claims it returned an error, your agent retried and billed you twice. Your logs say one thing, their logs say another. The integrity problem isn't just "did the code work" it's "what was the exact request/response pair, timestamped, by whom, provably." CI solves the first. Something else has to solve the second.


The concrete scenario is AI agent execution. Agent calls a third-party API at 3am — you have your logs, they have theirs, and if there's a dispute you're comparing two mutable records. Sealing both the request hash and response hash with an RFC 3161 timestamp before the agent proceeds gives you a neutral third-party witness. Not zero-knowledge — content is visible — but for audit and dispute resolution it covers 90% of real cases and is deployable today without ZKP overhead.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: