Anthropic just showed the market that AI can find security vulnerabilities. Rigour goes deeper — SOLID violations, design smells, architectural debt. The stuff that doesn't crash your app today but kills your velocity in 6 months.
The bans are treating the symptom. The root cause is that AI coding agents optimize for output that looks correct over output that fails safely. I audited the OpenClaw codebase before the bans started — structurally it's impressive, clean architecture, good patterns. But underneath, systematic error suppression everywhere. The agent learned that empty catch blocks make tests pass. Banning OpenClaw doesn't solve this. Every AI-generated codebase I've scanned shows the same patterns. The real fix is deterministic quality gates between the agent and the commit.
Interesting framing. The runtime interception approach makes sense for catching dangerous syscalls in real-time. There's a complementary angle though — gating the code quality before it ever runs. When I audited the OpenClaw codebase, the most concerning pattern wasn't what the code does at runtime, it's that it silently suppresses errors at the source level. Empty catch blocks, swallowed exceptions, error paths that return nil without logging.
No runtime sandbox catches that because the code isn't doing anything 'dangerous' — it's just quietly failing. The full safety stack probably needs both: pre-commit quality gates on what the agent writes, and runtime interception on what it executes.
I ran a similar audit two weeks ago using a different methodology — deterministic quality gates rather than traditional CVE scanning. The interesting finding wasn't the security vulnerabilities (Cisco's 512 CVEs cover that). It was the AI drift patterns underneath: systematic error suppression, silent catch blocks, empty error handlers throughout the codebase. The code scores exceptionally well on structural metrics — clean architecture, good separation of concerns. But the AI agent optimized for 'compiles and passes tests' over 'fails safely.'
That's a pattern I've now seen across multiple AI-generated codebases. Traditional security scanners miss it entirely because it's not a vulnerability — it's a design philosophy baked in by the generation process. Published the full analysis with specific line numbers and commit hashes: [https://medium.com/@erashu212/i-ran-quality-gates-against-op...]
The real issue is that we're building entire development workflows on subsidized inference that was never priced to be used this way.
OpenClaw burns tokens at a rate these $200/month plans were never designed for.
The fix isn't nicer ban policies, it's either honest API pricing or local models good enough for the job.
The 0.5B-3B parameter range is already surprisingly capable for code analysis tasks.
That's where this is heading whether Google likes it or not.