I find that adversarial multi agent setups eventually fall down because one side or the other always manages to convince the other side to give up given enough time.
I’ve tried all sorts of things to keep Claude from cheating, but the only one that works is to restrict access to the tests files, which obviously isn’t a real solution.
We recently had an “AI week” at work and I spent $1000 in tokens trying out different iterations of this.
My setup always has an adversarial loop, when speccing, when planning, when building. It never finds everything though, even obvious things. So I have copilot check at the end and it finds things. But for whatever reason, maybe magic prompting, I don’t know, but I have a Claude routine that checks all commits of the last 24 hours and it has fresh PRs every morning. Legit ones too.
I’ve tried all sorts of things to keep Claude from cheating, but the only one that works is to restrict access to the tests files, which obviously isn’t a real solution.
We recently had an “AI week” at work and I spent $1000 in tokens trying out different iterations of this.