Hacker Timesnew | past | comments | ask | show | jobs | submitlogin
We Benchmarked Frontier LLMs on Defensive Security. The Results Surprised Us (cotool.ai)
6 points by logancarmody 4 months ago | hide | past | favorite | 2 comments


Interesting. I wonder if Gemini 3 reverses that performance trend or if the agent harness lended itself to OpenAI / Anthropic more than Google.

Would like to see this on more open-source agent harnesses and tools.


agreed - would be interesting




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: