We Benchmarked Frontier LLMs on Defensive Security. The Results Surprised Us | Hacker News

Hacker Timesnew | past | comments | ask | show | jobs | submit

		We Benchmarked Frontier LLMs on Defensive Security. The Results Surprised Us (cotool.ai)
		6 points by logancarmody 4 months ago \| hide \| past \| favorite \| 2 comments

mmpollard 4 months ago [–]

Interesting. I wonder if Gemini 3 reverses that performance trend or if the agent harness lended itself to OpenAI / Anthropic more than Google.

Would like to see this on more open-source agent harnesses and tools.

hunterwalk 4 months ago | [–]

agreed - would be interesting

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact