Submissions from swebench.com

		MiniMax M2.5 is beating Claude Opus 4.6 and MiniMax is 17x-20x cheaper (swebench.com)
		6 points by thelinuxkid 32 days ago \| past \| 9 comments
		Show HN: Randomly switching between LMs at every step boosts SWE-bench score (swebench.com)
		5 points by lieret 7 months ago \| past \| 1 comment
		Show HN: New SWE-bench leaderboard compares LMs without fancy agent scaffolds (swebench.com)
		2 points by lieret 8 months ago \| past
		New leader on swe-bench multimodal (swebench.com)
		3 points by katrin777 9 months ago \| past
		SWE-bench just published an updated list of top AI Agents (swebench.com)
		4 points by laxyz 9 months ago \| past
		SWE-bench (swebench.com)
		1 point by katrin777 10 months ago \| past
		Refact.ai is the new open-source SOTA on SWE-bench Verified and Lite (swebench.com)
		3 points by bystrakowa 10 months ago \| past
		New #1 SOTA on Swe-bench is using Claude 3.7 and O1 (swebench.com)
		3 points by knes 12 months ago \| past
		Gru.ai Got 35.67% on SWEbench (swebench.com)
		2 points by BabelCLoud on Aug 15, 2024 \| past
		Amazon Q Developer Agent is now SOTA on SWE-bench (swebench.com)
		4 points by brendanfalk on May 14, 2024 \| past
		SWE-Bench: Can Language Models Resolve Real-World GitHub Issues? (swebench.com)
		1 point by goranmoomin on March 13, 2024 \| past
		Can Language Models Resolve Real-World GitHub Issues? (swebench.com)
		1 point by throw2321 on Nov 8, 2023 \| past
		SWE-Bench: Can Language Models Resolve Real-World GitHub Issues? (swebench.com)
		2 points by cjsaltlake on Oct 13, 2023 \| past
		SWE-Bench Can Language Models Resolve Real-World GitHub Issues? (swebench.com)
		3 points by EvgeniyZh on Oct 10, 2023 \| past