| | MiniMax M2.5 is beating Claude Opus 4.6 and MiniMax is 17x-20x cheaper (swebench.com) |
| 6 points by thelinuxkid 32 days ago | past | 9 comments |
|
| | Show HN: Randomly switching between LMs at every step boosts SWE-bench score (swebench.com) |
| 5 points by lieret 7 months ago | past | 1 comment |
|
| | Show HN: New SWE-bench leaderboard compares LMs without fancy agent scaffolds (swebench.com) |
| 2 points by lieret 8 months ago | past |
|
| | New leader on swe-bench multimodal (swebench.com) |
| 3 points by katrin777 9 months ago | past |
|
| | SWE-bench just published an updated list of top AI Agents (swebench.com) |
| 4 points by laxyz 9 months ago | past |
|
| | SWE-bench (swebench.com) |
| 1 point by katrin777 10 months ago | past |
|
| | Refact.ai is the new open-source SOTA on SWE-bench Verified and Lite (swebench.com) |
| 3 points by bystrakowa 10 months ago | past |
|
| | New #1 SOTA on Swe-bench is using Claude 3.7 and O1 (swebench.com) |
| 3 points by knes 12 months ago | past |
|
| | Gru.ai Got 35.67% on SWEbench (swebench.com) |
| 2 points by BabelCLoud on Aug 15, 2024 | past |
|
| | Amazon Q Developer Agent is now SOTA on SWE-bench (swebench.com) |
| 4 points by brendanfalk on May 14, 2024 | past |
|
| | SWE-Bench: Can Language Models Resolve Real-World GitHub Issues? (swebench.com) |
| 1 point by goranmoomin on March 13, 2024 | past |
|
| | Can Language Models Resolve Real-World GitHub Issues? (swebench.com) |
| 1 point by throw2321 on Nov 8, 2023 | past |
|
| | SWE-Bench: Can Language Models Resolve Real-World GitHub Issues? (swebench.com) |
| 2 points by cjsaltlake on Oct 13, 2023 | past |
|
| | SWE-Bench Can Language Models Resolve Real-World GitHub Issues? (swebench.com) |
| 3 points by EvgeniyZh on Oct 10, 2023 | past |
|