| | Codex Daily Benchmarks for Degradation Tracking (Marginlab.ai) (marginlab.ai) |
| 1 point by wendgeabos 67 days ago | past |
|
| | Claude Code daily benchmarks for degradation tracking (marginlab.ai) |
| 760 points by qwesr123 67 days ago | past | 354 comments |
|
| | No one is evaluating AI coding agents in the way they are used (marginlab.ai) |
| 1 point by qwesr123 83 days ago | past |
|
| | Claude Code Daily Degradation Tracker (marginlab.ai) |
| 3 points by qwesr123 87 days ago | past | 3 comments |
|
| | Anatomy of a Coding Agent: A step-by-step illustration (marginlab.ai) |
| 3 points by qwesr123 3 months ago | past |
|
| | How are coding assistants evaluated? SWE-Bench Pro Explorer (marginlab.ai) |
| 2 points by qwesr123 3 months ago | past |
|
| | SWE-Bench: The $500B Benchmark (marginlab.ai) |
| 5 points by qwesr123 3 months ago | past |
|