Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

Even the benchmarks for maths only checked numerical answers for ground truth, which means the LLM can output a lot of nonsense and guess the correct answer to pass it


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: