Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

I'm not convinced it's even a marginally useful metric for measuring LLM performance.


The question i quoted about the humming bird anatomy? That is the point. That is why this evaluation explicitly decides to not go down that route.

Nobody (well, not me anyway) wants to convince you that it is usefull. That is the kind of question the authors of this evaluation looked at, they also felt what you are feeling, and decided to do something which doesn’t require that kind of deep and specialist knowledge. And that is what they describe in the paper’s title as “PhD knowledge not required”.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: