Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

Ironically, scaling limits and evidence that quality vastly outweighs quantity suggests that all that web data is much less useful than buying and scanning a book. Most work with the Common Crawl data, for example, has ended up focusing on filtering out vast amounts of data as being mostly useless for training purposes.

There was a hot minute in 2023 where it looked like we could just data and compute scale to the moon. Shockingly, it turns out there are limits to that approach.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: