Yeah I'm not sure what the exact context of the statement is. I am absolutely ce...

necovek · 2026-05-21T19:10:59 1779390659

Let's stick to comparing language skills to language skills: at least in my experience with my two kids, they learn word formation patterns before they turn 2 — easy to notice because you see them make mistakes on exceptions.

LLMs needed how much training data to be able to do so?

FWIW, I still see them make up wrong words not following any grammatical pattern, esp in Serbian with less training data.

Serbian is pretty complex though: https://www.languagegrowth.com/en/blog/serbian-grammar-basic... — this made it even more surprising to see the kids pick them up so early when their vocabulary is probably not 2000 words yet.

staticman2 · 2026-05-21T12:31:41 1779366701

Hinton says things like

"...we're optimized for having not many experiences. You only live for about a billion seconds—that's assuming you don't learn anything after you're 30, which is pretty much true. So you live for about a billion seconds and you've got a 100 trillion connections. So [you've] got crazily more parameters than you have experiences. So our brains [are] optimized for making the best use of not very many experiences."

necovek · 2026-05-21T19:06:21 1779390381

A billion seconds is around 34 years, so I'd say we live for two billion seconds.

But that's a good way to look at it: in 2B seconds, how many experiences can we get?

KalMann · 2026-05-21T18:49:18 1779389358

I think this is disingenuous comparison. When we read a book we can estimate the amount of data we're taking in based on the character count (each character being represented by some fixed amount of bits).

What you're suggesting on the other hand is something akin to counting the number of pixels on each page we look at. That's absurd overestimate of the amount of data a person reading is actually taking in.

necovek · 2026-05-21T19:14:44 1779390884

I believe there is a point: we simulataneously ingest words, but also glyph shapes and learn acceptable variations between them (eg. serif vs non-serif, large x-height vs small, curlier or more elegant, playful letters...) — all of these contribute to our multi-faceted learning, but ultimately, we do seem to need less of the data to learn (how long it takes for us to learn to recognize letters vs OCR based on ML).