We have had LLMs for much longer than 3 years.

Nevermark · 2026-05-20T21:48:06 1779313686

I took humans thousands of years, then hundreds of years, to come to terms with very basic concepts about numbers.

Its amazing to me when people talk about recombining things, or following up on things as somehow lesser work.

People can't separate the perspective they were given when they learned the concepts, that those who developed the concepts didn't have because they didn't exist.

Simple things are hard, or everything simple would have been done hundreds of years ago, and that is certainly not the case. Seeing something others have not noticed is very hard, when we don't have the concepts that the "invisible" things right in front of us will teach us.

adi_kurian · 2026-05-20T22:25:40 1779315940

Anyone in the arts is aware that creativity is not the new, it is the repackaging of what already exists into something that is itself new.

RajT88 · 2026-05-20T22:52:51 1779317571

Except for "Being John Malkovich". That movie was way out there on its own.

fragmede · 2026-05-20T23:36:45 1779320205

It's "just" a Man-vs-Self story, of the ~7 story archetypes out there.

RajT88 · 2026-05-21T15:11:43 1779376303

You need to rewatch that movie.

godelski · 2026-05-21T00:08:08 1779322088

It's why the invention of teaching has been so important. Took a long time for humans to develop calculus. A long time to then refine it and make it much more useful. But then in a year or two an average person can learn what took hundreds of years to invent. It's crazy to equate these tasks as being the same. Even incremental innovation is difficult. You have to see something billions of people haven't. But there's also paradigm shifts and well... if you're not considered crazy at first then did you really shift a paradigm?

Nevermark · 2026-05-21T02:31:48 1779330708

And yet it is still taught in less than optimal form, lacking algebraic closure in ways that are completely unnecessary.

It isn't a secret, but the percentage of people who don't know that, plus the percentage of mathematicians who vaguely or more directly know that, but habitually use the broken, more difficult (i.e. less algebraic) notation is ... virtually everyone.

I am not trying to pick on calculus, this is everywhere. Important and useful concepts are right in front of all of us, that we don't see even in the context of what we are relatively fluent with.

Because we learn quickly, where we have (almost always inherited) the right preparatory perspectives (earned over lifetimes by others), we vastly overrate our ability to reason independently.

bananaflag · 2026-05-21T07:59:14 1779350354

What is that algebraic calculus you are hinting at?

godelski · 2026-05-21T18:45:35 1779389135

Were I to guess they're talking about the different derivatives. Here's at least something that might introduce you to some of the shortcuts people take but it's far from complete [0] (you can probably find more if you search things like how physicists use the derivative wrong. (I make this critique as someone with a degree in physics too))

I often say that math is taught through a game of telephone. It's a fanatic example of the problem with "I just care that it works" type of attitudes. The problem is if that's your actual belief then you wouldn't be saying that because you'd need to dig deeper. Caring about it working is exactly the reason people do did deeper and bring up issues. The reason things fall apart less in math is because the language was specifically invented to make miscommunication difficult. That's why it's overly pedantic. That's why we use formal languages rather than natural ones. So we should rephrase "I just care that it works" is that it's actually "I just care that it works for this exact case." It makes it easier to see the problem. If you don't know the subject in more detail then you can't actually know if it breaks in that use case. The broken parts are completely invisible to you! Which undermines your own stated goal.

This goes for a lot more than math. But being a formal language it's just easier to point things out and how people misunderstand. If you're an expert in any field you've probably see this same phenomena in that domain though. People having over confidence and their refusal to get deeper knowledge actually just undermines their whole goal. I'd honestly call this a form of Murray-Gell-man Amnesia

[0] https://m.youtube.com/watch?v=oIhdrMh3UJw

danielmarkbruce · 2026-05-20T21:43:07 1779313387

No, we haven't, for any reasonable definition of L.

wavemode · 2026-05-20T22:24:49 1779315889

OpenAI themselves must not have a "reasonable definition of L", then. Their own papers and press releases refer to GPT-2 (from 2019) as a "large language model".

https://openai.com/index/better-language-models/

danielmarkbruce · 2026-05-20T22:51:28 1779317488

Yes, and 1.5 billion parameters meets no reasonable current definition of large. It would be considered a tiny language model. OpenAI themselves refer to their small/fast models as small models all over their documentation.

wavemode · 2026-05-21T02:00:10 1779328810

The term doesn't change its meaning because something new comes along.

The point of the term "large" is to highlight the massive parameter count (compared to traditional statistical models, where having 1.5 billion parameters was basically unheard of). It leads to the "double decent" phenomenon that allows them to generalize in ways traditional statistical models can't.

The idea that the "large" descriptor was just a subjective exclamation, like "oh wow this model is pretty large ain't it", is revisionism.

danielmarkbruce · 2026-05-21T03:25:53 1779333953

yes, it does. That's why OpenAI refers to it's small models as small. They are just so different. The capabilities have changed dramatically. The use cases are wildly different. The architectures are quite different. Even the core idea of attention is different. Training them is materially different. Serving them is materially different. A 1.5 bill parameter model from 2019 is so different from today's LLMs that they really don't have much in common. What we have now is quite similar to what we had a couple years ago though.

bbor · 2026-05-21T14:14:47 1779372887

  The term doesn't change its meaning because something new comes along.

...you're gonna flip when you hear about how language works :)

Yizahi · 2026-05-20T23:13:43 1779318823

Sure we do, since Fei-Fei Li and team created that annotated dataset, which allowed to train first LLMs. So LLMs are here for more than a decade already.

danielmarkbruce · 2026-05-20T23:28:51 1779319731

You are confused by what the L and L mean in LLM, or which data set she created, or both, or in general.

Yizahi · 2026-05-21T08:56:15 1779353775

Or it is you who are confused. And I want to remind you that you can't retcon historical word use.

danielmarkbruce · 2026-05-21T16:55:26 1779382526

Fei Fei was annotating images... the second L in LLM is for "language". The first language models named LLM at the time were trained on language data, with an objective function of predicting the next token. It had nothing to do with the imagenet data. Imagenet data was used in... vision models.

The attention is all you need paper didn't ever use the term LLM or large language model because the phrase didn't exist in industry.

Why comment on a field you know nothing about?

asdfasgasdgasdg · 2026-05-20T23:50:21 1779321021

When people say this what they mean is that we've had plausibly useful LLMs for around three years, and I would say that is basically true. The stuff before 2023 could barely be classified above the level of an interesting toy.

asdfasgasdgasdg · 2026-05-20T23:49:49 1779320989

When people say this what they mean is that we've had plausibly useful LLMs for around three years, and I would say that is basically true.

nextaccountic · 2026-05-20T22:22:59 1779315779

Fine, 8 years? That's not a long time