No, we haven't, for any reasonable definition of L.

wavemode · 2026-05-20T22:24:49 1779315889

OpenAI themselves must not have a "reasonable definition of L", then. Their own papers and press releases refer to GPT-2 (from 2019) as a "large language model".

https://openai.com/index/better-language-models/

danielmarkbruce · 2026-05-20T22:51:28 1779317488

Yes, and 1.5 billion parameters meets no reasonable current definition of large. It would be considered a tiny language model. OpenAI themselves refer to their small/fast models as small models all over their documentation.

wavemode · 2026-05-21T02:00:10 1779328810

The term doesn't change its meaning because something new comes along.

The point of the term "large" is to highlight the massive parameter count (compared to traditional statistical models, where having 1.5 billion parameters was basically unheard of). It leads to the "double decent" phenomenon that allows them to generalize in ways traditional statistical models can't.

The idea that the "large" descriptor was just a subjective exclamation, like "oh wow this model is pretty large ain't it", is revisionism.

danielmarkbruce · 2026-05-21T03:25:53 1779333953

yes, it does. That's why OpenAI refers to it's small models as small. They are just so different. The capabilities have changed dramatically. The use cases are wildly different. The architectures are quite different. Even the core idea of attention is different. Training them is materially different. Serving them is materially different. A 1.5 bill parameter model from 2019 is so different from today's LLMs that they really don't have much in common. What we have now is quite similar to what we had a couple years ago though.

bbor · 2026-05-21T14:14:47 1779372887

  The term doesn't change its meaning because something new comes along.

...you're gonna flip when you hear about how language works :)

Yizahi · 2026-05-20T23:13:43 1779318823

Sure we do, since Fei-Fei Li and team created that annotated dataset, which allowed to train first LLMs. So LLMs are here for more than a decade already.

danielmarkbruce · 2026-05-20T23:28:51 1779319731

You are confused by what the L and L mean in LLM, or which data set she created, or both, or in general.

Yizahi · 2026-05-21T08:56:15 1779353775

Or it is you who are confused. And I want to remind you that you can't retcon historical word use.

danielmarkbruce · 2026-05-21T16:55:26 1779382526

Fei Fei was annotating images... the second L in LLM is for "language". The first language models named LLM at the time were trained on language data, with an objective function of predicting the next token. It had nothing to do with the imagenet data. Imagenet data was used in... vision models.

The attention is all you need paper didn't ever use the term LLM or large language model because the phrase didn't exist in industry.

Why comment on a field you know nothing about?