Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

No, we haven't, for any reasonable definition of L.
 help



OpenAI themselves must not have a "reasonable definition of L", then. Their own papers and press releases refer to GPT-2 (from 2019) as a "large language model".

https://openai.com/index/better-language-models/


Yes, and 1.5 billion parameters meets no reasonable current definition of large. It would be considered a tiny language model. OpenAI themselves refer to their small/fast models as small models all over their documentation.

The term doesn't change its meaning because something new comes along.

The point of the term "large" is to highlight the massive parameter count (compared to traditional statistical models, where having 1.5 billion parameters was basically unheard of). It leads to the "double decent" phenomenon that allows them to generalize in ways traditional statistical models can't.

The idea that the "large" descriptor was just a subjective exclamation, like "oh wow this model is pretty large ain't it", is revisionism.


yes, it does. That's why OpenAI refers to it's small models as small. They are just so different. The capabilities have changed dramatically. The use cases are wildly different. The architectures are quite different. Even the core idea of attention is different. Training them is materially different. Serving them is materially different. A 1.5 bill parameter model from 2019 is so different from today's LLMs that they really don't have much in common. What we have now is quite similar to what we had a couple years ago though.

  The term doesn't change its meaning because something new comes along.
...you're gonna flip when you hear about how language works :)

Sure we do, since Fei-Fei Li and team created that annotated dataset, which allowed to train first LLMs. So LLMs are here for more than a decade already.

You are confused by what the L and L mean in LLM, or which data set she created, or both, or in general.

Or it is you who are confused. And I want to remind you that you can't retcon historical word use.

Fei Fei was annotating images... the second L in LLM is for "language". The first language models named LLM at the time were trained on language data, with an objective function of predicting the next token. It had nothing to do with the imagenet data. Imagenet data was used in... vision models.

The attention is all you need paper didn't ever use the term LLM or large language model because the phrase didn't exist in industry.

Why comment on a field you know nothing about?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: