A (overly) simplified explanation:

- 7B means 7 billions parameters.

- 8K length means the size of input/output is 8K tokens.

- 1.5T tokens mean the training set has 1.5T tokens.

A: What's a parameter?

Q: More parameters your model has, more complex relationship it can represent. For example let's say you have a function f(x). This is a 2-parameter model:

f(x) = ax + b

This is a 4 parameter model:

f(x) = ax^3 + bx^2 + cx + d

As you can see as the number of parameters grows, the function is able to represent more complex relationship between f(x) and x.

A: What's a token?

Token is a way to encode text, like ASCII or Unicode. Unlike Unicode, tokenizor usually favors common combinations of alphabets. For example, "the" is a single token for GPT-3 tokenizor, but "eht" is two tokens (e and ht).

* Note that the number of parameters is more like an "upper limit" of the model's capabilities. If your a, b, c, d are just random shit, it's still a 4-parameter model, but it's still useless. The whole concept of "training" is just "finding the best parameters".