- 8K length means the size of input/output is 8K tokens.
- 1.5T tokens mean the training set has 1.5T tokens.
A: What's a parameter?
Q: More parameters your model has, more complex relationship it can represent. For example let's say you have a function f(x). This is a 2-parameter model:
f(x) = ax + b
This is a 4 parameter model:
f(x) = ax^3 + bx^2 + cx + d
As you can see as the number of parameters grows, the function is able to represent more complex relationship between f(x) and x.
A: What's a token?
Token is a way to encode text, like ASCII or Unicode. Unlike Unicode, tokenizor usually favors common combinations of alphabets. For example, "the" is a single token for GPT-3 tokenizor, but "eht" is two tokens (e and ht).
* Note that the number of parameters is more like an "upper limit" of the model's capabilities. If your a, b, c, d are just random shit, it's still a 4-parameter model, but it's still useless. The whole concept of "training" is just "finding the best parameters".
- 7B means 7 billions parameters.
- 8K length means the size of input/output is 8K tokens.
- 1.5T tokens mean the training set has 1.5T tokens.
A: What's a parameter?
Q: More parameters your model has, more complex relationship it can represent. For example let's say you have a function f(x). This is a 2-parameter model:
f(x) = ax + b
This is a 4 parameter model:
f(x) = ax^3 + bx^2 + cx + d
As you can see as the number of parameters grows, the function is able to represent more complex relationship between f(x) and x.
A: What's a token?
Token is a way to encode text, like ASCII or Unicode. Unlike Unicode, tokenizor usually favors common combinations of alphabets. For example, "the" is a single token for GPT-3 tokenizor, but "eht" is two tokens (e and ht).
* Note that the number of parameters is more like an "upper limit" of the model's capabilities. If your a, b, c, d are just random shit, it's still a 4-parameter model, but it's still useless. The whole concept of "training" is just "finding the best parameters".