Going from tokens to bytes explodes the model size. I can’t find the reference a...

		layer8 on Nov 15, 2024 \| parent \| context \| favorite \| on: Something weird is happening with LLMs and chess Going from tokens to bytes explodes the model size. I can’t find the reference at the moment, but reducing the average token size induces a corresponding quadratic increase in the width (size of each layer) of the model. This doesn’t just affect inference speed, but also training speed.