The point from Yann LeCun I find interesting is the negative space argument wher...

The point from Yann LeCun I find interesting is the negative space argument where as training data / parameters increase, the "best" next tokens represent a smaller and smaller slice of the model. His contention is therefore more hallucinations, more places to get stuck on some less best next tokens, etc and interesting to think about this as the opposite of how scaling laws are typically presented. A lot of smart people stabbing around in the dark right now and only time (and gazillions in GPUs) will tell.