Having slower memory may not actually lead to lower memory bandwidth. The cuda cores can be broken up into compute complexes which larger blocks of memory directly attached to the cores. These could be filled with read operations from the bulk system memory. You can start executing and then page the next batch of data in while compute is working. For LLMs you don't have much random memory access, you can sequence your accesses in blocks.
If these chips become popular I am sure you will see LLM architectures taking advantage of the parallelism.
If these chips become popular I am sure you will see LLM architectures taking advantage of the parallelism.