Having slower memory may not actually lead to lower memory bandwidth. The cuda c... | Hacker News

Hacker Timesnew | past | comments | ask | show | jobs | submit

		gravypod 7 days ago \| parent \| context \| favorite \| on: Nvidia is proposing a beast of a CPU system for Wi... Having slower memory may not actually lead to lower memory bandwidth. The cuda cores can be broken up into compute complexes which larger blocks of memory directly attached to the cores. These could be filled with read operations from the bulk system memory. You can start executing and then page the next batch of data in while compute is working. For LLMs you don't have much random memory access, you can sequence your accesses in blocks. If these chips become popular I am sure you will see LLM architectures taking advantage of the parallelism.
		help

cthalupa 7 days ago [–]

> The cuda cores can be broken up into compute complexes which larger blocks of memory directly attached to the cores.

Perhaps in theory, but for the gb10 stuff the memory is all on the CPU die and connected to the GPU die via nvlink-c2c

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact