Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

128GB of unified memory is a dream come true for local LLMs. VRAM has been the ultimate bottleneck for developers.
 help



The competitor for this NVIDIA CPU will not be the now old AMD Strix Halo, but its successor (launched recently), which supports up to 192 GB of unified memory. Thus 128 GB is no longer SOTA.

While this NVIDIA system is inferior from the point of view of the memory capacity, its main advantage is that the top models will have a bigger GPU, i.e. with 6144 or 5120 FP32 execution units, compared to 2560 for the AMD GPU (compared to the NVIDIA CPU, the AMD CPU has a better multi-threaded performance for legacy programs, and a much better multi-threaded performance for the applications that use AVX-512).

However, these top models with big GPUs will also be much more expensive than the competing AMD system, while also being much more expensive than a laptop or mini-PC with an equivalent discrete NVIDIA GPU (which has the disadvantage of having direct access only to a much smaller, even if faster, memory).


I don’t think there is much improvement in compute for the new strix halo revision. The next one supposedly adds rdna4 cores or similar and more memory channels

There is no improvement in the CPU or GPU, except for minor increases in the clock frequency.

The memory interface is a little faster, but the greatest improvement is +50% in the memory capacity, both over the old Strix Halo and over NVIDIA Spark.

However, even the Strix Halo CPU was better than the NVIDIA/Mediatek CPU.

NVIDIA has only the advantage (in its more expensive variants) of a GPU equivalent with RTX 5070.

It remains to see which will be the prices of the NVIDIA Spark models with big GPUs, but the rumors are that they grow from around $3000 upwards, with the upper limit for 128 GB DRAM and uncut GPU being unknown yet.

It also remains to be seen whether the variants with the biggest GPU can use it effectively when having a rather low memory bandwidth for such a big GPU.


I have a 128 GB LPDDR5X machine. It's a great workstation laptop (which is why I got it) but the memory bandwidth is just awful if you're wanting to use it for AI. An old Epyc CPU will fair better both in terms of being able to run full sized larger models as well as having higher memory bandwidth, and that's not a recommendation to go that route either as it's still not worth it.

It could help with exploding external LLM costs. Interesting to see how the adaption will be, which will mainly depend on the price.

This is what makes it interesting to me as well



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: