> You could run it on a cluster of nodes Not sure this is a MBP either.

bigyabai · 2026-04-28T07:00:56 1777359656

Not even a cluster of Mac Pros could run a dense 5T parameter model with RDMA, to my knowledge.

zozbot234 · 2026-04-28T08:17:17 1777364237

SOTA models are reportedly MoE, not dense.

bigyabai · 2026-04-28T17:37:08 1777397828

A 5T MoE model is still bottlenecked by streaming weights from SSD, in addition to compute bottlenecks during prefill and decode.

zozbot234 · 2026-04-28T21:24:00 1777411440

True but a cluster built on pipeline parallelism can naturally stream from multiple SSD's in parallel. That probably makes offload somewhat more effective. And you also have RAM caching available as a natural possibility.

bigyabai · 2026-04-28T21:34:10 1777412050

You won't be RAM caching much of anything with experts that are 220b parameters worth of layers.