Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

> You could run it on a cluster of nodes

Not sure this is a MBP either.

 help



Not even a cluster of Mac Pros could run a dense 5T parameter model with RDMA, to my knowledge.

SOTA models are reportedly MoE, not dense.

A 5T MoE model is still bottlenecked by streaming weights from SSD, in addition to compute bottlenecks during prefill and decode.

True but a cluster built on pipeline parallelism can naturally stream from multiple SSD's in parallel. That probably makes offload somewhat more effective. And you also have RAM caching available as a natural possibility.

You won't be RAM caching much of anything with experts that are 220b parameters worth of layers.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: