Hacker Times
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
nl
4 days ago
|
parent
|
context
|
favorite
| on:
Microsoft and OpenAI end their exclusive and reven...
> You could run it on a cluster of nodes
Not sure this is a MBP either.
help
bigyabai
4 days ago
[–]
Not even a cluster of Mac Pros could run a dense 5T parameter model with RDMA, to my knowledge.
reply
zozbot234
4 days ago
|
parent
[–]
SOTA models are reportedly MoE, not dense.
reply
bigyabai
4 days ago
|
root
|
parent
[–]
A 5T MoE model is still bottlenecked by streaming weights from SSD, in addition to compute bottlenecks during prefill and decode.
reply
zozbot234
4 days ago
|
root
|
parent
[–]
True but a cluster built on pipeline parallelism can naturally stream from multiple SSD's in parallel. That probably makes offload somewhat more effective. And you also have RAM caching available as a natural possibility.
reply
bigyabai
4 days ago
|
root
|
parent
[–]
You won't be RAM caching much of anything with experts that are 220b parameters worth of layers.
reply
Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
Not sure this is a MBP either.