Forgive my ignorance but aren't they already on huggingface? I assumed turboquan... | Hacker News

Hacker Timesnew | past | comments | ask | show | jobs | submit

dmichulke 47 days ago | parent | context | favorite | on: Running local models on an M4 with 24GB memory

Forgive my ignorance but aren't they already on huggingface?

I assumed turboquant optimizations are already everywhere - in llama-cpp, or the quantization machinery of unsloth and the likes.

rapatel0 41 days ago [–]

I forked it to also add rotorquant. This is a specific optimization that uses clifford rotors instead of static compile time random purmutation to store the activations. Reduces space and parameter count for the storage.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact