Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

Even 4-bit is fine.

To be more precise, it's not that there's no decrease in quality, it's that with the RAM savings you can fit a much better model. E.g. with LLaMA, if you start with 70b and increasingly quantize, you'll still get considerably better performance at 3 bit than LLaMA 33b running at 8bit.



True. The only problem with lower quantization though is that the model fails to understand long prompts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: