Even 4-bit is fine. To be more precise, it's not that there's no decrease in qua... | Hacker News

Hacker Timesnew | past | comments | ask | show | jobs | submit

int_19h on March 25, 2025 | parent | context | favorite | on: Qwen2.5-VL-32B: Smarter and Lighter

Even 4-bit is fine.

To be more precise, it's not that there's no decrease in quality, it's that with the RAM savings you can fit a much better model. E.g. with LLaMA, if you start with 70b and increasingly quantize, you'll still get considerably better performance at 3 bit than LLaMA 33b running at 8bit.

elorant on March 25, 2025 [–]

True. The only problem with lower quantization though is that the model fails to understand long prompts.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact