ONNX doesn't support the same level of quantization as GGML. So basically GGML w... | Hacker News

Hacker Timesnew | past | comments | ask | show | jobs | submit

		ianpurton on June 13, 2023 \| parent \| context \| favorite \| on: Llama.cpp: Full CUDA GPU Acceleration ONNX doesn't support the same level of quantization as GGML. So basically GGML will run on hardware with less memory.

regularfry on June 13, 2023 [–]

Or alternatively, bigger models with the same memory (just quantised harder).

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact