Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

Is this something that will show up in Ollama any time soon to increase context size of local models?


KV quantization has long been available in llama.cpp


Yes but the optimisation described has not right?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: