Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

> With a fine tuned 30B quantisized model you can serve a large portion of requests with around 32GB of RAM. Free users will likely only get these kinds of models.

These are exactly the kinds of models that you can easily run locally by repurposing existing hardware. Depending on how much you're willing to wait for the answer, running local even gives you strictly better outcomes for simple Q&A queries.

(Long-context and agentic use cases are admittedly much harder to fit under that model, since non-AI uses for the high-end hardware you'd realistically need for those are rather more limited, and they're hit by the ongoing hardware shortage.)



For programmers maybe. I do this too. But think about all the regular users out there. Your dad and your mum, maybe even your grandparents. This is a huge marked too and for that we can use these special chips at scale.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: