Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

Since I couldn't find it in your list, I'd like to plug my own macOS (and iOS) app: Private LLM. Unlike almost every other app in the space, it isn't based on llama.cpp (we use mlc-llm) or naive RTN quantized models (we use OmniQuant). Also, the app has deep integrations with macOS and iOS (Shortcuts, Siri, macOS Services, etc).

Incidentally, it currently runs Mixtral 8x7B Instruct[2] and Mistral[3] models faster than any other macOS app. The comparison videos are with Ollama, but it generalizes well to almost every other macOS app that I've seen uses llama.cpp for inference. :)

nb: Mixtral 8x7B Instruct requires an Apple Silicon Mac with at least 32GB of RAM.

[1]: https://privatellm.app/

[2]: https://www.youtube.com/watch?v=CdbxM3rkxtc

[3]: https://www.youtube.com/watch?v=UIKOjE9NJU4



What's the performance like in tokens/s?


You can see ms/token in a tiny font on the top of the screen, once the text generation completes in both the videos I'd linked to. Performance will vary by machine. On my 64GB M2 Mac Studio Max, I get ~47 tokens/s (21.06ms/token) with Mistral Instruct v0.2 and ~33 tokens/s (30.14ms/token) with Mixtral Instruct v0.1.


Interesting! What's the prompt eval processing speed like compared to llama.cpp and kin?


I haven't run any specific low level benchmarks, lately. But chunked prefilling and tvm auto-tuned Metal kernels from mlc-llm seemed to make a big differenced, the last time I checked. Also, compared to stock mlc-llm, I use a newer version of metal (3.0) and have a few modifications to make models have a slightly smaller memory and disk footprint, also slightly faster execution. Because unlike the mlc-llm folks, I only care about compatibility with Apple platforms. They support so much more than that in their upstream project.


thanks, I'll give it a crack


MacGPT is way handy because of a global keyboard shortcut which opens a spotlight-like prompt. I would love to have a local equivalent




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: