Top
Best
New

Posted by frontsideair 9/8/2025

Experimenting with Local LLMs on macOS(blog.6nok.org)
388 points | 262 commentspage 3
KolmogorovComp 9/8/2025|
The really though spot is finding a good model for your use case. I’ve a 16Gb MB and have been paralyzed by the many options. I’ve settle for a quantisied 14B Qwen for now, but no idea if this is a good idea.
frontsideair 9/9/2025|
14B Qwen was a good choice, but it became outdated a bit and seems like the new version of 4B surpassed it in benchmarks somehow.

It's a balancing game, how slow a token generation speed can you tolerate? Would you rather get an answer quick, or wait for a few seconds (or sometimes minutes) for reasoning?

For quick answers, Gemma 3 12B is still good. GPT-OSS 20B is pretty quick when reasoning is set to low, which usually doesn't think longer than one sentence. I haven't gotten much use out of Qwen3 4B Thinking (2507) but at least it's fast while reasoning.

lawxls 9/8/2025||
What is the best local model for cursor style autocomplete/code suggestions? And is there an extension for vs code which can integrate local model for such use?
kergonath 9/8/2025|
I have been playing with the continue.dev extension for vscodium. I got it to work with Ollama and the Mistral models (codestral, devstral and mistral-small). I did not go much further than experimenting yet, but it looks promising, entirely local and mostly open source. And even then, it’s much further than I got with most other tools I tried.
jokoon 9/8/2025||
I am still looking for a local image captioner, any suggestion which are the 3 easiest to use?
DrAwdeOccarim 9/8/2025|
Minstral small 3.2 Q4_K_M and Gemma 3 12b 4 bit are amazing. I run both in LM Studio on a Macbook Pro M3 Pro with 36GB of RAM.
jokoon 9/9/2025||
can I call it from the command line?
DrAwdeOccarim 9/9/2025||
Yes. LM Studio acts like an OAi endpoint when you turn the server on.
OvidStavrica 9/8/2025||
By far, the easiest (open source/Mac) is with Pico AI Server with Witsy for a front end:

https://picogpt.app/

https://apps.apple.com/us/app/pico-ai-server-llm-vlm-mlx/id6...

Witsy:

https://github.com/nbonamy/witsy

...and you really want at least 48G RAM to run >24B models.

cchance 9/9/2025||
#1 thing they need to do is open up ANE for developers to properly access
coldtea 9/9/2025||
>I also use them for brain-dumping. I find it hard to keep a journal, because I find it boring, but when you’re pretending to be writing to someone, it’s easier. If you have friends, that’s much better, but some topics are too personal and a friend may not be available at 4 AM. I mostly ignore its responses, because it’s for me to unload, not to listen to a machine spew slop. I suggest you do the same, because we’re anthropomorphization machines and I’d rather not experience AI psychosis. It’s better if you don’t give it a chance to convince you it’s real. I could use a system prompt so it doesn’t follow up with dumb questions (or “YoU’Re AbSoLuTeLy CoRrEcT”s), but I never bothered as I already don’t read it.

Reads like someone starting to get their daily drinks, already using them for "company" and fun, and saying "I'm not an alcoholic, I can quit anytime".

anArbitraryOne 9/9/2025||
I still don't think MacOS is such a great idea
jus3sixty 9/8/2025||
An awful lot of Monday morning quarterback CEOs are here running their mouths about what Tim Cook should do or what they would do. Chill out with the extremely confident ignorance. Tim Cook brought Apple to a billion dollars in free cash he doesn’t need to ride the hype train.

Also let’s not forget they are first and foremost designers of hardware and the arms race is only getting started.

j45 9/8/2025|
Not sure I can think of anything that is more performant per watt for LLMs than Apple Silicon.
saagarjha 9/9/2025||
A datacenter GPU is going to be an order of magnitude more efficient.
a-dub 9/8/2025||
ollama is another good choice for this purpose. it's essentially a wrapper around llamacpp that adds easy downloading and management of running instances. it's great! also works on linux!
frontsideair 9/9/2025|
Ollama adding a paid cloud version made me postpone this post for a few weeks at least. I don't object them to make money, but it was hard to recommend a tool for local usage and make the first instruction to go to settings and enable airplane mode.

Luckily llama.cpp has come a long way and was at a point that I could easily recommend as the open source option instead.

wer232essf 9/8/2025|
[flagged]
More comments...