Experimenting with Local LLMs on macOS

Posted by frontsideair 5 days ago

Experimenting with Local LLMs on macOS(blog.6nok.org)

386 points | 259 commentspage 3

KolmogorovComp 5 days ago|

The really though spot is finding a good model for your use case. I’ve a 16Gb MB and have been paralyzed by the many options. I’ve settle for a quantisied 14B Qwen for now, but no idea if this is a good idea.

frontsideair 4 days ago|

14B Qwen was a good choice, but it became outdated a bit and seems like the new version of 4B surpassed it in benchmarks somehow.

It's a balancing game, how slow a token generation speed can you tolerate? Would you rather get an answer quick, or wait for a few seconds (or sometimes minutes) for reasoning?

For quick answers, Gemma 3 12B is still good. GPT-OSS 20B is pretty quick when reasoning is set to low, which usually doesn't think longer than one sentence. I haven't gotten much use out of Qwen3 4B Thinking (2507) but at least it's fast while reasoning.

lawxls 5 days ago||

What is the best local model for cursor style autocomplete/code suggestions? And is there an extension for vs code which can integrate local model for such use?

kergonath 5 days ago|

I have been playing with the continue.dev extension for vscodium. I got it to work with Ollama and the Mistral models (codestral, devstral and mistral-small). I did not go much further than experimenting yet, but it looks promising, entirely local and mostly open source. And even then, it’s much further than I got with most other tools I tried.

cchance 4 days ago||

#1 thing they need to do is open up ANE for developers to properly access

jokoon 5 days ago||

I am still looking for a local image captioner, any suggestion which are the 3 easiest to use?

DrAwdeOccarim 5 days ago|

Minstral small 3.2 Q4_K_M and Gemma 3 12b 4 bit are amazing. I run both in LM Studio on a Macbook Pro M3 Pro with 36GB of RAM.

jokoon 4 days ago||

can I call it from the command line?

DrAwdeOccarim 4 days ago||

Yes. LM Studio acts like an OAi endpoint when you turn the server on.

OvidStavrica 5 days ago||

By far, the easiest (open source/Mac) is with Pico AI Server with Witsy for a front end:

https://picogpt.app/

https://apps.apple.com/us/app/pico-ai-server-llm-vlm-mlx/id6...

Witsy:

https://github.com/nbonamy/witsy

...and you really want at least 48G RAM to run >24B models.

coldtea 4 days ago||

>I also use them for brain-dumping. I find it hard to keep a journal, because I find it boring, but when you’re pretending to be writing to someone, it’s easier. If you have friends, that’s much better, but some topics are too personal and a friend may not be available at 4 AM. I mostly ignore its responses, because it’s for me to unload, not to listen to a machine spew slop. I suggest you do the same, because we’re anthropomorphization machines and I’d rather not experience AI psychosis. It’s better if you don’t give it a chance to convince you it’s real. I could use a system prompt so it doesn’t follow up with dumb questions (or “YoU’Re AbSoLuTeLy CoRrEcT”s), but I never bothered as I already don’t read it.

Reads like someone starting to get their daily drinks, already using them for "company" and fun, and saying "I'm not an alcoholic, I can quit anytime".

anArbitraryOne 4 days ago||

I still don't think MacOS is such a great idea

a-dub 5 days ago||

ollama is another good choice for this purpose. it's essentially a wrapper around llamacpp that adds easy downloading and management of running instances. it's great! also works on linux!

frontsideair 4 days ago|

Ollama adding a paid cloud version made me postpone this post for a few weeks at least. I don't object them to make money, but it was hard to recommend a tool for local usage and make the first instruction to go to settings and enable airplane mode.

Luckily llama.cpp has come a long way and was at a point that I could easily recommend as the open source option instead.

jus3sixty 5 days ago||

An awful lot of Monday morning quarterback CEOs are here running their mouths about what Tim Cook should do or what they would do. Chill out with the extremely confident ignorance. Tim Cook brought Apple to a billion dollars in free cash he doesn’t need to ride the hype train.

Also let’s not forget they are first and foremost designers of hardware and the arms race is only getting started.

j45 5 days ago|

Not sure I can think of anything that is more performant per watt for LLMs than Apple Silicon.

saagarjha 4 days ago||

A datacenter GPU is going to be an order of magnitude more efficient.

wer232essf 5 days ago|

[flagged]

More comments...