Running local models is good now

Posted by jfb 11 hours ago

Running local models is good now(vickiboykis.com)

984 points | 419 commentspage 7

anax32 10 hours ago|

I've just made a milestone on my project, moving away from AWS (budget) to self-hosted and the local models are so much faster than in the past. Beyond LLMs, having embeddings, image, video, audio gen available is crazy.

Running locally is the bar; it's hard to make these things a service which scales.

nikagrawal121 6 hours ago||

I tried for my legal AI application that I'm building and it was able to do majority of the tasks. I used gemma4:26B

prlin 9 hours ago||

If you wanted to do some research or learn about post training and agent harnesses, is that a good option with these local models? What hardware is recommended, or easiest to go with a Mac Studio with 64GB+ RAM?

wrxd 9 hours ago||

I wonder how much local models hallucinate. I am getting almost daily an "Honest answers: I made that up." reply from Claude Opus when I challenge some silly thing it's trying to do.

malkosta 9 hours ago||

The problem with QWEN is that it just can't edit files reliably, I had to hack Pi all over to reduce the pain, but still far from perfect...does Gemma 4 strugle on this?

stared 10 hours ago||

I really recommend Qwen3.6 27B.

Make some tests, and its 8 bit version runs at 30tok/s when using llama.cpp with MTP and run on Macbook Max M5. I have 128 GB, but but 64 GB is well enough. https://github.com/stared/benching-local-llms-on-apple-silic...

When using benchmarks, it gives more-or-less the level of SotA mid-late 2025.

iagooar 9 hours ago||

I run the exact same model, on the exact same hardware - amazing results. Pair it with good search skills (Tavily, Brave, Exa) and you have a near-SOTA model on your desk.

wizzledonker 10 hours ago||

Did you mean 2025?

stared 10 hours ago||

Yes, fixed

bthornbury 8 hours ago||

the qwopus 27b model is good for grunt work style tasks, even across multiple files. Piping a bunch of things through, small factoring changes, stuff that just takes time to type out.

I wouldn't rely on it for large stuff like codex though. I haven't tried out deepseek/kimi, if we could run those locally it would be great.

daniban 9 hours ago||

With Apple silicon and now the RTX Spark there are real discussions whether local AI is the future. The only problem is Western open source models are so far behind. I genuinely feel there's a push to fix this. Gemma is getting more frequent releases and Nvdia is quietly creating very cool small models. I hope both the hardware and models catch up and local really does emerge.

ridruejo 8 hours ago||

Local models are one of the main drivers for our installer / Desktop app for OpenClaw https://holaclaw.ai (disclaimer I am one of the founders). The smaller models are really only suitable for the most basic tasks, but if you have 32gb-64gb you can get real work done (ie complex web workflows) without third party hosted models

ibizaman 10 hours ago|

Tangential but reading on mobile, the font size in the code snippets are all over the place. I actually have the same issue on my blog. Anyone knows why?

More comments...