Running local models is good now

Posted by jfb 7 hours ago

Running local models is good now(vickiboykis.com)

751 points | 341 commentspage 4

abalashov 4 hours ago|

And if you want to dial in a setting in between: I've switched to Kimi K2.6 (now K2.7) and DeepSeek through OpenRouter and Reasonix for pretty much everything, with no discernible loss of analytical quality or utility.

However, like many commenters, I don't really believe in vibe-coding, long-horizon agentic one-shot agentic coding, etc. and do not use LLMs for huge generation tasks that involve designing things end-to-end.

I also have an MBP with 128 GB of unified memory and do quite a bit of Qwen3.6-35B-A3B. No, it's not as smart as the aforementioned models, to say nothing of frontier, but many people seem pleasantly shocked by the number of banal tasks that do not require these.

b3ing 4 hours ago||

They are ok for simple stuff, coding is weak, chat is alright, writing is ok. But I had many of them write stories for ideas and they kept using the same names regardless of what the story was about. I can’t complain, it’s free. Can’t wait till they get even better, but for local image generation they are good, slow but just create a bunch in the background while you do other things otherwise it’s like 14.4k modems

huydotnet 5 hours ago||

I love that local LLMs are being discussed more often on HN recently. But for the post, I find it strange that the author claimed they were working with local models from day 1, but wrote a post that still links to Qwen2.5 and Qwen3 in mid June 2026.

0xbadcafebee 2 hours ago||

Local models have been good for a while. But this being the HN echo chamber, people here think that local models can only be used for coding, and are expecting Opus 4.8 on their iPhone. Turns out AI can be used for things other than just coding. Even tiny models (<4B parameters) can do tons of useful things on local devices. Search, index, summarization, troubleshooting, crafting documents/formatting, image analysis, transcription, object identification, robot navigation, text-to-speech, speech-to-text, browser/window control, MCP/tool calls, and much more.

Larger models just do more complex reasoning. But if you want them to be really good, you need a beefy Mac. They have the best combination of memory bandwidth and RAM to allow medium-sized models to run at speed. GPUs have less memory but more bandwidth, and AMD iGPUs have more memory but less bandwidth. The Mac is the best compromise on the market today.

Once you do have a beefy Mac, you want to run a dense model. This gives you the best possible result with the system you have. You can go MoE for faster results, use cutting-edge inference techniques, parameter tweaks, etc. But a basic dense model (at Q6 quant) on a big-ass mac will serve 90% of your coding needs.

wxw 6 hours ago||

> “if we are constrained by performance and price, what architectural tradeoffs do we need to make?” a question that so far has not really been asked in the mad token gold rush.

To be fair, I think the labs are also interested in this (e.g OpenAI parameter golf). But the incentives are tricky. When the subsidies and tokenmaxxing era ends, local models will be essential.

valisvalis 5 hours ago||

There are good use cases for them for sure, the Gemma 4 Good hackathon a while ago showed how local models can solve problems in health and education in areas with low connectivity or small infrastructure.

jszymborski 4 hours ago||

I run local models and they work fine for me, but specifically for use in coding harnesses, I'm having a hard time. Tools tend to end up in the same loop, trying to `ls` the same folder or `grep` the same file, over and over and eating up the whole context. Super hard to get it to do anything but that. Any tips?

aliljet 6 hours ago||

The problem here is always the cost-benefit. For $200/mo, you're receiving subsidized best of breed access. There's no model competing for that price anywhere. If a 27B param model is what you choose, show me your hardware! I would love to be wrong...

rsolva 5 hours ago|

But for how long? The subsidized phase is probably short, and then what? I run Qwen 3.5 27 Dense om my old AMD RX7900XTX at about 45 t/s and barely use my Claude Code subscription anymore.

MrKoby07 3 hours ago||

I think a lot of people just don't have specs like that, making it still painful.

jlengrand 3 hours ago|

Just wanna say it's always fun and nostalgic to see authors pass by here who I was reading back when I started my career. I was reading Vicki's blogs way back, even remember learning some email parsing in python from her over 10 years ago. TY!

More comments...