Running local models is good now

Posted by jfb 12 hours ago

Running local models is good now(vickiboykis.com)

1025 points | 434 commentspage 8

bthornbury 9 hours ago|

the qwopus 27b model is good for grunt work style tasks, even across multiple files. Piping a bunch of things through, small factoring changes, stuff that just takes time to type out.

I wouldn't rely on it for large stuff like codex though. I haven't tried out deepseek/kimi, if we could run those locally it would be great.

daniban 10 hours ago||

With Apple silicon and now the RTX Spark there are real discussions whether local AI is the future. The only problem is Western open source models are so far behind. I genuinely feel there's a push to fix this. Gemma is getting more frequent releases and Nvdia is quietly creating very cool small models. I hope both the hardware and models catch up and local really does emerge.

ibizaman 11 hours ago||

Tangential but reading on mobile, the font size in the code snippets are all over the place. I actually have the same issue on my blog. Anyone knows why?

ridruejo 9 hours ago||

Local models are one of the main drivers for our installer / Desktop app for OpenClaw https://holaclaw.ai (disclaimer I am one of the founders). The smaller models are really only suitable for the most basic tasks, but if you have 32gb-64gb you can get real work done (ie complex web workflows) without third party hosted models

0xbadcafebee 7 hours ago||

Local models have been good for a while. But this being the HN echo chamber, people here think that local models can only be used for coding, and are expecting Opus 4.8 on their iPhone. Turns out AI can be used for things other than just coding. Even tiny models (<4B parameters) can do tons of useful things on local devices. Search, index, summarization, troubleshooting, crafting documents/formatting, image analysis, transcription, object identification, robot navigation, text-to-speech, speech-to-text, browser/window control, MCP/tool calls, and much more.

Larger models just do more complex reasoning. But if you want them to be really good, you need a beefy Mac. They have the best combination of memory bandwidth and RAM to allow medium-sized models to run at speed. GPUs have less memory but more bandwidth, and AMD iGPUs have more memory but less bandwidth. The Mac is the best compromise on the market today.

Once you do have a beefy Mac, you want to run a dense model. This gives you the best possible result with the system you have. You can go MoE for faster results, use cutting-edge inference techniques, parameter tweaks, etc. But a basic dense model (at Q6 quant) on a big-ass mac will serve 90% of your coding needs.

xienze 11 hours ago||

The big caveat here is that these local models require you to invest some time tweaking your harness, AGENTS.md, and skills in order to get things roughly to the level you'd expect. But something like Qwen3.6-27B with web search capabilities and a good set of skills really is impressive! Especially considering that you can go wild and not worry about token costs.

The other thing that people tend to gloss over is that you really do need to spend some $$$ on decent hardware. Yeah, you CAN run some 4-bit quant with heavily quantized cache on your 16GB card, but it's not going to be a great experience (I think this is where a lot of the "if you think it's gonna be any good, you're going to be disappointed" stuff comes from). Yes it's a lot of $$$ upfront but it's very much unknown when hardware prices are going to come back to reality. There's a lot of hopes and dreams that any minute now an H100 will be worth pennies because "that's how it's always been" w.r.t. computer hardware, but we are living in interesting times. So you can't just make the tired old assumptions that a Claude subscription over three years time will work out to be dramatically less than the value of some card three years from now. We STILL have basically anything with >=24GB VRAM appreciating in value, which is absolutely wild. What I'm saying is, the depreciation curve may very well be a lot less dramatic and fast than it used to be, going forward.

osigurdson 8 hours ago||

Running AI on timesharing mainframes does seem like an odd final state for the world.

fl4regun 10 hours ago||

In my experience, with a system of 32GB RAM and 24GB VRAM, no, they aren't that good.

wasimxyz 11 hours ago||

https://canirun.ai

drchaim 10 hours ago|

really want to try local models, but I don't have the hardware yet. Probably I'm the only one here still using a Mac Mini m1 8gb 2020. :/

tennfown 9 hours ago|

I have some decent specs, but I’m stuck with AMD graphics card which I’ve been told is a non-starter

More comments...