A guide to local coding models

Posted by mpweiher 12/21/2025

A guide to local coding models(www.aiforswes.com)

607 points | 351 commentspage 5

Bukhmanizer 12/22/2025|

Are people really so naive to think that the price/quality of proprietary models is going to stay the same forever? I would guess sometime in the next 2-3 years all of the major AI companies are going to increase the price/enshittify their models to the point where running local models is really going to be worth it.

stuaxo 12/22/2025||

Is the conclusion the same if you have a computer that is just for the LLM, and a separate one that runs your dev tools ?

tempodox 12/22/2025||

> You might need to install Node Package Manager for this.

How anyone in this day and age can still recommend this is beyond me.

KronisLV 12/22/2025||

My experience: even for the run of the mill stuff, local models are often insufficient, and where they would be sufficient, there is a lack of viable software.

For example, simple tasks CAN be handled by Devstral 24B or Qwen3 30B A3B, but often they fail at tool use (especially quantized versions) and you often find yourself wanting something bigger, where the speed falls a bunch. Even something like zAI GLM 4.6 (through Cerebras, as an example of a bigger cloud model) is not good enough for doing certain kinds of refactoring or writing certain kinds of scripts.

So either you use local smaller models that are hit or miss, or you need a LOT of expensive hardware locally, or you just pay for Claude Code, or OpenAI Codex, or Google Gemini, or something like that. Even Cerebras Code that gives me a lot of tokens per day isn't enough for all tasks, so you most likely will need a mix - but running stuff locally can sometimes decrease the costs.

For autocomplete, the one thing where local models would be a nearly perfect fit, there just isn't good software: Continue.dev autocomplete sucks and is buggy (Ollama), there don't seem to be good enough VSC plugins to replace Copilot (e.g. with those smart edits, when you change one thing in a file but have similar changes needed like 10, 25 and 50 lines down) and many aren't even trying - KiloCode had some vendor locked garbage with no Ollama support, Cline and RooCode aren't even trying to support autocomplete.

And not every model out there (like Qwen3) supports FIM properly, so for a bit I had to use Qwen2.5 Coder, meh. Then when you have some plugins coming out, they're all pretty new and you also don't know what supply chain risks you're dealing with. It's the one use case where they could be good, but... they just aren't.

For all of the billions going into AI, someone should have paid a team of devs to create something that is both open (any provider) and doesn't fucking suck. Ollama is cool for the ease of use. Cline/RooCode/KiloCode are cool for chat and agentic development. OpenCode is a bit hit or miss in my experience (copied lines getting pasted individually), but I appreciate the thought. The rest is lacking.

evanreichard 12/23/2025|

Have you tried llama.vscode [0]? I use the vim equivalent, llama.vim [1] with Qwen3 Coder 30B and personally feel that it's better than Copilot. I have hot keys that allow me to quickly switch between the two and find myself always going back to local.

[0] https://github.com/ggml-org/llama.vscode

[1] https://github.com/ggml-org/llama.vim

flowinghorse 12/22/2025||

Local models less than 2b are good enough for code auto completion. Even you don't have 128G memory.

dfischer96 12/22/2025||

Nice guide! I want to point out opencode CLI, which is far superior to Qwen CLI in my opinion.

freeone3000 12/21/2025||

What are you doing with these models that you’re going above free tier on copilot?

satvikpendem 12/21/2025|

Some just like privacy and working without internet, I for example travel regularly on the train and like to have my laptop when there's not always good WiFi.

avhception 12/22/2025||

I tried local models for general-purpose LLM tasks on my Radeon 7800 XT (20GB VRAM), and was disappointed.

But I keep thinking: It should be possible to run some kind of supercharged tab completion on there, no? I'm spending most of my time writing Ansible or in the shell, and I have a feeling that even a small local model should give me vastly more useful completion options...

dackdel 12/22/2025||

no one using exo?

redrove 12/22/2025|

https://github.com/exo-explore/exo

I keep hearing about it but unfortunately I myself only have one mac and nvidia GPUs and those can’t cluster together :/

ikidd 12/22/2025|

So I can't see bothering with this when I pumped 260M tokens through running in Auto mode on a $20/mo Cursor plan. It was my first month of a paid subscription, if that means anything. Maybe someone can explain how this works for them?

Frankly, I don't understand it at all, and I'm waiting for the other shoe to drop.

lelanthran 12/22/2025|

> So I can't see bothering with this when I pumped 260M tokens through running in Auto mode on a $20/mo Cursor plan. It was my first month of a paid subscription, if that means anything. Maybe someone can explain how this works for them?

They're running at a loss and covering up the losses using VC?

> Frankly, I don't understand it at all, and I'm waiting for the other shoe to drop.

I think that the providers are going to wait until there are a significant number of users that simply cannot function in any way without the subscription, and then jack up the prices.

After all, I can all but guarantee that even the senior devs at most places now won't be able to function if every single tool or IDE provided by a corporate (like VSCode) was yanked from them.

Myself, you can scrub my main dev desktop of every corporate offering, and I might not even notice (emacs or neovim, plugins like Slime, Lsp plugins, etc) is what I am using daily, along with programming languages.

More comments...