Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?

Posted by cloudking 6 hours ago

Has anyone here fully swapped Claude/GPT for a local model as their main coding tool, not just for side experiments? If so, please share your setup and performance (e.g tok/s)

324 points | 203 commentspage 3

moezd 2 hours ago|

Not yet. Without pure Apple game or decent GPUs, even with a lot of RAM and threads, all you get is about 30-50 tokens/second, and that's thinking turned off. Without these optimizations your model will have a field day with your MCPs, skills and agent descriptions and you will watch the paint dry before seeing the first output token. Local model serving means you have to fight for every token in your context window, which is quite opposite of what Claude/GPT/Copilot are pushing the industry towards.

ndom91 2 hours ago||

Not 100%, I still fall back to Claude for most day-job stuff. But I've been trying to use Qwen 3.6 and Gemma 4 on my framework desktop mainboard (Strix Halo) as much as possible.

I've been working on an ops style tool for local LLM inference. Proxying, api keys, request logging, model rewriting and much much more.

https://github.com/ndom91/llama-dash

grmnygrmny2 2 hours ago||

Just sharing my $0.02 here - I have ethical objections to using OpenAI or Anthropic products so I was a reluctant adopter of LLMs at all. Local models address most, though not all, my moral objections so I’ve been using them for work and personal projects for about a month.

The hardware I have (32gb Macs and a gaming PC with 10gb 3080) can only get me to Qwen3.6-35B-A3B at various quants but that’s enough (200-400 PP, 20-30 TG).

It’s taken some time to learn how to best utilize it - some things take a bit of babysitting or direction - but it’s quite useful. Not having ever used CC I can’t compare but it’s been a great assistant or pair programmer for everything from embedded C++ to Vue. I wish I could run 27B as there have been moments when this model feels like it just can’t quite figure something out but those moments are quite rare. For a lot of tasks it’s a huge time saver and has proved super capable at digging into and fixing bugs given pretty vague instructions.

I’m using Pi as my harness.

bijowo1676 2 hours ago||

One of the interesting setups I saw is using expensive frontier models to write and update markdown for your app: specs, product requirements, architecture, etc

but then use cheap/local model to implement the specs.

Markdown is more effective at compressing information and fits the context window easier, than hundreds of source code files

but this requires second and third passes, to smooth out the rough edges

has anyone tried that?

acc_297 4 hours ago||

I've been wondering lately if it would help to take a medium sized model and either in cloud or some local setup actually do Reinforcement Learning from Human Feedback (RLHF) on every prompt as a chore - I don't know if trying to manually finetune a model to your use habits would ruin it or help - ideally if you were diligent you could get rid of some of the ticks that make models for the general public difficult to work with e.g. overly sycophantic, overly verbose, annoying tendency to explain via analogies

but perhaps one individuals prompt feedback just isn't going to ever be enough I'm not sure how much you need (I know people working at big companies that have purchased in-house agents fine-tuned on internal documents etc.. and apparently these end up with bizarre behaviours not necessarily more helpful than the standard models)

I'd like to be able to essentially edit every response given by an agent and then finetune on the difference between what it produced and how I edited the text. Personally I would just remove a lot of the adjectives and try to distill the responses to core responses but I worry based on some of the work done by Owain Evans and other alignment researchers that this can sometimes push agents into tricky-to-predict tendancies.

htrp 2 hours ago||

Cursor is doing that (i think with Fireworks as their provider)

https://cursor.com/blog/real-time-rl-for-composer

rolisz 4 hours ago||

I'm interested in trying something similar. I was thinking to do this for my OpenClaw agent.

About Owain Evans work: I think he did SFT. On Twitter someone was saying that RL is not as susceptible to what he showed. I'd like to try that

bravetraveler 2 hours ago||

I'm largely 'all natural', any of my little LLM usage is local. 128G Strix system, a not-super-dense Qwen or Gemma variant will get 50-80 tok/s output. Not subscribing to Claude/GPT/etc even in the unlikely event these are the last local models released; simply not needed.

nfrankel 4 hours ago||

I tried. It works in theory: https://blog.frankel.ch/tokensparsamkeit-coding-assistants/#...

Results depend on the model, of course, and your computer is the limit. Mine wasn't up to the task, unfortunately.

K0balt 4 hours ago||

Pretty good results with qwen 3.6 27b dense. I’d say it’s about equal to (Claude) haiku 4.5 maybe sonnet depending on the task.

kadoban 4 hours ago||

What tool do you use to drive things for you, out of curiosity?

kandros 4 hours ago||

I’d rather ask my butcher than Haiku for coding tasks

papichulo4 3 hours ago||

Agreed on this. Anthropic has now changed the verbiage on the definitions of the models under `/model` to say that Opus is for everyday usage, and Sonnet is for routine tasks.

There's apparently a reason Sonnet and Haiku have been left in previous version #s.

Still encouraging, though, that things are catching up. We can't expect $20k local setups to match $20bn compute clusters.

cheekygeeky 3 hours ago||

Our software dev (smartest guy I ever met) is using OpenCode and Tmux with Open Source models. He says the DeepSeek is his model of choice for coding (he call's it "pretty GOOD". He's running two 3090s on an i9 with 128GB RAM. https://www.msn.com/en-us/news/technology/china-s-open-deeps...

blurbleblurble 4 hours ago|

My experience is that it's not the models themselves that are limiting right now, it's the clunky alternative harnesses with weird missing features making for bad ergonomics around stuff like queue management, interruption, subagents, goals, etc.

coder543 2 hours ago||

I agree completely.

It's also annoying that OpenCode doesn't even try to support local LLMs properly.

Getting OpenCode to work is possible, but extremely manual and clunky to configure. I have written a script to automate converting my llama-server configs into an OpenCode config, and that helps, but it's not ideal.

I have seriously considered writing Yet Another Coding Harness in my free time. I have some ideas for what would make it nice.

horsawlarway 3 hours ago|||

Pi is decent.

I've used the cli agents for claude, cursor, and pi, plus several custom harnesses I've written myself from time to time as experiments (and I guess technically gastown, if we're calling that a harness).

Pi is... just fine.

It does what I need it to, has a decent selection of tooling out of the box, integrates nicely with other tools, and generally gets out of my way enough that I don't think about it much anymore.

If you can run ~30b models at decent speeds, I think most folks would be pleasantly surprised at how capable they are with pi.

Tack on some of the extensions (ex https://pi.dev/packages/pi-mcp-adapter?name=mcp and https://pi.dev/packages/pi-web-access?name=search) and I get web tooling (ex - perplexity search), access to mcps to do things like drive chrome (https://browsermcp.io/) or firefox (https://github.com/mozilla/firefox-devtools-mcp)

It's fine. Is it as good as a subsidized top tier model? Nope. Is it free and still very capable? Yup.

And personally, I've been having a LOT of fun with the pi sdk (https://pi.dev/docs/latest/sdk)

Which is something that all the other providers charge you api access rates for (ex - thousands a month).

Insanity 4 hours ago||

Heard good things about pi.dev but haven’t tried it. It might take care of some of those missing features you mentioned.

bityard 3 hours ago||

pi.dev is more like an agent developer kit. It's basically a substrate upon which you spend hours/days/weeks building your own agents or coding framework. It's pretty much the neovim to claude's vscode.

horsawlarway 3 hours ago||

I mean - the base experience is just fine, with perfectly reasonable built in tools for file access and editing, plus bash.

But yes - it expands a lot if you're willing to play with it.

I'd actually say the vscode comparison is wrong, because vscode is very much "bring your own extension" in the same way that Pi is. While Claude is much more "visual studio" vibes. It's thick, it's opinionated, and it's absolutely not something you can really customize, but it can feel slick for supported workflows.

More comments...