Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?

Posted by cloudking 11 hours ago

Has anyone here fully swapped Claude/GPT for a local model as their main coding tool, not just for side experiments? If so, please share your setup and performance (e.g tok/s)

568 points | 288 commentspage 5

qu0b 4 hours ago|

I'm using deepseek V4 on two rtx 6000 pros and its working great. Opus is so slow that I get deepseek to do most of the work and Opus is only used to validate and help plan.

ndom91 6 hours ago||

Not 100%, I still fall back to Claude for most day-job stuff. But I've been trying to use Qwen 3.6 and Gemma 4 on my framework desktop mainboard (Strix Halo) as much as possible.

I've been working on an ops style tool for local LLM inference. Proxying, api keys, request logging, model rewriting and much much more.

https://github.com/ndom91/llama-dash

anubhav200 8 hours ago||

Yes, llama.cpp, qwen27b, 35b, claude code. Llama-cpp-manager for managing llama.cpp configs (https://github.com/anubhavgupta/llama-cpp-manager)

agentbc9000 4 hours ago||

Kimi K2.7 is very good - i have been testing it and its very very good, Fable 5 level of goodness.

bentt 3 hours ago|

Say more!

salutonmundo 3 hours ago||

it's called your damn brain.

zaptheimpaler 7 hours ago||

I tried gemma-4-26B-A4B just to see if it could help me read/sort my emails on a relatively under-powered setup (16GB VRAM + 32GB RAM) and it's not going well.. the model burns 24K tokens just on searching for the right tool and then dumps the email contents into context - i tried to get it to use code-mode to save context but the code-mode implementation can't save files so it was useless and im going to try to switch to "ssh-mode" into my devbox container. Still relatively new to this, so I'm probably doing something wrong

anana_ 7 hours ago|

Perhaps try a different model? Just from anecdotal experience, I find that the Gemma models smaller than 31B do not tool call as often as they should.

Some of the benchmarks appear to back this up [0]

Of course, a lot depends how you are using it (inference parameters, harness, prompting, etc.), but the model is quite important too.

[0]: https://artificialanalysis.ai/models/open-source/small?model...

BiraIgnacio 8 hours ago||

I tried for a bit, with llama.cpp + Qwen + Mac Pro but the results were very poor (both quality and speed).

I considered investing in better hardware but doing the math, it is cheaper for me to pay for DeepSeek (yeah, I know not everyone can do that).

NetOpWibby 8 hours ago||

I'm looking forward to having Claude Fable at home. THAT is when I'll THINK about replacing Claude (who knows what their next models will be capable of, Fable was damn good for the three days I had it).

trueno 7 hours ago|

we keep moving the goalposts on when we're gonna be happy with local. first it was sonnet at home as the good enough, then opus, now it's the mysterious leading model that runs on infrastructure we can't feasibly have at home

627467 5 hours ago||

So, everyone has different context, but how free is free running these local models? Like having a power hungry machine always on in the cupboard?

How much does this ware out the hardware?

Also, if privacy is the main reason for running local models, why not use venice.ai and equivalent?

boringg 8 hours ago|

Will the AI labs always make sure there is at least a years worth of differential? I guess the underlying business premise is that each new release has a step function change that prevents this kind of behaviour..

snoman 4 hours ago|

If the government is going to gate access to frontier models from here on out, even if new releases are a step function change… which they’re not… then it may be even more comparable to what’s available with a subscription.

More comments...