Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?

Posted by cloudking 20 hours ago

Has anyone here fully swapped Claude/GPT for a local model as their main coding tool, not just for side experiments? If so, please share your setup and performance (e.g tok/s)

1054 points | 457 commentspage 13

hacker_homie 12 hours ago|

I do qwen3.6 on an amd ai max laptop getting about 6-10tok/s it’s slow enough that I can follow along. It has issues with design and large piles of code. Otherwise it’s a good programming buddy.

major505 16 hours ago||

Yes. I use Owen on my MacBook m1 (16gb) daily, running inside Ollama. Works well. Is not particularly fast, and I need to create a custom imagem that sets the temperature of the model to zero starting, so I don't get over creative with its bullshit, but it works reasonable week.

Der_Einzige 15 hours ago|

Secretly the problems many people have with agentic coding are related to poor choice of sampling settings, but the world will wait several more years before this is understood well. top_p and top_k are garbage but they are intentionally kept on purpose because subsequent methods enable coherent high temperature sampling, which is an absolute no go for alignment/safety reasons.

The secret to actually good agentic outputs even with small models? Llamacpp has support for this little known sampler called "top-n sigma". You should use that, set it to 1 and set temperature to literally whatever you want (it could be infinity) and your model will just magically work to your maximum context window. That's because long context generation is a sampling problem.

platevoltage 11 hours ago||

I run very small models locally for code completion and writing boiler plate. I still use Claude in a web browser on occasion since it's free, but the second that goes away, I'll be done with it. They get none of my money.

epolanski 12 hours ago||

Not with a local one, but I moved to DeepSeek v4.

Albeit I plan to move to local ones when I will get my hands on a 256+ GB macbook.

Local inference is good enough to help me with my daily job, and doesn't turn me into an assistant to the LLM.

jay_kyburz 12 hours ago||

Can anybody let me know how just chatting with Qwen3.6 on a Strix Halo 128GB

If I give it a page of context, can it write a linked list or identify a bad line of CSS?

Is there anywhere online I can chat with a model I could be running at home to see how good it is?

thrownaway561 15 hours ago||

I just use DeepSeekV4 Fast... It's cheap as hell. Currently my monthly usage has been

67M Ouput 51M Input

Total $0.83 dollar.

I honestly don't understand why people just don't use DeepSeek.

ThomasGlanzmann 14 hours ago||

I do the same. deepseekv4 fast for the 90% of the tasks, if it can't lift it, I use deepseekv4 pro. I use crush as coding agent but removed the blocked commands because I also do a lot of system administration. Love it. I use 8 USD in 7 weeks and use it quiet extensively for all sorts of things, programming, system administration, google search replacement, investments, you name it.

codemk8 12 hours ago||

You mean deepseek-v4-flash, right? Same here. I use it for my Hermes agent. It's so cheap that I sometimes feel "guilty". I even put more money than I needed just make sure they do not go out of business.

ThomasGlanzmann 6 hours ago||

Yes, I do mean deepseek-v4-flash.

jeffrallen 16 hours ago||

I use Qwen 3.6 on a remote GPU that my work offers. Works fine. Slow and steady, works hard, gets the job done. Probably better at diagnosing than making new code, but whatever.

gigatexal 17 hours ago||

I tried to. I just couldn't get over how it made my otherwise whisper quiet M3 Max MacBook Pro 14 for the performance. The sweet spot has been adopting Claude Code to use the Chinese models. Deepseek V4 Pro is very, very good. But I am such a casual local user of AI that my 20/month Claude subscription is enough and I find myself using that more and more.

dude250711 18 hours ago||

Yes, running a local model on a natural wetware substrate here.

Recommended setup: plenty of nutrients, some caffeine and a quiet environment.

Performance - not currently measured in tokens: roughly average.

jasongill 18 hours ago||

I have been running this stack since well before Claude Code became popular. It works OK but I've found it to be very slow; and despite having a big context window, it seems to lose track of what it's working on and goes down a rabbit hole (or just wastes tokens trying to use the web browser) for hours and is hard to get back on track. I even tried spinning up two sub-agents but even after years of trying to prompt them, they are almost useless in terms of coding ability, so that is looking to be a waste of spending at least so far but maybe the model will improve as time goes on.

bananadonkey 14 hours ago||

My sub agent has been looping for almost 10 years at this point and has so far written 0 lines of code. Definitely won't be investing in another...

HPsquared 18 hours ago||

I personally get about 50 tokens per hour.

syngrog66 11 hours ago|

pre-replaced it with combo of my brain, vim, an assortment of other CLI/TUI tools, etc

More comments...