Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?

Posted by cloudking 15 hours ago

Has anyone here fully swapped Claude/GPT for a local model as their main coding tool, not just for side experiments? If so, please share your setup and performance (e.g tok/s)

774 points | 370 commentspage 8

shironnnn_ 10 hours ago|

I use SpecKit to create a very detailed plan with a high amount of specificity using paid Claude plan.

Then I give it to local LLM (eg: Qwen / Gemma 4) via CLI. This is possible through usage of llm-mlx on Mac (or ollama on any machine given sufficient on hardware) which serve OpenAPI endpoints compatible for Aider (CLI) or Visual Studio Code to vibe along with the agentic coding assistant.

The paid products have an advantage but are not necessary if you don't mind to be more-involved with the process and have low expectations.

mark_l_watson 10 hours ago||

I would like to say I run 100% local, but I use Opus + Gemini Pro cumulatively for 3 or 4 hours a week. I also like to use DeepSeek v4 flash with OpenCode for small quick tasks.

I did just publish a free to read online book "The Rise of Local Coding Agents" [1] where I document my setup that I enjoy using. I use little-coder (built on pi) and have good results for small Python and TypeScript applications. I struggle getting good results with Common Lisp and Clojure.

For me, the problem with all local LLM-basic coding agents is slow runtime.

[1] https://leanpub.com/read/local-coding-agents

ecshafer 12 hours ago||

I work with a few models on servers, so not local, but self hosted with ollama. gemma-4, glm 4.7 flash, and qwen 3.6. glm is the best at coding agentically. But I still don't think any of them reach the levels of gpt 5.5 or opus 4.8.

wuschel 12 hours ago||

I would like to know whether someone was able to use lower tier models for activities other than coding e.g. a limited version of a personal note manager - and what the hardware requirements in RAM for these models were.

anuramat 9 hours ago||

I wonder what languages people are using; I imagine smaller models would be decent at bash/python but significantly worse at something like rust

agentbc9000 7 hours ago||

Kimi K2.7 is very good - i have been testing it and its very very good, Fable 5 level of goodness.

bentt 7 hours ago|

Say more!

fortyseven 12 hours ago||

I use Pi and Qwen 3.6 27b locally on a 4090 for all my personal projects. I still use Claude for day job work since they pay for it, and my employer expects me to use it. I rarely touch it otherwise.

redox99 11 hours ago||

Models that you can run at home (Like Qwen 35B) aren't remotely close to Opus or GPT 5.5. Not even close. The only open models that are in that neighbor are around 1T params, so forget about running at home.

It's kind of like driving a shitbox. It can often drive you from A to B, and some people will try to convince you it's fine. It's not.

There's no logical reason other than absolutely requiring the privacy, doing it for fun, or niche use cases like airplanes and so on. If you can't spend the insanely subsidized $20 for codex, you can use an API for chinese models which will run circles around these tiny models.

catapart 8 hours ago||

tough ask, but since we're here: has anyone done this with 16GB of VRAM? I've been getting projects finished with LM Studio, but it definitely could stand to be more efficient. lots of time wasted with trying to get models to understand a problem with so few tokens.

Rzor 2 hours ago|

RX 9060 XT 16GB here on google/gemma-4-26b-a4b-qat using LM Studio. Context 65k, 23 layers on the GPU, 7 on the CPU, model in memory, mmapped. I'm getting 23-33 tks. Started experimenting 3 days ago (with gemma-4-e4b), don't know what half those settings mean, but 26B, even quantified, feels significantly better at a few small projects I asked it to create ("create a image converter using ffmpeg in bash", "create a canvas animation with real physics, no libraries"[1]).

It's faster than I can read, but it feels slow as hell. I think 40-50 tks is probably much more comfortable and I hope I can reach that when trying this on llamacpp soon enough.

[0] - https://pastes.io/9gaARxE8

[1] - https://jsfiddle.net/pou4nbh9/1/

Model: https://huggingface.co/google/gemma-4-26B-A4B-it-qat-q4_0-gg...

hegdeezy 11 hours ago|

I have tried locally but I find that the implicit breakeven is somewhere around 1 year of use given the high power costs where I live. Not really worth it but maybe if I move some day!

More comments...