Posted by cloudking 15 hours ago
Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?
Then I give it to local LLM (eg: Qwen / Gemma 4) via CLI. This is possible through usage of llm-mlx on Mac (or ollama on any machine given sufficient on hardware) which serve OpenAPI endpoints compatible for Aider (CLI) or Visual Studio Code to vibe along with the agentic coding assistant.
The paid products have an advantage but are not necessary if you don't mind to be more-involved with the process and have low expectations.
I did just publish a free to read online book "The Rise of Local Coding Agents" [1] where I document my setup that I enjoy using. I use little-coder (built on pi) and have good results for small Python and TypeScript applications. I struggle getting good results with Common Lisp and Clojure.
For me, the problem with all local LLM-basic coding agents is slow runtime.
It's kind of like driving a shitbox. It can often drive you from A to B, and some people will try to convince you it's fine. It's not.
There's no logical reason other than absolutely requiring the privacy, doing it for fun, or niche use cases like airplanes and so on. If you can't spend the insanely subsidized $20 for codex, you can use an API for chinese models which will run circles around these tiny models.
It's faster than I can read, but it feels slow as hell. I think 40-50 tks is probably much more comfortable and I hope I can reach that when trying this on llamacpp soon enough.
[0] - https://pastes.io/9gaARxE8
[1] - https://jsfiddle.net/pou4nbh9/1/
Model: https://huggingface.co/google/gemma-4-26B-A4B-it-qat-q4_0-gg...