Top
Best
New

Posted by mpweiher 1 day ago

A guide to local coding models(www.aiforswes.com)
583 points | 342 commentspage 3
jszymborski 1 day ago|
I just got a RTX 5090, so I thought I'd see what all the fuss was about these AI coding tools. I've previously copy pasted back and forth from Claude but never used the instruct models.

So I fired up Cline with gpt-oss-120b, asked it to tell me what a specific function does, and proceeded to watch it run `cat README.md` over and over again.

I'm sure it's better with other the Qwen Coder models, but it was a pretty funny first look.

kelvie 1 day ago|
gpt-oss-120b doesn't fit on a 5090 without offloading or crazy quants -- or did you mean you ran it via openrouter or something?
jszymborski 23 hours ago|||
I'm running the MXFP4 [0] quants at like 10-13 toks/sec. It is actually really good, I'm starting to think its a problem with Cline since I just tried it with Qwen3 and the same thing happened. Turns out Cline _hates_ empty files in my projects, although they aren't required for this to happen.

[0] https://huggingface.co/blog/RakshitAralimatti/learn-ai-with-...

kube-system 1 day ago|||
Sounds like a crazy quant. IME 2 bit quants are pretty dumb.
fny 1 day ago||
My takeaway is that clock is ticking on Claude, Codex et al's AI monopoly. If a local setup can do 90% of what Claude can do today, what do things look like in 5 years?
maranas 1 day ago||
I think they have already realized this, which is why they are moving towards tool use instead of text generation. Also explains why there are no more free APIs nowadays (even for search)
ukuina 1 day ago||
Exactly, imagine what Claude can do in five years!
rester324 1 day ago||
10% on top of what we have now and the same things that the local models can do of those times ahead of us?
brainless 1 day ago||
I do not spend $100/month. I spend for 1 Claude Pro subscription and then a (much cheaper) z.ai Coding Plan, which is like one fifth the cost.

I use Claude for all my planning, create task documents and hand over to GLM 4.6. It has been my workhorse as a bootstrapped founder (building nocodo, think Lovable for AI agents).

alok-g 1 day ago|
I have heard about this approach elsewhere too. Could you please provide some more details on the set up steps and usage approach. I would like to replicate. Thanks.
brainless 1 day ago|||
I simply ask Claude Sonnet, using claudecode, to use opencode. That's it! Example:

  We need to clean up code lint and format errors across multiple files. Check which files are affected using cargo commands. Please use opencode, a coding agent that is installed. Use `opencode run <prompt>` to pass in a per-file prompt to opencode, wait for it to finish, check and ask again if needed, then move to next file. Do not work on files yourself.
baconner 1 day ago|||
There are a couple of decent approaches to having a planning/reviewer model set (eg. claude, codex, gemini) and an execution model (eg. glm 4.6, flash models, etc) workflow that I've tried. All three of these will let you live in a single coding cli but swap in different models for different tasks easily.

- claude code router - basically allows you to swap in other models using the real claude code cli and set up some triggers for when to use which one (eg. plan mode use real claude, non plan or with keywords use glm)

- opencode - this is what im mostly using now. similar to ccr but i find it a lot more reliable against alt models. thinking tasks go to claude, gemini, codex and lesser execution tasks go to glm 4.6 (on ceberas).

- sub-agent mcp - Another cool way is to use an mcp (or a skill or custom /command) that runs another agent cli for certain tasks. The mcp approach is neat because then your thinker agent like claude can decide when to call the execution agents, when to call in another smart model for a review of it's own thinking, etc instead of it being explicit choice from you. So you end up with the mcp + an AGENTS.md that instructs it to aggressively use the sub-agent mcp when it's a basic execution task, review, ...

I also find that with this setup just being able to tap in an alt model when one is stuck, or get review from an alt model can help keep things unstuck and moving.

KronisLV 23 hours ago||
RooCode and KiloCode also have an Orchestrator mode that can create sub-tasks and you can specify which model to use for what - and since they report their results back after finishing a task (implement X, fix Y), the context of the more expensive model doesn’t get as polluted. Probably one of the most user friendly ways to do that.

A simpler approach without subtasks would be to just use the smart model for Ask/Plan/whatever mode and the dumb but cheap one for the Code one, so the smart model can review the results as well and suggest improvements or fixes.

bilater 23 hours ago||
If you are using local models for coding you are midwiting this. Your code should be worth more than a subscription.

The only legit use case for local models is privacy.

I don't know why anyone would want to code with an intern level model when they can get a senior engineer level model for a couple of bucks more.

It DOESN'T MATTER if you're writing a simple hello world function or building out a complex feature. Just use the f*ing best model.

rester324 3 hours ago||
Is this some kind of mental problem that you want to tell people what they do and how they spend their money? Pretty jerk attitude IMO
pcl 22 hours ago|||
Or if you want to do development work while offline.
bilater 22 hours ago||
Good to have fallbacks but in reality most ppl ( at least in the west) will have internet 99% of the time.
pcl 20 hours ago||
Sure, but I am not one of them. I find myself wanting to code on trains and planes pretty often, and so local toolchains are always attractive for me.
groguzt 22 hours ago|||
"senior engineer level model" is the biggest cope I've ever seen
beeboop0 20 hours ago|||
[dead]
jgalt212 23 hours ago||
I will use a local coding model for our proprietary / trade secrets internal code when Google uses Claude for its internal code and Microsoft starts using Gemini for internal code.

The flip side of this coin is I'd be very excited if Jane Street or DE Shaw were running their trading models through Claude. Then I'd have access to billions of dollars of secrets.

Aurornis 22 hours ago||
> I'd be very excited if Jane Street or DE Shaw were running their trading models through Claude. Then I'd have access to billions of dollars of secrets.

Using Claude for inference does not mean the codebase gets pulled into their training set.

This is a tired myth that muddies up every conversation about LLMs

jgalt212 22 hours ago|||
> This is a tired myth that muddies up every conversation about LLMs

Many copyright holders, and the courts would beg to differ.

bilater 22 hours ago|||
lol yeah its weird to me why even ppl on HN can't wrap their heads around stateless calls.
jgalt212 21 hours ago||
unless you control both the client and the server, you cannot prove a call is stateless.
NumberCruncher 1 day ago||
I am freelancing on the side and charge 100€ by the hour. Spending roughly 100€ per month on AI subscriptions has a higher ROI for me personally than spending time on reading this article and this thread. Sometimes we forget that time is money...
mungoman2 1 day ago||
The money argument is IMHO not super strong, here as that Mac depreciates more per month than the subscription they want to avoid.

There may be other reasons to go local, but I would say that the proposed way is not cost effective.

There's also a fairly large risk that this HW may be sufficient now, but will be too small in not too long. So there is a large financial risk built into this approach.

The article proposes using smaller/less capable models locally. But this argument also applies to online tools! If we use less capable tools even the $20/mo subscriptions won't hit their limit.

ardme 1 day ago||
Isnt the math of buying Nvidia stock with what you pay for all the hardware and then just paying $20 a month for codex with the annual returns better?
phainopepla2 1 day ago|
If you can see into the future and know the stock price, then sure.
Muromec 1 day ago||
The line only ever goes up, until we all cry and find a new false messiah. Or die
Roark66 1 day ago||
I found the winning combination is to use all of them in this way: - first you need a vendor agnostic tool like opencode (I had to add my own vendors as it didn't support it out of the box properly) - second you set up agents with different models. I use: - for architecture and planning - opus, Sonet, gpt 5.2, gemini3 (depending on specifics, for example I found got better in troubleshooting, Sonet better in pure code planning, opus better in DevOps, Gemini the best for single shot stuff) - for execution of said plans (Qwen 2.5 Coder 30B - yes, it's even better in my use cases than Qwen3 despite benchmarks, Sonet - only when absolutely necessary, Qwen3-235B - between Qwen 2.5 and Sonet) - verification (Gemini 3 flash, Qwen3-480B etc)

The biggest saving you make is by making the context smaller and where many turns are required going for smaller models. For example a single 30min troubleshooting session with Gemini 3 can cost $15 if you run it "normally" or it can cost $2 if you use the agents, wipe context after most turns (can be done thanks to tracking progress in a plan file)

altx 1 day ago||
Its interesting to notice that here https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com... we default to measuring LLM coding performance as how long[~5h] a human task a model can complete with 50% success-rate (with 80% fall back for the second chart [~.5h]), while here it seems that for actual coding we really care about the last 90-100% of the costly model's performance.
threethirtytwo 1 day ago|
I hope hardware becomes so cheap local models become the standard.
layer8 21 hours ago||
I hope that as well, but if cloud AI keeps buying up most of the world’s GPU and RAM production, it might not come to that.
rynn 1 day ago||
It will be like the rest of computing, some things will move to the edge and others stay on the cloud.

Best choice will depend on use cases.

lelanthran 1 day ago|||
> It will be like the rest of computing, some things will move to the edge and others stay on the cloud.

It will become like cloud computing - some people will have a cloud bill of $10k/m to host their apps, other people would run their app on a $15/m VPS.

Yes, the cost discrepancy will be as big as the current one we see in cloud services.

Terr_ 1 day ago|||
I think the long term will depends on the legal/rent-seeking side.

Imagine having the hardware capacity to run things locally, but not the necessary compliance infrastructure to ensure that you aren't committing a felony under the Copyright Technofeudalism Act of 2030.

More comments...