Posted by mpweiher 1 day ago
So I fired up Cline with gpt-oss-120b, asked it to tell me what a specific function does, and proceeded to watch it run `cat README.md` over and over again.
I'm sure it's better with other the Qwen Coder models, but it was a pretty funny first look.
[0] https://huggingface.co/blog/RakshitAralimatti/learn-ai-with-...
I use Claude for all my planning, create task documents and hand over to GLM 4.6. It has been my workhorse as a bootstrapped founder (building nocodo, think Lovable for AI agents).
We need to clean up code lint and format errors across multiple files. Check which files are affected using cargo commands. Please use opencode, a coding agent that is installed. Use `opencode run <prompt>` to pass in a per-file prompt to opencode, wait for it to finish, check and ask again if needed, then move to next file. Do not work on files yourself.- claude code router - basically allows you to swap in other models using the real claude code cli and set up some triggers for when to use which one (eg. plan mode use real claude, non plan or with keywords use glm)
- opencode - this is what im mostly using now. similar to ccr but i find it a lot more reliable against alt models. thinking tasks go to claude, gemini, codex and lesser execution tasks go to glm 4.6 (on ceberas).
- sub-agent mcp - Another cool way is to use an mcp (or a skill or custom /command) that runs another agent cli for certain tasks. The mcp approach is neat because then your thinker agent like claude can decide when to call the execution agents, when to call in another smart model for a review of it's own thinking, etc instead of it being explicit choice from you. So you end up with the mcp + an AGENTS.md that instructs it to aggressively use the sub-agent mcp when it's a basic execution task, review, ...
I also find that with this setup just being able to tap in an alt model when one is stuck, or get review from an alt model can help keep things unstuck and moving.
A simpler approach without subtasks would be to just use the smart model for Ask/Plan/whatever mode and the dumb but cheap one for the Code one, so the smart model can review the results as well and suggest improvements or fixes.
The only legit use case for local models is privacy.
I don't know why anyone would want to code with an intern level model when they can get a senior engineer level model for a couple of bucks more.
It DOESN'T MATTER if you're writing a simple hello world function or building out a complex feature. Just use the f*ing best model.
The flip side of this coin is I'd be very excited if Jane Street or DE Shaw were running their trading models through Claude. Then I'd have access to billions of dollars of secrets.
Using Claude for inference does not mean the codebase gets pulled into their training set.
This is a tired myth that muddies up every conversation about LLMs
Many copyright holders, and the courts would beg to differ.
There may be other reasons to go local, but I would say that the proposed way is not cost effective.
There's also a fairly large risk that this HW may be sufficient now, but will be too small in not too long. So there is a large financial risk built into this approach.
The article proposes using smaller/less capable models locally. But this argument also applies to online tools! If we use less capable tools even the $20/mo subscriptions won't hit their limit.
The biggest saving you make is by making the context smaller and where many turns are required going for smaller models. For example a single 30min troubleshooting session with Gemini 3 can cost $15 if you run it "normally" or it can cost $2 if you use the agents, wipe context after most turns (can be done thanks to tracking progress in a plan file)
Best choice will depend on use cases.
It will become like cloud computing - some people will have a cloud bill of $10k/m to host their apps, other people would run their app on a $15/m VPS.
Yes, the cost discrepancy will be as big as the current one we see in cloud services.
Imagine having the hardware capacity to run things locally, but not the necessary compliance infrastructure to ensure that you aren't committing a felony under the Copyright Technofeudalism Act of 2030.