Top
Best
New

Posted by adchurch 1 day ago

Show HN: Smart model routing directly in Claude, Codex and Cursor(github.com)
We built a model router that plugs into coding agents (e.g. Claude Code, Codex, Cursor, etc.) and intelligently sends requests to the best model to serve them. Here's a quick demo of running it locally: https://www.youtube.com/watch?v=isKhAyivtfM.

At Weave, we write most of our code with AI, and it's been getting more expensive. This came to a head when Opus 4.7 was released and, thanks to its tokenizer changes, our costs shot up. We knew we didn't need Opus for everything but we didn't want to lose out on the intelligence for the cases where you really need it. So we decided to build a model router to handle this for us.

The Weave Router acts as an Anthropic/OpenAI endpoint specifically for coding agents. It looks at every inference request and intelligently (more on that in a sec) decides what model to send it to, handling all the translations required along the way. So it can use faster/cheaper models (e.g. DeepSeek v4, GLM 5.2, Kimi K2.6) when possible, and frontier models (Opus 4.8 & GPT 5.5 (& Fable whenever it's back)) when necessary.

How do we know what model to route to? We trained an RL model on tens of thousands (so far!) of agent traces. We reward the routing model when it selects an LLM that successfully completes the given task.

Here's an example: if you ask the router to plan a complex change, it will (probably) route that request to Opus 4.8. Subagents exploring the codebase to gather context will be routed to more suitable models (e.g. DeepSeek V4 Flash). Then when you have the plan ready to implement, it will be (most likely) be handed to a quicker model (e.g. GLM 5.2) to carry it out.

We've been using this internally for the last month or so. We've saved 40% on tokens vs. what we otherwise would have paid, with no noticeable differences in quality or velocity.

The router is source-available under Elastic License 2.0, so you can self-host it. Or if you prefer, you can also use our hosted version: weaverouter.com.

I'll be here to answer any questions you may have!

198 points | 108 commentspage 3
asdev 1 day ago|
Large model companies will likely build this and make it better. It'll also be cheaper overall since they'll be subsidizing token cost if you use them directly vs third party router paying API costs
adchurch 1 day ago|
I would argue they do not have a good incentive to build this and make it better. Why would Anthropic route Claude Code traffic to DeepSeek (at 20% of the cost)?
asdev 1 day ago||
They'll route traffic to Haiku or one of their cheaper models, not third parties. Overall cost will end up being cheaper than whatever you are doing
adchurch 6 hours ago||
We welcome the competition :)
pradeep1177 1 day ago||
So, how are you handling read/write caching? I mean, if I keep routing the next prompt based on the task weights? How about if I'm sending every 5th query to opus, which do expensive write cache?
adchurch 1 day ago|
We consider the cost of missing the cache when making each routing decision after the initial one. Discussed in a bit more depth here: https://news.ycombinator.com/item?id=48689448
k9294 1 day ago||
What about request caching? If you swap to a cheaper model mid execution it might cost more that to make multiple requests to the already cached provider?
adchurch 1 day ago|
Yep 100%, mentioned this in another thread (https://news.ycombinator.com/item?id=48689448) but tl;dr we build the router to be cache aware
alansaber 1 day ago||
"We reward the routing model when it selects an LLM that achieves the task successfully" sounds pretty oversimplified
adchurch 1 day ago|
Indeed it is :) I skipped over talking about all the RL machinery, network design, reward function design, state representations, etc. because really the intuition is that we tell the model when it accomplishes its goal, and then it learns over time how to get better at making the right decisions in order to accomplish its goal.

Happy to talk about this in some more depth if there's anything specific you're curious about!

zcw100 1 day ago||
Can't really win can ya? Scarce? They're driving up prices! Plentiful? It's all a big bubble!
thandv 1 day ago||
This might be a stupid question, but can a extra added local llm help with the caching problem?
adchurch 1 day ago|
We haven't experimented with routing to local LLMs much. Technically they benefit from the cache too although it's more a question of latency than cost. But tbh I haven't seen great results in the wild from working with local LLMs for coding - curious if you've had any success with them?
thandv 5 hours ago||
I generally used them for token saving purposes, just using them for repetitive tasks, gated and supervised by claude. So its planned and verified by better models, but implementation falls on local ones. It has been pretty effective for me, as long as I spend a bit more initially on splitting complex tasks further down
gautam_io 1 day ago||
This is cool!

Will this use my Claude Pro/Max subscription? Or will it always use the API billing "pay as you go"?

adchurch 1 day ago|
Yep it uses the Claude sub if possible and falls back to API billing only if you don't have a Claude sub or it's out of usage! Same deal for Codex
reliablereason 1 day ago||
Wont this kill the kv cache?

Also i am pretty sure neither open ai or anthropic leets you seed the agents own tokens.

adchurch 1 day ago|
Very important consideration, addressed it in another thread (https://news.ycombinator.com/item?id=48689448). tl;dr we built this to be cache aware for exactly this reason
suyash 1 day ago||
I would rather just use OpenCode - leverage AI models, even can host locally or paid ones with ease.
adchurch 1 day ago|
We integrate with OpenCode too! OpenCode provides the harness, then the router selects the right model for the task.

We haven't yet set up local model routing though, that's really interesting - have you had any success using local models for coding tasks? Tbh I haven't heard many success stories from using local models yet

treexs 1 day ago|
Ahh been working on the same thing for a while now but haven't launched yet
gopher_space 1 day ago||
A lot of people are working on the same thing because nobody's come up with a definition of "thing" that people agree on yet. Your project would be valuable just for adding another point of view to the conversaion.
adchurch 1 day ago||
Cool, interested to see your approach when you do launch! I think it's a really interesting problem
More comments...