Show HN: Smart model routing directly in Claude, Codex and Cursor

Posted by adchurch 1 day ago

Show HN: Smart model routing directly in Claude, Codex and Cursor(github.com)

We built a model router that plugs into coding agents (e.g. Claude Code, Codex, Cursor, etc.) and intelligently sends requests to the best model to serve them. Here's a quick demo of running it locally: https://www.youtube.com/watch?v=isKhAyivtfM.

At Weave, we write most of our code with AI, and it's been getting more expensive. This came to a head when Opus 4.7 was released and, thanks to its tokenizer changes, our costs shot up. We knew we didn't need Opus for everything but we didn't want to lose out on the intelligence for the cases where you really need it. So we decided to build a model router to handle this for us.

The Weave Router acts as an Anthropic/OpenAI endpoint specifically for coding agents. It looks at every inference request and intelligently (more on that in a sec) decides what model to send it to, handling all the translations required along the way. So it can use faster/cheaper models (e.g. DeepSeek v4, GLM 5.2, Kimi K2.6) when possible, and frontier models (Opus 4.8 & GPT 5.5 (& Fable whenever it's back)) when necessary.

How do we know what model to route to? We trained an RL model on tens of thousands (so far!) of agent traces. We reward the routing model when it selects an LLM that successfully completes the given task.

Here's an example: if you ask the router to plan a complex change, it will (probably) route that request to Opus 4.8. Subagents exploring the codebase to gather context will be routed to more suitable models (e.g. DeepSeek V4 Flash). Then when you have the plan ready to implement, it will be (most likely) be handed to a quicker model (e.g. GLM 5.2) to carry it out.

We've been using this internally for the last month or so. We've saved 40% on tokens vs. what we otherwise would have paid, with no noticeable differences in quality or velocity.

The router is source-available under Elastic License 2.0, so you can self-host it. Or if you prefer, you can also use our hosted version: weaverouter.com.

I'll be here to answer any questions you may have!

198 points | 108 commentspage 2

pradeep1177 6 hours ago|

I generally believe the proxy route is best to understand any harness. I been building some thing similar.

nativeit 23 hours ago||

This would have been neat back when I could afford enough tokens to even set it up properly. Now I’ve had to increase my GH Copilot subscription just to cover the bare minimum updates to a few websites every month, and I no longer do any test driving, or even recreational coding projects. I don’t have hundreds of dollars a month to plow into these products, so I’m rationing use, looking for better local options, and being much more discerning about where these tools actually save time. Precarious time to be alive…

ValentineC 21 hours ago||

> Now I’ve had to increase my GH Copilot subscription

Maybe you should move away from a subscription that started charging by the token instead of by the request?

dools 18 hours ago|||

I've been building a reasonably complicated project over the past week using deepseek v4 pro almost exclusively (a couple of k2.7 and 1 session with gpt5.5 to re-assess some architectural questions). Deepseek is super capable though if you're a coder. I don't even write "code" but I can tell when it's doing something dumb and tell it how to do it better, but other than that I'm not micro managing it or using it "just for auto complete" or whatever.

And it is SO fucking cheap.

adchurch 5 hours ago||

Yes the open source models are very good, that’s a big part of what makes this router save so much money in practice! There definitely are some things they still don’t handle well though where you do want a frontier model

newaccountman2 23 hours ago||

try OpenCode

lubujackson 1 day ago||

I notice Cursor already does something similar. Even if I have Opus 4.8 selected, it will trigger subagents using Composer 2.5. I like using Auto personally because it is pretty effective and deeply discounted, but at work I YOLO Opus high.

I imagine a solution like this will eventually be an enterprise-forced solution because there is no reason right now for individual developers to be selective about model pricing. Even more important is non-tech users who do stuff through MCPs like "give me a full overview of all analytics" and let it chug for half an hour.

adchurch 1 day ago|

Oh interesting, didn't know Cursor did that! Totally makes sense though, routing subagents is def the easiest win, no need to have any cache awareness.

latchkey 1 hour ago||

What I want is a router that can also provision compute on demand and shut it down when it is done.

spqw 1 day ago||

This + making sure common requests are saved as reusable skills and scripts would probably save a large part of my token usage

As prices increase we will see more of these tools to optimise and make the best use of token budget

adchurch 1 day ago|

100%, from what we've seen, for a lot of big companies that 1. don't have subsidized usage and 2. are pushing AI adoption hard, figuring out token costs is P0 or P1 for their eng leadership

SoftTalker 1 day ago||

So you're saying that since adopting AI/LLM tech many companies have their top engineering priority being optimizing the costs of that rather than ... addressing actual business needs?

adchurch 1 day ago||

I guess delivering business value is always #1, I just meant it's the biggest problem they're trying to solve. Here's a recent example that was public: https://fortune.com/2026/05/26/uber-coo-ai-spending-tokens-c...

forgeshiptoday 13 hours ago||

Just curious how the router decides on which model to use. When I use Claude Code, I often ask Claude Code to decide itself if it should spawn a sub-agent to downgrade or upgrade the model. Claude Code is smart to know how much context and cache it has and will decide if it should use sub-agent with a lesser model (sometimes it costs more to re-fetch tokens with a Sonnet sub-agent if the parent agent already has the context).

adchurch 5 hours ago|

We trained a model to select which LLM to call at any given turn, based on lots of agent traces

jmalicki 1 day ago||

> with no noticeable differences in quality or velocity.

Have you done any A/B tests on this with evidence? (That's one thing I'd be very interested to see for claims like this - I'm not necessarily doubting you, it just seems like it could be useful to understand claims of quality/efficiency)

adchurch 1 day ago|

Great question! Our main product quantifies engineering productivity & quality so I think we're uniquely qualified to answer this - our velocity has only gone up and our quality (bugs introduced, code turnover) has not budged per our own analysis.

jmalicki 20 hours ago||

> our velocity has only gone up

That is super curious - using more low quality cheaper models increased your velocity? My prior would have been slightly reduced velocity but massive reduction in token costs made it worthwhile.

Is that due to the faster inference time?

jawon 20 hours ago||

I got Opus to knock out an MCP server that implements subagents running in pi and tell Opus to send work to DeepSeek. Or I tell it to ask GPT-5.5 for critiques. It's manual but saves a lot of tokens.

jpease 22 hours ago||

Is this noticeably different than having your implementation planning phase break a larger task into sub-tasks, and recording the ideal model to use based on scope as part of the task definition?

adchurch 19 hours ago|

Yes because it's a model explicitly trained to make model selections! Opus probably doesn't have a great idea of when to send a task to DeepSeek vs. to Sonnet, for example.

notatoad 1 day ago|

Is this talking to claude code, or to claude api (and paying api rates)? programatically routing requests through claude code sounds like a good way to get banned, just like the opencode and openclaw users.

adchurch 1 day ago|

If you have a Claude sub with subsidized usage we use that. If not you pay API prices.

ValentineC 21 hours ago||

Is that because you start by running it inside Claude Code? I don't see how Claude would allow any other harness to call them for their subscription, after all that OpenClaw hullabaloo.

adchurch 5 hours ago||

Yep exactly

More comments...