Claude Code: connect to a local model when your quota runs out

Posted by fugu2 3 days ago

Claude Code: connect to a local model when your quota runs out(boxc.net)

106 points | 38 commentspage 2

btbuildem 2 hours ago|

I'm confused, wasn't this already available via env vars? ANTHROPIC_BASE_URL and so on, and yes you may have to write a thin proxy to wrap the calls to fit whatever backend you're using.

I've been running CC with Qwen3-Coder-30B (FP8) and I find it just as fast, but not nearly as clever.

zingar 3 hours ago||

I guess I should be able to use this config to point Claude at the GitHub copilot licensed models (including anthropic models). That’s pretty great. About 2/3 of the way through every day I’m forced to switch from Claude (pro license) to amp free and the different ergonomics are quite jarring. Open source folks get copilot tokens for free so that’s another pro license I don’t have to worry about.

raw_anon_1111 2 hours ago||

Or just don’t use Claude Code and use Codex CLI. I have yet to hit a quota with Codex working all day. I hit the Claude limits within an hour or less.

This is with my regular $20/month ChatGpT subscription and my $200 a year (company reimbursed) Claude subscription.

mercutio2 28 minutes ago|

Yeah, the generosity of Anthropic is vastly less than OpenAI. Which is, itself, much less than Gemini (I've never paid Google a dime, I get hours of use out of gemini-cli every day). I run out of my weekly quota in 2-3 days, 5-hour quota in ~1 hour. And this is 1-2 tasks at a time, using Sonnet (Opus gets like 3 queries before I've used my quota).

Right now OpenAI is giving away fairly generous free credits to get people to try the macOS Codex client. And... it's quite good! Especially for free.

I've cancelled my Anthropic subscription...

raw_anon_1111 24 minutes ago||

Hmm, I might have to try Gemini. Open AI, Claude and Gemini are all explicitly approved by my employer. Especially since we use GSuite anyway

mcbuilder 2 hours ago||

Opencode has been a thing for a while now

swyx 3 hours ago||

i mean the other obvious answer is to plug in to the other claude code proxies that other model companies have made for you:

https://docs.z.ai/devpack/tool/claude

https://www.cerebras.ai/blog/introducing-cerebras-code

or i guess one of the hosted gpu providers

if you're basically a homelabber and wanted an excuse to run quantized models on your own device go for it but dont lie and mutter under your own tin foil hat that its a realistic replacement

esafak 2 hours ago|

Or they could just let people use their own harnesses again...

usef- 2 hours ago|

That wouldn't solve this problem.

And they do? That's what the API is.

The subscription always seemed clearly advertised for client usage, not general API usage, to me. I don't know why people are surprised after hacking the auth out of the client. (note in clients they can control prompting patterns for caching etc, it can be cheaper)

esafak 2 hours ago||

End users -- people who use harnesses -- have subscriptions so that makes no sense. General API usage is for production.

usef- 2 hours ago||

"Production" what?

The API is for using the model directly with your own tools. It can be in dev, or experiments, or anything.

Subscriptions are for using the apps Claude + code. That's what it always said when you sign up.

eli 1 hour ago|||

Production = people who can afford to pay API rates for a coding harness

usef- 1 hour ago||

Saying their prices are too high is an understandable complaint; I'm only arguing against the complaint that people were stopped from hacking the subscriptions.

LLMs are a hyper-competitive market at the moment, and we have a wealth of options, so if Anthropic is overpricing their API they'll likely be hurting themselves.

esafak 2 hours ago|||

Production code, of course; deployed software. For when you need to make LLM calls.