Claude Code users hitting usage limits 'way faster than expected'

Posted by samizdis 10 hours ago

Claude Code users hitting usage limits 'way faster than expected'(www.theregister.com)

232 points | 146 commentspage 3

pagecalm 3 hours ago|

Hit this myself recently, along with a bunch of overloaded errors. I think it's growing pains for where we are with AI right now.

As the tooling matures I think we'll see better support for mixing models — local and cloud, picking the right one for the task. Run the cheap stuff locally, use the expensive cloud models only when you actually need them. That would go a long way toward managing costs.

There's also the dependency risk people aren't talking about enough. These providers can change pricing whenever they want. A tool you've built your entire workflow around can become inaccessible overnight just because the economics shifted. It's the vendor lock-in problem all over again but with less predictability.

canada_dry 7 hours ago||

I hit my limit on the project I've been working on (after I let "MAX" run out and moved to "PRO") after about only 2 hours!

TIP (YMMV): I've found that moving the current code base into a new 'project' after a dozen or so turns helps as I suspect the regurgitation of the old conversations chews up tokens.

canada_dry 6 hours ago|

An aside: https://www.buchodi.com/chatgpt-wont-let-you-type-until-clou...

It seems that anthropic has added something similar to their browser UI because just in the last few days chat has become almost unusable in firefox. %@$#%

stavros 8 hours ago||

Anthropic went about this in a really dishonest way. They had increased demand, fine, but their response was to ban third-party clients (clients they were fine with before), and to semi-quietly reduce limits while keeping the price the same.

Unilaterally changing the deal to give customers less for the same price should not be legal, but companies have slowly boiled the frog in such a way that now we just go "welp, it's corporations, what can you do", and forget that we actually used to have some semblance of justice in the olden days.

mszczodrak 2 hours ago||

I've been hitting the API limit errors over Claude CLI, yet the total usage was 0% on the claude.ai website. Changing the model fixed the problem.

delphic-frog 7 hours ago||

The token usage differs day to day - that's the most frustrating part. You can't effectively plan a development session if you aren't sure how far you'll likely get into a feature.

paulbjensen 3 hours ago||

I have found that:

- If I ask Claude to go and build a product idea out for me from scratch, it can get quite far, but then I will hit quota limits on the pro plan ($20pm).

- I have not drunk the Kool-aid and tried to indulge in ClaudeMaxxing (Max plan at $200pm). I need to sleep and touch grass from time to time.

- I don't bother with a Claude.md in my projects. I just raw-dog context.

- If I have a big codebase, and I'm very clear about what code changes I want to make Claude do, I can easily get a lot of changes made without getting near my quota. It's like Mr Miyagi making precision edits to that Bonsai Tree in Karate Kid.

My last bit of advice - use the tool, but don't let the tool use you.

nitekode 5 hours ago||

This could also be because of the recently introduced 1 million token buffer. I also saw my tokens drain away quickly; then in noticed I was pushing 750k tokens through for every prompt :) Sometimes its hard to get into the habit of clearing

lukewarm707 8 hours ago||

please tell me if i'm crazy.

i just refuse to use openai/google/anthropic subscriptions, i only use open source models with ZDR tokens.

- i like privacy in my work, and i share when i wish. somehow we accepted that our prompts and work may be read and moderated by employees. would you accept people moderating what you write in excel, google docs, apple pages?

- i want a consistent tool, not something that is quantised one day, slow one day, a different harness one day, stops randomly.

- unless i am missing something, the closed source models are too slow for me to watch what they are doing. i feel comfortable with monitoring something, usually at about 200-300tps on GLM 5. above that it might even be too fast!

muskstinks 8 hours ago||

Its a question of price, quality and other factors.

If my company pays for it, i do not care.

If i have a hobby project were it is about converting an idea in my spare time in what i want, i'm happily paying 20$. I just did something like this on the weekend over a few hours. I really enjoy having small tools based on single html page with javascript and json as a data store (i ask it to also add an import/export feature so i can literaly edit it in the app and then save it and commit it).

For the main agent i'm waiting for like the one which will read my emails and will have access tos ystems? I would love a local setup but just buying some hardware today costs still a grant and a lot of energy. Its still sign cheaper to just use a subscription.

Not sure what you mean though regarding speed, they are super fast. I do not have a setup at home which can run 200-300 tps.

lukewarm707 8 hours ago||

i don't use local models, i just use the APIs of cloud providers (eg fireworks, together, friendli, novita, even cerebras or groq).

you can get subscriptions to use the APIs, from synthetic, or ollama, fireworks.

johntash 43 minutes ago|||

I might be missing it, but does fireworks actually have a subscription? All I saw was serverless (per token) and gpu $/hr.

And since I saw a few other comments talking about these, do you have any preference on different cloud providers with ZDR? I look every once in a while and want to switch to completely open models and/or at least ZDR so I can start doing things like summarizing e-mail. I'm thinking I can probably split my use between some sort of cloud api and claude code for heavier tasks.

muskstinks 7 hours ago|||

Whats the big difference then? You can get a lot of tokens for 20$ and not everything is a state secret i'm doing.

But if i would use some API stuff, probably openrouter, isn't that easer to switch around and also have zero konwledge savety?

lukewarm707 7 hours ago||

i think that privacy is good for wellbeing. it may be this is a dying point of view.

muskstinks 7 hours ago||

It is for sure but running your own email is so time intense that i gave that up 10 years ago.

i then decided to trust one company with most stuff.

Also as I said, I would use something different for my personal stuff. But i'm waiting for the right hardware etc.

susupro1 8 hours ago||

You are not crazy, you are just waking up from the SaaS delusion. We somehow allowed the industry to convince us that paying $20/month to rent volatile compute, have our proprietary workflows surveilled, and get throttled mid-thought is an 'upgrade'. The pendulum is swinging violently back to local-native tools. Deterministic, privately owned, unmetered—buying your execution layer instead of renting it is the only way to build actual leverage.

muskstinks 8 hours ago|||

I'm quite aware of my dependency and i'm balancing this in and out regularly over the last 10 years.

Owning is expensive. Not owning is also expensive.

Energy in germany is at 35 cent/kwh and skyrocketed to 60 when we had the russian problem.

I'm planning to buy a farm and add cheap energy but this investment will still take a little bit of time. Until then, space is sparse.

lukewarm707 7 hours ago||||

i don't use local llms. it's mostly the closed source subscriptions that are not private, it really is a choice.

there are many cloud providers of zero data retention llm APIs, and even cryptographic attestation.

they are not throttled, you can get an agreed rate limit.

l72 3 hours ago||

Would you mind naming some of your favorite providers?

staticassertion 8 hours ago||||

No one was convinced to spend money to do the things you're saying. That's just disingenuous. People rent models because (a) it moves compute elsewhere (b) they provide higher quality models.

nprateem 8 hours ago||

c) It's turnkey instead of requiring months/years of custom dev and on-going maintenance.

NoMoreNicksLeft 8 hours ago|||

If I could buy this to run it locally, what's that hardware even look like? What model would I even run on the hardware? What framework would I need to have it do the things Claude Code can do?

giancarlostoro 8 hours ago||

I'm guessing their newer models are taking way more compute than they can afford to give away. The biggest challenge of AI will eventually be, how to bring down how much compute a powerful model takes. I hope Claude puts more emphasis into making Haiku and Sonnet better, when I use them via JetBrains AI it feels like only Opus is good enough, for whatever odd reason.

medwards666 8 hours ago|

I get the same. Work has shifted to being agentic first - and whenever I use anything other than Claude Opus it seems that the model easily gets lost spinning its wheels on even the simplest query - especially with some of our more complex codebases, whereas Opus manages to not only reason adequately about the codebase, but also can produce decent quality code/tests in fairly short order.

Oddly though, when using at home I'm using Sonnet via the standard chat interface and that, whilst it will produce substandard code in its output is still reasonably capable - even in more niche tasks. Granted though that my personal projects are far simpler than the codebase I handle at work.

giancarlostoro 8 hours ago||

Funny, I use Opus at home, but I have a Max plan, and I only use it during their non-peak hours. I can't bring myself to downgrade to Haiku or Sonnet.

anon7000 5 hours ago|

I think I ran into this yesterday, with Claude Code taking FOREVER on a lot of tasks. But using Claude within Cursor seems way faster

More comments...