Claude Code users hitting usage limits 'way faster than expected'

Posted by samizdis 12 hours ago

Claude Code users hitting usage limits 'way faster than expected'(www.theregister.com)

263 points | 162 commentspage 5

spongebobstoes 10 hours ago|

try codex, it's really good and doesn't have the same limits issues

ChrisArchitect 10 hours ago||

Source: https://old.reddit.com/r/ClaudeCode/comments/1s7zg7h/investi... (https://news.ycombinator.com/item?id=47582671)

raincole 10 hours ago||

Opus 4.6 price:

Input $5 / M tokens Output $25 / M tokens

GPT Codex 5.3:

Input $1.75 / M tokens Output $14 / M tokens

> Claude Code users hitting usage limits 'way faster than expected'

No shit, Sherlock.

firebot 11 hours ago||

The first hit is free.

shafyy 11 hours ago||

What is the best way to get start with open weight models? And are they a good alternative to Claude Code?

MarsIronPI 11 hours ago||

If you want to still use APIs, I like OpenRouter because I can use the same credits across various models, so I'm not stuck with a single family of models. (Actually, you can even use the proprietary models on OpenRouter, but they're eye-wateringly expensive.)

Otherwise you should look into running e.g. Qwen3.5-35B-A3B or Qwen3.5-27B on your own computer. They're not Opus-level but from what I've heard they're capable for smaller tasks. llama.cpp works well for inference; it works well on both CPU and GPUs and even split across both if you want.

lukewarm707 11 hours ago|||

i would recommend getting an API account on fireworks, this is ZDR and typically the fastest provider.

otherwise check the list of providers on openrouter and you can see the pricing, quantisation, sign up directly rather than via a router. ensure to get caching prices, do not get input/output API prices.

GLM 5 is a frontier model, Kimi 2.5 is similar with vision support, Minimax M2.7 is a very capable model focused on tool calling.

If you need server side web search, you could use the Z AI API directly, again ZDR; or Friendli AI; or just install a search mcp.

For the harness opencode is the normal one, it has subagents and parallel tool calling; or just use claude code by pointing it at the anthropic APIs of various providers like fireworks.

wolvoleo 11 hours ago|||

Just install ollama.

And no, they're not as capable as SOTA models. Not by far.

However they can help reduce your token expenditure a lot by routing them the low-hanging fruit. Summaries, translations, stuff like that.

ramon156 11 hours ago||

no need for ollama, simonw's llm tool is good enough

scottcha 11 hours ago||

We offer multiple SOA models at https://portal.neuralwatt.com at very generous pricing since we have options to bill per kWh instead of per token. Recipes for your favorite tools here: https://github.com/neuralwatt/neuralwatt-tools

wellthisisgreat 5 hours ago||

yeah this is crazy hitting limits on a non-constant usage of a Max plan?

sachamorard 5 hours ago||

[dead]

philbitt 7 hours ago||

[dead]

bustah 11 hours ago||

[dead]

MeetRickAI 8 hours ago|

[dead]

More comments...