I cancelled Claude: Token issues, declining quality, and poor support

Posted by y42 19 hours ago

I cancelled Claude: Token issues, declining quality, and poor support(nickyreinert.de)

871 points | 509 commentspage 7

DeathArrow 18 hours ago|

I use Claude Code with GLM, Kimi and MiniMax models. :)

I was worried about Anthropic models quality varying and about Anthropic jacking up prices.

I don't think Claude Code is the best agent orchestrator and harness in existence but it's most widely supported by plugins and skills.

droidjj 17 hours ago|

Where are you getting inference from? I'm overwhelmed by the options at the moment.

alex-onecard 16 hours ago|||

I am also curious. Considering the kimi coding plan but I'm worried about data privacy and security.

DeathArrow 16 hours ago||

I don't send much data to cloud, mostly code. And I don't believe in security by obscurity, if I need high security I do proper implementation.

DeathArrow 16 hours ago|||

I am using Ollama Cloud and Moonshot Ai.

giancarlostoro 18 hours ago||

I'm torn because I use it in my spare time, so I've missed some of these issues, I don't use it 9 to 5, but I've built some amazing things, when 1 Million tokens dropped, that was peak Claude Code for me, it was also when I suspect their issues started. I've built up some things I've been drafting in my head for ages but never had time for, and I can review the code and refine it until it looks good.

I'm debating trying out Codex, from some people I hear its "uncapped" from others I hear they reached limits in short spans of time.

There's also the really obnoxious "trust me bro" documentation update from OpenClaw where they claim Anthropic is allowing OpenClaw usage again, but no official statement?

Dear Anthropic:

I would love to build a custom harness that just uses my Claude Code subscription, I promise I wont leave it running 24/7, 365, can you please tell me how I can do this? I don't want to see some obscure tweet, make official blog posts or documentation pages to reflect policies.

Can I get whitelisted for "sane use" of my Claude Code subscription? I would love this. I am not dropping $2400 in credits for something I do for fun in my free time.

fluidcruft 18 hours ago||

It sounds like we have very similar usage/projects. codex had been essentially uncapped (via combination of different x-factors between Plus and Pro and promotions) until very recently when they copied Anthropic's notes.

Plus is still very usable for me though. I have not tried Claude Pro in quite a while and if people are complaining about usage limits I know it's going to be a bad time for me. I had to move up from Claude Pro when the weekly limits were introduced because it was too annoying to schedule my life around 5hr windows.

I started using codex around December when I started to worry I was becoming too dependent on Claude and need to encourage competition. codex wasn't particularly competitive with Claude until 5.4 but has grown on me.

The only thing I really care about is that whatever I'm using "just works" and doesn't hurt limits and Claude code has been flaky as all hell on multiple fronts ever since everyone discovered it during the Pentagon flap. So I tend to reach for ChatGPT and codex at the moment because it will "just work" and there's a good chance Claude will not.

scottyah 18 hours ago|||

Don't forget, Openclaw was basically bought by OpenAI so there's only incentive to use it as a wedge to pry people off Anthropic.

dheera 18 hours ago||

Claude Code now has an official telegram plugin and cron jobs and can do 80% of the things people used OpenClaw for if you just give it access to tools and run it with --dangerously-skip-permissions.

giancarlostoro 15 hours ago|||

I don't use OpenClaw is what I'm saying though, I use Claude Code for coding, and would like to better equip Claude by a custom coding harness that has superior tooling out of the box, but that is fair.

Der_Einzige 18 hours ago|||

The /loop command which is supposed to be the equivilant to heartbeat.md is EXTREMELY unreliable/shitty.

giancarlostoro 15 hours ago||

I use it sparingly with my guardrails project. I basically tell it to:

Check any tasks if it's not currently working on one, and to continue until it finishes, dismiss this reminder if it's done, and then to ensure it runs unit tests / confirms the project builds before moving on to the next one. Compact the context when it will move to the next one. Once its exhausted all remaining tasks close the loop.

Works for me for my side projects, I can leave it running for a bit until it exhausts all remaining tasks.

chadleriv 13 hours ago||

Off topic: I do feel like this model switching content feels very circa 2010 "I'm quitting Facebook"

hedgehog 18 hours ago||

I used Opus via Copilot until December and then largely switched over to Claude Code. I'm not sure what the difference is but I haven't seen any of these issues in daily use.

nickdothutton 18 hours ago||

Switched to local models after quality dropped off a cliff and token consumption seemed to double. Having some success with Qwen+Crush and have been more productive.

tfrancisl 17 hours ago|

Would love some more info on how you got any local model working with Crush. Love charmbracelet but the docs are all over the place on linking into arbitrary APIs.

porkloin 16 hours ago||

assuming you have a locally running llama-server or llama-swap, just drop this into your crush.json with your setup details/local addresses etc:

Edit: i forgot HN doesn't do code fences. See https://pastebin.com/2rQg0r2L

Obviously the context window settings are going to depend on what you've got set on the llama-server/llama-swap side. Multiple models on the same server like I have in the config snippet above is mostly only relevant if you're using llama-swap.

TL;DR is you need to set up a provider for your local LLM server, then set at least one model on that server, then set the large and small models that crush actually uses to respond to prompts to use that provider/model combo. Pretty straightforward but agree that their docs could be better for local LLM setups in particular.

For me, I've got llama-swap running and set up on my tailnet as a [tailscale service](https://tailscale.com/docs/features/tailscale-services) so I'm able to use my local LLMs anywhere I would use a cloud-hosted one, and I just set the provider baseurl in crush.json to my tailscale service URL and it works great.

AJRF 13 hours ago||

We are in the 'we need to IPO so screw our customers' phase of the cycle

sfmike 17 hours ago||

i ran prompts used up a ton of usage, and got no return just showed error.

Asked support hey i got nothing back i tried prompting several times used a ton of usage and it gave no response. I'd just like usage back. What I payed for I never got.

Just bot response we don't do refunds no exceptions. Even in the case they don't serve you what your plan should give you.

caycep 17 hours ago||

If all Claude does is automate mundane code, why not just make a "meta library" of said common mundane code snippets?

twobitshifter 17 hours ago||

maybe make it so that when you start typing it completes the snippet?

queuebert 17 hours ago||

Like Stack Overflow?

caycep 17 hours ago||

you still have to search stack overflow and sift, but I'm surprised someone doesn't just make a TLDR style or shortcuts-expanding product out of it that you can just pop in code for this use case or that. vs. the current product spending a few datacenter's worth of energy to give you the answer that's only correct x% of the time

aleqs 18 hours ago||

The usage metering is just so incredibly inconsistent, sometimes 4 parallel Opus sessions for 3 hours straight on max effort only uses up 70% of a session, other times 20 mins / 3 prompts in one session completely maxes it out. (Max x20 plan) Is this just a bug on anthropic side or is the usage metering just completely opaque and arbitrary?

unshavedyak 17 hours ago|

It's something strange because i never have these issues. I often run two in parallel (though not all day), and generally have something running anytime i look at my laptop to advance the steps/tasks/etc. Usually i struggle to hit 50% on my Max20.

Heck two weeks ago i tried my hardest to hit my limit just to make use of my subscription (i sometimes feel like i'm wasting it), and i still only managed to get to 80% for the week.

I generally prune my context frequently though, each new plan is a prune for example, because i don't trust large context windows and degradation. My CLAUDE.md's are also somewhat trim for this same fear and i don't use any plugins, and only a couple MCPs (LSP).

No idea why everyone seems to be having such wildly different experiences on token usage.

subscribed 14 hours ago||

Maybe you should try running exactly the same prompts in exactly the same settings?

Chances are one of you has been drafted into an unpleasant experiment.

smashah 4 hours ago|

Did the same with Google Ai Ultra. They rug pulled the subscribers. They changed the deal, we cancel. Simple.

More comments...