Measuring Claude 4.7's tokenizer costs

Posted by aray07 10 hours ago

Measuring Claude 4.7's tokenizer costs(www.claudecodecamp.com)

519 points | 354 commentspage 2

uberman 10 hours ago|

On actual code, I see what you see a 30% increase in tokens which is in-line with what they claim as well. I personally don't tend to feed technical documentation or random pros into llms.

Given that Opus 4.6 and even Sonnet 4.6 are still valid options, for me the question is not "Does 4.7 cost more than claimed?" but "What capabilities does 4.7 give me that 4.6 did not?"

Yesterday 4.6 was a great option and it is too soon for me to tell if 4.7 is a meaningful lift. If it is, then I can evaluate if the increased cost is justified.

tetha 8 hours ago||

Yeah that was an interesting discovery in a development meeting. Many people were chasing after the next best model and everything, though for me, Sonnet 4.6 solves many topics in 1-2 rounds. I mainly need some focus on context, instructions and keeping tasks well-bounded. Keeping the task narrow also simplifies review and staying in control, since I usually get smaller diffs back I can understand quickly and manage or modify later.

I'll look at the new models, but increasing the token consumptions by a factor of 7 on copilot, and then running into all of these budget management topics people talk about? That seems to introduce even more flow-breakers into my workflow, and I don't think it'll be 7 times better. Maybe in some planning and architectural topics where I used Opus 4.6 before.

pier25 10 hours ago|||

haven't people been complaining lately about 4.6 getting worse?

solenoid0937 10 hours ago|||

People complain about a lot of things. Claude has been fine:

https://marginlab.ai/trackers/claude-code-historical-perform...

addisonj 9 hours ago|||

I will be the first to acknowledge that humans are a bad judge of performance and that some of the allegations are likely just hallucinations...

But... Are you really going to completely rely on benchmarks that have time and time again be shown to be gamed as the complete story?

My take: It is pretty clear that the capacity crunch is real and the changes they made to effort are in part to reduce that. It likely changed the experience for users.

Majromax 9 hours ago||||

While that's a nice effort, the inter-run variability is too high to diagnose anything short of catastrophic model degradation. The typical 95% confidence interval runs from 35% to 65% pass rates, a full factor of two performance difference.

Moreover, on the companion codex graphs (https://marginlab.ai/trackers/codex-historical-performance/), you can see a few different GPT model releases marked yet none correspond to a visual break in the series. Either GPT 5.4-xhigh is no more powerful than GPT 5.2, or the benchmarking apparatus is not sensitive enough to detect such changes.

yorwba 7 hours ago||

Yes, MarginLab only tests 50 tasks a day, which is too few to give a narrower confidence interval. On the other hand, this really calls into question claims of performance degradation that are based on less intensive use than that. Variance is just so high that long streaks of bad luck are to be expected and plausibly the main source of such complaints. Similarly, it's unlikely you can measure a significant performance difference between models like GPT 5.4-xhigh and GPT 5.2 unless you have a task where one of them almost always fails or one almost always succeeds (thus guaranteeing low variance), or you make a lot of calls (i.e. probably through the API and not in interactive mode.)

jofzar 1 hour ago||||

Matrix also found that Claude was AB testing 4.6 vs 4.7 in production for the last 12 days.

https://matrix.dev/blog-2026-04-16

sumedh 3 hours ago||||

Your link shows there have been huge drops.

How is it fine?

cbg0 9 hours ago|||

That performance monitor is super easy to game if you cache responses to all the SWE bench questions.

solenoid0937 7 hours ago||

You dramatically overestimate how much time engineers at hypergrowth startups have on their hands

cbg0 7 hours ago||

Caching some data is time consuming? They can just ask Claude to do it.

ed_elliott_asc 10 hours ago|||

No we increased our plans

grim_io 10 hours ago||

How long will they host 4.6? Maybe longer for enterprise, but if you have a consumer subscription, you won't have a choice for long, if at all anymore.

Jeremy1026 9 hours ago|||

I was trying to figure out earlier today how to get 4.6 to run in Claude Code, as part of the output it included "- Still fully supported — not scheduled for retirement until Feb 2027." Full caveat of, I don't know where it came up with this information, but as others have said, 4.5 is still available today and it is now 5, almost 6 months old.

hypercube33 9 hours ago||||

I'm still using 4.5 because it gets the niche work I'm using it for where 4.6 would just fight me.

nfredericks 9 hours ago|||

Opus 4.5 is still available

grim_io 9 hours ago||

Wow, they hosted it for 6 months. Truly LTS territory :)

atonse 9 hours ago||

Just yesterday I was happy to have gotten my weekly limit reset [1]. And although I've been doing a lot of mockup work (so a lot of HTML getting written), I think the 1M token stuff is absolutely eating up tokens like CRAZY.

I'm already at 27% of my weekly limit in ONE DAY.

https://news.ycombinator.com/item?id=47799256

jabart 9 hours ago||

I'm seeing the opposite. With Opus 4.7 and xhigh, I'm seeing less session usage , it's moving faster, and my weekly usage is not moving that much on a Team Pro account.

cbm-vic-20 8 hours ago|||

Four day workweek!

richstokes 4 hours ago|||

My personal Claude sub (Pro), I can burn through my limit in a couple of hours when using Opus. It's borderline unusable unless you're willing to pay for extended usage or artificially slow yourself down.

tabbott 4 hours ago||

To me, it seems like the Pro tier is priced for using Sonnet a lot or Opus a little, and Max for using Opus a lot.

So that seems about what you should expect.

aray07 9 hours ago|||

yeah similar for me - it uses a bunch more tokens and I haven’t been able to tell the ROI in terms of better instruction following

it seems to hallucinate a bit more (anecdotal)

titaniumtown 9 hours ago||

I had it hallucinate a tool that didn't exist, it was very frustrating!

dminik 8 hours ago||

Anthropic intruduces fake tool calls to prevent distillation of their models. Others still distill. Anthropic distils third party models. Claude now hallucinates tools.

Brilliant.

CharlesW 8 hours ago|||

> I'm already at 27% of my weekly limit in ONE DAY.

Ouch, that's very different than experience. What effort level? Are you careful to avoid pushing session context use beyond 350k or so (assuming 1m context)?

atonse 6 hours ago|||

Yeah fair point. I have had a couple of conversations (ingesting a pretty complex domain and creating about 42 high fidelity tailwind mockups with ui.sh).

And this particular set of things has context routinely hit 350-450k before I compact.

That's likely what it is? I think this particular work stream is eating a lot of tokens.

Earlier this week (before Open 4.7 hit), I just turned off 1m context and had it grow a lot slower.

I also have it on high all the time. Medium was starting to feel like it was making the occasional bad decisions and also forgetting things more.

JimmaDaRustla 7 hours ago|||

I'm mind blown people are complaining about token consumption and not communicating what thinking level they're using - if cost is a concern and you're paying any attention, you'd be starting with medium and seeing if you can get better results with less tokens. Every person complaining about token usage seem to have no methodology - probably using max and completely oblivious.

AndyNemmity 7 hours ago||

It's unsurprising when this is the first day that tokens have been crazy like this.

All of us doing crazy agentic stuff were fine on max before this. Now with Opus 4.7, we're no longer fine, and troubleshooting, and working through options.

JimmaDaRustla 7 hours ago||

> were fine on max before this

Ya...you may be who I'm talking about though (if you're speaking from experience). If your methodology is "I used 4.6 max, so I'm going to try 4.7 max" this is fully on you - 4.7 max is not equivalent to 4.6 max, you want 4.7 xhigh.

From their docs:

max: Max effort can deliver performance gains in some use cases, but may show diminishing returns from increased token usage. This setting can also sometimes be prone to overthinking. We recommend testing max effort for intelligence-demanding tasks.

xhigh (new): Extra high effort is the best setting for most coding and agentic use cases.

AndyNemmity 6 hours ago||

Sorry, in that case I misunderstood max to mean the subscription, max 20.

I am on xhigh.

JimmaDaRustla 6 hours ago||

Ah - xhigh is probably what you want. Their docs suggest xhigh for agentic coding, though judging by their blog high should be better than 4.6 max (ymmv)

I've always used high, so maybe I should be using xhigh

AndyNemmity 6 hours ago||

I'm actually in the process of switching all of my agents to sonnet, and going to try to drop down to medium.

I used up 1/3rd of my context in less than a day. I am working diligently to do whatever I can to lower token usage.

sreekanth850 8 hours ago|||

Iam at 22%, just two task. A bug fixing and a Scalar integration.

AndyNemmity 7 hours ago||

I'm at 35% :(

sipsi 9 hours ago||

I tried to do my usual test (similar to pelican but a bit more complex) but it ran out of 5 hour limit in 5 minutes. Then after 5 hours I said "go on" and the results were the worst I've ever seen.

jmward01 8 hours ago||

Claude code seems to be getting worse on several fronts and better on others. I suspect product is shifting from 'make it great' to 'make it make as much money for us as possible and that includes gathering data'.

Recently it started promoting me for feedback even though I am on API access and have disabled this. When I did a deep dive of their feedback mechanism in the past (months ago so probably changed a lot since then) the feedback prompt was pushing message ids even if you didn't respond. If you are on API usage and have told them no to training on your data then anything pushing a message id implies that it is leaking information about your session. It is hard to keep auditing them when they push so many changes so I am now 'default they are stealing my info' instead of believing their privacy/data use policy claims. Basically, my level of trust is eroding fast in their commitment to not training on me and I am paying a premium to not have that happen.

yuanzhi1203 8 hours ago||

We noticed this two weeks ago where we found some of our requests are unexpected took more tokens than measured by count_tokens call. At the end they were Anthropic's A/B testing routing some Opus 4.6 calls to Opus 4.7.

https://matrix.dev/blog-2026-04-16.html (We were talking to Opus 4.7 twelve days ago)

qq66 9 hours ago||

This is the backdoor way of raising prices... just inflate the token pricing. It's like ice cream companies shrinking the box instead of raising the price

Bridged7756 7 hours ago|

No, you're forgetting the never ending world shattering models being released every couple of months. Each one with 2X token costs of course, for a vague performance gain and that will deprecate the previous ones.

captn3m0 3 hours ago|||

https://platform.claude.com/docs/en/about-claude/model-depre...

Retirement date for Opus 4.6 is marked as "Not sooner than February 5, 2027"

therobots927 4 hours ago|||

It’s nice to see comments like this. It makes me feel less crazy. Something very weird is going on behind the scenes at Anthropic.

margorczynski 8 hours ago||

It doesn't look good for Anthropic, especially considering they are burning billions in investor money.

Looks like they lost the mandate of heaven, if Open AI plays it right it might be their end. Add to that the open source models from China.

throwaway041207 7 hours ago||

I work at a company that has gone all in on Anthropic, and we're just shoveling money at them. I suspect there are a more enterprises than we realize that are doing this.

When I read these comments on Hacker News, I see a lot of people miffed about their personal subscription limits. I think this is a viewpoint that is very consumer focused, and probably within Anthropic they're seeing buckets of money being dumped on them from enterprises. They probably don't really care as much about the individual subscription user, especially power users.

solenoid0937 6 hours ago|||

1. HN is so unrepresentative of real life. You have people on their $20/$200 subscriptions complaining about usage limits. They are a tiny fraction of Anthropic's revenue. API billing and enterprise is where the money is.

2. Anthropic and OpenAI's financials are totally different. The former has nearly the same RRR and a fraction of the cash burn. There is a reason Anthropic is hot on secondary and OAI isn't

therobots927 8 hours ago||

OpenAI is dealing with exactly the same energetic and financial constraints as Anthropic. That will become apparent soon.

taosx 9 hours ago||

Claude seems so frustrating lately to the point where I avoid and completely ignore it. I can't identify a single cause but I believe it's mostly the self-righteousness and leadership that drive all the decisions that make me distrust and disengage with it.

QuercusMax 8 hours ago||

What do you mean by this? What are you frustrated by?

You're offended by their political beliefs, so you don't like the way the model works?

estearum 9 hours ago||

using dumber models to own the libs

testbjjl 9 hours ago||

Definitely experimenting with less expensive ones. I have a few versions of my settings.json

I also wonder if token utilization has or will ever find its way to employee performance reviews as these models go up in price.

jmward01 9 hours ago||

Yeah. I just did a day with 4.7 and I won't be going back for a while. It is just too expensive. On top of the tokenization the thinking seems like it is eating a lot more too.

aray07 9 hours ago||

yeah i am still not clear why there are 5 effort modes now on top of more expensive tokenization

jmward01 6 hours ago|||

choice is often a great dark-pattern (lack of choice is too but...). Choices generally grow cost to discover optimality in an np way. This means if the entity giving choice has more ability to compute the value prop than the entity deciding the choice you can easily create an exploitive system. Just create a bunch of choices, some actually do save money with enough thought but most don't, and you will gain:

People that think they got what they wanted, the feature is there!, so they can't complain but...

People that end up essentially randomly picking so the average value of the choices made by customers is suboptimal.

jddj 9 hours ago|||

Once you've seen a few results of an LLM given too much sway over product decisions, 5 effort modes expressed as various english adjectives is pretty much par for the course

JimmaDaRustla 7 hours ago||

What was your level methodology and results? Can't just post "too expensive" and not explain how you went about it.

technotony 8 hours ago|

Not only that but they seem to have cut my plan ability to use Sonnet too. I have a routine that used to use about 40% of my 5 hour max plan tokens, then since yesterday it gets stopped because it uses the whole 100%. Anyone else experience this?

mfro 8 hours ago|

yeah it seems like sonnet 4.6 burns thru tokens crazy fast. I did one prompt, sonnet misunderstood it as 'generate an image of this' and used all of my free tokens.

More comments...