Measuring Claude 4.7's tokenizer costs

Posted by aray07 15 hours ago

Measuring Claude 4.7's tokenizer costs(www.claudecodecamp.com)

573 points | 400 commentspage 5

motbus3 11 hours ago|

I've been using 4.6 models since each of them launched. Same for 4.5.

4.6 performers worse or the same in most of the tasks I have. If there is a parameter that made me use 4.6 more frequently is because 4.5 get dumber and not because 4.6 seemed smarter.

redml 13 hours ago||

It does cost more but I found the quality of output much higher. I prefer it over the dumbing of effort/models they were doing for the last two months. They have to get users used to picking the appropriate model for their task (or have an automatic mode - but still let me force it to a model).

DiscourseFan 13 hours ago||

Yeah I noticed today, I had it work up a spreadsheet for me and I only got 3 or 4 turns in the conversation before it used up all my (pro) credits. It wasn't even super-complicated or anything, only moderately so.

lacoolj 14 hours ago||

This is probably an adjacent result of this (from anthropic launch post):

> In Claude Code, we’ve raised the default effort level to xhigh for all plans.

Try changing your effort level and see what results you get

aray07 14 hours ago|

effort level is separate from tokenization. Tokenization impacts you the same regardless.

I find 5 thinking levels to be super confusing - I dont really get why they went from 3 -> 5

kburman 13 hours ago||

Anthropic must be loving it. It's free money.

aliljet 14 hours ago||

This is the reality I'm seeing too. Does this mean that the subscriptions (5x, 10x, 20x) are essentially reduced in token-count by 20-30%?

aray07 14 hours ago||

yeah thats the part that is unclear to me as well - if our usage capacity is now going to run out faster.

AndyNemmity 12 hours ago||

The same thing I've been doing all the time, now has used up 1/3rd of my week in one day on max20.

So yes, for the same tasks, usage runs out faster (currently)

cbg0 12 hours ago||

Boris said on Twitter that they've increased rate limits for everyone.

outlore 11 hours ago||

I can manage session cost effectively myself if forking and rewinds were first class features

markrogersjr 14 hours ago||

4.7 one-shot rate is at least 20-30% higher for me

ChicagoBoy11 14 hours ago||

How are you able to track this as you use it? A bit stumped atm

markrogersjr 13 hours ago||

Purely empirical

omega3 13 hours ago||

Contrary to people here who feel the price increases, reduction of subscription limits etc are the result of the Anthropic models being more expensive to run than the API & subscription revenue they generate I have a theory that Anthropic has been in the enshittification & rent seeking phase for a while in which they will attempt to extract as much money out of existing users as possible.

Commercial inference providers serve Chinese models of comparable quality at 0.1x-0.25x. I think Anthropic realised that the game is up and they will not be able to hold the lead in quality forever so it's best to switch to value extraction whilst that lead is still somewhat there.

CharlesW 13 hours ago|

> Commercial inference providers serve Chinese models of comparable quality…

"Comparable" is doing some heavy lifting there. Comparable to Anthropic models in 1H'25, maybe.

omega3 13 hours ago||

Benchmarks suggests they are comparable: https://artificialanalysis.ai/?models=claude-opus-4-6-adapti...

But let's say for the sake of discussion Opus is much better - still doesn't justify the price disparity especially when considering that other models are provided by commercial inference providers and anthropics is inhouse.

cbg0 12 hours ago|||

Try doing real work with them, it's night and day difference especially for systems programming. The non-frontier models to a lot of benchmaxxing to look good.

xienze 13 hours ago|||

> Benchmarks suggests they are comparable

The problem here is people think AI benchmarks are analogous to say, CPU performance benchmarks. They're not:

* You can't control all the variables, only one (the prompt).

* The outputs, BY DESIGN, can fluctuate wildly for no apparent reason (i.e., first run, utter failure, second run, success).

* The biggest point, once a benchmark is known, future iterations of the model will be trained on it.

Trying to objectively measure model performance is a fool's errand.

dallen33 15 hours ago|

I'm still using Sonnet 4.6 with no issues.

risyachka 15 hours ago|

How does this solve the issue? 4.6 will be disabled after one or more release like any other legacy model.

gadflyinyoureye 14 hours ago||

Won't the thing that replaces 4.6 come down in token cost?

More comments...