Anonymous request-token comparisons from Opus 4.6 and Opus 4.7

Posted by anabranch 6 days ago

Anonymous request-token comparisons from Opus 4.6 and Opus 4.7(tokens.billchambers.me)

615 points | 575 commentspage 6

gverrilla 6 days ago|

Yeah I'm seriously considering dropping my Max subscription, unless they do something in the next few days - something like dropping Sonnet 4.7 cheap and powerful.

jbrooks84 5 days ago||

I don't get all this talk on the new model. I see enhanced capabilities and more token usage. Need to use external validation and specs.

Frannky 6 days ago||

My subscription was up for renewal today. I gave it a shot with OpenCode Go + Xiaomi model. So far, so good—I can get stuff done the same way it seems.

blahblaher 6 days ago||

Conspiracy time: they released a new version just so hey could increase the price so that people wouldn't complain so much along the lines of "see this is a new version model, so we NEED to increase the price") similar to how SaaS companies tack on some shit to the product so that they can increase prices

willis936 6 days ago|

The result is the same: they lose their brand of producing quality output. However the more clever the maneuver they try to pull off the more clear it is to their customers that they are not earning trust. That's what will matter at the end of this. Poor leadership at Claude.

operatingthetan 6 days ago||

They are trying to pull a rabbit out of a hat. Not surprising that is their SOP given that AI in concept is an attempt to do the very same thing.

nickvec 6 days ago||

For all intents and purposes, aren't the "token change" and "cost change" metrics effectively the same thing?

micromacrofoot 6 days ago||

The latest qwen actually performs a little better for some tasks, in my experience

latest claude still fails the car wash test

reddit_clone 6 days ago|

Not just _wrong_. It is confused! It is actually right in the second sentence. This was Friday, Opus 4.6.

>I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

Walk. It's 50 meters — you're going there to clean the car anyway, so drive it over if it needs washing, but if you're just dropping it off or it's a self-service place, walking is fine for that distance.

zozbot234 6 days ago||

This is actually a good diagnostic of whether the model is skimping on the thinking loop. Try raising thinking effort and it should get it right. Of course, if you're running this in a coding harness with a whole lot of extraneous context, the model will be awfully confused as to what it should be thinking about.

eezing 6 days ago||

Not sure if this equates to more spend. Smarter models make fewer mistakes and thus fewer round trips.

bparsons 6 days ago||

Had a pretty heavy workload yesterday, and never hid the limit on claude code. Perhaps they allowed for more tokens for the launch?

Claude design on the other hand seemed to eat through (its own separate usage limit) very fast. Hit the limit this morning in about 45 mins on a max plan. I assume they are going to end up spinning that product off as a separate service.

axeldunkel 6 days ago||

the better the tokenizer maps text to its internal representation, the better the understanding of the model what you are saying - or coding! But 4.7 is much more verbose in my experience, and this probably drives cost/limits a lot.

liangyunwuxu 6 days ago|

If possible, I will continue to use version 4.6 until it is discontinued.

More comments...