Posted by aray07 10 hours ago
Given that Opus 4.6 and even Sonnet 4.6 are still valid options, for me the question is not "Does 4.7 cost more than claimed?" but "What capabilities does 4.7 give me that 4.6 did not?"
Yesterday 4.6 was a great option and it is too soon for me to tell if 4.7 is a meaningful lift. If it is, then I can evaluate if the increased cost is justified.
I'll look at the new models, but increasing the token consumptions by a factor of 7 on copilot, and then running into all of these budget management topics people talk about? That seems to introduce even more flow-breakers into my workflow, and I don't think it'll be 7 times better. Maybe in some planning and architectural topics where I used Opus 4.6 before.
https://marginlab.ai/trackers/claude-code-historical-perform...
But... Are you really going to completely rely on benchmarks that have time and time again be shown to be gamed as the complete story?
My take: It is pretty clear that the capacity crunch is real and the changes they made to effort are in part to reduce that. It likely changed the experience for users.
Moreover, on the companion codex graphs (https://marginlab.ai/trackers/codex-historical-performance/), you can see a few different GPT model releases marked yet none correspond to a visual break in the series. Either GPT 5.4-xhigh is no more powerful than GPT 5.2, or the benchmarking apparatus is not sensitive enough to detect such changes.
How is it fine?
I'm already at 27% of my weekly limit in ONE DAY.
So that seems about what you should expect.
it seems to hallucinate a bit more (anecdotal)
Brilliant.
Ouch, that's very different than experience. What effort level? Are you careful to avoid pushing session context use beyond 350k or so (assuming 1m context)?
And this particular set of things has context routinely hit 350-450k before I compact.
That's likely what it is? I think this particular work stream is eating a lot of tokens.
Earlier this week (before Open 4.7 hit), I just turned off 1m context and had it grow a lot slower.
I also have it on high all the time. Medium was starting to feel like it was making the occasional bad decisions and also forgetting things more.
All of us doing crazy agentic stuff were fine on max before this. Now with Opus 4.7, we're no longer fine, and troubleshooting, and working through options.
Ya...you may be who I'm talking about though (if you're speaking from experience). If your methodology is "I used 4.6 max, so I'm going to try 4.7 max" this is fully on you - 4.7 max is not equivalent to 4.6 max, you want 4.7 xhigh.
From their docs:
max: Max effort can deliver performance gains in some use cases, but may show diminishing returns from increased token usage. This setting can also sometimes be prone to overthinking. We recommend testing max effort for intelligence-demanding tasks.
xhigh (new): Extra high effort is the best setting for most coding and agentic use cases.
I am on xhigh.
I've always used high, so maybe I should be using xhigh
I used up 1/3rd of my context in less than a day. I am working diligently to do whatever I can to lower token usage.
Recently it started promoting me for feedback even though I am on API access and have disabled this. When I did a deep dive of their feedback mechanism in the past (months ago so probably changed a lot since then) the feedback prompt was pushing message ids even if you didn't respond. If you are on API usage and have told them no to training on your data then anything pushing a message id implies that it is leaking information about your session. It is hard to keep auditing them when they push so many changes so I am now 'default they are stealing my info' instead of believing their privacy/data use policy claims. Basically, my level of trust is eroding fast in their commitment to not training on me and I am paying a premium to not have that happen.
https://matrix.dev/blog-2026-04-16.html (We were talking to Opus 4.7 twelve days ago)
Retirement date for Opus 4.6 is marked as "Not sooner than February 5, 2027"
Looks like they lost the mandate of heaven, if Open AI plays it right it might be their end. Add to that the open source models from China.
When I read these comments on Hacker News, I see a lot of people miffed about their personal subscription limits. I think this is a viewpoint that is very consumer focused, and probably within Anthropic they're seeing buckets of money being dumped on them from enterprises. They probably don't really care as much about the individual subscription user, especially power users.
2. Anthropic and OpenAI's financials are totally different. The former has nearly the same RRR and a fraction of the cash burn. There is a reason Anthropic is hot on secondary and OAI isn't
You're offended by their political beliefs, so you don't like the way the model works?
I also wonder if token utilization has or will ever find its way to employee performance reviews as these models go up in price.
People that think they got what they wanted, the feature is there!, so they can't complain but...
People that end up essentially randomly picking so the average value of the choices made by customers is suboptimal.