Top
Best
New

Posted by HellsMaddy 10 hours ago

Claude Opus 4.6(www.anthropic.com)
1570 points | 670 commentspage 2
dmk 9 hours ago|
The benchmarks are cool and all but 1M context on an Opus-class model is the real headline here imo. Has anyone actually pushed it to the limit yet? Long context has historically been one of those "works great in the demo" situations.
pants2 9 hours ago||
Paying $10 per request doesn't have me jumping at the opportunity to try it!
cedws 8 hours ago|||
Makes me wonder: do employees at Anthropic get unmetered access to Claude models?
swader999 6 hours ago|||
It's like when you work at McDonald's and get one free meal a day. Lol, of course they get access to the full model way before we do...
wiredpancake 5 hours ago||
[dead]
ajam1507 6 hours ago|||
Seems quite obvious that they do, within reason.
schappim 8 hours ago|||
The only way to not go bankrupt is to use a Claude Code Max subscription…
nomel 8 hours ago|||
Has a "N million context window" spec ever been meaningful? Very old, very terrible, models "supported" 1M context window, but would lose track after two small paragraphs of context into a conversation (looking at you early Gemini).
libraryofbabel 7 hours ago||
Umm, Sonnet 4.5 has a 1m context window option if you are using it through the api, and it works pretty well. I tend not to reach for it much these days because I prefer Opus 4.5 so much that I don't mind the added pain of clearing context, but it's perfectly usable. I'm very excited I'll get this from Opus now too.
nomel 4 hours ago||
If you're getting on along with 4.5, then that suggests you didn't actually need the large context window, for your use. If that's true, what's the clear tell that it's working well? Am I misunderstanding?

Did they solve the "lost in the middle" problem? Proof will be in the pudding, I suppose. But that number alone isn't all that meaningful for many (most?) practical uses. Claude 4.5 often starts reverting bug fixes ~50k tokens back, which isn't a context window length problem.

Things fall apart much sooner than the context window length for all of my use cases (which are more reasoning related). What is a good use case? Do those use cases require strong verification to combat the "lost in the middle" problems?

awestroke 9 hours ago||
Opus 4.5 starts being lazy and stupid at around the 50% context mark in my opinion, which makes me skeptical that this 1M context mode can produce good output. But I'll probably try it out and see
itay-maman 7 hours ago||
Important: I didn't see opus 4.6 in claude code. I have native install (which is the recommended instllation). So, I re-run the installation command and, voila, I have it now (v 2.1.32)

Installation instructions: https://code.claude.com/docs/en/overview#get-started-in-30-s...

insane_dreamer 6 hours ago|
It’s there. I’m already using it
hmaxwell 6 hours ago||
I just tested both codex 5.3 and opus 4.6 and both returned pretty good output, but opus 4.6's limits are way too strict. I am probably going to cancel my Claude subscription for that reason:

What do you want to do?

  1. Stop and wait for limit to reset
   2. Switch to extra usage
   3. Upgrade your plan

 Enter to confirm · Esc to cancel
How come they don't have "Cancel your subscription and uninstall Claude Code"? Codex lasts for way longer without shaking me down for more money off the base $xx/month subscription.
ArchieScrivener 3 hours ago||
How else are they going to supplement their own development expenses? The more Claude Anthropic needs the less Claude the customer will get. By their own admission that is how the Anthropic model works. Their end value is in using vibe coders and engineers alike to create a persistent synthetic developer that replaces their own employees and most of their customers.

Scalable Intelligence is just a wrapper for centralized power. All Ai companies are headed that way.

seunosewa 5 hours ago||
They introduced the low limit warning for Opus on claude.ai
minimaxir 10 hours ago||
Will Opus 4.6 via Claude Code be able to access the 1M context limit? The cost increase by going above 200k tokens is 2x input, 1.5x output, which is likely worth it especially for people with the $100/$200 plans.
CryptoBanker 9 hours ago|
The 1M context is not available via subscription - only via API usage
romanovcode 9 hours ago||
Well this is extremely disappointing to say the least.
ayhanfuat 9 hours ago|||
It says "subscription users do not have access to Opus 4.6 1M context at launch" so they are probably planning to roll it out to subscription users too.
kimixa 8 hours ago||
Man I hope so - the context limit is hit really quickly in many of my use cases - and a compaction event inevitably means another round of corrections and fixes to the current task.

Though I'm wary about that being a magic bullet fix - already it can be pretty "selective" in what it actually seems to take into account documentation wise as the existing 200k context fills.

humanfromearth9 7 hours ago|||
Hello,

I check context use percentage, and above ~70% I ask it to generate a prompt for continuation in a new chat session to avoid compaction.

It works fine, and saves me from using precious tokens for context compaction.

Maybe you should try it.

pluralmonad 6 hours ago||
How is generating a continuation prompt materially different from compaction? Do you manually scrutinize the context handoff prompt? I've done that before but if not I do not see how it is very different from compaction.
robertfw 3 hours ago||
I wonder if it's just: compact earlier, so there's less to compact, and more remaining context that can be used to create a more effective continuation
nickstinemates 8 hours ago||||
Is this a case of doing it wrong, or you think accuracy is good enough with the amount of context you need to stuff it with often?
kimixa 8 hours ago|||
I mean the systems I work on have enough weird custom APIs and internal interfaces just getting them working seems to take a good chunk of the context. I've spent a long time trying to minimize every input document where I can, compact and terse references, and still keep hitting similar issues.

At this point I just think the "success" of many AI coding agents is extremely sector dependent.

Going forward I'd love to experiment with seeing if that's actually the problem, or just an easy explanation of failure. I'd like to play with more controls on context management than "slightly better models" - like being able to select/minimize/compact sections of context I feel would be relevant for the immediate task, to what "depth" of needed details, and those that aren't likely to be relevant so can be removed from consideration. Perhaps each chunk can be cached to save processing power. Who knows.

romanovcode 7 hours ago|||
In my example the Figma MCP takes ~300k per medium sized section of the page and it would be cool to enable it reading it and implementing Figma designs straight. Currently I have to split it which makes it annoying.
IhateAI_2 6 hours ago|||
lmao what are you building that actually justify needing 1mm tokens on a task? People are spending all this money to do magic tricks on themselves.
kimixa 6 hours ago||
The opus context window is 200k tokens not 1mm.

But I kinda see your point - assuming from you're name you're not just a single purpose troll - I'm still not sold on the cost effectiveness of the current generation, and can't see a clear and obvious change to that for the next generation - especially as they're still loss leaders. Only if you play silly games like "ignoring the training costs" - IE the majority of the costs - do you get even close to the current subscription costs being sufficient.

My personal experience is that AI generally doesn't actually do what it is being sold for right now, at least in the contexts I'm involved with. Especially by somewhat breathless comments on the internet - like why are they even trying to persuade me in the first place? If they don't want to sell me anything, just shut up and keep the advantage for yourselves rather than replying with the 500th "You're Holding It Wrong" comment with no actionable suggestions. But I still want to know, and am willing to put the time, effort and $$$ in to ensure I'm not deluding myself in ignoring real benefits.

IhateAI_2 6 hours ago|||
They want the value of your labor and competency to be 1:1 correlated to the quality and quantity of tokens you can afford (or be loaned)??

Its a weapon who's target is the working class. How does no one realize this yet?

Don't give them money, code it yourself, you might be surprised how much quality work you can get done!

kmod 2 hours ago||
I think it's interesting that they dropped the date from the API model name, and it's just called "claude-opus-4-6", vs the previous was "claude-opus-4-5-20251101". This isn't an alias like "claude-opus-4-5" was, it's the actual model name. I think this means they're comfortable with bumping the version number if they want to release a revision.
charcircuit 9 hours ago||
From the press release at least it sounds more expensive than Opus 4.5 (more tokens per request and fees for going over 200k context).

It also seems misleading to have charts that compare to Sonnet 4.5 and not Opus 4.5 (Edit: It's because Opus 4.5 doesn't have a 1M context window).

It's also interesting they list compaction as a capability of the model. I wonder if this means they have RL trained this compaction as opposed to just being a general summarization and then restarting the agent loop.

thunfischtoast 6 hours ago||
On Openrouter it has the same cost per token as 4.5
charcircuit 2 hours ago||
You missed my point. If the average request uses more tokens than 4.5, then you will pay more sending those requests to 4.6 than 4.5.

Imagine 2 models where when asking a yes or no question the first model just outputs a single yes or no then but the second model outputs a 10 page essay and then either yes or no. They could have the same price per token but ultimately one will be cheaper to ask questions to.

eaf7e281 9 hours ago||
> From the press release at least it sounds more expensive than Opus 4.5 (more tokens per request and fees for going over 200k context).

That's a feature. You could also not use the extra context, and the price would be the same.

charcircuit 9 hours ago||
The model influences how many tokens it uses for a problem. As an extreme example if it wanted it could fill up the entire context each time just to make you pay more. The efficiency that model can answer without generating a ton of tokens influences the price you will be spending on inference.
mlmonkey 6 hours ago||
> We build Claude with Claude.

How long before the "we" is actually a team of agents?

mercat 3 hours ago|
Starting today maybe? https://code.claude.com/docs/en/agent-teams
22c 7 minutes ago||
I tried teams, good way to burn all your tokens in a matter of minutes.

It seems that the Claude Code team has not properly taught Claude how to use teams effectively.

One of the biggest problems I saw with it is that Claude assumes team members are like a real worker, where once they finish a task they should immediately be given the next task. What should really happen is once they finish a task they should be terminated and a new agent should be spawned for the next task.

mFixman 10 hours ago||
I found that "Agentic Search" is generally useless in most LLMs since sites with useful data tend to block AI models.

The answer to "when is it cheaper to buy two singles rather than one return between Cambridge to London?" is available in sites such as BRFares, but no LLM can scrape it so it just makes up a generic useless answer.

causalmodels 9 hours ago|
Is it still getting blocked when you give it a browser?
DanielHall 7 hours ago||
A bit surprised, the first one released wasn't Sonnet 5 after all, since the Google Cloud API had leaked Sonnet 5's model snapshot codename before.
denysvitali 7 hours ago|
Looks like a marketing strategy to bill more for Opus than Sonnet
silverwind 9 hours ago|
Maybe that's why Opus 4.5 has degraded so much in the recent days (https://marginlab.ai/trackers/claude-code/).
jwilliams 7 hours ago|
I’ve definitely experienced a subjective regression with Opus 4.5 the last few days. Feels like I was back to the frustrations from a year ago. Keen to see if 4.6 has reversed this.
More comments...