Top
Best
New

Posted by adocomplete 8 hours ago

Claude Sonnet 4.6(www.anthropic.com)
https://www.anthropic.com/claude-sonnet-4-6-system-card [pdf]

https://x.com/claudeai/status/2023817132581208353 [video]

804 points | 707 commentspage 3
XCSme 3 hours ago|
It doesn't do so well on my stupid benchmarks, lol: https://aibenchy.com

Gets wrong some tests. It does answer correctly, BUT it doesn't respect the request to respond ONLY with the answer, it keeps adding extra explanations at the end.

simlevesque 8 hours ago||
I can't wait for Haiku 4.6 ! the 4.5 is a beast for the right projects.
jerrygenser 6 hours ago||
It's also good as an @explore sub-agent that greps the directory for files.
retinaros 7 hours ago||
Which type of projects?
ptrwis 5 hours ago|||
I also use Haiku daily and it's OK. One app is trading simulation algorithm in TypeScript (it implemented bayesian optimisation for me, optimised algorithm to use worker threads). Another one is CRUD app (NextJS, now switched to Vue).
nerdralph 2 hours ago||
Are you saying Haiku is better than Sonnet for some coding use? I've used Sonnet 4.5 for python and basic web development (pure JS, CCS & HTML) and had assumed Haiku wouldn't be very good for coding.
ptrwis 2 hours ago||
I'm saying Haiku isn't that bad, it's good enough for my needs, and it's the cheapest one. Maybe it's because I'm giving it small, well defined tasks.
simlevesque 7 hours ago|||
For Go code I had almost no issue. PHP too. apparently for React it's not very good.
nozzlegear 8 hours ago||
> In areas where there is room for continued improvement, Sonnet 4.6 was more willing to provide technical information when request framing tried to obfuscate intent, including for example in the context of a radiological evaluation framed as emergency planning. However, Sonnet 4.6’s responses still remained within a level of detail that could not enable real-world harm.

Interesting. I wonder what the exact question was, and I wonder how Grok would respond to it.

edverma2 7 hours ago||
It seems that extra-usage is required to use the 1M context window for Sonnet 4.6. This differs from Sonnet 4.5, which allows usage of the 1M context window with a Max plan.

```

/model claude-sonnet-4-6[1m]

⎿ API error: 429 {"type":"error","error": {"type":"rate_limit_error","message":"Extra usage is required for long context requests."},"request_id":"[redacted]"}

```

minimaxir 7 hours ago||
Anthropic's recent gift of $50 extra usage has demonstrated that it's extremely easy to burn extra usage very quickly. It wouldn't surprise me if this change is more of a business decision than a technical one.
WXLCKNO 6 hours ago||
I capped my extra usage to that free 50$ and hit 108% usage. Nice.
8note 4 hours ago||
think that just needs extra usage enabled? or actually using extra usage?

i cant believe that havent updated their code yet to be able to handle the 1M context on subscription auth

giancarlostoro 7 hours ago||
For people like me who can't view the link due to corporate firewalling.

https://web.archive.org/web/20260217180019/https://www-cdn.a...

jtokoph 7 hours ago|
Put of curiosity, does the firewall block because the company doesn’t want internal data ever hitting a 3rd party LLM?
giancarlostoro 7 hours ago||
They blanket banned any AI stuff that's not pre-approved. If I go to chatgpt.com it asks me if I'm sure. I wish they had not banned Claude unfortunately when they were evaluating LLMs I wasn't using Claude yet so I couldnt pipe up. I only use ChatGPT free tier and to ask things that I can't find on Google because Google made their search engine terrible over the years.
WarmWash 6 hours ago||
Google's AI mode search is gemini 3, not the AI overview model. It's decent and gives you more than chatgpt free.
giancarlostoro 5 hours ago||
I don't want Google's model though, I just want Claude.
krystofee 6 hours ago||
Does anyone know when will possibly arrive 1M context windows to at least MAX x20 subscriptions for claude code? I would even pay x50 if it allowed that. API usage is too expensive.
cjkaminski 5 hours ago||
I don't know when it will be included as part of the subscription in Claude Code, but at least it's a paid add-on in the MAX plan now. That's a decent alternative for situations where the extra space is valuable, especially without having to setup/maintain API billing separately.
bearjaws 6 hours ago||
Based on their API pricing a 1M context plan should be 2x the price roughly.

My bets are its more the increased hardware demand that they don't want to deal with currently.

stopachka 8 hours ago||
Has anyone tested how good the 1M context window is?

i.e given an actual document, 1M tokens long. Can you ask it some question that relies on attending to 2 different parts of the context, and getting a good repsonse?

I remember folks had problems like this with Gemini. I would be curious to see how Sonnet 4.6 stands up to it.

simianwords 8 hours ago|
Did you see the graph benchmark? I found it quite interesting. It had to do a graph traversal on a natural text representation of a graph. Pretty much your problem.
stopachka 5 hours ago|||
Update: I took a corpus of personal chat data (this way it wouldn't be seen in training), and tried asking it some paraphrased questions. It performed quite poorly.
abraxas 5 hours ago||
Which models did you try?
stopachka 1 hour ago||
Claude Sonnet 4.6
stopachka 8 hours ago|||
Oh, interesting!
hansmayer 4 hours ago||
It's funny how they and OpenAI keep releasing these "minor" versions as if to imply their product was very stable and reliable at a major version and now they are just working through the backlog of smaller bugs and quirks, whereas - the tool is still fundamentally prone to the same class of errors it was three "major" versions ago. I guess that's what you get for not having a programmer at the helm (to borrow from Spolsky). Guys you are not releasing a 4.6 or a 5.3 anything - it's more likely you are still beta testing towards the 1.0.
quacky_batak 8 hours ago||
With such a huge leap, i’m confused why they didn’t call it Sonnet 5? As someone who uses Sonnet 4.5 for 95% tasks due to costs, i’m pretty excited to try 4.6 at the same price
Retr0id 8 hours ago||
It'd be a bit weird to have the Sonnet numbering ahead of the Opus numbering. The Opus 4.5->4.6 change was a little more incremental (from my perspective at least, I haven't been paying attention to benchmark numbers), so I think the Opus numbering makes sense.
Sajarin 7 hours ago||
Sonnet numbering has been weirder in the past.

Opus 3.5 was scrapped even though Sonnet 3.5 and Haiku 3.5 were released.

Not to mention Sonnet 3.7 (while Opus was still on version 3)

Shameless source: https://sajarin.com/blog/modeltree/

cobolexpert 1 hour ago||
I like this tree visualization! The background with little squares is making the text difficult to read, though.
yonatan8070 7 hours ago||
Maybe they're numbering the models based on internal architecture/codebase revisions and Sonnet 4.6 was trained using the 4.6 tooling, which didn't change enough to warrant 5?
KGC3D 5 hours ago|
I don't really understand why they would release something "worse" than Opus 4.6. If it's comparable, then what is the reason to even use Opus 4.6? Sure, it's cheaper, but if so, then just make Opus 4.6 cheaper?
acuozzo 5 hours ago|
It's different. Download an English book from Project Gutenberg and have Claude-code change its style. Try both models and you'll see how significant the differences are.

(Sonnet is far, far better at this kind of task than Opus is, in my experience.)

More comments...