Claude Sonnet 4.6 - Hacker News

Posted by adocomplete 6 hours ago

https://www.anthropic.com/claude-sonnet-4-6-system-card [pdf]

https://x.com/claudeai/status/2023817132581208353 [video]

716 points | 594 commentspage 2

Arifcodes 4 hours ago|

The interesting pattern with these Sonnet bumps: the practical gap between Sonnet and Opus keeps shrinking. At $3/15 per million tokens vs whatever Opus 4.6 costs, the question for most teams is no longer "which model is smarter" but "is the delta worth 10x the price."

For agent workloads specifically, consistency matters more than peak intelligence. A model that follows your system prompt correctly 98% of the time beats one that's occasionally brilliant but ignores instructions 5% of the time. The claim about improved instruction following is the most important line in the announcement if you're building on the API.

The computer use improvements are worth watching too. We're at the point where these models can reliably fill out a multi-step form or navigate between tabs. Not flashy, but that's the kind of boring automation that actually saves people time.

skybrian 1 hour ago|

Looking the pricing page, Sonnet 4.6 seems to be about 60% the price of Opus 4.6. What am I missing?

https://platform.claude.com/docs/en/about-claude/pricing

Arifcodes 1 hour ago||

Fair point on the sticker price. The ratio shifts when you factor in cache read costs on long contexts. Sonnet 4.6 cache reads are $0.30/MTok vs Opus 4.6 at $1.50/MTok - a 5x difference that matters a lot on repeated agentic runs or RAG pipelines where the same large context gets reused. For single-shot short prompts you are right, the gap is not that dramatic. For anything with a warm cache it closes fast.

zone411 3 hours ago||

They're improved compared to 4.5 on my Extended NYT Connections benchmark (https://github.com/lechmazur/nyt-connections/).

Sonnet 4.6 Thinking 16K scores 57.6 on the Extended NYT Connections Benchmark. Sonnet 4.5 Thinking 16K scored 49.3.

Sonnet 4.6 No Reasoning scores 55.2. Sonnet 4.5 No Reasoning scored 47.4.

nikcub 5 hours ago||

Enabling /extra-usage in my (personal) claude code[0] with this env:

    "ANTHROPIC_DEFAULT_SONNET_MODEL": "claude-sonnet-4-6[1m]"

has enabled the 1M context window.

Fixed a UI issue I had yesterday in a web app very effectively using claude in chrome. Definitely not the fastest model - but the breathing space of 1M context is great for browser use.

[0] Anthropic have given away a bunch of API credits to cc subscribers - you can claim them in your settings dashboard to use for this.

gverrilla 7 minutes ago|

/extra-usage inside claude code also works

stevepike 6 hours ago||

I'm a bit surprised it gets this question wrong (ChatGPT gets it right, even on instant). All the pre-reasoning models failed this question, but it's seemed solved since o1, and Sonnet 4.5 got it right.

https://claude.ai/share/876e160a-7483-4788-8112-0bb4490192af

This was sonnet 4.6 with extended thinking.

bobbylarrybobby 4 hours ago||

Interesting, my sonnet 4.6 starts with the following:

The classic puzzle actually uses *eight 8s*, not nine. The unique solution is: 888+88+8+8+8=1000. Count: 3+2+1+1+1=8 eights.

It then proves that there is no solution for nine 8s.

https://claude.ai/share/9a6ee7cb-bcd6-4a09-9dc6-efcf0df6096b (for whatever reason the LaTeX rendering is messed up in the shared chat, but it looks fine for me).

stevepike 27 minutes ago||

Yeah, earlier in the GPT days I felt like this was a good example of LLMs being "a blurry jpeg of the web", since you could give them something that was very close to an existing puzzle that exists commonly on the web, and they'd regurgitate an answer from that training set. It was neat to me to see the question get solved consistently by the reasoning models (though often by churning a bunch of tokens trying and verifying to count 888 + 88 + 8 + 8 + 8 as nine digits).

I wonder if it's a temperature thing or if things are being throttled up/down on time of day. I was signed in to a paid claude account when I ran the test.

malfist 5 hours ago|||

Chatgpt doesn't get it right: https://chatgpt.com/share/6994c312-d7dc-800f-976a-5e4fbec0ae...

``` Use digit concatenation plus addition: 888 + 88 + 8 + 8 + 8 = 1000 Digit count:

888 → three 8s

88 → two 8s

8 + 8 + 8 → three 8s

Total: 3 + 2 + 3 = 9 eights Operation used: addition only ```

Love the 3 + 2 + 3 = 9

simianwords 4 hours ago||

chatgpt gets it right. maybe you are using free or non thinking version?

https://chatgpt.com/share/6994d25e-c174-800b-987e-9d32c94d95...

leumon 4 hours ago|||

My locally running nemotron-3-nano quantized to Q4_K_M gets this right. (although it used 20k thought tokens before answering the question)

layer8 5 hours ago||

Off-by-one errors are one of the hardest problems in computer science.

anonymous908213 5 hours ago||

That is not an off-by-one error in a computer science sense, nor is it "one of the hardest problems in computer science".

layer8 5 hours ago||

This was in reference to a well-known joke, see here: https://martinfowler.com/bliki/TwoHardThings.html

nubg 6 hours ago||

Waiting for the OpenAI GPT-5.3-mini release in 3..2..1

minimaxir 5 hours ago||

As with Opus 4.6, using the beta 1M context window incurs a 2x input cost and 1.5x output cost when going over >200K tokens: https://platform.claude.com/docs/en/about-claude/pricing

Opus 4.6 in Claude Code has been absolutely lousy with solving problems within its current context limit so if Sonnet 4.6 is able to do long-context problems (which would be roughly the same price of base Opus 4.6), then that may actually be a game changer.

sumedh 4 hours ago||

> Opus 4.6 in Claude Code has been absolutely lousy with solving problems

Can you share your prompts and problems?

minimaxir 4 hours ago||

You cut out the "within its current context limit" phrase. It solves the problems, just often with 1% or 0% context limit left and it makes me sweat.

egeozcan 4 hours ago||

Why? You can use the fast version to directly skip to compact! /s

gallerdude 6 hours ago||

The weirdest thing about this AI revolution is how smooth and continuous it is. If you look closely at differences between 4.6 and 4.5, it’s hard to see the subtle details.

A year ago today, Sonnet 3.5 (new), was the newest model. A week later, Sonnet 3.7 would be released.

Even 3.7 feels like ancient history! But in the gradient of 3.5 to 3.5 (new) to 3.7 to 4 to 4.1 to 4.5, I can’t think of one moment where I saw everything change. Even with all the noise in the headlines, it’s still been a silent revolution.

Am I just a believer in an emperor with no clothes? Or, somehow, against all probability and plausibility, are we all still early?

dtech 5 hours ago||

If you've been using each new step is very noticeable and so have the mindshare. Around Sonnet 3.7 Claude Code-style coding became usable, and very quickly gained a lot of marketshare. Opus 4 could tackle significant more complexity. Opus 4.6 has been another noticable step up for me, suddenly I can let CC run significantly more independently, allowing multiple parallel agents where previously too much babysitting was required for that.

CuriouslyC 5 hours ago|||

In terms of real work, it was the 4 series models. That raised the floor of Sonnet high enough to be "reliable" for common tasks and Opus 4 was capable of handling some hard problems. It still had a big reward hacking/deception problem that Codex models don't display so much, but with Opus 4.5+ it's fairly reliable.

cmrdporcupine 5 hours ago|||

Honestly, 4.5 Opus was the game changer. From Sonnet 4.5 to that was a massive difference.

But I'm on Codex GPT 5.3 this month, and it's also quite amazing.

jasonsb 6 hours ago||

[dead]

simlevesque 6 hours ago||

I can't wait for Haiku 4.6 ! the 4.5 is a beast for the right projects.

jerrygenser 5 hours ago||

It's also good as an @explore sub-agent that greps the directory for files.

retinaros 5 hours ago||

Which type of projects?

ptrwis 3 hours ago|||

I also use Haiku daily and it's OK. One app is trading simulation algorithm in TypeScript (it implemented bayesian optimisation for me, optimised algorithm to use worker threads). Another one is CRUD app (NextJS, now switched to Vue).

nerdralph 1 hour ago||

Are you saying Haiku is better than Sonnet for some coding use? I've used Sonnet 4.5 for python and basic web development (pure JS, CCS & HTML) and had assumed Haiku wouldn't be very good for coding.

ptrwis 48 minutes ago||

I'm saying Haiku isn't that bad, it's good enough for my needs, and it's the cheapest one. Maybe it's because I'm giving it small, well defined tasks.

simlevesque 5 hours ago|||

For Go code I had almost no issue. PHP too. apparently for React it's not very good.

hansmayer 2 hours ago||

It's funny how they and OpenAI keep releasing these "minor" versions as if to imply their product was very stable and reliable at a major version and now they are just working through the backlog of smaller bugs and quirks, whereas - the tool is still fundamentally prone to the same class of errors it was three "major" versions ago. I guess that's what you get for not having a programmer at the helm (to borrow from Spolsky). Guys you are not releasing a 4.6 or a 5.3 anything - it's more likely you are still beta testing towards the 1.0.

edverma2 5 hours ago|

It seems that extra-usage is required to use the 1M context window for Sonnet 4.6. This differs from Sonnet 4.5, which allows usage of the 1M context window with a Max plan.

```

/model claude-sonnet-4-6[1m]

⎿ API error: 429 {"type":"error","error": {"type":"rate_limit_error","message":"Extra usage is required for long context requests."},"request_id":"[redacted]"}

```

minimaxir 5 hours ago||

Anthropic's recent gift of $50 extra usage has demonstrated that it's extremely easy to burn extra usage very quickly. It wouldn't surprise me if this change is more of a business decision than a technical one.

WXLCKNO 4 hours ago||

I capped my extra usage to that free 50$ and hit 108% usage. Nice.

8note 2 hours ago||

think that just needs extra usage enabled? or actually using extra usage?

i cant believe that havent updated their code yet to be able to handle the 1M context on subscription auth

More comments...