Posted by adocomplete 6 hours ago
For agent workloads specifically, consistency matters more than peak intelligence. A model that follows your system prompt correctly 98% of the time beats one that's occasionally brilliant but ignores instructions 5% of the time. The claim about improved instruction following is the most important line in the announcement if you're building on the API.
The computer use improvements are worth watching too. We're at the point where these models can reliably fill out a multi-step form or navigate between tabs. Not flashy, but that's the kind of boring automation that actually saves people time.
Sonnet 4.6 Thinking 16K scores 57.6 on the Extended NYT Connections Benchmark. Sonnet 4.5 Thinking 16K scored 49.3.
Sonnet 4.6 No Reasoning scores 55.2. Sonnet 4.5 No Reasoning scored 47.4.
"ANTHROPIC_DEFAULT_SONNET_MODEL": "claude-sonnet-4-6[1m]"
has enabled the 1M context window.Fixed a UI issue I had yesterday in a web app very effectively using claude in chrome. Definitely not the fastest model - but the breathing space of 1M context is great for browser use.
[0] Anthropic have given away a bunch of API credits to cc subscribers - you can claim them in your settings dashboard to use for this.
https://claude.ai/share/876e160a-7483-4788-8112-0bb4490192af
This was sonnet 4.6 with extended thinking.
The classic puzzle actually uses *eight 8s*, not nine. The unique solution is: 888+88+8+8+8=1000. Count: 3+2+1+1+1=8 eights.
It then proves that there is no solution for nine 8s.
https://claude.ai/share/9a6ee7cb-bcd6-4a09-9dc6-efcf0df6096b (for whatever reason the LaTeX rendering is messed up in the shared chat, but it looks fine for me).
I wonder if it's a temperature thing or if things are being throttled up/down on time of day. I was signed in to a paid claude account when I ran the test.
``` Use digit concatenation plus addition: 888 + 88 + 8 + 8 + 8 = 1000 Digit count:
888 → three 8s
88 → two 8s
8 + 8 + 8 → three 8s
Total: 3 + 2 + 3 = 9 eights Operation used: addition only ```
Love the 3 + 2 + 3 = 9
https://chatgpt.com/share/6994d25e-c174-800b-987e-9d32c94d95...
Opus 4.6 in Claude Code has been absolutely lousy with solving problems within its current context limit so if Sonnet 4.6 is able to do long-context problems (which would be roughly the same price of base Opus 4.6), then that may actually be a game changer.
Can you share your prompts and problems?
A year ago today, Sonnet 3.5 (new), was the newest model. A week later, Sonnet 3.7 would be released.
Even 3.7 feels like ancient history! But in the gradient of 3.5 to 3.5 (new) to 3.7 to 4 to 4.1 to 4.5, I can’t think of one moment where I saw everything change. Even with all the noise in the headlines, it’s still been a silent revolution.
Am I just a believer in an emperor with no clothes? Or, somehow, against all probability and plausibility, are we all still early?
But I'm on Codex GPT 5.3 this month, and it's also quite amazing.
```
/model claude-sonnet-4-6[1m]
⎿ API error: 429 {"type":"error","error": {"type":"rate_limit_error","message":"Extra usage is required for long context requests."},"request_id":"[redacted]"}
```
i cant believe that havent updated their code yet to be able to handle the 1M context on subscription auth