Claude Opus 4.6 - Hacker News

Posted by HellsMaddy 16 hours ago

1907 points | 805 commentspage 6

ra 11 hours ago|

Why are Anthropic such a horrible company to deal with?

danielbln 6 hours ago|

Care to elaborate?

ra 3 hours ago||

obscure billing, unreachable customer support gatekeeped by an overzealous chatbot, no transparency about inclusions, or changes to inclusions over time... just from recent experience.

winterrx 16 hours ago||

Agentic search benchmarks are a big gap up. let's see Codex release later today

osti 16 hours ago||

Somehow regresses on SWE bench?

lkbm 16 hours ago||

I don't know how these benchmarks work (do you do a hundred runs? A thousand runs?), but 0.1% seems like noise.

SubiculumCode 16 hours ago|||

That benchmark is pretty saturated, tbh. A "regression" of such small magnitude could mean many different things or nothing at all.

usaar333 16 hours ago||

i'd interpret that as rounding error. that is unchanged

swe-bench seems really hard once you are above 80%

Squarex 16 hours ago||

it's not a great benchmark anymore... starting with it being python / django primarily... the industry should move to something more representative

usaar333 16 hours ago||

Openai has; they don't even mention score on gpt-5.3-codex.

On the other hand, it is their own verified benchmark, which is telling.

m-hodges 16 hours ago||

> In Claude Code, you can now assemble agent teams to work on tasks together.

nprz 16 hours ago|

I was just reading about Steve Yegge's Gas Town[0], it sounds like agent orchestration is now integrated into Claude Code?

[0]https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...

simianwords 16 hours ago||

Important: API cost of Opus 4.6 and 4.5 are the same - no change in pricing.

rob 16 hours ago||

System Card: https://www-cdn.anthropic.com/0dd865075ad3132672ee0ab40b05a5...

niobe 12 hours ago||

Is there a good technical breakdown of all these benchmarks that get used to market the latest greatest LLMs somewhere? Preferably impartial.

Aztar 12 hours ago|

I just ask claude and ask for sources for each one.

niobe 6 hours ago||

Reminds me of how if you make a complaint against a lawyer or a judge it's evaluated by lawyers and judges.

kingstnap 16 hours ago||

I was hoping for a Sonnet as well but Opus 4.6 is great too!

paxys 16 hours ago||

Hmm all leaks had said this would be Claude 5. Wonder if it was a last minute demotion due to performance. Would explain the few days' delay as well.

trash_cat 16 hours ago||

I think the naming schemes are quite arbitrary at this point. Going to 5 would come with massive expectations that wouldn't meet reality.

mrandish 15 hours ago|||

After the negative reactions to GPT 5, we may see model versioning that asymptotically approaches the next whole number without ever reaching it. "New for 2030: Claude 4.9.2!"

esafak 7 hours ago||

Or approaching a magic number like e (Metafont) or π (TeX).

Squarex 15 hours ago|||

the standard used to be that major version means a new base model / full retrain... but now it is arbitrary i guess

cornedor 16 hours ago|||

Leaks were mentioning Sonnet 5 and I guess later (a combination of) Opus 4.6

scrollop 15 hours ago||

Sonnet 5 was mentioned initially.

sanufar 16 hours ago|

Works pretty nicely for research still, not seeing a substantial qualitative improvement over Opus 4.5.

More comments...