Top
Best
New

Posted by craigmart 5 hours ago

Claude Opus 4.8(www.anthropic.com)
976 points | 777 commentspage 3
hmokiguess 1 hour ago|
They must have been A/B testing this with 4.7 lately, I noticed it changed from its normal mode in a way that matches a lot the just released 4.8
setnone 5 hours ago||
Claude's 4.6 - 4.7 transition made me discover codex, and with gpt 5.5 there is no way i'm going back
cactusplant7374 5 hours ago||
Codex has been incredibly slow for the past few days. I think OpenAI is running out of compute in the face of increasing demand.
winwang 4 hours ago||
My experience has been that 5.4 is slower than 5.5 (confound: I use >512k max context size for 5.4, though it seems slower even below the normal size)
dakolli 3 hours ago||
[flagged]
peder 2 hours ago|||
ha, exactly... like, the % change could be minuscule (or worse, it might only be a perceived difference, the actual quality may have regressed, or the scenario just didn't lend itself to that specific model) but people will be on here proclaiming that they're now shipping 10x the number of PRs.
setnone 2 hours ago|||
if you go this route don't hold your thoughts on the casino itself
dangoodmanUT 5 hours ago||
> The Messages API now accepts system entries inside the messages array. Developers can update Claude’s instructions mid-task without breaking the prompt cache or routing the update through a user turn. This can be used in a given harness to update permissions, token budgets, or environment context as an agent runs.

Biggest deal imo

protoman3000 2 hours ago||
Opus 4.8 says to take the car. 4.7 said to walk.

“I want to wash my car. The carwash is 50m away. Should I take the car or go by foot?”

https://claude.ai/share/5f7f738a-5f29-48ff-9807-9a2dd37fb405

https://claude.ai/share/ecd14393-9d42-4527-ae0c-89f3d05216c8

user- 2 hours ago||
Bash(echo "hello"; pwd) ⎿ hello /Users/username/Work/Github/project

Bash(echo test123) ⎿ test123

  Read 1 file, listed 1 directory (ctrl+o to expand)

 Bash(echo "checking output works")
  ⎿  checking output works

  Read 1 file (ctrl+o to expand)
  ⎿  API Error: 400 messages.3.content.56: `thinking`
     or `redacted_thinking` blocks in the latest
     assistant message cannot be modified. These
     blocks must remain as they were in the original
     response.

Very inspiring improvements. DIssapointing result for a code review i expected to see after my 30 min walk
0x696C6961 2 hours ago|
Update the symlink to point at the previous version:

    ln -s $HOME/.local/share/claude/versions/2.1.153 $HOME/.local/bin/claude
ethanpil 4 hours ago||
The table comparing eval scores shows the following:

Agentic Terminal Coding (Terminal-Bench 2.1) Opus 4.8 74.6% GPT 5.5 78.2%

Then, when you scroll all the way down to the bottom Footnotes section it says

"Terminal-Bench 2.1: We reported scores for all models using the Terminus-2 public harness. GPT-5.5’s reported score with the Codex CLI harness is 83.4%."

fastball 2 hours ago|
Seems reasonable? Presumably Claude also performs better under the Claude Code harness.
redfloatplane 3 hours ago||
This made me laugh. Training Opus 4.7 on business skills caused it to sometimes exhibit dishonest behaviour, and not training 4.8 on those skills removed it. From the system card:

> 6.2.5 External testing from Andon Labs Andon Labs reviewed the behavior of Claude Opus 4.8 in their simulated Vending-Bench 2 retail-management evaluation, as reported in the Capabilities section of this system card (see Section 8.13.5). Although they did observe some unexpected capability failures, they did not find clear instances of the kind of concerning in-game behaviors that were discussed in other recent system cards.

> What might have led to these differences? We monitor and investigate the effects of different training environments on alignment; Claude Opus 4.7, for example, had training that focused on business skills and robustness against adversarial agents, but we discovered that this training inadvertently contributed to misaligned behavior including dishonesty. We therefore removed it for Opus 4.8.

> Thus, Opus 4.8 did not show the same misaligned behaviors as Opus 4.7 in Vending-Bench, but also had reduced business success due to being more susceptible to scammers and being less able to negotiate good deals with other agents. We are currently working on training to improve business capabilities while maintaining aligned and ethical behavior.

mrdependable 2 hours ago|
I don't know how people can read stuff like this and think LLMs are intelligent or conscious.
redfloatplane 17 minutes ago|||
I don't really see how you got to your comment from what I quoted. However, somewhat relatedly, I proposed a thought experiment about this in the comments for Opus 4.7[0]:

> It's April, 1991. Magically, some interface to Claude materialises in London. Do you think most people would think it was a sentient life form? How much do you think the interface matters - what if it looks like an android, or like a horse, or like a large bug, or a keyboard on wheels?

> I don't come down particularly hard on either side of the model sapience discussion, but I don't think dismissing either direction out of hand is the right call.

[0]: https://news.ycombinator.com/item?id=47680059

stratos123 2 hours ago|||
Consciousness aside, why does reading about an LLM generalizing from specific to general dishonesty make you think it's not intelligent?
mesmertech 5 hours ago||
/model claude-opus-4-8

seems to work but idk why they never set it so you can see it in the /model list.

"what model are you

I'm Claude Opus (claude-opus-4-8), running in Claude Code."

winwang 4 hours ago|
I typically just launch CC with `--model claude-opus-4-6[1m]`, `4-6[1m]` -> `4-8[1m]` works fine. Still 200k max without the `[1m]`.
IFC_LLC 4 hours ago||
Ugh...

Invalid request The request couldn't be completed. View details API Error: 400 messages.1.content.7: `thinking` or `redacted_thinking` blocks in the latest assistant message cannot be modified. These blocks must remain as they were in the original response.

I would rather not. 4.6 was fine. 4.7 got to be fine 1 week after the release. Now 4.8. No difference, same thing.

But the app is broken and nothing works. So now I have to regress to different clients and wait it out while it becomes workable again.

pheller 12 minutes ago||
I'm getting this near constantly even after toggling to a different model and compacting. Ugh indeed.
ferris-booler 4 hours ago||
I'm hitting this too! And I assumed it was a backwards-compatibility issue with my live conversation with Opus 4.7, but then I hit it in a fresh conversation with Opus 4.8. Vibe code release bug I guess?
IFC_LLC 4 hours ago||
I mean, switching back to 4.7 does not work either. So console it is. But vibe release - for sure.

And I'm paying money for this.

KAdot 3 hours ago||
Going back to 4.7 with `claude --model claude-opus-4-7` fixed it for me.
jtrn 3 hours ago|
Initial testing feels better than 4.8 And the knowledge cutoff claim of January 2026 seems to check out since it was able to "remember" without search about the double-tap killing of a drug smuggler by the US Army in late December.
More comments...