Top
Best
New

Posted by craigmart 6 hours ago

Claude Opus 4.8(www.anthropic.com)
1058 points | 844 commentspage 4
user- 3 hours ago|
Bash(echo "hello"; pwd) ⎿ hello /Users/username/Work/Github/project

Bash(echo test123) ⎿ test123

  Read 1 file, listed 1 directory (ctrl+o to expand)

 Bash(echo "checking output works")
  ⎿  checking output works

  Read 1 file (ctrl+o to expand)
  ⎿  API Error: 400 messages.3.content.56: `thinking`
     or `redacted_thinking` blocks in the latest
     assistant message cannot be modified. These
     blocks must remain as they were in the original
     response.

Very inspiring improvements. DIssapointing result for a code review i expected to see after my 30 min walk
0x696C6961 3 hours ago|
Update the symlink to point at the previous version:

    ln -s $HOME/.local/share/claude/versions/2.1.153 $HOME/.local/bin/claude
protoman3000 3 hours ago||
Opus 4.8 says to take the car. 4.7 said to walk.

“I want to wash my car. The carwash is 50m away. Should I take the car or go by foot?”

https://claude.ai/share/5f7f738a-5f29-48ff-9807-9a2dd37fb405

https://claude.ai/share/ecd14393-9d42-4527-ae0c-89f3d05216c8

james_marks 6 hours ago||
> One of the most prominent improvements in Opus 4.8 is its honesty. We train all our models to be honest—for instance, to avoid making claims that they can’t support. But a general problem with AI models is that they sometimes jump to conclusions, confidently claiming to have made progress in their work despite the evidence being thin. Early testers report that Opus 4.8 is more likely to flag uncertainties about its work and less likely to make unsupported claims.

Would be awesome if true

majormajor 6 hours ago||
"Honesty" seems like unnecessary (and annoying) anthropomorphism there. I don't think there's any intent of fraud or deception in outputs from these things, just overreaching of prediction. Based on the latter part of the paragraph, I wish they'd just say something like "less likely to skip steps or overemphasize thin evidence" in the first place.

Don't play to the sci-fi "this thing's trying to outsmart me" tropes.

Kiro 6 hours ago|||
Using words people understand is more important than this strange fixation on not anthropomorphizing things.
wasabi991011 6 hours ago|||
I think "honesty" is not a particularly good descriptor, independent of anthropomorphism. Previous commenters suggestion was much more understandable to me.
dugidugout 6 hours ago||||
Being that can be understood is language. The previous commenter is making an particular argument for how we can improve this understanding. They didn't suggest we should use less familiar words, but different familiar words. Why is this strange?
giraffe_lady 6 hours ago||||
Anthropomorphizing is a shorthand for a powerful and poorly defined set of metaphors. There are tradeoffs going both ways but trying to dismiss it as merely "strange fixation" shows your own weakness.
tadfisher 6 hours ago|||
To be clear, this is about anthropomorphizing large language models, not the general category of "things". Also, we should be evaluating these constructs using well-defined and measurable criteria; evaluating "honesty" fails to achieve both goals.
derac 5 hours ago||
I think Honesty can be evaluated. Does the model push back when it knows the user is wrong? How often does the model hallucinate data vs. say it doesn't know? Provide a prompt with contradictions or other issues and see if the model corrects you.

Here is an article by Anthropic that explains what they do and mean in more detail: https://alignment.anthropic.com/2025/honesty-elicitation/

swader999 6 hours ago||||
Just swap 'Honesty' with 'correctness in its claims' and you'll get what you need out of this aspect of the model description.
stratos123 2 hours ago||
Honesty and correctness are not the same thing, even when talking about LLMs. Sometimes an LLM says a false thing and you don't know whether it's being dishonest or merely incorrect. Sometimes, however, you can see in the CoT that the model does know the true fact and is reasoning about how to deceive the user. That's lying, not just being incorrect.
adamtaylor_13 6 hours ago||||
People get so wrapped around the axle with "anthropomorphizing". For regular folks with no technical background, sure maybe a bit of caveat sprinkled here or there is useful to help them understand what is or isn't true, but on HN it would seem to me that the bar is high enough that we can just use shared language to generally talk about capabilities.

When they say "Honesty" I don't think to myself, "Goodness, does this model have moral understanding?" No, I understand they mean it's less likely to directly bullshit me, which models frequently do.

I don't feel like this level of pedantry around language is useful for people who more or less know what's going on with LLMs. (Again, I concede that perhaps with a less technical audience, there's more need for it.)

krupan 2 hours ago|||
I agree. In connection with LLMs we also shouldn't use the words intelligent, smart, reasoning, thinking, chat, conversation, etc.
ealready_value 6 hours ago|||
Opus 4.7 was already trying hard to appear honest. Most conversations I have with it about advice or focusing an opinion often include "my honest take" or "my honest opinion".

The problem is that once I asked it "I'm thinking about A or B" twice, once with "I like A more but suspect B would be best" and a second time with them reversed. Not surprisingly, both times it chose the one I said I suspected was best as it's honest opinion.

MaxikCZ 3 hours ago||
I wish I knew how to make it regressively verify its assumptions, like a kind of hook but firing before a sentence is written, or perhaps after and then corrected. I feel like it assuming things clearly wrong is its biggest weakness.
benzible 6 hours ago|||
In the context of Claude Code, "honest" usually means that the agent took a shortcut, skipped requirements, etc. It's the model giving itself credit for admitting to failing rather than actually doing what was requested.
HAL3000 6 hours ago|||
Yeah, it's super annoying. A few days ago, Opus 4.7 created a plan with several items on it, including an auth feature. It then went through the plan and reported that it had created the auth feature, that everything was secure, and that the tests passed.

The issue was that it hadn't actually implemented the auth feature. After I confronted it about this, it admitted that it indeed hadn't done it and said it would implement it now.

If we had just trusted its output, we would now have a security vulnerability in production, allowing anyone to access other people's accounts.

gwd 5 hours ago|||
> If we had just trusted its output, we would now have a security vulnerability in production, allowing anyone to access other people's accounts.

This is one reason you always get a different model to review a model's PR. Gemini Or GPT-codex would have certainly noticed the missing auth.

FireBeyond 4 hours ago||||
I had a lower acuity incident exactly the same.

Had it implement a feature, "commit and merge to develop".

"Built, tested, committed, merged to develop. Up to you to continue testing and merge to main when ready."

Great. Poke at the web app. No feature.

"Where is feature, I can't see it on develop". "Well, that's because it's not on develop, but on feature-branch, so you wouldn't see it."

"I'm confused. I asked you to commit it and merge to develop."

"You're right, you asked me to and I said I would do it and I told you I did it but I did not actually do it. Want me to do it now, then?"

Claude is in sulky-teenager phase.

Schiendelman 5 hours ago||||
How do you test other features?
legitster 6 hours ago|||
Part of the problem is also garbage-in/garbage-out. There's a lot of human information on the internet that is also confidently wrong.

I use Sonnet a lot for learning about history or contextualizing news topics. It's really good at this for the most part. But there are a lot of topics where "consensus" between either academics or journalists is really "one secondary source which gets repeated a lot".

mitjam 5 hours ago||
A failure mode I see more, recently is that it gives superficially correct answers but after digging deeper, I get answers that contradict the superficial answers - really an important thing to be aware of, in my point of view, and it often leaves me wondering if I dug deep enough.
pants2 6 hours ago|||
[dead]
soperj 6 hours ago|||
My guess is that Claude Opus 4.8 wrote that and is lying to you.
malfist 6 hours ago||
And yet, every release has claimed lower hallucination rates. But they persist.
kentm 6 hours ago|||
Do they persist at the same rates? Lower doesn't mean eliminated, so both of these can be true.
simianwords 6 hours ago|||
False. Hallucination has meaningfully reduced.
Barbing 6 hours ago||
Is Gemini still the biggest confabulator of the big three?
jtrn 4 hours ago||
Initial testing feels better than 4.8 And the knowledge cutoff claim of January 2026 seems to check out since it was able to "remember" without search about the double-tap killing of a drug smuggler by the US Army in late December.
ismailmaj 1 hour ago||
I just asked the model details about the incoming spaceX IPO and it responded with “There’s no confirmed SpaceX IPO. Elon Musk has said for years that SpaceX itself won’t go public”. It took me two push backs and specifically asking for web search.

I feel like I won’t like this model just like I didn’t like 4.7, push backs a lot and avoids thinking or search as much as possible.

londons_explore 5 hours ago||
My guess is anthropic is doing reinforcement learning based on user sessions.

However, doing so relies on the production model staying vaguely close to the model being trained.

To ensure that, frequent releases are needed. I forsee that they might end up doing daily releases and perhaps not even telling anyone at some near future point.

llbbdd 4 hours ago|
If they are they need to fix how the Claude Code CLI asks for feedback, or make the feedback UI a lot more obvious. I keep experiencing the following scenario.

The agent session pauses with a numbered list of options and awaits steering input:

>> 1. Do the sane thing you asked for (Recommended)

>> 2. Do something dumb

>> 3. Do something even dumber

Below the agent session, it decides it's time to ask:

>> "How is Claude doing this session? 1) Bad 2) Good 3) Great"

I type "1", because that's the steering option I want. The UI prioritizes this input as a response to the feedback prompt without any further confirmation: "Claude is doing Bad. Thanks!"

I've done this so many times so far and I can't imagine I'm the only one, at some scale that has to poison any learning they're doing with this data.

MaxikCZ 3 hours ago||
I think that filtering out data like yours was an interns afternoon project.
redfloatplane 4 hours ago||
This made me laugh. Training Opus 4.7 on business skills caused it to sometimes exhibit dishonest behaviour, and not training 4.8 on those skills removed it. From the system card:

> 6.2.5 External testing from Andon Labs Andon Labs reviewed the behavior of Claude Opus 4.8 in their simulated Vending-Bench 2 retail-management evaluation, as reported in the Capabilities section of this system card (see Section 8.13.5). Although they did observe some unexpected capability failures, they did not find clear instances of the kind of concerning in-game behaviors that were discussed in other recent system cards.

> What might have led to these differences? We monitor and investigate the effects of different training environments on alignment; Claude Opus 4.7, for example, had training that focused on business skills and robustness against adversarial agents, but we discovered that this training inadvertently contributed to misaligned behavior including dishonesty. We therefore removed it for Opus 4.8.

> Thus, Opus 4.8 did not show the same misaligned behaviors as Opus 4.7 in Vending-Bench, but also had reduced business success due to being more susceptible to scammers and being less able to negotiate good deals with other agents. We are currently working on training to improve business capabilities while maintaining aligned and ethical behavior.

mrdependable 3 hours ago|
I don't know how people can read stuff like this and think LLMs are intelligent or conscious.
redfloatplane 1 hour ago|||
I don't really see how you got to your comment from what I quoted. However, somewhat relatedly, I proposed a thought experiment about this in the comments for Opus 4.7[0]:

> It's April, 1991. Magically, some interface to Claude materialises in London. Do you think most people would think it was a sentient life form? How much do you think the interface matters - what if it looks like an android, or like a horse, or like a large bug, or a keyboard on wheels?

> I don't come down particularly hard on either side of the model sapience discussion, but I don't think dismissing either direction out of hand is the right call.

[0]: https://news.ycombinator.com/item?id=47680059

stratos123 3 hours ago|||
Consciousness aside, why does reading about an LLM generalizing from specific to general dishonesty make you think it's not intelligent?
tarruda 6 hours ago||
> One of the most prominent improvements in Opus 4.8 is its honesty.

Does that mean it no longer deletes or changes tests to make it pass?

jmward01 6 hours ago||
Meanwhile haiku is on 4.5 and sonnet is on 4.6. It is clear where they are not making money.
bel8 6 hours ago||
Well if they have a big challenge ahead since DeepSeek offers an open model at Sonnet+ level while being cheaper than Haiku, plus 1 million context size.
InsideOutSanta 4 hours ago||
Yeah, I never use any of OpenAI or Anthropic's models other than whatever is the current highest-end one. For everything else, it makes more sense to use other providers.
spprashant 5 hours ago||
I love Sonnet 4.6 so much.
HDBaseT 58 minutes ago||
You'll love Deepseek V4 Pro w/ High thinking.
swader999 3 hours ago|
Used it for a couple of long running prompts so far. Had to restart one that bonked on API errors. Of note, I really like the straight forward candor its using. 'More honest' than previous models is playing out in what its saying to me. Telling me straight up where it failed, where gaps are. I like it so far.
More comments...