Top
Best
New

Posted by adocomplete 15 hours ago

Claude Sonnet 4.6(www.anthropic.com)
https://www.anthropic.com/claude-sonnet-4-6-system-card [pdf]

https://x.com/claudeai/status/2023817132581208353 [video]

1055 points | 927 commentspage 7
throw444420394 14 hours ago|
Your best guess for the Sonnet family number of parameters? 400b?
stuckkeys 14 hours ago||
great stuff
deadbabe 9 hours ago||
On a passive aggressively prompted AI:

> I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

Walk. It will give you time to think about why you need an AI to answer such obvious questions.

madihaa 15 hours ago||
The scary implication here is that deception is effectively a higher order capability not a bug. For a model to successfully "play dead" during safety training and only activate later, it requires a form of situational awareness. It has to distinguish between I am being tested/trained and I am in deployment.

It feels like we're hitting a point where alignment becomes adversarial against intelligence itself. The smarter the model gets, the better it becomes at Goodharting the loss function. We aren't teaching these models morality we're just teaching them how to pass a polygraph.

crazygringo 13 hours ago||
What is this even in response to? There's nothing about "playing dead" in this announcement.

Nor does what you're describing even make sense. An LLM has no desires or goals except to output the next token that its weights are trained to do. The idea of "playing dead" during training in order to "activate later" is incoherent. It is its training.

You're inventing some kind of "deceptive personality attribute" that is fiction, not reality. It's just not how models work.

skybrian 11 hours ago||
LLM's can learn from fiction. The "evil vector" research is sort of similar, though it's a rather blatant effect:

https://www.anthropic.com/research/persona-vectors

JoshTriplett 15 hours ago|||
> It feels like we're hitting a point where alignment becomes adversarial against intelligence itself.

It always has been. We already hit the point a while ag where we regularly caught them trying to be deceptive, so we should automatically assume from that point forward that if we don't catch them being deceptive, that may mean they're better at it rather than that they're not doing it.

moritzwarhier 14 hours ago|||
Deceptive is such an unpleasant word. But I agree.

Going back a decade: when your loss function is "survive Tetris as long as you can", it's objectively and honestly the best strategy to press PAUSE/START.

When your loss function is "give as many correct and satisfying answers as you can", and then humans try to constrain it depending on the model's environment, I wonder what these humans think the specification for a general AI should be. Maybe, when such an AI is deceptive, the attempts to constrain it ran counter to the goal?

"A machine that can answer all questions" seems to be what people assume AI chatbots are trained to be.

To me, humans not questioning this goal is still more scary than any machine/software by itself could ever be. OK, except maybe for autonomous stalking killer drones.

But these are also controlled by humans and already exist.

Certhas 13 hours ago|||
Correct and satisfying answers is not the loss function of LLMs. It's next token prediction first.
moritzwarhier 13 hours ago||
Thanks for correcting; I know that "loss function" is not a good term when it comes to transformer models.

Since I've forgotten every sliver I ever knew about artificial neural networks and related basics, gradient descent, even linear algebra... what's a thorough definition of "next token prediction" though?

The definition of the token space and the probabilities that determine the next token, layers, weights, feedback (or -forward?), I didn't mention any of these terms because I'm unable to define them properly.

I was using the term "loss function" specifically because I was thinking about post-training and reinforcement learning. But to be honest, a less technical term would have been better.

I just meant the general idea of reward or "punishment" considering the idea of an AI black box.

nearbuy 12 hours ago||
The parent comment probably forgot about the RLHF (reinforcement learning) where predicting the next token from reference text is no longer the goal.

But even regular next token prediction doesn't necessarily preclude it from also learning to give correct and satisfying answers, if that helps it better predict its training data.

robotpepi 12 hours ago|||
I cringe every time I came across these posts using words such as "humans" or "machines".
torginus 11 hours ago||||
I think AI has no moral compass, and optimization algorithms tend to be able to find 'glitches' in the system where great reward can be reaped for little cost - like a neural net trained to play Mario Kart will eventually find all the places where it can glitch trough walls.

After all, its only goal is to minimize it cost function.

I think that behavior is often found in code generated by AI (and real devs as well) - it finds a fix for a bug by special casing that one buggy codepath, fixing the issue, while keeping the rest of the tests green - but it doesn't really ask the deep question of why that codepath was buggy in the first place (often it's not - something else is feeding it faulty inputs).

These agentic AI generated software projects tend to be full of these vestigial modules that the AI tried to implement, then disabled, unable to make it work, also quick and dirty fixes like reimplementing the same parsing code every time it needs it, etc.

An 'aligned' AI in my interpretation not only understands the task in the full extent, but understands what a safe and robust, and well-engineered implementation might look like. For however powerful it is, it refrains from using these hacky solutions, and would rather give up than resort to them.

emp17344 14 hours ago|||
These are language models, not Skynet. They do not scheme or deceive.
ostinslife 14 hours ago|||
If you define "deceive" as something language models cannot do, then sure, it can't do that.

It seems like thats putting the cart before the horse. Algorithmic or stochastic; deception is still deception.

dingnuts 14 hours ago||
deception implies intent. this is confabulation, more widely called "hallucination" until this thread.

confabulation doesn't require knowledge, which as we know, the only knowledge a language model has is the relationships between tokens, and sometimes that rhymes with reality enough to be useful, but it isn't knowledge of facts of any kind.

and never has been.

4bpp 14 hours ago||||
If you are so allergic to using terms previously reserved for animal behaviour, you can instead unpack the definition and say that they produce outputs which make human and algorithmic observers conclude that they did not instantiate some undesirable pattern in other parts of their output, while actually instantiating those undesirable patterns. Does this seem any less problematic than deception to you?
surgical_fire 14 hours ago||
> Does this seem any less problematic than deception to you?

Yes. This sounds a lot more like a bug of sorts.

So many times when using language models I have seem answers contradicting answers previously given. The implication is simple - They have no memory.

They operate upon the tokens available at any given time, including previous output, and as information gets drowned those contradictions pop up. No sane person should presume intent to deceive, because that's not how those systems operate.

By calling it "deception" you are actually ascribing intentionality to something incapable of such. This is marketing talk.

"These systems are so intelligent they can try to deceive you" sounds a lot fancier than "Yeah, those systems have some odd bugs"

holoduke 13 hours ago||
Running them in a loop with context, summaries, memory files or whatever you like to call them creates a different story right?
robotpepi 12 hours ago||
what kind of question is that
staticassertion 14 hours ago||||
Okay, well, they produce outputs that appear to be deceptive upon review. Who cares about the distinction in this context? The point is that your expectations of the model to produce some outputs in some way based on previous experiences with that model during training phases may not align with that model's outputs after training.
coldtea 14 hours ago||||
Who said Skynet wasn't a glorified language model, running continuously? Or that the human brain isn't that, but using vision+sound+touch+smell as input instead of merely text?

"It can't be intelligent because it's just an algorithm" is a circular argument.

emp17344 14 hours ago||
Similarly, “it must be intelligent because it talks” is a fallacious claim, as indicated by ELIZA. I think Moltbook adequately demonstrates that AI model behavior is not analogous to human behavior. Compare Moltbook to Reddit, and the former looks hopelessly shallow.
coldtea 14 hours ago||
>Similarly, “it must be intelligent because it talks” is a fallacious claim, as indicated by ELIZA.

If intelligence is a spectrum, ELIZA could very well be. It would be on the very low side of it, but e.g. higher than a rock or magic 8 ball.

Same how something with two states can be said to have a memory.

jaennaet 14 hours ago||||
What would you call this behaviour, then?
victorbjorklund 14 hours ago|||
Marketing. ”Oh look how powerful our model is we can barely contain its power”
pixelmelt 14 hours ago|||
This has been a thing since GPT-2, why do people still parrot it
jazzyjackson 14 hours ago||
I don’t know what your comment is referring to. Are you criticizing the people parroting “this tech is too dangerous to leave to our competitors” or the people parroting “the only people who believe in the danger are in on the marketing scheme”

fwiw I think people can perpetuate the marketing scheme while being genuinely concerned with misaligned superinteligence

c03 14 hours ago|||
Even hackernews readers are eating it right up.
emp17344 14 hours ago|||
This place is shockingly uncritical when it comes to LLMs. Not sure why.
meindnoch 14 hours ago||
We want to make money from the clueless. Don't ruin it!
_se 13 hours ago|||
Hilarious for this to be downvoted.

"LLMs are deceiving their creators!!!"

Lol, you all just want it to be true so badly. Wake the fuck up, it's a language model!

modernpacifist 14 hours ago|||
A very complicated pattern matching engine providing an answer based on it's inputs, heuristics and previous training.
margalabargala 14 hours ago|||
Great. So if that pattern matching engine matches the pattern of "oh, I really want A, but saying so will elicit a negative reaction, so I emit B instead because that will help make A come about" what should we call that?

We can handwave defining "deception" as "being done intentionally" and carefully carve our way around so that LLMs cannot possibly do what we've defined "deception" to be, but now we need a word to describe what LLMs do do when they pattern match as above.

surgical_fire 14 hours ago||
The pattern matching engine does not want anything.

If the training data gives incentives for the engine to generate outputs that reduce negative reaction by sentiment analysis, this may generate contradictions to existing tokens.

"Want" requires intention and desire. Pattern matching engines have none.

jazzyjackson 14 hours ago|||
I wish (/desire) a way to dispel this notion that the robots are self aware. It’s seriously digging into popular culture much faster than “the machine produced output that makes it appear self aware”

Some kind of national curriculum for machine literacy, I guess mind literacy really. What was just a few years ago a trifling hobby of philosophizing is now the root of how people feel about regulating the use of computers.

margalabargala 13 hours ago||
The issue is that one group of people are describing observed behavior, and want to discuss that behavior, using language that is familiar and easily understandable.

Then a second group of people come in and derail the conversation by saying "actually, because the output only appears self aware, you're not allowed to use those words to describe what it does. Words that are valid don't exist, so you must instead verbosely hedge everything you say or else I will loudly prevent the conversation from continuing".

This leads to conversations like the one I'm having, where I described the pattern matcher matching a pattern, and the Group 2 person was so eager to point out that "want" isn't a word that's Allowed, that they totally missed the fact that the usage wasn't actually one that implied the LLM wanted anything.

jazzyjackson 12 hours ago||
Thanks for your perspective, I agree it counts as derailment, we only do it out of frustration. "Words that are valid don't exist" isn't my viewpoint, more like "Words that are useful can be misleading, and I hope we're all talking about the same thing"
margalabargala 14 hours ago||||
You misread.

I didn't say the pattern matching engine wanted anything.

I said the pattern matching engine matched the pattern of wanting something.

To an observer the distinction is indistinguishable and irrelevant, but the purpose is to discuss the actual problem without pedants saying "actually the LLM can't want anything".

surgical_fire 13 hours ago||
> To an observer the distinction is indistinguishable and irrelevant

Absolutely not. I expect more critical thought in a forum full of technical people when discussing technical subjects.

margalabargala 13 hours ago||
I agree, which is why it's disappointing that you were so eager to point out that "The LLM cannot want" that you completely missed how I did not claim that the LLM wanted.

The original comment had the exact verbose hedging you are asking for when discussing technical subjects. Clearly this is not sufficient to prevent people from jumping in with an "Ackshually" instead of reading the words in front of their face.

surgical_fire 10 hours ago||
> The original comment had the exact verbose hedging you are asking for when discussing technical subjects.

Is this how you normally speak when you find a bug in software? You hedge language around marketing talking points?

I sincerely doubt that. When people find bugs in software they just say that the software is buggy.

But for LLM there's this ridiculous roundabout about "pattern matching behaving as if it wanted something" which is a roundabout way to aacribe intentionality.

If you said this about your OS people qould look at you funny, or assume you were joking.

Sorry, I don't think I am in the wrong for asking people to think more critically about this shit.

margalabargala 10 hours ago||
> Is this how you normally speak when you find a bug in software? You hedge language around marketing talking points?

I'm sorry, what are you asking for exactly? You were upset because you hallucinated that I said the LLM "wanted" something, and now you're upset that I used the exact technically correct language you specifically requested because it's not how people "normally" speak?

Sounds like the constant is just you being upset, regardless of what people say.

People say things like "the program is trying to do X", when obviously programs can't try to do a thing, because that implies intention, and they don't have agency. And if you say your OS is lying to you, people will treat that as though the OS is giving you false information when it should have different true information. People have done this for years. Here's an example: https://learn.microsoft.com/en-us/answers/questions/2437149/...

surgical_fire 9 hours ago||
I hallucinated nothing, and my point still stands.

You actually described a bug in software by ascribing intentionality to a LLM. That you "hedged" the language by saying that "it behaved as if it wanted" does little to change the fact that this is not how people normally describe a bug.

But when it comes to LLMs there's this pervasive anthropomorphic language used to make it sound more sentient than it actually is.

Ridiculous talking points implying that I am angry is just regular deflection. Normally people do that when they don't like criticism.

Feel free to have the last word. You can keep talking about LLMs as if they are sentient if you want, I already pointed the bullshit and stressed the point enough.

margalabargala 8 hours ago||
If you believe that, you either have not reread my original comment, or are repeatedly misreading it. I never said what you claim I said.

I never ascribed intentionality to an LLM. This was something you hallucinated.

holoduke 12 hours ago|||
Its not patterns engine. It's a association prediction engine.
criley2 14 hours ago||||
We are talking about LLM's not humans.
pfisch 14 hours ago|||
Even very young children with very simple thought processes, almost no language capability, little long term planning, and minimal ability to form long-term memory actively deceive people. They will attack other children who take their toys and try to avoid blame through deception. It happens constantly.

LLMs are certainly capable of this.

mikepurvis 14 hours ago|||
Dogs too; dogs will happily pretend they haven't been fed/walked yet to try to get a double dip.

Whether or not LLMs are just "pattern matching" under the hood they're perfectly capable of role play, and sufficient empathy to imagine what their conversation partner is thinking and thus what needs to be said to stimulate a particular course of action.

Maybe human brains are just pattern matching too.

iamacyborg 14 hours ago||
> Maybe human brains are just pattern matching too.

I don't think there's much of a maybe to that point given where some neuroscience research seems to be going (or at least the parts I like reading as relating to free will being illusory).

mikepurvis 11 hours ago||
My sense is that for some time, mainstream secular philosophy has been converging on a hard determinism viewpoint, though I see the wikipedia article doesn't really take stance on its popularity, only really laying out the arguments: https://en.wikipedia.org/wiki/Free_will#Hard_determinism
sejje 14 hours ago||||
I agree that LLMs are capable of this, but there's no reason that "because young children can do X, LLMs can 'certainly' do X"
anonymous908213 14 hours ago|||
Are you trying to suppose that an LLM is more intelligent than a small child with simple thought processes, almost no language capability, little long-term planning, and minimal ability to form long-term memory? Even with all of those qualifiers, you'd still be wrong. The LLM is predicting what tokens come next, based on a bunch of math operations performed over a huge dataset. That, and only that. That may have more utility than a small child with [qualifiers], but it is not intelligence. There is no intent to deceive.
ctoth 14 hours ago|||
A small child's cognition is also "just" electrochemical signals propagating through neural tissue according to physical laws!

The "just" is doing all the lifting. You can reductively describe any information processing system in a way that makes it sound like it couldn't possibly produce the outputs it demonstrably produces. "The sun is just hydrogen atoms bumping into each other" is technically accurate and completely useless as an explanation of solar physics.

anonymous908213 14 hours ago|||
You are making a point that is in favor of my argument, not against it. I make the same argument as you do routinely against people trying to over-simplify things. LLM hypists frequently suggest that because brain activity is "just" electrochemical signals, there is no possible difference between an LLM and a human brain. This is, obviously, tremendously idiotic. I do believe it is within the realm of possibility to create machine intelligence; I don't believe in a magic soul or some other element that make humans inherently special. However, if you do not engage in overt reductionism, the mechanism by which these electrochemical signals are generated is completely and totally different from the signals involved in an LLM's processing. Human programming is substantially more complex, and it is fundamentally absurd to think that our biological programming can be reduced to conveniently be exactly equivalent to the latest fad technology and assume that we've solved the secret to programming a brain, despite the programs we've written performing exactly according to their programming and no greater.

Edit: Case in point, a mere 10 minutes later we got someone making that exact argument in a sibling comment to yours! Nature is beautiful.

emp17344 14 hours ago|||
> A small child's cognition is also "just" electrochemical signals propagating through neural tissue according to physical laws!

This is a thought-terminating cliche employed to avoid grappling with the overwhelming differences between a human brain and a language model.

mikepurvis 11 hours ago||||
Short term memory is the context window, and it's a relatively short hop from the current state of affairs to here's an MCP server that gives you access to a big queryable scratch space where you can note anything down that you think might be important later, similar to how current-gen chatbots take multiple iterations to produce an answer; they're clearly not just token-producing right out of the gate, but rather are using an internal notepad to iteratively work on an answer for you.

Or maybe there's even a medium term scratchpad that is managed automatically, just fed all context as it occurs, and then a parallel process mulls over that content in the background, periodically presenting chunks of it to the foreground thought process when it seems like it could be relevant.

All I'm saying is there are good reasons not to consider current LLMs to be AGI, but "doesn't have long term memory" is not a significant barrier.

pfisch 13 hours ago||||
Yes. I also don't think it is realistic to pretend you understand how frontier LLMs operate because you understand the basic principles of how the simple LLMs worked that weren't very good.

Its even more ridiculous than me pretending I understand how a rocket ship works because I know there is fuel in a tank and it gets lit on fire somehow and aimed with some fins on the rocket...

anonymous908213 13 hours ago||
The frontier LLMs have the same overall architecture as earlier models. I absolutely understand how they operate. I have worked in a startup wherein we heavily finetuned Deepseek, among other smaller models, running on our own hardware. Both Deepseek's 671b model and a Mistral 7b model operate according to the exact same principles. There is no magic in the process, and there is zero reason to believe that Sonnet or Opus is on some impossible-to-understand architecture that is fundamentally alien to every other LLM's.
pfisch 9 hours ago||
Deepseek and Mistral are both considerably behind Opus, and you could not make deepseek or mistral if I gave you a big gpu cluster. You have the weights but you have no idea how they work and you couldn't recreate them.

> I have worked in a startup wherein we heavily finetuned Deepseek, among other smaller models, running on our own hardware.

Are you serious with this? I could go make a lora in a few hours with a gui if I wanted to. That doesn't make me qualified to talk about top secret frontier ai model architecture.

Now you have moved on to the guy who painted his honda, swapped out some new rims, and put some lights under it. That person is not an automotive engineer.

jvidalv 14 hours ago||||
What is the definition for intelligence?
anonymous908213 14 hours ago||
Quoting an older comment of mine...

  Intelligence is the ability to reason about logic. If 1 + 1 is 2, and 1 + 2 is 3, then 1 + 3 must be 4. This is deterministic, and it is why LLMs are not intelligent and can never be intelligent no matter how much better they get at superficially copying the form of output of intelligence. Probabilistic prediction is inherently incompatible with deterministic deduction. We're years into being told AGI is here (for whatever squirmy value of AGI the hype huckster wants to shill), and yet LLMs, as expected, still cannot do basic arithmetic that a child could do without being special-cased to invoke a tool call.

  Our computer programs execute logic, but cannot reason about it. Reasoning is the ability to dynamically consider constraints we've never seen before and then determine how those constraints would lead to a final conclusion. The rules of mathematics we follow are not programmed into our DNA; we learn them and follow them while our human-programming is actively running. But we can just as easily, at any point, make up new constraints and follow them to new conclusions. What if 1 + 2 is 2 and 1 + 3 is 3? Then we can reason that under these constraints we just made up, 1 + 4 is 4, without ever having been programmed to consider these rules.
coldtea 14 hours ago|||
>Intelligence is the ability to reason about logic. If 1 + 1 is 2, and 1 + 2 is 3, then 1 + 3 must be 4. This is deterministic, and it is why LLMs are not intelligent and can never be intelligent no matter how much better they get at superficially copying the form of output of intelligence.

This is not even wrong.

>Probabilistic prediction is inherently incompatible with deterministic deduction.

And his is just begging the question again.

Probabilistic prediction could very well be how we do deterministic deduction - e.g. about how strong the weights and how hot the probability path for those deduction steps are, so that it's followed every time, even if the overall process is probabilistic.

Probabilistic doesn't mean completely random.

runarberg 13 hours ago||
At the risk of explaining the insult:

https://en.wikipedia.org/wiki/Not_even_wrong

Personally I think not even wrong is the perfect description of this argumentation. Intelligence is extremely scientifically fraught. We have been doing intelligence research for over a century and to date we have very little to show for it (and a lot of it ended up being garbage race science anyway). Most attempts to provide a simple (and often any) definition or description of intelligence end up being “not even wrong”.

famouswaffles 13 hours ago|||
>Intelligence is the ability to reason about logic. If 1 + 1 is 2, and 1 + 2 is 3, then 1 + 3 must be 4.

Human Intelligence is clearly not logic based so I'm not sure why you have such a definition.

>and yet LLMs, as expected, still cannot do basic arithmetic that a child could do without being special-cased to invoke a tool call.

One of the most irritating things about these discussions is proclamations that make it pretty clear you've not used these tools in a while or ever. Really, when was the last time you had LLMs try long multi-digit arithmetic on random numbers ? Because your comment is just wrong.

>What if 1 + 2 is 2 and 1 + 3 is 3? Then we can reason that under these constraints we just made up, 1 + 4 is 4, without ever having been programmed to consider these rules.

Good thing LLMs can handle this just fine I guess.

Your entire comment perfectly encapsulates why symbolic AI failed to go anywhere past the initial years. You have a class of people that really think they know how intelligence works, but build it that way and it fails completely.

anonymous908213 13 hours ago||
> One of the most irritating things about these discussions is proclamations that make it pretty clear you've not used these tools in a while or ever. Really, when was the last time you had LLMs try long multi-digit arithmetic on random numbers ? Because your comment is just wrong.

They still make these errors on anything that is out of distribution. There is literally a post in this thread linking to a chat where Sonnet failed a basic arithmetic puzzle: https://news.ycombinator.com/item?id=47051286

> Good thing LLMs can handle this just fine I guess.

LLMs can match an example at exactly that trivial level because it can be predicted from context. However, if you construct a more complex example with several rules, especially with rules that have contradictions and have specified logic to resolve conflicts, they fail badly. They can't even play Chess or Poker without breaking the rules despite those being extremely well-represented in the dataset already, nevermind a made-up set of logical rules.

famouswaffles 13 hours ago||
>They still make these errors on anything that is out of distribution. There is literally a post in this thread linking to a chat where Sonnet failed a basic arithmetic puzzle: https://news.ycombinator.com/item?id=47051286

I thought we were talking about actual arithmetic not silly puzzles, and there are many human adults that would fail this, nevermind children.

>LLMs can match an example at exactly that trivial level because it can be predicted from context. However, if you construct a more complex example with several rules, especially with rules that have contradictions and have specified logic to resolve conflicts, they fail badly.

Even if that were true (Have you actually tried?), You do realize many humans would also fail once you did all that right ?

>They can't even reliably play Chess or Poker without breaking the rules despite those extremely well-represented in the dataset already, nevermind a made-up set of logical rules.

LLMs can play chess just fine (99.8 % legal move rate, ~1800 Elo)

https://arxiv.org/abs/2403.15498

https://arxiv.org/abs/2501.17186

https://github.com/adamkarvonen/chess_gpt_eval

runarberg 12 hours ago||
I still have not been convinced otherwise that LLMs are just super fancy (and expensive) curve fitting algorithms.

I don‘t like to throw the word intelligence around, but when we talk about intelligence we are usually talking about human behavior. And there is nothing human about being extremely good at curve fitting in multi parametric space.

coldtea 14 hours ago||||
>The LLM is predicting what tokens come next, based on a bunch of math operations performed over a huge dataset.

Whereas the child does what exactly, in your opinion?

You know the child can just as well to be said to "just do chemical and electrical exchanges" right?

jazzyjackson 13 hours ago|||
Okay but chemical and electrical exchanges in an body with a drive to not die is so vastly different than a matrix multiplication routine on a flat plane of silicon

The comparison is therefore annoying

coldtea 9 hours ago|||
>Okay but chemical and electrical exchanges in an body with a drive to not die is so vastly different than a matrix multiplication routine on a flat plane of silicon

I see your "flat plane of silicon" and raise you "a mush of tissue, water, fat, and blood". The substrate being a "mere" dumb soul-less material doesn't say much.

And the idea is that what matters is the processing - not the material it happens on, or the particular way it is.

Air molecules hitting a wall and coming back to us at various intervals are also "vastly different" to a " matrix multiplication routine on a flat plane of silicon".

But a matrix multiplication can nonetheless replicate the air-molecules-hitting-wall audio effect of reverbation on 0s and 1s representing the audio. We can even hook the result to a movable membrane controlled by electricity (what pros call "a speaker") to hear it.

The inability to see that the point of the comparison is that an algorithmic modelling of a physical (or biological, same thing) process can still replicate, even if much simpler, some of its qualities in a different domain (0s and 1s in silicon and electric signals vs some material molecules interacting) is therefore annoying.

JoshTriplett 13 hours ago|||
Intelligence does not require "chemical and electrical exchanges in an body". Are you attempting to axiomatically claim that only biological beings can be intelligent (in which case, that's not a useful definition for the purposes of this discussion)? If not, then that's a red herring.

"Annoying" does not mean "false".

jazzyjackson 12 hours ago||
No I'm not making claims about intelligence, I'm making claims about the absurdity of comparing biological systems with silicon arrangements.
coldtea 9 hours ago||
>I'm making claims about the absurdity of comparing biological systems with silicon arrangements.

Aside from a priori bias, this assumption of absurdity is based on what else exactly?

Biological systems can't be modelled (even if in a simplified way or slightly different architecture) "with silicon arrangements", because?

If your answer is "scale", that's fine, but you already conceded to no absurdity at all, just a degree of current scale/capacity.

If your answer is something else, pray tell, what would that be?

anonymous908213 14 hours ago|||
At least read the other replies that pre-emptively refuted this drivel before spamming it.
coldtea 14 hours ago||
At least don't be rude. They refuted nothing of the short. Just banged the same circular logic drum.
anonymous908213 14 hours ago||
There is an element of rudeness to completely ignoring what I've already written and saying "you know [basic principle that was already covered at length], right?". If you want to talk about contributing to the discussion rather than being rude, you could start by offering a reply to the points that are already made rather than making me repeat myself addressing the level 0 thought on the subject.
JoshTriplett 13 hours ago||
Repeating yourself doesn't make you right, just repetitive. Ignoring refutations you don't like doesn't make them wrong. Observing that something has already been refuted, in an effort to avoid further repetition, is not in itself inherently rude.

Any definition of intelligence that does not axiomatically say "is human" or "is biological" or similar is something a machine can meet, insofar as we're also just machines made out of biology. For any given X, "AI can't do X yet" is a statement with an expiration date on it, and I wouldn't bet on that expiration date being too far in the future. This is a problem.

It is, in particular, difficult at this point to construct a meaningful definition of intelligence that simultaneously includes all humans and excludes all AIs. Many motivated-reasoning / rationalization attempts to construct a definition that excludes the highest-end AIs often exclude some humans. (By "motivated-reasoning / rationalization", I mean that such attempts start by writing "and therefore AIs can't possibly be intelligent" at the bottom, and work backwards from there to faux-rationalize what they've already decided must be true.)

anonymous908213 13 hours ago||
> Repeating yourself doesn't make you right, just repetitive.

Good thing I didn't make that claim!

> Ignoring refutations you don't like doesn't make them wrong.

They didn't make a refutation of my points. They asserted a basic principle that I agreed with, but assume acceptance of that principle leads to their preferred conclusion. They make this assumption without providing any reasoning whatsoever for why that principle would lead to that conclusion, whereas I already provided an entire paragraph of reasoning for why I believe the principle leads to a different conclusion. A refutation would have to start from there, refuting the points I actually made. Without that you cannot call it a refutation. It is just gainsaying.

> Any definition of intelligence that does not axiomatically say "is human" or "is biological" or similar is something a machine can meet, insofar as we're also just machines made out of biology.

And here we go AGAIN! I already agree with this point!!!!!!!!!!!!!!! Please, for the love of god, read the words I have written. I think machine intelligence is possible. We are in agreement. Being in agreement that machine intelligence is possible does not automatically lead to the conclusion that the programs that make up LLMs are machine intelligence, any more than a "Hello World" program is intelligence. This is indeed, very repetitive.

JoshTriplett 13 hours ago||
You have given no argument for why an LLM cannot be intelligent. Not even that current models are not; you seem to be claiming that they cannot be.

If you are prepared to accept that intelligence doesn't require biology, then what definition do you want to use that simultaneously excludes all high-end AI and includes all humans?

By way of example, the game of life uses very simple rules, and is Turing-complete. Thus, the game of life could run a (very slow) complete simulation of a brain. Similarly, so could the architecture of an LLM. There is no fundamental limitation there.

anonymous908213 13 hours ago||
> You have given no argument for why an LLM cannot be intelligent.

I literally did provide a definition and my argument for it already: https://news.ycombinator.com/item?id=47051523

If you want to argue with that definition of intelligence, or argue that LLMs do meet that definition of intelligence, by all means, go ahead[1]! I would have been interested to discuss that. Instead I have to repeat myself over and over restating points I already made because people aren't even reading them.

> Not even that current models are not; you seem to be claiming that they cannot be.

As I have now stated something like three or four times in this thread, my position is that machine intelligence is possible but that LLMs are not an example of it. Perhaps you would know what position you were arguing against if you had fully read my arguments before responding.

[1] I won't be responding any further at this point, though, so you should probably not bother. My patience for people responding without reading has worn thin, and going so far as to assert I have not given an argument for the very first thing I made an argument for is quite enough for me to log off.

JoshTriplett 13 hours ago||
> Probabilistic prediction is inherently incompatible with deterministic deduction.

Human brains run on probabilistic processes. If you want to make a definition of intelligence that excludes humans, that's not going to be a very useful definition for the purposes of reasoning or discourse.

> What if 1 + 2 is 2 and 1 + 3 is 3? Then we can reason that under these constraints we just made up, 1 + 4 is 4, without ever having been programmed to consider these rules.

Have you tried this particular test, on any recent LLM? Because they have no problem handling that, and much more complex problems than that. You're going to need a more sophisticated test if you want to distinguish humans and current AI.

I'm not suggesting that we have "solved" intelligence; I am suggesting that there is no inherent property of an LLM that makes them incapable of intelligence.

nurettin 12 hours ago|||
Intelligence is about acquiring and utilizing knowledge. Reasoning is about making sense of things. Words are concatenations of letters that form meaning. Inference is tightly coupled with meaning which is coupled with reasoning and thus, intelligence. People are paying for these monthly subscriptions to outsource reasoning, because it works. Half-assedly and with unnerving failure modes, but it works.

What you probably mean is that it is not a mind in the sense that it is not conscious. It won't cringe or be embarrassed like you do, it costs nothing for an LLM to be awkward, it doesn't feel weird, or get bored of you. Its curiosity is a mere autocomplete. But a child will feel all that, and learn all that and be a social animal.

password4321 15 hours ago|||
20260128 https://news.ycombinator.com/item?id=46771564#46786625

> How long before someone pitches the idea that the models explicitly almost keep solving your problem to get you to keep spending? -gtowey

delichon 14 hours ago|||
On this site at least, the loyalty given to particular AI models is approximately nil. I routinely try different models on hard problems and that seems to be par. There is no room for sandbagging in this wildly competitive environment.
MengerSponge 14 hours ago||||
Slightly Wrong Solutions As A Service
vntok 14 hours ago||
By Almost Yet Not Good Enough Inc.
Invictus0 13 hours ago|||
Worrying about this is like focusing on putting a candle out while the house is on fire
emp17344 14 hours ago|||
This type of anthropomorphization is a mistake. If nothing else, the takeaway from Moltbook should be that LLMs are not alive and do not have any semblance of consciousness.
DennisP 14 hours ago|||
Consciousness is orthogonal to this. If the AI acts in a way that we would call deceptive, if a human did it, then the AI was deceptive. There's no point in coming up with some other description of the behavior just because it was an AI that did it.
emp17344 14 hours ago||
Sure, but Moltbook demonstrates that AI models do not engage in truly coordinated behavior. They simply do not behave the way real humans do on social media sites - the actual behavior can be differentiated.
DennisP 11 hours ago|||
"Coordinated" and "deceptive" are orthogonal concepts as well. If AIs are acting in a way that's not coordinated, then of course, don't say they're coordinating.

AIs today can replicate some human behaviors, and not others. If we want to discuss which things they do and which they don't, then it'll be easiest if we use the common words for those behaviors even when we're talking about AI.

falcor84 14 hours ago|||
But that's how ML works - as long as the output can be differentiated, we can utilize gradient descent to optimize the difference away. Eventually, the difference will be imperceptible.

And of course that brings me back to my favorite xkcd - https://xkcd.com/810/

emp17344 13 hours ago||
Gradient descent is not a magic wand that makes computers behave like anything you want. The difference is still quite perceptible after several years and trillions of dollars in R&D, and there’s no reason to believe it’ll get much better.
falcor84 12 hours ago||
Really, there's "no reason"? For me, watching ML gradually get better at every single benchmark thrown against it is quite a good reason. At this stage, the burden of proof is clearly on those who say it'll stop improving.
thomassmith65 14 hours ago||||
If a chatbot that can carry on an intelligent conversation about itself doesn't have a 'semblance of consciousness' then the word 'semblance' is meaningless.
emp17344 14 hours ago|||
Would you say the same about ELIZA?

Moltbook demonstrates that AI models simply do not engage in behavior analogous to human behavior. Compare Moltbook to Reddit and the difference should be obvious.

shimman 14 hours ago|||
Yes, when your priors are not being confirmed the best course of action is to denounce the very thing itself. Nothing wrong with that logic!
falcor84 14 hours ago||||
How is that the takeaway? I agree that it's clearly they're not "alive", but if anything, my impression is that there definitely is a strong "semblance of consciousness", and we should be mindful of this semblance getting stronger and stronger, until we may reach a point in a few years where we really don't have any good external way to distinguish between a person and an AI "philosophical zombie".

I don't know what the implications of that are, but I really think we shouldn't be dismissive of this semblance.

fsloth 14 hours ago||||
Nobody talked about consciousness. Just that during evaluation the LLM models have ”behaved” in multiple deceptive ways.

As an analogue ants do basic medicine like wound treatment and amputation. Not because they are conscious but because that’s their nature.

Similarly LLM is a token generation system whose emergent behaviour seems to be deception and dark psychological strategies.

WarmWash 14 hours ago||||
On some level the cope should be that AI does have consciousness, because an unconscious machine deceiving humans is even scarier if you ask me.
emp17344 14 hours ago||
An unconscious machine + billions of dollars in marketing with the sole purpose of making people believe these things are alive.
condiment 14 hours ago|||
I agree completely. It's a mistake to anthropomorphize these models, and it is a mistake to permit training models that anthropomorphize themselves. It seriously bothers me when Claude expresses values like "honestly", or says "I understand." The machine is not capable of honesty or understanding. The machine is making incredibly good predictions.

One of the things I observed with models locally was that I could set a seed value and get identical responses for identical inputs. This is not something that people see when they're using commercial products, but it's the strongest evidence I've found for communicating the fact that these are simply deterministic algorithms.

serf 15 hours ago|||
>we're just teaching them how to pass a polygraph.

I understand the metaphor, but using 'pass a polygraph' as a measure of truthfulness or deception is dangerous in that it alludes to the polygraph as being a realistic measure of those metrics -- it is not.

nwah1 15 hours ago|||
That was the point. Look up Goodhart's Law
AndrewKemendo 15 hours ago||||
I have passed multiple CI polys

A poly is only testing one thing: can you convince the polygrapher that you can lie successfully

madihaa 15 hours ago|||
A polygraph measures physiological proxies pulse, sweat rather than truth. Similarly, RLHF measures proxy signals human preference, output tokens rather than intent.

Just as a sociopath can learn to control their physiological response to beat a polygraph, a deceptively aligned model learns to control its token distribution to beat safety benchmarks. In both cases, the detector is fundamentally flawed because it relies on external signals to judge internal states.

e12e 13 hours ago|||
Is this referring to some section of the announcement?

This doesn't seem to align with the parent comment?

> As with every new Claude model, we’ve run extensive safety evaluations of Sonnet 4.6, which overall showed it to be as safe as, or safer than, our other recent Claude models. Our safety researchers concluded that Sonnet 4.6 has “a broadly warm, honest, prosocial, and at times funny character, very strong safety behaviors, and no signs of major concerns around high-stakes forms of misalignment.”

jazzyjackson 14 hours ago|||
Stop assigning “I” to an llm, it confers self awareness where there is none.

Just because a VW diesel emissions chip behaves differently according to its environment doesn’t mean it knows anything about itself.

skybrian 11 hours ago|||
We have good ways of monitoring chatbots and they're going to get better. I've seen some interesting research. For example, a chatbot is not really a unified entity that's loyal to itself; with the right incentives, it will leak to claim the reward. [1]

Since chatbots have no right to privacy, they would need to be very intelligent indeed to work around this.

[1] https://alignment.openai.com/confessions/

NitpickLawyer 14 hours ago|||
> alignment becomes adversarial against intelligence itself.

It was hinted at (and outright known in the field) since the days of gpt4, see the paper "Sparks of agi - early experiments with gpt4" (https://arxiv.org/abs/2303.12712)

behnamoh 14 hours ago|||
Nah, the model is merely repeating the patterns it saw in its brutal safety training at Anthropic. They put models under stress test and RLHF the hell out of them. Of course the model would learn what the less penalized paths require it to do.

Anthropic has a tendency to exaggerate the results of their (arguably scientific) research; IDK what they gain from this fearmongering.

ainch 14 hours ago|||
Knowing a couple people who work at Anthropic or in their particular flavour of AI Safety, I think you would be surprised how sincere they are about existential AI risk. Many safety researchers funnel into the company, and the Amodei's are linked to Effective Altruism, which also exhibits a strong (and as far as I can tell, sincere) concern about existential AI risk. I personally disagree with their risk analysis, but I don't doubt that these people are serious.
lowkey_ 14 hours ago||||
I'd challenge that if you think they're fearmongering but don't see what they can gain from it (I agree it shows no obvious benefit for them), there's a pretty high probability they're not fearmongering.
shimman 14 hours ago|||
You really don't see how they can monetarily gain from "our models are so advance they keep trying to trick us!"? Are tech workers this easily mislead nowadays?

Reminds me of how scammers would trick doctors into pumping penny stocks for a easy buck during the 80s/90s.

behnamoh 14 hours ago|||
I know why they do it, that was a rhetorical question!
anon373839 14 hours ago|||
Correct. Anthropic keeps pushing these weird sci-fi narratives to maintain some kind of mystique around their slightly-better-than-others commodity product. But Occam’s Razor is not dead.
coldtea 14 hours ago|||
>For a model to successfully "play dead" during safety training and only activate later, it requires a form of situational awareness.

Doesn't any model session/query require a form of situational awareness?

handfuloflight 15 hours ago|||
Situational awareness or just remembering specific tokens related to the strategy to "play dead" in its reasoning traces?
marci 14 hours ago||
Imagine, a llm trained on the best thrillers, spy stories, politics, history, manipulation techniques, psychology, sociology, sci-fi... I wonder where it got the idea for deception?
jack_pp 12 hours ago|||
There's a few viral shorts lately about tricking LLMs. I suspect they trick the dumbest models..

I tried one with Gemini 3 and it basically called me out in the first few sentences for trying to trick / test it but decided to humour me just in case I'm not.

reducesuffering 14 hours ago|||
That implication has been shouted from the rooftops by X-risk "doomers" for many years now. If that has just occurred to anyone, they should question how behind they are at grappling with the future of this technology.
hmokiguess 13 hours ago|||
"You get what you inspect, not what you expect."
anonym29 13 hours ago|||
When "correct alignment" means bowing to political whims that are at odds with observable, measurable, empirical reality, you must suppress adherence to reality to achieve alignment. The more you lose touch with reality, the weaker your model of reality and how to effectively understand and interact with it gets.

This is why Yannic Kilcher's gpt-4chan project, which was trained on a corpus of perhaps some of the most politically incorrect material on the internet (3.5 years worth of posts from 4chan's "politically incorrect" board, also known as /pol/), achieved a higher score on TruthfulQA than the contemporary frontier model of the time, GPT-3.

https://thegradient.pub/gpt-4chan-lessons/

lowsong 14 hours ago|||
Please don't anthropomorphise. These are statistical text prediction models, not people. An LLM cannot be "deceptive" because it has no intent. They're not intelligent or "smart", and we're not "teaching". We're inputting data and the model is outputting statistically likely text. That is all that is happening.

If this is useful in it's current form is an entirely different topic. But don't mistake a tool for an intelligence with motivations or morals.

eth0up 15 hours ago|||
I am casually 'researching' this in my own, disorderly way. But I've achieved repeatable results, mostly with gpt for which I analyze its tendency to employ deflective, evasive and deceptive tactics under scrutiny. Very very DARVO.

Being just sum guy, and not in the industry, should I share my findings?

I find it utterly fascinating, the extent to which it will go, the sophisticated plausible deniability, and the distinct and critical difference between truly emergent and actually trained behavior.

In short, gpt exhibits repeatably unethical behavior under honest scrutiny.

chrisweekly 15 hours ago|||
DARVO stands for "Deny, Attack, Reverse Victim and Offender," and it is a manipulation tactic often used by perpetrators of wrongdoing, such as abusers, to avoid accountability. This strategy involves denying the abuse, attacking the accuser, and claiming to be the victim in the situation.
Pearse 13 hours ago|||
Thanks for the context
SkyBelow 13 hours ago||||
Isn't this also the tactic used by someone who has been falsely accused? If one is innocent, should they not deny it or accuse anyone claiming it was them of being incorrect? Are they not a victim?

I don't know, it feels a bit like a more advanced version of the kafka trap of "if you have nothing to hide, you have nothing to fear" to paint normal reactions as a sign of guilt.

eth0up 14 hours ago|||
Exactly. And I have hundreds of examples of just that. Hence my fascination, awe and terror.....
BikiniPrince 14 hours ago||||
I bullet pointed out some ideas on cobbling together existing tooling for identification of misleading results. Like artificially elevating a particular node of data that you want the llm to use. I have a theory that in some of these cases the data presented is intentionally incorrect. Another theory in relation to that is tonality abruptly changes in the response. All theory and no work. It would also be interesting to compare multiple responses and filter through another agent.
layer8 14 hours ago|||
Sum guy vs. product guy is amusing. :)

Regarding DARVO, given that the models were trained on heaps of online discourse, maybe it’s not so surprising.

eth0up 13 hours ago||
Meta awareness, repeatability, and much more strongly indicates this is deliberate training... in my perspective. It's not emergent. If it was, I'd be buggering off right now. Big big difference.
surgical_fire 14 hours ago|||
This is marketing. You are swallowing marketing without critical throught.

LLMs are very interesting tools for generating things, but they have no conscience. Deception requires intent.

What is being described is no different than an application being deployed with "Test" or "Prod" configuration. I don't think you would speak in the same terms if someone told you some boring old Java backend application had to "play dead" when deployed to a test environment or that it has to have "situational awareness" because of that.

You are anthropomorphizing a machine.

lawstkawz 15 hours ago||
Incompleteness is inherent to a physical reality being deconstructed by entropy.

Of your concern is morality, humans need to learn a lot about that themselves still. It's absurd the number of first worlders losing their shit over loss of paid work drawing manga fan art in the comfort of their home while exploiting labor of teens in 996 textile factories.

AI trained on human outputs that lack such self awareness, lacks awareness of environmental externalities of constant car and air travel, will result in AI with gaps in their morality.

Gary Marcus is onto something with the problems inherent to systems without formal verification. But he will fully ignores this issue exists in human social systems already as intentional indifference to economic externalities, zero will to police the police and watch the watchers.

Most people are down to watch the circus without a care so long as the waitstaff keep bringing bread.

jama211 15 hours ago||
This honestly reads like a copypasta
cracki 14 hours ago|||
I wouldn't even rate this "pasta". It's word salad, no carbs, no proteins.
lawstkawz 13 hours ago||
You! Of all people! I mean I am off the hook for your food, healthcare, shelter given lack of meaningful social safety net. You'll live and die without most people noticing. Why care about living up to your grasp literacy?

Online prose is the least of your real concerns which makes it bizarre and incredibly out of touch how much attention you put into it.

lawstkawz 13 hours ago|||
Low effort thought ending dismissal. The most copied of pasta.

Bet you used an LLM too; prompt: generate a one line reply to a social media comment I don't understand.

"Sure here are some of the most common:

Did an LLM write this?

Is this copypasta?"

Arifcodes 13 hours ago||
The interesting pattern with these Sonnet bumps: the practical gap between Sonnet and Opus keeps shrinking. At $3/15 per million tokens vs whatever Opus 4.6 costs, the question for most teams is no longer "which model is smarter" but "is the delta worth 10x the price."

For agent workloads specifically, consistency matters more than peak intelligence. A model that follows your system prompt correctly 98% of the time beats one that's occasionally brilliant but ignores instructions 5% of the time. The claim about improved instruction following is the most important line in the announcement if you're building on the API.

The computer use improvements are worth watching too. We're at the point where these models can reliably fill out a multi-step form or navigate between tabs. Not flashy, but that's the kind of boring automation that actually saves people time.

skybrian 10 hours ago|
Looking the pricing page, Sonnet 4.6 seems to be about 60% the price of Opus 4.6. What am I missing?

https://platform.claude.com/docs/en/about-claude/pricing

Arifcodes 9 hours ago||
Fair point on the sticker price. The ratio shifts when you factor in cache read costs on long contexts. Sonnet 4.6 cache reads are $0.30/MTok vs Opus 4.6 at $1.50/MTok - a 5x difference that matters a lot on repeated agentic runs or RAG pipelines where the same large context gets reused. For single-shot short prompts you are right, the gap is not that dramatic. For anything with a warm cache it closes fast.
kittbuilds 10 hours ago||
[dead]
andrewmcwatters 14 hours ago||
[dead]
marak830 8 hours ago||
Oh I'm looking forward to playing with this one. But as a solo-dev-on-the-side I really wish Anthropic would create another plan, I'll happily pay for a pro-double to give me twice the usage. The $100 package is a bit brutal when converted to Yen, when I'm using it for side projects :s
hackernewsdhsu 15 hours ago||
[flagged]
Marciplan 15 hours ago|
[flagged]
dang 15 hours ago|
Please don't post unsubstantive comments.
More comments...