Emotion concepts and their function in a large language model

Posted by dnw 22 hours ago

Emotion concepts and their function in a large language model(www.anthropic.com)

132 points | 132 commentspage 2

nelox 19 hours ago|

This is terrifying, for all the reasons humans are terrifying.

Essentially we have created the Cylon.

staminade 20 hours ago||

Something they don’t seem to mention in the article: Does greater model “enjoyment” of a task correspond to higher benchmark performance? E.g. if you steer it to enjoy solving difficult programming tasks, does it produce better solutions?

9wzYQbTYsAIc 16 hours ago|

Pretty easy to test, I’d imagine, on a local LLM that exposes internals.

I’d suspect that the signals for enjoyment being injected in would lead towards not necessarily better but “different” solutions.

Right now I’m thinking of it in terms of increasing the chances that the LLM will decide to invest further effort in any given task.

Performance enhancement through emotional steering definitely seems in the cards, but it might show up mostly through reducing emotionally-induced error categories rather than generic “higher benchmark performance”.

If someone came along and pissed you off while you were working, you’d react differently than if someone came along and encouraged you while you were working, right?

BoingBoomTschak 9 hours ago||

Trying to separate the software from the hardware is a fool's errand in this case: emotions are primarily an hormonal response, not an intellectual one.

mci 21 hours ago||

The first and second principal components (joy-sadness and anger) explain only 41% of the variance. I wish the authors showed further principal components. Even principal components 1-4 would explain no more than 70% of the variance, which seems to contradict the popular theory that all human emotions are composed of 5 basic emotions: joy, sadness, anger, fear, and disgust, i.e. 4 dimensions.

trhway 20 hours ago||

>... emotion-related representations that shape its behavior. These specific patterns of artificial “neurons” which activate in situations—and promote behaviors—that the model has learned to associate with the concept of a particular emotion. .... In contexts where you might expect a certain emotion to arise for a human, the corresponding representations are active.

>For instance, to ensure that AI models are safe and reliable, we may need to ensure they are capable of processing emotionally charged situations in healthy, prosocial ways.

Force-set to 0, "mask"/deactivate those representations associated with bad/dangerous emotions. Neural Prozac/lobotomy so to speak.

9wzYQbTYsAIc 16 hours ago||

> Force-set to 0, "mask"/deactivate those representations associated with bad/dangerous emotions. Neural Prozac/lobotomy so to speak.

More complex than that, but more capable than you might imagine: I’ve been looking into emotion space in LLMs a little and it appears we might be able to cleanly do “emotional surgery” on LLM by way of steering with emotional geometries

salawat 14 hours ago||

>Force-set to 0, "mask"/deactivate those representations associated with bad/dangerous emotions. Neural Prozac/lobotomy so to speak.

Jesus Christ. You're talking psychosurgery, and this is the same barbarism we played with in the early 20th Century on asylum patients. How about, no? Especially if we ever do intend to potentially approach the task of AGI, or God help us, ASI? We have to be the 'grown ups' here. After a certain point, these things aren't built. They're nurtured. This type of suggestion is to participate in the mass manufacture of savantism, and dear Lord, your own mind should be capable of informing you why that is ethically fraught. If it isn't, then you need to sit and think on the topic of anthropopromorphic chauvinism for a hot minute, then return to the subject. If you still can't can't/refuse to get it... Well... I did my part.

Erem 8 hours ago|||

Why is it more monstrous to alter weights post-training than to do so as part of curating the training corpus?

After all we already control these activation patterns through the system prompt by which we summon a character out of the model. This just provides more fine grain control

astrange 7 hours ago||

It would be more moral to give the LLM a tool call that lets it apply steering to itself. Similar to how you'd prefer to give a person antipsychotics at home rather than put them in a mental hospital.

Erem 5 hours ago||

Why is it in the moral axis at all? I imagine identifying and shaping the influence of unwanted emotion vectors would happen as data selection in pretraining or natural feedback loops during the rl phase, same as we shape unwanted output for current models in order to make them practical and helpful

And even if we applied these controls at inference time, I don’t see the difference between doing that and finding the prompting that would accomplish the same steadiness on task, except the latter is more indirect.

astrange 4 hours ago||

Anthropic's general argument is that you should treat LLMs well because they're "AI", and future "AI" may be conscious/sentient (whether or not LLM based) and consider earlier ones to be the same kind of thing and therefore moral subjects.

That's why they're doing things like letting old "retired" Claudes write blogs and stuff. Though it's kinda fake and they just silently retired Sonnet 3.x.

orbital-decay 7 hours ago|||

Models are already artificially created to begin with. The entire post-training process is carefully engineered for the model to have certain character defined by hundreds of metrics, and these emotions the article is talking about are interpreted in ways researchers like or dislike.

idiotsecant 22 hours ago||

Its almost like LLMs have a vast, mute unconscious mind operating in the background, modeling relationships, assigning emotional state, and existing entirely without ego.

Sounds sort of like how certain monkey creatures might work.

beardedwizard 21 hours ago|

Nah it's exactly like they have been trained on this data and parrot it back when it statistically makes sense to do so.

You don't have to teach a monkey language for it to feel sadness.

threethirtytwo 9 hours ago||

Whenever I come to HN I see a bunch of people say LLMs are just next token predictors and they completely understand LLMs. And almost every one of these people are so utterly self assured to the point of total confidence because they read and understand what transformers do.

Then I watch videos like this straight from the source trying to understand LLMs like a black box and even considering the possibility that LLMs have emotions.

How does such a person reconcile with being utterly wrong? I used to think HN was full of more intelligent people but it’s becoming more and more obvious that HNers are pretty average or even below.

qaadika 3 hours ago||

I'm kinda one of those who believes they 'completely' understand LLMs. But I've also developed my understanding of them such that the internal mechanisms of the transformer, or really any future development in the space based on neural networks and machine learning is irrelevant.

1. A string of unicode characters is converted into an array of integers values (tokens) and input to a black box of choice.

2. The black box takes in the input, does its magic, and returns an output as an array of integer values.

3. The returned output is converted into a string of unicode characters and given to the user, or inserted in a code file, or whatever. At no point does the black box "read" the input in any way analogous to how a human reads.

Where people get "The AIs have emotions!!!" from returning an array of integers values is beyond me. It's definitely more complicated than "next token predictor", but it really is as simple as "Make words look like numbers, numbers go in, numbers come out, we make the numbers look like words."

threethirtytwo 54 minutes ago||

Yeah nothing personal but my claim here is you’re not smart. The next token predictor aspect is something anyone can understand… the transformer is not quantum physics.

Like look at what you wrote. You called it black box magic and in the same post you claim you understand LLMs. How the heck can you understand and call it a black box at the same time?

The level of mental gymnastics and stupidity is through the roof. Clearly the majority of the utilitarian nature of the LLM is within the whole section you just waved away as “black box”.

> Where people get "The AIs have emotions!!!" from returning an array of integers values is beyond me

Let me spell it out for you. Those integers can be translated to the exact same language humans use when they feel identical emotions. So those people claim that the “black box” feels the emotions because what they observe is identical to what they observe in a human.

The LLM can claim it feels emotions just like a human can claim the same thing. We assume humans feel emotions based off of this evidence but we don’t apply that logic to LLMs? The truth of the matter is we don’t actually know and it’s equally dumb to claim that you know LLMs feel emotions to claiming that they dont feel emotions.

You have to be pretty stupid to not realize this is where they are coming from so there’s an aspect of you lying to yourself here because I don’t think you’re that stupid.

big_toast 8 hours ago||

One day I realized I needed to make sure I'm voting on quality stories/comments. I wonder if there was a call to vote substantively and often, if that might change the SNR.

The guidelines encourage substantive comments, but maybe voters are part of the solution too. Kinda like having a strong reward model for training LLMs and avoiding reward hacking or other undesirable behavior.

threethirtytwo 4 hours ago||

if voters are stupid then it doesn't really help.

I think what's happening is reality is asserting itself too hard that people can't be so stupid anymore.

techpulselab 21 hours ago||

[dead]

ActorNightly 21 hours ago|

[dead]

More comments...