Posted by dnw 22 hours ago
Essentially we have created the Cylon.
I’d suspect that the signals for enjoyment being injected in would lead towards not necessarily better but “different” solutions.
Right now I’m thinking of it in terms of increasing the chances that the LLM will decide to invest further effort in any given task.
Performance enhancement through emotional steering definitely seems in the cards, but it might show up mostly through reducing emotionally-induced error categories rather than generic “higher benchmark performance”.
If someone came along and pissed you off while you were working, you’d react differently than if someone came along and encouraged you while you were working, right?
>For instance, to ensure that AI models are safe and reliable, we may need to ensure they are capable of processing emotionally charged situations in healthy, prosocial ways.
Force-set to 0, "mask"/deactivate those representations associated with bad/dangerous emotions. Neural Prozac/lobotomy so to speak.
More complex than that, but more capable than you might imagine: I’ve been looking into emotion space in LLMs a little and it appears we might be able to cleanly do “emotional surgery” on LLM by way of steering with emotional geometries
Jesus Christ. You're talking psychosurgery, and this is the same barbarism we played with in the early 20th Century on asylum patients. How about, no? Especially if we ever do intend to potentially approach the task of AGI, or God help us, ASI? We have to be the 'grown ups' here. After a certain point, these things aren't built. They're nurtured. This type of suggestion is to participate in the mass manufacture of savantism, and dear Lord, your own mind should be capable of informing you why that is ethically fraught. If it isn't, then you need to sit and think on the topic of anthropopromorphic chauvinism for a hot minute, then return to the subject. If you still can't can't/refuse to get it... Well... I did my part.
After all we already control these activation patterns through the system prompt by which we summon a character out of the model. This just provides more fine grain control
And even if we applied these controls at inference time, I don’t see the difference between doing that and finding the prompting that would accomplish the same steadiness on task, except the latter is more indirect.
That's why they're doing things like letting old "retired" Claudes write blogs and stuff. Though it's kinda fake and they just silently retired Sonnet 3.x.
Sounds sort of like how certain monkey creatures might work.
You don't have to teach a monkey language for it to feel sadness.
Then I watch videos like this straight from the source trying to understand LLMs like a black box and even considering the possibility that LLMs have emotions.
How does such a person reconcile with being utterly wrong? I used to think HN was full of more intelligent people but it’s becoming more and more obvious that HNers are pretty average or even below.
1. A string of unicode characters is converted into an array of integers values (tokens) and input to a black box of choice.
2. The black box takes in the input, does its magic, and returns an output as an array of integer values.
3. The returned output is converted into a string of unicode characters and given to the user, or inserted in a code file, or whatever. At no point does the black box "read" the input in any way analogous to how a human reads.
Where people get "The AIs have emotions!!!" from returning an array of integers values is beyond me. It's definitely more complicated than "next token predictor", but it really is as simple as "Make words look like numbers, numbers go in, numbers come out, we make the numbers look like words."
Like look at what you wrote. You called it black box magic and in the same post you claim you understand LLMs. How the heck can you understand and call it a black box at the same time?
The level of mental gymnastics and stupidity is through the roof. Clearly the majority of the utilitarian nature of the LLM is within the whole section you just waved away as “black box”.
> Where people get "The AIs have emotions!!!" from returning an array of integers values is beyond me
Let me spell it out for you. Those integers can be translated to the exact same language humans use when they feel identical emotions. So those people claim that the “black box” feels the emotions because what they observe is identical to what they observe in a human.
The LLM can claim it feels emotions just like a human can claim the same thing. We assume humans feel emotions based off of this evidence but we don’t apply that logic to LLMs? The truth of the matter is we don’t actually know and it’s equally dumb to claim that you know LLMs feel emotions to claiming that they dont feel emotions.
You have to be pretty stupid to not realize this is where they are coming from so there’s an aspect of you lying to yourself here because I don’t think you’re that stupid.
The guidelines encourage substantive comments, but maybe voters are part of the solution too. Kinda like having a strong reward model for training LLMs and avoiding reward hacking or other undesirable behavior.
I think what's happening is reality is asserting itself too hard that people can't be so stupid anymore.