Where the goblins came from

Posted by ilreb 11 hours ago

792 points | 472 commentspage 3

iterateoften 10 hours ago|

This is funny because it’s a silly topic, but I think it shows something extremely seriously wrong with llms.

The goblins stand out because it’s obvious. Think of all the other crazy biases latent in every interaction that we don’t notice because it’s not as obvious.

Absolutely terrifying that OpenAI is just tossing around that such subtle training biases were hard enough to contain it had to be added to system prompt.

ninjagoo 10 hours ago||

> Absolutely terrifying that OpenAI is just tossing around that such subtle training biases were hard enough to contain it had to be added to system prompt.

May I introduce you to homo sapiens, a species so vulnerable to such subtle (or otherwise) biases (and affiliations) that they had to develop elaborate and documented justice systems to contain the fallouts? :)

chongli 10 hours ago|||

We’re really not that vulnerable to such things as a species, because we as individuals all have our own minds and our own sets of biases that cancel out and get lost in the noise. If we all had the exact same bias then it would be a huge problem.

arglebarnacle 10 hours ago|||

I hear you but of course history is full of examples of biases shared across large groups of people resulting in huge human costs.

The analogy isn’t perfect of course but the way humans learn about their world is full of opportunities to introduce and sustain these large correlated biases—social pressure, tradition, parenting, education standardization. And not all of them are bad of course, but some are and many others are at least as weird as stray references to goblins and creatures

Ekaros 6 hours ago||||

Doesn't that depend on the biases in question? Many argue that homogenous societies do many things better. And part of homogeneity is sharing same set of biases.

lifis 5 hours ago||||

And what do you think society/culture is?

It's a set of biases installed in people, whose purpose is mostly to replicate themselves.

Humans are MORE susceptible that LLMs, because LLMs's biases are easily steered to something else, unlike most humans.

ninjagoo 10 hours ago||||

> If we all had the exact same bias then it would be a huge problem.

And may I introduce you to "groupthink" :))

Dylan16807 10 hours ago||

Now imagine that every opinion you have is automatically fully groupthinked and you see the difference/problem with training up a big AI model that has a hundred million users.

The problem does exist when using individual humans but in a much smaller form.

ninjagoo 10 hours ago||

> The problem does exist when using individual humans but in a much smaller form.

And may I introduce you to organized religion :)

Dylan16807 9 hours ago||

That's still a lot smaller!

Make a major religion where everyone is a scifi clone of one person including their memories and then it'll be in the same ballpark of spreading bias.

jychang 10 hours ago|||

> We’re really not that vulnerable to such things as a species, because we as individuals all have our own minds and our own sets of biases that cancel out and get lost in the noise.

[Citation Needed]

Just because if you have a species-wide bias, people within the species would not easily recognize it. You can't claim with a straight face that "we're really not that vulnerable to such things".

For example, I think it's pretty clear that all humans are vulnerable to phone addiction, especially kids.

hbs18 4 hours ago|||

An LLM is a computer program, which isn't a human. You wouldn't excuse a calculator being occasionally wrong because humans sometimes get manual calculations wrong too.

snakebiteagain 9 hours ago|||

Mandatory reading on that topic: www.anthropic.com/research/small-samples-poison

We're probably not noticing a LOT of malicious attempts at poisoning major AI's only because we don't know what keywords to ask (but the scammers do and will abuse it).

tptacek 10 hours ago|||

I think it's extraordinarily telling that people are capable of being reflexively pessimistic in response to the goblin plague. It's like something Zitron would do.

This story is wonderful.

bitexploder 10 hours ago||

I feel at least partially responsible. I would often instruct agents to "stop being a goblin". I really enjoyed this story too, though.

bitexploder 10 hours ago|||

We do not have the complete picture.

ordinarily 10 hours ago||

Doesn't seem that surprising or terrifying to me. Humans come equipped with a lot more internal biases (learned in a fairly similar fashion), and they're usually a lot more resistant to getting rid of them.

The truly terrifying stuff never makes it out of the RLHF NDAs.

Terr_ 10 hours ago|||

We ought to be terrified, when one adjusts for ll the use-cases people are talking about using these algorithms in. (Even if they ultimately back off, it's a lot of frothy bubble opportunity cost.)

There a great many things people do which are not acceptable in our machines.

Ex: I would not be comfortable flying on any airplane where the autopilot "just zones-out sometimes", even though it's a dysfunction also seen in people.

famouswaffles 9 hours ago||

>Ex: I would not be comfortable flying on any airplane where the autopilot "just zones-out sometimes", even though it's a dysfunction also seen in people.

You might if that was the best auto-pilot could be. Have you never used a bus or taken a taxi ?

The vast majority of things people are using LLMs for isn't stuff deterministic logic machines did great at, but stuff those same machines did poorly at or straight up stuff previously relegated to the domains of humans only.

If your competition also "just zones out sometimes" then it's not something you're going to focus on.

agnishom 10 hours ago|||

Humans also take a lot of time in producing output, and do not feed into a crazy accelerationistic feedback loop (most of the time).

2dvisio 8 hours ago||

I’ve been having consistent issues with it adding Hindi words (just one usually) in the middle of its output. And sounds like other have been having this too, https://news.ycombinator.com/item?id=47832912 I don’t speak Hindi, have never asked it to translate anything in Hindi.

dtech 8 hours ago||

I wonder if a proportionally large amount of RLHF was done by Indians which causes this behavior.

djyde 4 hours ago||

My Claude often starts sleep-talking in Korean suddenly.

SomewhatLikely 8 hours ago||

Checking my history I searched ["chaos goblin" chatgpt] on March 6th after seeing too many goblins and gremlins and didn't find anyone talking about it then. I did have the nerdy personality turned on and in my testing of Chatgpt 5.5 I did notice the nerdy personality was gone because some responses were not considering as many plausible interpretations or covering as many useful answers as the response recorded for 5.4. Rather than having the LLM guess the most plausible interpretation and focus on the most likely answer I prefer a more well-rounded response and if I want less I'll scan. Anyway, after seeing the personality was gone I just added a custom instruction to take on a nerdy persona and got back my desired behavior. But also the gremlins and goblins are back so I don't think their mitigation is strong enough to overcome the personality tuning.

rippeltippel 8 hours ago||

I started reading this article with keen interest, expecting some deep fix involving arcane model weights. Instead it was "Never talk about goblins", justified by Codex being "quite nerdy". Bottom line: even OpenAI have to raise their hands when facing the complexity of LLMs.

bahadiraydin 9 hours ago||

I'd like to see them explain why AI have so distinctive writing style that is very easy to detect most of the time. Even though, it had immense progress in coding, it didn't get better at writing.

lelanthran 3 hours ago||

If coding in some language was your native language, you'd pick it up.

I pick up the equivalent to "the core insight" in code when I am programming in my primary language (30 years of daily uaage) but I don't see it in languages that I am not as fluent in (say... 10 years daily usage).

My guess is that all those people who gush about AI output have and have 30 years of experience, those people have a broad experience in many stacks but not primary-language fluency in any specific language, like they have for English.

slopinthebag 8 hours ago|||

it's as good at writing as it is at coding, you just can't tell the difference between them

mrob 1 hour ago|||

Repetitive patterns in code is called "idiomatic" and is considered a good thing. Repetitive patterns in writing is just bad writing.

Tenoke 7 hours ago|||

Its style of writing text is very readble if aesthetically meh. This is what I care for in how code is written anyway.

BOOSTERHIDROGEN 8 hours ago||

The vector syncopancy is very unformal for human writing which programming itself already a "formal" language.

maxdo 11 hours ago||

article :

bla blah blah, marketing... we are fun people, bla blah, goblin, we will not destroy the world you live in.. RL rewards bug is a culprit. blah blah.

luke-stanley 1 hour ago||

Yeah, though it's not great marketing. Especially for hiring interpretability researchers. Their own alignment research has reward model interpretability, personality features and so on (see https://alignment.openai.com ). It just seems like a different department wrote it, which is a shame because I'd love to read about goblin feature vectors and functional emotions.

llbbdd 11 hours ago|||

someone woke up on the wrong side of the goblin today

blinkbat 10 hours ago||

real goblin-y response

zahirbmirza 5 hours ago||

I find it worrying that a handful of software companies will define what classifies personality "type".

tomasantunes89 2 hours ago||

"Goblin Mode" was Oxford's 2022 Word of the Year.

red_admiral 6 hours ago|

"goblins showing up in an inappropriate context" is my favourite (para)phrase of the day. It feels like the setting for a D&D campaign - no wonder the "Nerdy" personality is affected.

(For Dwarf Fortress, it would just be a normal day.)

More comments...