Where the goblins came from

Posted by ilreb 16 hours ago

Where the goblins came from(openai.com)

969 points | 579 commentspage 7

innis226 15 hours ago|

I suspect this was intentionally added. Just to give some personality and to fuel hype

pezgrande 11 hours ago||

They should call it "El Quijote" syndrome

hansmayer 13 hours ago||

> We unknowingly gave particularly high rewards for metaphors with creatures. From there, the goblins spread.

WTF does this even mean? How the hell do you do something like this "unknowingly"? What other features are you bumping "unknowingly"? Suicide suggestions or weapon instructions come to mind. Horrible, this ship obviously has no captain!

ben_w 13 hours ago|

Yes? They know, they'e always known. Why do you think they've been saying, since GPT-2, not ChatGPT even, that their LLMs needs careful study before being released?

hansmayer 11 hours ago||

Well obviously they have - but the press and the common folk still treat these people as some kind of geniuses, when they are obviously more similar to that junior dev using some framework without understanding its internals.

ben_w 11 hours ago||

FWIW, none of the press or public I see regard them that highly (but, I live in Berlin); mostly it's the technically minded people who see them as geniuses (because we can't get those jobs), while the general public find examples which the AI can't do (strawberry, walk to car wash) and share them around with disappointment, wondering "why can't these teams fix such simple bugs?"

hansmayer 10 hours ago||

> while the general public find examples which the AI can't do

We must have very different experiences with the general public then, because from my interactions, some non-tech demographics who are leaning way too much into it:

- teachers - realtors - generic "office worker", - and even some doctors!

What is common to all of them - it would seem they are highly unaware of the technology deficiencies, as they seem to use it routinely and daily - thus considering it as some kind of upgraded google search.

wewewedxfgdf 13 hours ago||

It should be OK for AI to develop personality traits.

JoshTriplett 16 hours ago||

A plausible theory I've seen going around: https://x.com/QiaochuYuan/status/2049307867359162460

NonHyloMorph 1 hour ago||

I like theres an interesting terry pratchett novel where some guy finds out hes actually an orc (quite different from the high fantasy concept of orcs) there are also goblins little wretched creatures- and the manifest anthropimorphised darkness which spesks to commander samuel vimes, commander of the nightwatch, the police force of ankh morpork. Vimes, who is the guarantor of bottom up working class justice and integrity is lead by the darkness at some point to help the goblins - because there is no cresture to wretched to not find refugee in the darkness. Loosly resonstes

danpalmer 15 hours ago|||

If you tell an LLM it's a mushroom you'll get thoughts considering how its mycelium could be causing the goblins.

This "theory" is simply role playing and has no grounding in reality.

krackers 15 hours ago|||

I wish the blog mentioned more about why exactly training for nerdy personality rewarded mention of goblins. Since it's probably not a deterministic verifiable reward, at their level the reward model itself is another LLM. But this just pushes the issue down one layer, why did _that_ model start rewarding mentions of goblin?

palmotea 15 hours ago|||

> I wish the blog mentioned more about why exactly training for nerdy personality rewarded mention of goblins. Since it's probably not a deterministic verifiable reward, at their level the reward model itself is another LLM. But this just pushes the issue down one layer, why did _that_ model start rewarding mentions of goblin?

Speculation: because nerds stereotypically like sci-fi and fantasy to an unhealthy degree, and goblins, gremlins, and trolls are fantasy creatures which that stereotype should like? Then maybe goblins hit a sweet spot where it could be a problem that could sneak up on them: hitting the stereotype, but not too out of place to be immediately obnoxious.

autumnstwilight 14 hours ago||||

Perhaps it has something to do with recent human trends for saying "goblin" or "gremlin" to describe... basically the opposite of dignified and socially acceptable behavior, like hunching under a blanket, unshowered, playing video games all day and eating shredded cheese directly out of the bag.

The fact that it was strongly associated with the "nerdy" personality makes me think of this connection.

NonHyloMorph 2 hours ago||

Checkout goblin style in queer culture ;)

in-silico 13 hours ago|||

Either someone hard-coded it in a system prompt to the reward model (similar to how they hard-coded it out), or the reward model mixed up some kind of correlation/causation in the human preference data (goblins are often found in good responses != goblins make responses good). It's also possible that human data labellers really did think responses with goblins were better (in small doses).

yard2010 13 hours ago|||

I love the people thinking "I should ask ChatGPT and copy pasta the response to the (tweet|gh comment)"

dakolli 16 hours ago||

It is a stateless text / pixel auto-complete it has no references of self, stop spreading this bs.

doph 15 hours ago|||

is a kv cache not a kind of state? what does statefulness have to do with selfhood? how does a system prompt work at all if these things have no reference to themselves?

danpalmer 15 hours ago||

The kv cache is not persistent. It's a hyper-short-term memory.

in-silico 13 hours ago||

Modern kv caches can contain up to 1 million tokens (~3000 pages of text). It's not that short, it's like 48 straight hours of reading.

danpalmer 7 hours ago||

Yes and no, it's not just text, it's images, video, etc, and it's not just the pages of content, it's also all the "thinking" as well. Plus the models tend to work better earlier on in the context.

I regularly get close to filling up context windows and have to compact the context. I can do this several times in one human session of me working on a problem, which you could argue is roughly my own context window.

My point though was that almost nothing of the model's knowledge is in the context, it's all in the training. We have no functional long term memory for LLMs beyond training.

cyanydeez 3 hours ago||

The KV cache isn't memory, it's the extent of the process saved so the inference can start where the last generated output is concatenated with the next input. It's entirely about saving compute and has nothing to do with memory.

This really confuses how stupid LLMs are: they're just text logs as output and text logs as input; hence the goblins are just tokens that seem to problematically be more probable in the output.

But the KV cache is a thing made to keep a session from having to run through the entire inference. The only thing you can call "memory" is there's no random perturbations in the KV cache while there may be in re=running chat which ends up being non-deterministic. You can think of it as a deterministic seed to prevent a random conversation from it's normal non-deterministic output

mediaman 15 hours ago||||

It has trained on vast amounts of content that contains the concept of self, of course the idea of self is emergent.

And autoregressive LLMs are not stateless.

dakolli 9 hours ago||

of course the idea of self is emergent

You sound really sure of yourself, thousands of ML researchers would disagree with you that self awareness is emergent or at all apparent in large language models. You're literally psychotic if you think this is the case and you need to go touch grass.

NonHyloMorph 2 hours ago||

There is a difference between the emergence of selfawareness and the emergence of its idea. Probably

yard2010 12 hours ago||||

Imagine people would just click words on iOS auto complete mistaking this for intelligence:

"I think the problem is that when you don't have to be perfect for me that's why I'm asking you to do it but I would love to see you guys too busy to get the kids to the park and the trekkers the same time as the terrorists."

How do you like this theory?

andai 15 hours ago|||

Ask Claude about Claude.

tim-tday 16 hours ago||

So, you brain damaged your model with a system prompt.

sailfast 6 hours ago||

Posted January 2037 after the end of the second civil conflict and the first robot uprising: “Where the fascism came from”

suncore 11 hours ago||

Marketing grab

leadgenman 11 hours ago|

anyone solving the goblin mystery???

nephihaha 8 hours ago|

Surely the prevalence of fantasy fanfic etc online?

More comments...