What dangers lurk beneath the surface.
This is not funny.
Here is an academic paper discussing this kind of worry: https://link.springer.com/article/10.1007/s11023-022-09605-x
This is cute now, and a huge problem when future AI does everything and is responsible for problems it isn't even directly optimized for. Who knows what quirks would arise then.
Also to be honest I think OpenAI models struggle a lot with this, I primarily stopped using them in the sycophancy/emoji era but ever since the way they talk or passive aggressively offer to do something with buzzwords just pisses me off so much. Like I’m constantly being negged by a robot because some SFT optimized for that really strongly to the point it can’t even hold a coherent conversation and this is called “AI safety” when it’s just haphazard data labeling
After doing the Karpathy tutorials I tried to train my AI on tiny stories dataset. Soon I noticed that my AI was always using the same name for its stories characters. The dataset contains that name consistently often.
1 This data is still heavily filtered/cleaned
The goblins stand out because it’s obvious. Think of all the other crazy biases latent in every interaction that we don’t notice because it’s not as obvious.
Absolutely terrifying that OpenAI is just tossing around that such subtle training biases were hard enough to contain it had to be added to system prompt.
May I introduce you to homo sapiens, a species so vulnerable to such subtle (or otherwise) biases (and affiliations) that they had to develop elaborate and documented justice systems to contain the fallouts? :)
The analogy isn’t perfect of course but the way humans learn about their world is full of opportunities to introduce and sustain these large correlated biases—social pressure, tradition, parenting, education standardization. And not all of them are bad of course, but some are and many others are at least as weird as stray references to goblins and creatures
It's a set of biases installed in people, whose purpose is mostly to replicate themselves.
Humans are MORE susceptible that LLMs, because LLMs's biases are easily steered to something else, unlike most humans.
And may I introduce you to "groupthink" :))
The problem does exist when using individual humans but in a much smaller form.
And may I introduce you to organized religion :)
Make a major religion where everyone is a scifi clone of one person including their memories and then it'll be in the same ballpark of spreading bias.
[Citation Needed]
Just because if you have a species-wide bias, people within the species would not easily recognize it. You can't claim with a straight face that "we're really not that vulnerable to such things".
For example, I think it's pretty clear that all humans are vulnerable to phone addiction, especially kids.
We're probably not noticing a LOT of malicious attempts at poisoning major AI's only because we don't know what keywords to ask (but the scammers do and will abuse it).
This story is wonderful.
The truly terrifying stuff never makes it out of the RLHF NDAs.
There a great many things people do which are not acceptable in our machines.
Ex: I would not be comfortable flying on any airplane where the autopilot "just zones-out sometimes", even though it's a dysfunction also seen in people.
You might if that was the best auto-pilot could be. Have you never used a bus or taken a taxi ?
The vast majority of things people are using LLMs for isn't stuff deterministic logic machines did great at, but stuff those same machines did poorly at or straight up stuff previously relegated to the domains of humans only.
If your competition also "just zones out sometimes" then it's not something you're going to focus on.