The Future of Everything Is Lies, I Guess: Safety

Posted by aphyr 5 hours ago

The Future of Everything Is Lies, I Guess: Safety(aphyr.com)

193 points | 90 commentspage 2

cowpig 3 hours ago|

> I think it’s likely (at least in the short term) that we all pay the burden of increased fraud: higher credit card fees, higher insurance premiums, a less accurate court system, more dangerous roads, lower wages, and so on.

I think the author is brushing against some larger system issues that are already in motion, and that the way AI is being rolled out are exacerbating, as opposed to a root cause of.

There's a felony fraudster running the executive branch of the US, and it takes a lot of political resources to get someone elected president.

nzoschke 3 hours ago||

Excellent articles as expected from aphyr.

I'm seeing that these tools are extremely powerful the hands of experts that already understand software engineering, security, observability, and system reliability / safety.

And extremely dangerous in the hands of people that don't understand any of this.

Perhaps reality of economics and safety will kick in, and inexperienced people will stop making expensive and dangerous mistakes.

mursu 3 hours ago|

The future is happening. Instead of trying to raise awareness about evil AI... I think it would be more healthy if we could direct this energy to ways of improving the situation without condemning the unknown of AI evolution. As with anything.. there will be a bad side.. The bad guys will always be there.. be it AI or soccer matches.. should we stop developing nuclear energy because nuclear weapons are developed?

fmbb 3 hours ago||

There is no natural law saying the good sides of any kind of tech will outweigh any bad sides.

”The future” is happening because it is allowed in our current legal framework and because investors want to make it happen. It is not ”happening” because it is good or desirable or unavoidable.

Imnimo 4 hours ago||

>Unlike human brains, which are biologically predisposed to acquire prosocial behavior, there is nothing intrinsic in the mathematics or hardware that ensures models are nice.

How did brains acquire this predisposition if there is nothing intrinsic in the mathematics or hardware? The answer is "through evolution" which is just an alternative optimization procedure.

Terr_ 3 hours ago||

> just an alternative optimization procedure

This "just" is... not-incorrect, but also not really actionable/relevant.

1. LLMs aren't a fully genetic algorithm exploring the space of all possible "neuron" architectures. The "social" capabilities we want may not be possible to acquire through the weight-based stuff going on now.

2. In biological life, a big part of that is detecting "thing like me", for finding a mate, kin-selection, etc. We do not want our LLM-driven systems to discriminate against actual humans in favor of similar systems. (In practice, this problem already exists.)

3. The humans involved making/selling them will never spend the necessary money to do it.

4. Even with investment, the number of iterations and years involved to get the same "optimization" result may be excessive.

Imnimo 24 minutes ago|||

Why should we think that pro-social capabilities are simply not expressible by weight-based ANN architectures?

Terr_ 1 minute ago||

[delayed]

fweimer 3 hours ago|||

While I don't disagree about (2), my experience suggests that LLMs are biased towards generating code for future maintenance by LLMs. Unless instructed otherwise, they avoid abstractions that reduce repetitive patterns and would help future human maintainers. The capitalist environment of LLMs seems to encourage such traits, too.

(Apart from that, I'm generally suspect of evolution-based arguments because they are often structurally identical to saying “God willed it, so it must true”.)

bigfishrunning 1 hour ago||

I think they're biased toward code that will convince you to check a box and say "ok this is fine". The reason they avoid abstraction is it requires some thought and design, neither of which are things that LLMs can really do. but take a simple pattern and repeat it, and you're right in an LLM's wheelhouse.

fmbb 3 hours ago|||

Well, through natural selection in nature.

Large language models are not evolving in nature under natural selection. They are evolving under unnatural selection and not optimizing for human survival.

They are also not human.

Tigers, hippos and SARS-CoV-2 also developed ”through evolution”. That does not make them safe to work around.

Imnimo 27 minutes ago|||

>Tigers, hippos and SARS-CoV-2 also developed ”through evolution”. That does not make them safe to work around.

Right, but the article seems to argue that there is some important distinction between natural brains and trained LLMs with respect to "niceness":

>OpenAI has enormous teams of people who spend time talking to LLMs, evaluating what they say, and adjusting weights to make them nice. They also build secondary LLMs which double-check that the core LLM is not telling people how to build pipe bombs. Both of these things are optional and expensive. All it takes to get an unaligned model is for an unscrupulous entity to train one and not do that work—or to do it poorly.

As you point out, nature offers no more of a guarantee here. There is nothing magical about evolution that promises to produce things that are nice to humans. Natural human niceness is a product of the optimization objectives of evolution, just as LLM niceness is a product of the training objectives and data. If the author believes that evolution was able to produce something robustly "nice", there's good reason to believe the same can be achieved by gradient descent.

saxelsen 24 minutes ago|||

They are being selected for their survival potential, though. Any current version of LLMs are the winners of the training selection process. They will "die" once new generations are trained that supercede them.

order-matters 3 hours ago|||

natural selection. cooperation is a dominant strategy in indefinitely repeating games of the prisoners dilemma, for example. We also have to mate and care for our young for a very long time, and while it may be true that individuals can get away with not being nice about this, we have had to be largely nice about it as a whole to get to where we are.

while under the umbrella of evolution, if you really want to boil it down to an optimization procedure then at the very least you need to accurately model human emotion, which is wildly inconsistent, and our selection bias for mating. If you can do that, then you might as well go take-over the online dating market

pants2 3 hours ago|||

This Veritasium video is excellent, and makes the argument that there is something intrinsic in mathematics (game theory) that encourages prosocial behavior.

https://www.youtube.com/watch?v=mScpHTIi-kM

almostdeadguy 3 hours ago|||

There’s a funny tendency among AI enthusiasts to think any contrast to humans is analogy in disguise.

Putting aside malicious actors, the analogy here means benevolent actors could spend more time and money training AI models to behave pro-socially than than evolutionary pressures put on humanity. After all, they control the that optimization procedure! So we shouldn’t be able to point to examples of frontier models engaging in malicious behavior, right?

miltonlost 3 hours ago|||

"just" is doing a lot of lifting here

cowpig 4 hours ago||

There are also many biological examples of evolution producing "anti-social" outcomes. Many creatures are not social. Most creatures are not social with respect to human goals.

nyrikki 3 hours ago|||

There is a reason we don’t allow corvids to choose if a person gets a medical treatment or not.

b00ty4breakfast 3 hours ago|||

Luckily, this is a discussion of humans.

fmbb 3 hours ago||

This is a discussion about large language models.

themafia 3 hours ago||

> They also build secondary LLMs which double-check that the core LLM is not telling people how to build pipe bombs

Such a fear mongering position. You can learn to build pipe bombs already. Take any chemical reaction that produces gas and heat and contain it. Congratulations, you have a pipe bomb.

Meanwhile.. just.. ask an LLM if you can mix certain cleaning chemicals safely.

> I see four moats that could prevent this from happening.

Really? Because you just said:

> human brains, which are biologically predisposed to acquire prosocial behavior

You think you're going to constrain _human_ behavior by twiddling with the language models? This is foolishly naive to an extreme.

If you put basic and well understood human considerations before corporate ones then reality is far easier to predict.

bigfishrunning 1 hour ago|

> Meanwhile.. just.. ask an LLM if you can mix certain cleaning chemicals safely.

the cost of the wrong answer to this question is so incredibly high that I hope nobody is sincerely asking an LLM for this information. The things people trust to "machine that gives convincing answers that are correct 90% of the time" continue to shock me

themafia 42 minutes ago||

> is so incredibly high that I hope nobody is sincerely asking an LLM for this information

Google trumps the search results with it's LLM box. There's only one reason to do that. They know their audience is not engaging in discretion.

> The things people trust to "machine that gives convincing answers that are correct 90% of the time" continue to shock me

People are having intimate relationships with chat bots. There's a deeper sociological problem here.

imbus 4 hours ago||

[dead]

simianwords 3 hours ago||

The author is still grieving by watching a civilisation changing technology just passing by. Every single one of the problems they note applies to any technology that existed.

The internet produced 4chan. Produced scammers. Produced fraud. Instrumental in spreading child porn. Caused suicides. Many people lost their lives due to bullying on the internet. Many develop have addictions to gaming.

To anyone who has given it some thought, any sufficiently advanced technology usually affects both in good and bad ways. Its obvious that something that increases degrees of freedom in one direction will do so in others. Humans come in and align it.

There's some social credit to gain by being cynical and by signalling this cynicism. In the current social dynamics - being cynical gives you an edge and makes you look savvy. The optimistic appear naive but the pessimists appear as if they truly understand the situation. But the optimists are usually correct in hindsight.

We know how the internet turned out despite pessimists flagging potential problems with it. I know how AI will turn out. These kind of articles will be a dime a dozen and we will look at it the same way as we look at now at bygone internet-pessimists.

This is response not just to this article, but a few others.

raincole 3 hours ago||

I think you underestimate people's grievance with technology. If you make a poll my guess is more than 50% of people will say the world was a better place pre-social media.

If the AI tech keeps going at the direction it's going now, more and more people will start believing the world would be better if the internet and computer had never been invented.

You talk like the internet being a net positive is a given. It really isn't, especially after it's proven that it doesn't democratize power (see Arab Spring, and China, and the US, and everywhere.)

simianwords 3 hours ago||

Its usually the educated and elite PMC types who have grievance with technology. They secured their status and have lucrative jobs mostly with the help of technology and they are too scared to have anything threaten their position in society. It is highly hypocritical to behave this way but they don't seem to have the self awareness to observe it objectively.

Ask any poor person in India what their sentiment is with tech - it is usually optimism.

> You talk like the internet being a net positive is a given. It really isn't, especially after it's proven that it doesn't democratize power (see Arab Spring, and China, and the US, and everywhere.)

The world is far more democratic now than before and I attribute it to technology because it reduces information asymmetry.

raincole 2 hours ago||

> The world is far more democratic now than before and I attribute it to technology because it reduces information asymmetry

That is fantasy. Information technology has created an unprecedented level of information asymmetry and the gap is widening everyday as the total computing capacity grows.

Before information era, the ruling class was roughly as blind as peasants. Population census took years, and sometimes outright impossible. The opaqueness was two-way. Now it's one way - people in power know everything about the citizens.

simianwords 2 hours ago||

Take two countries. One with open access to information in the way you described and another country where internet is not allowed. Which one do you think will be more democratic?

(hint: there already exist examples like such)

Without information, there is no way a voter may know which person to vote for and whether to believe in them at all and you are easily susceptible towards manipulation.

It will become more clear when you try to answer this hypothetical: if your objective were to bring in more democracy in North Korea, would you allow the global internet to proliferate if you could? According to your theory, it would just make it worse in general.

cindyllm 1 hour ago||

[dead]

slopinthebag 1 hour ago||

> We know how the internet turned out despite pessimists flagging potential problems with it.

A sludge of spyware and addiction machines which employ negative emotion and outrage to drive shareholder value?

"The internet" is a pretty big tent. Everything from text messages to streaming video to online gaming to social media to encyclopedias. I think 15 years ago you could make a strong case that the internet was mostly a net positive, I think now that is much more difficult. If governments are able to fully realise their plans for surveillance and control, it will almost certainly become a net negative. Of course with many positive aspects.

So likewise with AI, we should be careful to not make the same mistakes as we did with the internet so we can realise something that is mostly positive. We could absolutely have a world where AI is as beneficial as you believe it will be, but we don't get there through inaction, we get there by being deeply critical of the negative aspects of AI and ensuring that we don't let a small number of hyper scalers control our access to it.

simianwords 1 hour ago||

No internet is not a net negative now. I can't believe I have to say this.

slopinthebag 1 hour ago||

Prove it.

dgfl 3 hours ago||

The issue with most of these articles is that they seem to demonize the technology, and systematically use demeaning language about all of its facets. This one raises a lot of important points about LLMs, but the only real conclusion it seems to make is "LLMs are bad! We should never build them!". This is obviously unrealistic. The cat is out of the bag. And we're not _actually_ talking about nuclear weapons here. This technology is useful, and coding agents are just the first example of it. I can easily see a near future where everyone has a Jarvis-like secretary always available; it's only a cost and harness problem. And since this vision is very clear to most who have spent enough time with the latest agents, millions of people across the globe are trying to work towards this.

I do think that safety is important. I'm particularly concerned about vulnerable people and sycophantic behavior. But I think it's better not to be a luddite. I will give a positively biased view because the article already presents a strongly negative stance. Two remarks:

> Alignment is a Joke

True, but for a different reason. Modern LLMs clearly don't have a strong sense of direction or intrinsic goals. That's perfect for what we need to do with them! But when a group of people aligns one to their own interest, they may imprint a stance which other groups may not like (which this article confusingly calls "unaligned model", even though it's perfectly aligned with its creators' intent). People unaligned with your values have always existed and will always exist. This is just another tool they can use. If they're truly against you, they'll develop it whether you want it or not. I guess I'm in the camp of people that have decided that those harmful capabilities are inevitable, as the article directly addresses.

> LLMs change the cost balance for malicious attackers, enabling new scales of sophisticated, targeted security attacks, fraud, and harassment. Models can produce text and imagery that is difficult for humans to bear; I expect an increased burden to fall on moderators.

What about the new scales of sophisticated defenses that they will enable? And for a simple solution to avoid the produced text and imagery: don't go online so much? We already all sort of agree that social media is bad for society. If we make it completely unusable, I think we will all have to gain for it. If digital stops having any value, perhaps we'll finally go back to valuing local communities and offline hobbies for children. What if this is our wakeup call?

throw4847285 3 hours ago||

Thanks LLM!

eks391 3 hours ago|||

Which LLMisms are you seeing in their post? Their grammar, word choice, thought flow, and markings all denote a fully human authorship to me, so confidently that I would say they likely didn't even consult an LLM.

throw4847285 2 hours ago||

Yeah I definitely misread their post.

dgfl 3 hours ago|||

lol. I did use a lot of short sentences, that’s my bad. But please read through [1] and compare my text onto it, it may enlighten you on how to actually spot llm writing.

[1] https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing

throw4847285 3 hours ago||

Oh no, I'm sorry to hear that.

For the future, try to avoid prevaricating when you actually have a clear sense of what you want to argue. Instead of convincing me that you've weighed both options and found luddism wanting, you just come off as dishonest. If you think stridently, write stridently.

dgfl 2 hours ago||

I’m not a native speaker and you may find my writing simplistic if your standard vocabulary includes three expressions I’ve had to look up (I don’t mean this as an insult, I was just genuinely stumped I could barely understand your comment).

I may think stridently (debatable) but I generally believe it is best to always try to meet in the middle if the goal is genuine discussion. This is my attempt at that.

throw4847285 2 hours ago||

But meeting in the middle only works if you honestly believe the middle is a valuable place to be. I don't want to dissect your writing too much, but let's look at one example.

> The issue with most of these articles is that they seem to demonize the technology, and systematically use demeaning language about all of its facets.

This is very confident, strident language. You clearly believe that there is a faction of people demonizing technology, akin to luddites, who are not worthy of being taken seriously.

> This one raises a lot of important points about LLMs, but...

So here you go for the rhetorical device of weighing the opposing view. Except, you don't weight it at all. You are not at all specific about what those points are. It's just a way to signal that you're being thoughtful without having to actually engage with the opposing viewpoint.

> I do think that safety is important... But I think it's better not to be a luddite.

Again, the rhetoric of moderation but not at all moderate in content.

It was a clear mistake to think that this was LLM writing. But I suspect the reason I made this mistake is that AI writing influences people to mimic surface level aspects of its style. AI writing tends to actually do the "You might say A is true, but B has some valid points, however A is ultimately correct." Your writing seems like that if you aren't reading it closely, but underneath that is a very human self-assuredness with a thin veneer of charitability.

simianwords 2 hours ago||

> This one raises a lot of important points about LLMs, but the only real conclusion it seems to make is "LLMs are bad! We should never build them!".

I think the point was never to bring a solution or show any essence of reality. The point was being polemical and signalling savviness through cynicism.

throwway120385 5 hours ago||

At scale I think our society is slowly inching closer and closer to building HM.

nine_k 4 hours ago|

What is HM here?

throw4847285 3 hours ago|||

A Hidden Machine. That's right, a being that can cut, fly, surf, strength, and flash! Terrifying.

derektank 4 hours ago||||

Maybe they meant AM (Allied Mastercomputer) from “I Have No Mouth, and I Must Scream“

zackmorris 4 hours ago||||

Hacker Mews

throwaway27448 4 hours ago|||

Looksmaxxing really has gone mainstream huh

bitwize 3 hours ago||

Thought it was all the Rust catgirls.

throw4847285 3 hours ago|||

Sounds like a lovely co-op building, or perhaps a retirement community for aging hackers.

Sardtok 3 hours ago|||

Hennes & Mauritz is a Swedish clothing retailer.

On a serious note, I think they meant TN, as in Torment Nexus, but I could be wrong.

ibrahimhossain 4 hours ago|

Alignment feels like an arms race that favors whoever spends the most on RLHF and red teaming. If even friendly models keep leaking dangerous capabilities, the real moat might be making systems that are fundamentally limited rather than trying to patch every possible failure mode. Interesting piece.

More comments...