The Future of Everything Is Lies, I Guess: Safety

Posted by aphyr 3 hours ago

The Future of Everything Is Lies, I Guess: Safety(aphyr.com)

193 points | 90 comments

dredmorbius 7 minutes ago|

Other articles in this series discussed over the past five days:

1. Introduction: <https://news.ycombinator.com/item?id=47689648> (619 comments)

2. Dynamics: <https://news.ycombinator.com/item?id=47693678> (0 comments)

3. Culture: <https://news.ycombinator.com/item?id=47703528>

4. Information Ecology: <https://news.ycombinator.com/item?id=47718502> (106 comments)

5. Annoyances: <https://news.ycombinator.com/item?id=47730981> (171 comments)

6. Psychological Hazards: <https://news.ycombinator.com/item?id=47747936> (0 comments)

And this submission makes:

7. Safety: <https://news.ycombinator.com/item?id=47754379> (89 comments, presently).

There's also a comprehensive PDF version for those who prefer that kind of thing: <https://aphyr.com/data/posts/411/the-future-of-everything-is...> (PDF) 26 pp.

(Derived from aphyr's comment: <https://news.ycombinator.com/item?id=47754834>.)

jagged-chisel 1 hour ago||

"Alignment"

In what world would I ever expect a commercial (or governmental) entity to have precise alignment with me personally, or even with my own business? I argue those relationships are necessarily adversarial, and trusting anyone else to align their "AI" tool to my goals, needs, and/or desires is a recipe for having my livelihood completely reassigned into someone else's wallet.

sigbottle 1 hour ago||

Interesting you single out commercial and government entities but not people. What defines the difference? Bureaucracy? Concentration of resources? Legal theory?

I guess I'm trying to wonder why this line of thinking (in theory) doesn't turn to paranoia about everybody. I don't know much ethics or political theory or anything.

jagged-chisel 54 minutes ago|||

> … paranoia about everybody

It does. People drive these entities. People hide behind the liability shields and authority of these entities. Also notice that I generalized with the phrase “…and trusting anyone…”

robot-wrangler 16 minutes ago||||

You can tell that broad alignment between people is natural just by looking at the effort that corporations and governments make to undermine it. Alignment between people is perhaps not a state of nature, but it really is a pretty normal consequence of a fairly small amount of education and of middle-class existence that is left to itself (i.e. without brain-washing and deliberately working to create out-groups). If you're eating enough and have a few brain cells to rub together, then you definitely want that for your neighbors too because it promotes stability.

zozbot234 3 minutes ago||

> You can tell that broad alignment between people is natural

It really isn't. The whole point of the market system is to collectively align people's actions towards a shared target of "Pareto-optimized total welfare". And even then the alignment is approximate and heavily constrained due to a combination of transaction costs (which also account for e.g. externalities) and information asymmetries. But transaction costs and information asymmetries apply to any system of alignment, including non-market ones. The market (augmented with some pre-determined legal assignment of property rights, potentially including quite complex bundles of rules and regulations) is still your best bet.

MaulingMonkey 12 minutes ago|||

> Interesting you single out commercial and government entities but not people. What defines the difference? Bureaucracy? Concentration of resources? Legal theory?

Not OP, but for me, kind family and friends, and various feel-good pieces of fiction and other writing, at least let me envision the possibility of a perfectly kind/dedicated/innocent/naieve individual who is truly on my side 100%. But even that is mostly imagination and fiction... although convincing others of that isn't necessairly an argument worth making.

Commercial entities have a fundamental purpouse of profit. While profit doesn't have to be a zero-sum game - ideally, everyone benefits in a somewhat balanced way - there's some fundamental tension, in that each party's profit is necessairly limited by the other party's.

Government entities have a fundamental purpouse of executing the will of the state, which is rather explicitly not the same thing as the will of you as an individual.

Both commercial and government entities also tend to involve multiple people, which gets statistics working against you - you really gathered that many people who would put your needs above their own, with exactly zero "imposters" - which in this context just means people with a bit of rational self interest?

> I guess I'm trying to wonder why this line of thinking (in theory) doesn't turn to paranoia about everybody. I don't know much ethics or political theory or anything.

Just because you're paranoid, doesn't mean they aren't out to get you. Trust, but verify.

You might not be able to put absolute blind trust in anybody. I certainly can't. However, one can hedge one's bets, and diversify trust. Build social circles of people with good character, good judgement, and calm temperments - and statistics will start working for you. It's unlikely they'll all conspire to betray you simultaniously, especially if you've ensured betrayal costs much and gains little. While petty and jealous people can indeed be irrational enough to betray under such circumstances, it'll be harder for them to create the kind of conspiracy necessary for mass betrayal that might cause significant enough damage to warrant proper paranoia. You might still have to watch out for gaslighters stealing credit (document your work!) and framing people (document your character!) and other such dishonest and manipulative behavior... but if everyone's looking out for the same thing, well, that's just everyone looking out for everyone else! That's a community looking out for each other, and holding everyone honest and accountable. Most find comfort in that, rather than the stress paranoia implies.

Put yourself in a room full of manipulators and schemers, on the other hand, and "parnoia about everyone" might be the only reasonable or rational response!

tyrust 22 minutes ago|||

> precise alignment with me personally, or even with my own business

Seems like a strawman, I don't think anyone means this when talking about alignment.

More general goals, like avoiding paperclip maximization, are broadly applicable to humanity.

__MatrixMan__ 1 hour ago||

You could expect such a thing in a world where consent was currency, rather than scarcity.

philipkglass 2 hours ago||

In short, the ML industry is creating the conditions under which anyone with sufficient funds can train an unaligned model. Rather than raise the bar against malicious AI, ML companies have lowered it.

This is true, and I believe that the "sufficient funds" threshold will keep dropping too. It's a relief more than a concern, because I don't trust that big models from American or Chinese labs will always be aligned with what I need. There are probably a lot of people in the world whose interests are not especially aligned with the interests of the current AI research leaders.

"Don't turn the visible universe into paperclips" is a practically universal "good alignment" but the models we have can't do that anyhow. The actual refusal-guards that frontier models come with are a lot more culturally/historically contingent and less universal. Lumping them all under "safety" presupposes the outcome of a debate that has been philosophically unresolved forever. If we get hundreds of strong models from different groups all over the world, I think that it will improve the net utility of AI and disarm the possibility of one lab or a small cartel using it to control the rest of us.

pixl97 1 hour ago|

I mean that does partially reduce the chances of a cartel, but not really near as likely as you think.

Most countries have a pretty strong ban on most kinds of weapons, the US is one of the few that lets everyone run around with their rooty tooty point and shooty, but most countries have implemented bans. Some because the government doesn't want the people having them, and in others the citizens call for the bans because they don't like the idea of getting shot by their fellow citizens.

It won't be long before citizens and governments get tired of models being used for criminal activities and will eventually lay down laws around this. Models will have to be registered and safety tested, strict criminal prosecution will happen if you don't. And the big model companies will back their favorite politicians to ensure this will happen to.

Now, that in general will be helpful as there will still be more models, but it will still not be a free for all.

Cynddl 3 hours ago||

> "Unavailable Due to the UK Online Safety Act"

Anyone outside the UK can share what this is about?

0x3444ac53 2 hours ago||

https://web.archive.org/web/20260413164025/https://aphyr.com...

satvikpendem 1 hour ago|||

Ironic.

tristramb 48 minutes ago|||

Use the Tor browser

starik36 1 hour ago|||

What specifically is unsafe in this article?

onei 1 hour ago||

It's not that the article is inherently unsafe, it's that the UK law imposes a liability the author is unwilling to shoulder.

sieabahlpark 2 hours ago|||

[dead]

jazzpush2 3 hours ago||

The Future of Everything is Lies, I Guess: Safety Software LLM The Future of Everything is Lies I Guess 2026-04-13 New machine learning systems endanger our psychological and physical safety. The idea that ML companies will ensure “AI” is broadly aligned with human interests is naïve: allowing the production of “friendly” models has necessarily enabled the production of “evil” ones. Even “friendly” LLMs are security nightmares. The “lethal trifecta” is in fact a unifecta: LLMs simply cannot safely be given the power to fuck things up. LLMs change the cost balance for malicious attackers, enabling new scales of sophisticated, targeted security attacks, fraud, and harassment. Models can produce text and imagery that is difficult for humans to bear; I expect an increased burden to fall on moderators. Semi-autonomous weapons are already here, and their capabilities will only expand.

Alignment is a Joke Well-meaning people are trying very hard to ensure LLMs are friendly to humans. This undertaking is called alignment. I don’t think it’s going to work.

First, ML models are a giant pile of linear algebra. Unlike human brains, which are biologically predisposed to acquire prosocial behavior, there is nothing intrinsic in the mathematics or hardware that ensures models are nice. Instead, alignment is purely a product of the corpus and training process: OpenAI has enormous teams of people who spend time talking to LLMs, evaluating what they say, and adjusting weights to make them nice. They also build secondary LLMs which double-check that the core LLM is not telling people how to build pipe bombs. Both of these things are optional and expensive. All it takes to get an unaligned model is for an unscrupulous entity to train one and not do that work—or to do it poorly.

I see four moats that could prevent this from happening.

First, training and inference hardware could be difficult to access. This clearly won’t last. The entire tech industry is gearing up to produce ML hardware and building datacenters at an incredible clip. Microsoft, Oracle, and Amazon are tripping over themselves to rent training clusters to anyone who asks, and economies of scale are rapidly lowering costs.

Second, the mathematics and software that go into the training and inference process could be kept secret. The math is all published, so that’s not going to stop anyone. The software generally remains secret sauce, but I don’t think that will hold for long. There are a lot of people working at frontier labs; those people will move to other jobs and their expertise will gradually become common knowledge. I would be shocked if state actors were not trying to exfiltrate data from OpenAI et al. like Saudi Arabia did to Twitter, or China has been doing to a good chunk of the US tech industry for the last twenty years.

Third, training corpuses could be difficult to acquire. This cat has never seen the inside of a bag. Meta trained their LLM by torrenting pirated books and scraping the Internet. Both of these things are easy to do. There are whole companies which offer web scraping as a service; they spread requests across vast arrays of residential proxies to make it difficult to identify and block.

Fourth, there’s the small armies of contractors who do the work of judging LLM responses during the reinforcement learning process; as the quip goes, “AI” stands for African Intelligence. This takes money to do yourself, but it is possible to piggyback off the work of others by training your model off another model’s outputs. OpenAI thinks Deepseek did exactly that.

To make matters worse, the current efforts at alignment don’t seem to be working all that well. LLMs are complex chaotic systems, and we don’t really understand how they work or how to make them safe. Even after shoveling piles of money and gobstoppingly smart engineers at the problem for years, supposedly aligned LLMs keep sexting kids, obliteration attacks can convince models to generate images of violence, and anyone can go and download “uncensored” versions of models. Of course alignment prevents many terrible things from happening, but models are run many times, so there are many chances for the safeguards to fail. Alignment which prevents 99% of hate speech still generates an awful lot of hate speech. The LLM only has to give usable instructions for making a bioweapon once.

We should assume that any “friendly” model built will have an equivalently powerful “evil” version in a few years. If you do not want the evil version to exist, you should not build the friendly one! You should definitely not reorient a good chunk of the US economy toward making evil models easier to train. ...

jazzpush2 3 hours ago||

To be clear, that's not the full article, just the intro (though the whole thing isn't too long)

macintux 3 hours ago||

Previous discussions from earlier posts on the topic:

* https://news.ycombinator.com/item?id=47703528

* https://news.ycombinator.com/item?id=47730981

krishna3145 39 minutes ago||

https://www.researchgate.net/publication/403780821_Adversari...

ramoz 1 hour ago||

Aside from the sentiment and arguments made–

You don't need to train new models. Every single frontier model is susceptible to the same jailbreaks they were 3 years ago.

Only now, an agent reading the CEOs email is much more dangerous because it is more capable than it was 3 years ago.

weinzierl 1 hour ago||

Oh boy, that’s a very generous view of human nature.

The cynic in me agrees with the article’s premise, but not because I believe "alignment is a joke", but because I doubt that humans are "biologically predisposed to acquire prosocial behavior."

goatlover 1 hour ago|

Human cooperation is the norm not the exception.

amarant 1 hour ago||

There's really only one thing we need to do to avoid the apocalypse, and that is to not hand over the launch codes to a LLM.

Seems easy enough, I'm actually pretty confident in even the most incompetent of current world leaders in this particular task.

anon35 50 minutes ago|

You don't think a human using an LLM to generate content that convinces another human to press the launch button is a concern? Sure seems like there's more than one thing we need to do.

mossTechnician 11 minutes ago|||

The exact same concern already existed without LLMs. It is called social engineering, and has been a known risk for a while.

amarant 14 minutes ago|||

Honestly? I really don't! What kind of content do you think would trigger that? If humans were launching nukes based on Facebook posts we'd all be long dead! A good deep fake might trick your grandma, but it's not very likely to fool military intelligence.

quantified 1 hour ago|

The Garden of Eden story is an apocryphal fable. But it sort of has a relevant twang to it.

Geoffrey Hinton will not have his liver pecked out every day like Prometheus does.

throwanem 44 minutes ago|

Are you sure? In some mythologies, the basilisk is notably birdlike, I believe.

More comments...