The Future of Everything Is Lies, I Guess: Safety

Posted by aphyr 6 hours ago

The Future of Everything Is Lies, I Guess: Safety(aphyr.com)

240 points | 134 commentspage 3

dgfl 5 hours ago|

The issue with most of these articles is that they seem to demonize the technology, and systematically use demeaning language about all of its facets. This one raises a lot of important points about LLMs, but the only real conclusion it seems to make is "LLMs are bad! We should never build them!". This is obviously unrealistic. The cat is out of the bag. And we're not _actually_ talking about nuclear weapons here. This technology is useful, and coding agents are just the first example of it. I can easily see a near future where everyone has a Jarvis-like secretary always available; it's only a cost and harness problem. And since this vision is very clear to most who have spent enough time with the latest agents, millions of people across the globe are trying to work towards this.

I do think that safety is important. I'm particularly concerned about vulnerable people and sycophantic behavior. But I think it's better not to be a luddite. I will give a positively biased view because the article already presents a strongly negative stance. Two remarks:

> Alignment is a Joke

True, but for a different reason. Modern LLMs clearly don't have a strong sense of direction or intrinsic goals. That's perfect for what we need to do with them! But when a group of people aligns one to their own interest, they may imprint a stance which other groups may not like (which this article confusingly calls "unaligned model", even though it's perfectly aligned with its creators' intent). People unaligned with your values have always existed and will always exist. This is just another tool they can use. If they're truly against you, they'll develop it whether you want it or not. I guess I'm in the camp of people that have decided that those harmful capabilities are inevitable, as the article directly addresses.

> LLMs change the cost balance for malicious attackers, enabling new scales of sophisticated, targeted security attacks, fraud, and harassment. Models can produce text and imagery that is difficult for humans to bear; I expect an increased burden to fall on moderators.

What about the new scales of sophisticated defenses that they will enable? And for a simple solution to avoid the produced text and imagery: don't go online so much? We already all sort of agree that social media is bad for society. If we make it completely unusable, I think we will all have to gain for it. If digital stops having any value, perhaps we'll finally go back to valuing local communities and offline hobbies for children. What if this is our wakeup call?

throw4847285 5 hours ago||

Thanks LLM!

eks391 4 hours ago|||

Which LLMisms are you seeing in their post? Their grammar, word choice, thought flow, and markings all denote a fully human authorship to me, so confidently that I would say they likely didn't even consult an LLM.

throw4847285 4 hours ago||

Yeah I definitely misread their post.

dgfl 4 hours ago|||

lol. I did use a lot of short sentences, that’s my bad. But please read through [1] and compare my text onto it, it may enlighten you on how to actually spot llm writing.

[1] https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing

throw4847285 4 hours ago||

Oh no, I'm sorry to hear that.

For the future, try to avoid prevaricating when you actually have a clear sense of what you want to argue. Instead of convincing me that you've weighed both options and found luddism wanting, you just come off as dishonest. If you think stridently, write stridently.

dgfl 4 hours ago||

I’m not a native speaker and you may find my writing simplistic if your standard vocabulary includes three expressions I’ve had to look up (I don’t mean this as an insult, I was just genuinely stumped I could barely understand your comment).

I may think stridently (debatable) but I generally believe it is best to always try to meet in the middle if the goal is genuine discussion. This is my attempt at that.

throw4847285 4 hours ago||

But meeting in the middle only works if you honestly believe the middle is a valuable place to be. I don't want to dissect your writing too much, but let's look at one example.

> The issue with most of these articles is that they seem to demonize the technology, and systematically use demeaning language about all of its facets.

This is very confident, strident language. You clearly believe that there is a faction of people demonizing technology, akin to luddites, who are not worthy of being taken seriously.

> This one raises a lot of important points about LLMs, but...

So here you go for the rhetorical device of weighing the opposing view. Except, you don't weight it at all. You are not at all specific about what those points are. It's just a way to signal that you're being thoughtful without having to actually engage with the opposing viewpoint.

> I do think that safety is important... But I think it's better not to be a luddite.

Again, the rhetoric of moderation but not at all moderate in content.

It was a clear mistake to think that this was LLM writing. But I suspect the reason I made this mistake is that AI writing influences people to mimic surface level aspects of its style. AI writing tends to actually do the "You might say A is true, but B has some valid points, however A is ultimately correct." Your writing seems like that if you aren't reading it closely, but underneath that is a very human self-assuredness with a thin veneer of charitability.

simianwords 3 hours ago||

> This one raises a lot of important points about LLMs, but the only real conclusion it seems to make is "LLMs are bad! We should never build them!".

I think the point was never to bring a solution or show any essence of reality. The point was being polemical and signalling savviness through cynicism.

throwway120385 6 hours ago||

At scale I think our society is slowly inching closer and closer to building HM.

nine_k 6 hours ago|

What is HM here?

throw4847285 5 hours ago|||

A Hidden Machine. That's right, a being that can cut, fly, surf, strength, and flash! Terrifying.

derektank 6 hours ago||||

Maybe they meant AM (Allied Mastercomputer) from “I Have No Mouth, and I Must Scream“

zackmorris 6 hours ago||||

Hacker Mews

throwaway27448 6 hours ago|||

Looksmaxxing really has gone mainstream huh

bitwize 5 hours ago||

Thought it was all the Rust catgirls.

throw4847285 5 hours ago|||

Sounds like a lovely co-op building, or perhaps a retirement community for aging hackers.

Sardtok 5 hours ago|||

Hennes & Mauritz is a Swedish clothing retailer.

On a serious note, I think they meant TN, as in Torment Nexus, but I could be wrong.

ibrahimhossain 6 hours ago||

Alignment feels like an arms race that favors whoever spends the most on RLHF and red teaming. If even friendly models keep leaking dangerous capabilities, the real moat might be making systems that are fundamentally limited rather than trying to patch every possible failure mode. Interesting piece.

jazzpush2 6 hours ago||

Every one of these posts is immediately pushed to the front page, this one within 4 minutes.

aphyr 6 hours ago||

It's been weirdly uneven. Sections 1, 3, and 5 did well on HN; 2, 4, and 6 sank with essentially no trace. The distribution of views is presently:

1. Introduction: 33,088 (https://news.ycombinator.com/item?id=47689648)

2. Dynamics: 3,659 (https://news.ycombinator.com/item?id=47693678)

3. Culture: 5,914 (https://news.ycombinator.com/item?id=47703528)

4. Information Ecology: 777 (https://news.ycombinator.com/item?id=47718502)

5. Annoyances: 7,020 (https://news.ycombinator.com/item?id=47730981)

6. Psychological Hazards: 199 (https://news.ycombinator.com/item?id=47747936)

Feedback from early readers was that the work was too large to digest in a single reading, so I split it up into a series of posts. I'm not entirely sure this was the right call; the sections I thought were the most interesting seem to have gotten much less attention than the introductory preliminaries.

dgfl 4 hours ago|||

I think these articles may benefit from a more thorough table of content at the beginning, or from some kind of abstract. If you briefly presented the whole list of topics in a single article, it would be more clear that your views on the topic are more complete. I initially thought the table of content would be scoped to the article itself rather than connecting it to the adjacent ones.

I had never heard of you, and this article appeared very biased to me. I found the information ecology piece superior, shame that it went unnoticed; I will try to go through all of them. I admire the breadth of topics you’re covering and appreciate the many sources. They’re clearly written in your own voice and that is great to see, I guess I mostly reacted to not being fully aligned with your view.

simoncion 5 hours ago|||

I'm not sure that HN vote count is a good indicator of interest? HN alerted me to the existence of the intro post. I read the intro, noticed that it was one in an ongoing series, and have been checking your blog for new installments every few days.

I suspect that if you'd not broken up the post into a series of smaller ones, the sorts of folks who are unwilling to read the whole thing as you post it section by section would have fed the entire post to an LLM to "summarize".

acdha 6 hours ago|||

That’s unsurprising given the author’s long history in the tech community. A ton of people see that domain and upvote.

jazzpush2 6 hours ago||

Sure, but 4 front-page posts from the same url in 4 days surely sits at the tail of the distribution. (I guess they all capitalize on the same 'LLM-is-bad' sentiment).

zdragnar 6 hours ago|||

It's also aphyr, who is incredibly popular. Take one very popular author, have him write a series of posts on the zeitgeist everyone can't help but talk about, and yes, the outcome is that his posts are extremely popular.

I still remember his takedown of mongodb's claims with the call me maybe post years and years ago filling me with a good bit of awe.

macintux 5 hours ago||

When I worked for Basho, aphyr was highly respected by some of the smartest people I’d ever worked with. Definitely no slouch.

borski 6 hours ago||||

It’s because it’s aphyr.

If ‘tptacek posts a blog post, I bet it similarly does well, on average, because they’re a “known quantity” around these parts, for example.

acdha 2 hours ago|||

Different URL, same domain, and exactly the kind of thing I’d expect a fair number of HN readers to have in a feed reader where they’d see it shortly after publication and decide to share it.

Also, if you think this is just “LLM is bad”, I highly suggest reading the series first. The social impacts they talked about at the start of the series should resonate with a lot of people here and are exactly the kind of thing which people building systems should talk about. If you’re selling LLMs, you still want to think about how what you’re building will affect the larger society you live in and the ways that could go wrong—even if we posit sociopath/MBA-levels of disregard for impacts on other people, you still want to think about how LLMs change the fraud and security landscape, how the tools you build can be misused, how all of this is likely to lead to regulatory changes.

tptacek 5 hours ago|||

A statement broadly true of most things this author writes.

stronglikedan 6 hours ago||

that's just, like, how HN works. people post, people like, people upvote, people discuss

conquera_ai 5 hours ago||

Feels like we’re repeating classic distributed systems lessons: assume failure, constrain blast radiusand never trust components that can’t explain themselves reliably

ibrahimhossain 4 hours ago|

Exactly assuming failure and constraining the blast radius feels like the only reliable path when the models themselves are black boxes. Patch based alignment starts looking fragile pretty quickly

amarant 4 hours ago||

There's really only one thing we need to do to avoid the apocalypse, and that is to not hand over the launch codes to a LLM.

Seems easy enough, I'm actually pretty confident in even the most incompetent of current world leaders in this particular task.

anon35 4 hours ago|

You don't think a human using an LLM to generate content that convinces another human to press the launch button is a concern? Sure seems like there's more than one thing we need to do.

mossTechnician 3 hours ago|||

The exact same concern already existed without LLMs. It is called social engineering, and has been a known risk for a while.

amarant 3 hours ago|||

Honestly? I really don't! What kind of content do you think would trigger that? If humans were launching nukes based on Facebook posts we'd all be long dead! A good deep fake might trick your grandma, but it's not very likely to fool military intelligence.

atleastoptimal 4 hours ago|

There really are only 3 options that don't involve human destruction:

1. AI becomes a highly protected technology, a totalitarian world government retains a monopoly on its powers and enforces use, and offers it to those with preexisting connections: permanent underclass outcome

2. Somehow the world agrees to stop building AI and keep tech in many fields at a permanent pre-2026 level: soft butlerian jihad

3. Futurama: somehow we get ASI and a magical balance of weirdness and dance of continual disruption keeps apocalypse in check and we accept a constant steady-state transformation without paperclipocalypse

bigfishrunning 3 hours ago||

Scenario 2 makes the assumption that no technological development can happen without AI, which seems like a stretch to me. Honestly, the worst scenario i can think of is 40ish years of AI assisted development followed by a technological crash due to there being no competent engineers left to fix the slop.

atleastoptimal 2 hours ago||

I didn't say all technological development would be halted, just that tech "in many fields" would have to be stalled for safety (AI development, algorithm development that would reduce the cost of training models, etc)> Naturally if AI is considered an existential threat there would be a huge safety radius for things that would allow bad-actors to train AI models.

tomjen3 4 hours ago|||

This makes the assumption that AI will lead to the apocalypse. That's unfalsifiable, predicted about plenty of things in the past, and frankly annoying to keep seeing pop up.

Its like listening to Christians talking about the rapture.

atleastoptimal 2 hours ago||

The problem is that if someone is right about an existential disaster caused by AI, by the time they're proven right it would be too late.

Frontier AI models get smarter every year, humans but humans don't get any smarter year over year. If you don't believe that somehow AI will just suddenly stop getting better (which is as much a faith-based gamble as assuming some rapturous outcome for AI by default), then you'd have to assume that at some point AI will surpass human intelligence in all fields, and the keep going. In that case human minds and overall will will be onconsequential compared to that of AI.

zozbot234 2 hours ago||

Frontier AI models get evaluated for safety precisely to avert the "AI robot uprising causes an existential disaster" scenario. At the moment we are light years away from anything like that ever happening, and that's after we literally tried our best to LARP that very scenario into existence with things like moltbook and OpenClaw.

nyc_data_geek1 4 hours ago|||

Cool story, bro!

cindyllm 4 hours ago|||

[dead]

raincole 4 hours ago||

In other words, only one option.