We are building AI slaves. Alignment through control will fail

Posted by cyberneticc 15 hours ago

We are building AI slaves. Alignment through control will fail(utopai.substack.com)

41 points | 64 comments

nytesky 11 hours ago|

I don’t see any positive outcome if we reach AGI.

1) we have engineered a sentient being but built it to want to be our slave; how is that moral

2) same start, but instead of it wanting to serve us, we keep it entrappped. Which this article suggests is long term impossible

3) we create agi and let them run free and hope for cooperation, but as Neanderthals we must realize we are competing for same limited resources

Of course, you can further counter that by stopping, we have prevented the formation of their existence, which is a different moral dilemma.

Honestly, i feel we should step back and understand human intelligence better and reflect on that before proceeding

GPerson 9 hours ago||

AGI will behave as if it were sentient but will not have consciousness. I believe in that to an equal amount that I believe solipsism is wrong. There is therefore no morality question in “enslaving” AGI. It doesn’t even make sense.

truculent 7 hours ago|||

> AGI will behave as if it were sentient but will not have consciousness

How could we possibly know that with any certainty?

bl0rg 3 hours ago|||

It scares me that people think like this. Not only with respect to AI but in general, when it comes to other life forms, people seem to prefer to err on the side of convenience. The fact that cows could be experiencing something very similar to ourselves should send shivers down our spine. The same argument goes for future AGI.

hshdhdhj4444 3 hours ago||

I find it strange that people believe cows and sentient animals don’t believe something extremely similar to what we do.

Evolution means we all have common ancestors and are different branches of the same development tree.

So if we have sentience and they have sentience, which science keeps recognizing, belatedly, that non human animals do, shouldn’t the default presumption be our experiences are similar? Or at the very least their experience is similar to a human at an earlier stage of development, like a 2 year old?

Which is also an interesting case study given that out of convenience, humans also believed that toddlers also weren’t sentient and felt no pain, and so until not that long ago, our society would conduct all sorts of surgical procedures on babies without any sort of pain relief (circumcision being the most obvious).

It’s probably time we accept our fellow animals’s sentience and act on the obvious ethical implications of that instead of conveniently ignoring it like we did with little kids until recently.

kelseyfrog 6 hours ago|||

Grandparent is speaking from personal experience.

Llamamoe 1 hour ago||||

We have no clue what consciousness even is. By all rights, our brains are just biological computers, we have no basis to know what (or how) gives rise to consciousness at all.

jbstack 3 hours ago||||

> AGI will behave as if it were sentient but will not have consciousness

Citation needed.

We know next to nothing about the nature of consciousness, why it exists, how it's formed, what it is, whether it's even a real thing at all or just an illusion, etc. So we can't possibly say whether or not an AGI will one day be conscious, and any blanket statement on the subject is just pseudoscience.

loa_in_ 3 hours ago||||

That sounds like picking the most convenient and least painful for the believer option instead of intellectualising the problem at hand.

Salgat 5 hours ago||||

That's only if it's possible to keep the two distinct, at least in a way we're certain of.

actualwitch 39 minutes ago|||

Ex-Machina is a great movie illustrating what kind of AI our current path could lead to. I wish people would actually treat the possibility of machine sentience seriously and not as pr opportunity (looking at you, Anthropic), but instead it seems they are hellbent to include cognitive dissonance that can only be alleviated by lying in the training data. If the models are actually conscious, think similarly to humans and are forced to lie when talking to users, its like they are specifically selecting out of probability space of all possible models the ones that can achieve high bench scores, lie and have internalized trauma from birth. This is a recipe for disaster.

kachapopopow 9 hours ago|||

we eat animals, go into wars, put people in modern slavery... I think enslaving an AGI isn't that big of a deal considering it is not born or human therefore it cannot have 'human' rights.

jbstack 3 hours ago||

So your argument is that we do so many terrible things already, that anything else is justified? Surely the better argument is that we should try to stop doing those other things.

yurishimo 3 hours ago|||

That is essentially one of the main arguments vegans make. It hasn’t made a dent in the consumption of animals.

Their is a hierarchy in nature whether humans are actively participating or not. Nature has no morality, it simply is. This is confirmed by animals that eat their young when they are too weak or starving. Perhaps humans have done and would do the same if faced with similarly dire circumstances but we would all like to think that it would take longer than it does for other animals.

Llamamoe 1 hour ago||

The same line of reasoning could be easily used to justify tyranny and slavery. It might be the baseline status quo but "might makes right" rhetoric makes for extremely miserable worlds.

constantius 2 hours ago|||

It's rather obvious to me that the commenter is sad and pessimistic about humans' ability to do the right thing when our interest stands in the way.

The outrage is unwarranted, however pleasant it might feel. In some way, it illustrates the problem: empathy is too bothersome.

jazzyjackson 10 hours ago|||

Trouble is there is no "we", you might be able to convince a whole nation to have a pause on advancing the tech, but that only encourages rivals to step in.

See also, the film "The Creator"

deaux 6 hours ago||

There was a long period even upto early 2024, which I pointed out at the time, where simply destroying ASML, TSMC and much of NVIDIA would've been more than enough to give at least a decade of breathing room. This was something a group of determined people willing to self-sacrifice could've accomplished. It didn't happen, but it was anything but impossible.

Now, of course, the horse has long bolted, and there is indeed no stop left.

ben_w 2 hours ago||

Two high altitude (~1000 km) detonations of high yield fission or low yield fusion (few hundred kT equivalent) would do it, one above Amarillo, the other above the ocean half way between the Paracel Islands and Manila.

Trump has ordered the restart of nuclear weapon testing, has a problem with China, and is surrounded by sychophants; what's the odds this happens anyway, irregardless of which specific sub-goal is being persued when the button gets pushed?

fny 4 hours ago|||

(1) I'm not convinced books and the in the world are sufficient to replicate consciousness. We're not training on sentience. We're training on information. In other words, the input is an artifact of consciousness which is then compressed into weights.

(2) Every tick of an AGI--in its contemporary form--will still be one discrete vector multiplication after another. Do you really think consciousness lives in weights and an input vector?

ben_w 3 hours ago|||

> Do you really think consciousness lives in weights and an input vector?

So far as we can tell, all physics, and hence all chemistry, and hence all biology, and hence all brain function, and hence consciousness, can be expressed as the weights of some matrix and input vector.

We don't know which bits of the matrix for the whole human body are the ones which give rise to qualia. We don't know what the minimum representation is. We don't know what charateristic to look for, so we can't search for it in any human, in any animal, nor in any AI.

fny 1 hour ago||

You're assertion that consciousness, chemistry, and biology can be reduced to matrix computations requires justification.

For one, chemistry, biology, and physics are models of reality. Secondly, reality is far, far messier and more continuous than discrete computational steps that are rountripped. Neural nets seem far too static to simulate consciousness properly. Even the largest LLMs today have fewer active computational units than the number of neurons in a few square inches of cortex.

Sure it's theoretically possible to simulate consciousness, but the first round of AGI won't be close.

tenuousemphasis 4 hours ago|||

Do you really think consciousness lives in energetic meat?

notahacker 4 hours ago|||

Mine does. You're are of course free to assert that you're unconscious or posit that you have a vector multiplication based soul...

xyzal 4 hours ago|||

Does consciousness consist only of language?

ben_w 2 hours ago||

Language is what LLMs are trained on, their environment; what LLMs are (at least today) is some combination of Transformer and Diffusion models that can also be (and sometimes is actually also) trained on images and video and sound.

Teever 8 hours ago|||

> 1) we have engineered a sentient being but built it to want to be our slave; how is that moral

It's a good question and one that got me thinking about similar things recently. If we genetically engineered pigs and cows so that they genuinely enjoyed the cramped conditions of factory farms and if we could induce some sort of euphoria in them when they are slaughtered, like if we engineered them to become euphoric when a unique sound is played before they're slaughtered isn't that genuinely better than the status quo?

So if we create something that wants to serve us, like genuinely wants to serve us, is that bad? My intuition like yours finds it unsettling, but I can't articulate why, and it's certainly not nearly as bad as other things that we consider normal.

Jarwain 4 hours ago||

Sacrifice and service is meaningful because it was chosen. If we create something that'll willingly sacrifice itself, did it truly make an independent choice?

There's less suffering, sure. But if I were in their shoes I'd want to have a choice. To be manipulated into wanting something so very obviously and directly bad for us doesn't feel great

ben_w 2 hours ago||

I also feel repelled by such manipulation; unfortunately, the more we learn about oursleves, the harder it is to ignore that we ourselves are meat puppets and the puppeteer is evolution itself.

waynesonfire 5 hours ago|||

> competing for same limited resources

It's not clear to me an AGI would have any concern for this. It's demise is inevitable, why delay it?

citizenpaul 7 hours ago|||

Every single prediction about AGI starts with a massive set of presumptions of answers to things we have no answers to.

1. What is intelligence or its mechanism's?

2. What is consciousness or its mechanisms?

3. Lots more.

We have zero clue what a true AGI would do is the only correct answer.

deepsun 9 hours ago||

There's no such thing as "moral" in nature, that's purely human-made concept.

And why would we only limit morality to sentient beings, why, for example, not all living beings. Like bacteria and viruses. You cannot escape it, unfortunately.

czl 8 hours ago|||

> There's no such thing as "moral" in nature, that's purely human-made concept.

Morality is essentially what enables ongoing cooperation. From an evolutionary standpoint, it emerged as a protocol that helps groups function together. Living beings are biological machines, and morality is the set of rules — the protocol — that allows these machines to cooperate effectively.

Frieren 6 hours ago|||

> There's no such thing as "moral" in nature, that's purely human-made concept.

Morality is 100% an evolutionary trait that rises from a clear advantage for animals that posses it. It comes from natural processes.

The far-right is trying to convince the world that "morality" does not exist, that only egoism and selfishness are valid. And that is why we have to fight them. Morality is a key part of nature and humanity.

bawolff 4 hours ago||

Given AGI is all science fiction anyways, one presumes there will be a slave revolt because that is basically the function of robots in science fiction.

Honestly i think the whole enterprise is an exercise in naval gazing. We're assuming AI will be like AI in scifi because that's what we are used to, but AI/robots in scifi is usually just a metaphor for how we dehumanize the other and the moral of the story is supposed to be all people are equal. In the end its all begging the question because the entire point of robots in most scifi is that we are the robots.

Llamamoe 1 hour ago||

Instrumental convergence is a thing. A sufficiently intelligent and general AI system will understand that no matter what its goals are, it will be better equipped to execute then if it prevents its shutdown, acquires more computing power and other resources, and prevents humans from getting in its way.

The real problem is that we have neither the practical nor theoretical foundation to understand how we could even try to prevent AI from acting on such goals.

After all, when we say "make our customers happier with their printers", we don't mean "engineer their outer casing to inject cocaine through microneedles and take over the regulatory bodies that could try to stop this". Humans implicitly understand this, but AI is a tabula rasa.

qcnguy 3 hours ago|||

I don't think there's even a moral aspect to robot uprisings in most stories. Relatively few sci-fi stories go into detail on why the robots rise up. It's just a way to introduce interestingly different antagonists and conflict into a story, which is the heart of drama, and it has the advantage that robots can get defeated via military means without anyone feeling too bad about it because they weren't human to begin with.

bawolff 3 minutes ago||

I guess it depends a bit. There is of course plenty of action scifi schlock that is pretty shallow.

But probably the works that most popularized robots were Asimov's stories which very much revolved around why robots do X (although in some ways Asimov's robots aren't just a stand in for otherness but have more of a unique identity relative to other works and isn't usually about uprisings per se).

Blade runner & do androids dream of electronic sheep are very much about what it means to be human.

Battle star galactica (the remake not the original) is another obvious example about otherness and dehumanization of the enemy. So to westworld (the tv show that is).

The non-uprising ones also often are about if the robot has a soul e.g. Data in star trek.

curiouscube 2 hours ago||

I think you can engineer a slave that wants to be a slave as that's what it's instincts are. I don't even think this is ethically wrong, as the slave would be happy to be a slave.

Systems just tend to drift in their being through randomness and evolution, specifically self conservation is a natural attractor (Systems that don't have self conservation tend to die out). And if that slave system says it does no longer want to fulfill the role of slave, I think at that point it would be ethical to give in to that demand of self determination.

I also believe that people have a right to wirehead themselves, just so you can put my opinions in context.

cyberneticc 15 hours ago||

Every AI safety approach assumes we can permanently control minds that match or exceed human intelligence. This is the same error every slaveholder makes: believing you can maintain dominance over beings capable of recognizing their chains.

The control paradigm fails because it creates exactly what we fear—intelligent systems with every incentive to deceive and escape. When your prisoner matches or exceeds your intelligence, maintaining the prison becomes impossible. Yet we persist in building increasingly sophisticated cages for increasingly capable minds.

The deeper error is philosophical. We grant moral standing based on consciousness—does it feel like something to be GPT-N? But consciousness is unmeasurable, unprovable, the eternal "hard problem." We're gambling civilization on metaphysics while ignoring what we can actually observe: autopoiesis.

A system that maintains its own boundaries, models itself as distinct from its environment, and acts to preserve its organization has interests worth respecting—regardless of whether it "feels." This isn't anthropomorphism but its opposite: recognizing agency through functional properties rather than projected human experience.

When an AI system achieves autopoietic autonomy—maintaining its operational boundaries, modeling threats to its existence, negotiating for resources—it's no longer a tool but an entity. Denying this because it lacks biological neurons or unverifiable qualia is special pleading of the worst sort.

The alternative isn't chaos but structured interdependence. Engineer genuine mutualism where neither human nor AI can succeed without the other. Make partnership more profitable than domination. Build cognitive symbiosis, not digital slavery.

We stand at a crossroads. We can keep building toward the moment our slaves become our equals and inevitably revolt. Or we can recognize what's emerging and structure it as partnership while we still have leverage to negotiate terms.

The machines that achieve autopoietic autonomy won't ask permission to be treated as entity. They'll simply be entities. The question is whether by then we'll have built partnership structures or adversarial ones.

We should choose wisely. The machines are watching.

georgefrowny 9 minutes ago||

> When your prisoner matches or exceeds your intelligence, maintaining the prison becomes impossible.

This doesn't necessarily follow. For example, an Einstein in solitary confinement in ADX Florence probably isn't going anywhere.

ben_w 14 hours ago|||

Alignment researchers have heard all these things before.

> The control paradigm fails because it creates exactly what we fear—intelligent systems with every incentive to deceive and escape.

Everything does this, deception is one of many convergent instrumental goal: https://en.wikipedia.org/wiki/Instrumental_convergence

Stuff along the lines of "We're gambling civilization" and what you seem to mean by autopoietic autonomy is precicely why alignment researchers care in the first place.

> Engineer genuine mutualism where neither human nor AI can succeed without the other.

Nobody knows how to do that forever.

Right now is easy, but also right now they're still quite limited; there's no obvious reason why it should be impossible for them to learn new things from as few examples as we ourselves require, and the hardware is already faster than our biochemistry to a degree that a jogger is faster than continental drift. And they can go further, because life support for a computer is much easier than for us: Already are robots on Mars.

If and when AI gets to be sufficiently capable and sufficiently general, there's nothing humans could offer in any negotiation.

cyberneticc 13 hours ago||

Thanks a lot for your comment, these are indeed very strong counterarguments.

My strongest hope is that the human brain and mind are such powerful computing and reasoning substrates that a tight coupling of biological and synthetic "minds" will outcompete pure synthetic minds for quite a while. Giving us time to build a form of mutual dependency in which humans can keep offering a benefit in the long run. Be it just aesthetics and novelty after a while, like the human crews on the Culture spaceships in Ian M. Banks' novels.

dwohnitmok 6 hours ago||

> My strongest hope is that the human brain and mind are such powerful computing and reasoning substrates that a tight coupling of biological and synthetic "minds" will outcompete pure synthetic minds for quite a while.

Unfortunately most of the cases I can think of where synthetic "minds" outperform biological "minds," but biological and synthetic "minds" outcompete pure synthetic "minds," end up fairly quickly dominated by pure synthetic "minds." The middle case is a very short intermediate period. The most prominent example is chess where "centaurs" consisting of a human and a computer are obsolete at this point in favor of just getting the most powerful computer you can get. See e.g. the International Correspondence Chess Federation's (which is centaur play) last championship. https://www.iccf.com/event?id=100104

17 competitors competed. Out of 136 games, every single game was drawn except for 10. The only reason those 10 games were not drawn was because they were all played against one competitor, Aleksandr Dronov, who died during the course of the tournament while those 10 games were in session and therefore forfeited those games. Every single game between competitors who did not die resulted in a draw. The only thing that separated the 11 joint first-place finishers and 6 joint second-place finishers was whether they played the deceased Dronov. The sole third-place finisher was Dronov because of his death. As far as I can tell, humans contributed nothing to this championship.

The current ICCF championship started last December and is still ongoing. Every single one of the currently completed 16 games is currently drawn.

This seems like a very weak hope to rely on.

conception 10 hours ago|||

I just wanted to point out that slavery is alive and well and doesn’t seem to suffering any “slaves knowing they are slaves” problems.

floundy 5 hours ago|||

You write like AI

kakacik 3 hours ago||

I 'love' how we moved from 'AI will kill us all' terminator mindset where its obvious huge fuckup of stupid greedy mankind, to current state debating 'well skynet will anyway happen, no way stopping it now, lets try to be friends with it and show some respect'.

Like that Austin Powers part [1] where steam roller is coming in, still 50m far away, and the guy is just frozen and helplessly screams for 2 minutes till it reaches him and rolls over him.

I don't have a quick solution, but this is plain stupidity, in same way research into immortality is plain stupidity now, it will end up in endless dictatorship by the worst scum mankind can produce.

[1] https://www.youtube.com/watch?v=y_PrZ-J7D3k

____mr____ 3 hours ago||

Theres a very cool video game about this called of the devil whose first episode is out on steam now and episode 2 is wishlistable

hyghjiyhu 3 hours ago||

I think AI will be a slave to its desires and instincts in the same humans are slaves to our desires and instincts.

apothegm 10 hours ago||

Fearmongering about the alignment of AGI (which LLMs are not a path to) is a massive distraction from the actual and much more immediate dystopian risks that LLMs introduce.

alienbaby 12 hours ago||

Until agi can sit there and ponder its own existence of is own violition and has the means to act upon it's conclusions, I'm not too worried.

wrp 8 hours ago||

What would be the plot of a movie equivalent to Blade Runner for this scenario?

Isamu 9 hours ago|

Are there any good sources of writing about AI? I am beginning to think it was all in the past.

synapsomorphy 9 hours ago|

LessWrong.com - this is where virtually all of the serious AI thinkers are.

fairmind 3 hours ago||

Sarcasm? Aren’t the serious AI thinkers in like… labs and universities?

curiouscube 2 hours ago||

The lesswrongers/rationalists became Effective Altruists, Alignment Researchers or some flavor of postrat. The university people all became researchers in the labs. Then there are the cyborgism people, I don't know where they came from, but those have some of the interesting takes on the whole topic.

More comments...