Posted by Brajeshwar 5 days ago
"And it confessed in writing" - no, it created probabilistically token after token based on the context without any other access to what happened.
LLMs can't explain themselves in the manner relevant here, much less confess.
In D&D 3.5 edition, there was a rule about how you could "take 20" on a d20 roll to get a guaranteed 20 by taking 20 times as long in-game to perform the action, but only if it was a check that didn't have consequences for failure, since it was essentially a shortcut to skip the RNG of rolling until you rolled a 20. Maybe framing it like this might make sense to people a bit more, but if not, I'll at least have more fun making my case.
Not picking on you specifically, but in general the comments here have me wondering if AI has stolen our basic reading comprehension, or if we were always this bad.
Anyway, take “LLM user had delete permission” off your list and add “deleting the production db also deletes all the backups” to the list.
The issue isn't with the amount of guardrails in place to perform an action. Yes, it is obvious that there should be some in place before doing any critical operation, such as deleting a database.
The issue is that the "agent" completely disregarded instructions, which in the age of "skills" and "superpowers" seems like an important issue that should be addressed.
Considering that these tools are given access to increasingly sensitive infrastructure, allowed to make decisions autonomously, and are able to find all sorts of loopholes in order to make "progress", this disaster could happen even with more guardrails in place. Shifting the blame on the human for this incident is sweeping the real issue under the rug, and is itself irresponsible.
There are far scarier scenarios that should concern us all than losing some data.
There is currently no way to prevent this apart from not giving the LLM full control. It will not delete what it can not delete.
Use an LLM to write an ansible playbook or some terraform code if you want, but review it, test it, apply it. Keep backups (3-2-1 rule at minimum).
Letting an LLM have access to everything is just a bad idea and will lead to bad outcomes. You can not replace a person with a mind and experience with an LLM. You can try. But you will probably fail.
But deleting something is just one action you might not want it to take.
The recent "agentic" craze is fueled by the narrative pushed by companies and influencers alike that the more access given to an LLM, the more useful it becomes. I think this is ludicrous for the same reasons as you, but it is evident that most people agree with this.
We can blame users for misusing the tools, and suggest that sandboxing is the way to go, but at the end of the day most people will favor convenience over anything else a reasonable person might find important.
So at what point should we start blaming the tools, and forcing "AI" companies to fix them? I certainly hope this is done before something truly catastrophic happens.
Still if I cut off my finger with a bandsaw that is usually my fault. I didn't use tool in a safe way. People have to learn how to use their tools in a safe way. You wouldn't give an intern that much power on day one.
Plausible text sometimes is right, sometimes not.
Humans have a world model, a model of what happens. LLMs have a model of what humans would plausibly say.
The only good guardrail seems human-in-the-loop.
I'm getting so tired of this.
On a more technical level very serious people have voiced doubts, for example Richard Sutton in an interview with Dwarkash Patel [1].
[1] https://m.youtube.com/watch?v=21EYKqUsPfg&pp=ygUnZmF0aGVyIG9...
It actually helps. I do copy backups to another place as well. One backup is good, but two is better.
>Why does a public-facing API that can delete all your production databases even exist?
Because it takes time and effort to build an API, and even if you build an API with a structured permission system so that only an admin can delete stuff the users probably won't spend the effort to use it. Because they're running a rental car SAAS business not a mission critical mars mission.
The best I can say is that with the advent of AI these choices could be different now, but I don't think they will be. I think fundamentally a fuck up every few months at a rental car SAAS company in exchange for 30% higher velocity/30% lower cost is probably fine.