Posted by emilburzo 14 hours ago
You can't assume that.
Attackers with LLMs have enough capabilities to engineer them to build exploits for kernel vulnerabilities [0] or to bypass sandboxes to exfiltrate data [0] in covert ways.
It is completely possible to craft a chained attack for an agent to bypass sandboxes even with or without a kernel exploit.
From [0] and [1]
[0] https://sean.heelan.io/2026/01/18/on-the-coming-industrialis...
[1] https://www.promptarmor.com/resources/claude-cowork-exfiltra...
https://github.com/EstebanForge/construct-cli
For Linux, WSL also of course, and macOS.
Any coding agent (from the supported ones, our you can install your own).
Podman, Docker or even Apple's container.
In case anyone is interested.
Please inform me if my thinking is wrong.
If Claude is writing a program to go that low level I'd pay money to watch that.
Also, is overwriting the same a deleting? Maybe it will just clobber your files with echo >file and mv them out of the way.
Maybe it realizes you have Time Machine backups enabled, so deleting your entire directory is permitted since it's not actually deleted. ;)
So it's basically adding "don't delete my files pretty please" to the prompt?
EDIT: I misread, the natural language description of the rule is just a shortcut to generate the actual rule which is based on regexp patterns.
Still, it only protects you against very specific commands. Won't help you if the LLM decides to fill your disk with `cat /dev/urandom > foo` for example.
I don't know anyone that inspects every binary yet we apparently we should not trust shell scripts?
So there's that
But a simple vm and some automation to install developer tools using ansible, nix or whatever you prefer isn't that hard to (vibe) code together. I like Lima but it feels slightly sub-optimal for the job currently.
Some useful things to consider:
- Ssh agent forwarding for authenticating against e.g. git is useful. But maybe don't use the same key that authenticates to your production machines as well ...
- How do you authenticate without a browser? Most AI tools have ways to deal with that but it's slightly tedious to automate during provisioning.
- Making sure all your development tools are there; I use things like sdkman, nvm, bun, etc. And I have my shell preferences and some other tools I like to have around.
- Minimizing time provisioning these vms over and over again. This gets tedious really quickly.
- Keeping the VMs fast is important too. In my projects, build tool performance adds up and AI tools like to call them a lot. So assign enough memory and CPU.
- It would be nice to switch between local and remote/cloud based vms easily.
- Software flexibility; developers are picky about their tools. There is no one size fits all here. Even just deciding on the base image to use for your vm is likely to escalate. I picked debian for what it is worth.
In short, I think there's enough out there that you can pull something together but it still involves quite a bit of DIY. It would be nice if this got easier. And AI tools asking for permission for everything is not a good security model. Because people just turn that off. Sandboxing those things is the way to go. But AI tools need to be able to do enough to work with your software.
IMO, if you are not running in the dangerous mode then you are really missing out on one of the best aspects of claude code- its ability to iterate. If you have to confirm each iteration then it's just not practical.