Posted by 0o_MrPatrick_o0 12 hours ago
- totally unsandboxed but I supervise it in a tight loop (the window just stays open on a second monitor and it interrupts me every time it needs to call a tool).
- unsupervised in a VM in the cloud where the agent has root. (I give it a task, negotiate a plan, then close the tab and forget about it until I get a PR or a notification that it failed).
I want either full capabilities for the agent (at the cost of needing to supervise for safety) or full independence (at the cost of limited context in a VM). I don't see a productive way to mix and match here, seems you always get the worst of both worlds if you do that.
Maybe the usecase for this particular example is where you are supervising the agent but you're worried that apparently-safe tool calls are actually quietly leaving a secret that's in context? So it's not that it's a 'mixed' usecase but rather it's just increasing safety in the supervised case?
Why in the cloud and not in a local VM?
I've re-discovered Vagrant and have been using it exactly for this and it's surprisingly effective for my workflows.
https://blog.emilburzo.com/2026/01/running-claude-code-dange...
[1] - https://en.wikipedia.org/wiki/Turtles_all_the_way_down
There are theoretical risks of Claude getting fully owned and going rogue, and doing the iterative malicious work to escape a weaker sandbox, but it seems substantially less likely to me, and therefore perhaps not (currently) worth the extra work.
I see there are cloud VMs like at kilocode but they are kind if useless IMO. I can only interact with the prompt and not the code base directly. Too many things go wrong and maybe I also want kilo code to run a docker stack for me which it can't in the agent cloud.
Yes! I'm surprised more people do not want this capability. Check out my comment above, I think Vagrant might also be what you want.
--bind "$HOME/.claude" "$HOME/.claude"
That directory has a bunch of of sensitive stuff in it, most notable the transcripts of all of your previous Claude Code sessions.You may want to take steps to avoid a malicious prompt injection stealing those, since they might contain sensitive data.
> I can’t take that token and run Cloudflare provisioning on your behalf, even if it’s “only” set as an env var (it’s still a secret credential and you’ve shared it in chat). Please revoke/rotate it immediately in Cloudflare.
So clearly they've put some sort of prompt guard in place. I wonder how easy it would be to circumvent it.
I use a lot of ansible to manage infra, and before I learned about ansible-vault, I was moving some keys around unprotected in my lab. Bad hygiene- and no prompt intervening.
Kinda bums me out that there may be circumstances where the model just rejects this even if you for some reason you needed it.
With the unpack directory, you can now limit the host paths you expose, avoiding leaking in details from your host machine into the sandbox.
bwrap --ro-bind image/ / --bind src/ /src ...
Any tools you need in the container are installed in the image you unpack.
Some more tips: Use --unshare-all if you can. Make sure to add --proc and --dev options for a functional container. If you just need network, use both --unshare-all and --share-net together, keeping everything else separate. Make sure to drop any privileges with --cap-drop ALL
Let me know if you give it a go ;)
Internet to connect with the provider, install packages, and search.
It's not perfect but it's a start.
You must not care about those systems that much.
[1]: https://repology.org/project/bubblewrap/information https://repology.org/project/landrun/information
1. allow an agent to run wild in some kind of isolated environment, giving the "tight loop" coding agent experience so you don't have to approve everything it does.
2. let it execute the code it's creating using some credentials to access an API or a server or whatever, without allowing it to exfil those creds.
If 1 is working correctly I don't see how 2 could be possible. Maybe there's some fancy homomorphic encryption / TEE magic to achieve this but like ... if the process under development has access to the creds, and the agent has unfettered access to the development environment, it is not obvious to me how both of these goals could be met simultaneously.
Very interested in being wrong about this. Please correct me!
If you bind-mount the directory, the sandbox can see the commands, but executing them won’t work since it can’t access the secret service.
Mysql user: test
Password: mypass123
Host: localhost
...
1. I never use permanent credentials for AWS on my local computer.
2. I never have keys anywhere on my local computer. I put them in AWS Secret Manager.
3. My usual set of local access keys can’t create IAM roles (PowerUserAccess).
It’s not foolproof. But it does reduce the attack surface.