Posted by atombender 13 hours ago
Problem 1: the agent does something destructive by accident — rm -rf, hard git revert, writes to the wrong config. Filesystem sandboxing solves this well.
Problem 2: the agent does something destructive because it was prompt-injected via a file it read. Sandboxing doesn't help here — the agent already has your credentials in memory before it reads the malicious file.
The only real answer to problem 2 is either never give the agent credentials that can do real damage, or have a separate process auditing tool calls before they execute. Neither is fully solved yet.
Agent Safehouse is a clean solution to problem 1. That's genuinely useful and worth having even if problem 2 remains open.
1. I built this because I like my agents to be local. Not in a container, not in a remote server, but running on my finely-tuned machine. This helps me run all agents on full-auto, in peace.
2. Yes, it's just a policy-generator for sandbox-exec. IMO, that's the best part about the project - no dependencies, no fancy tech, no virtualization. But I did put in many hours to identify the minimum required permissions for agents to continue working with auto-updates, keychain integration, and pasting images, etc. There are notes about my investigations into what each agent needs https://agent-safehouse.dev/docs/agent-investigations/ (AI-generated)
3. You don't even need the rest of the project and use just the Policy Builder to generate a single sandbox-exec policy you can put into your dotfiles https://agent-safehouse.dev/policy-builder.html
I've seen sandbox policy documents for agents before, but this is the first ready-to-use app I've come across.
I've only had a couple of points of friction so far:
- Files like .gitconfig and .gitignore in the home folder aren't accessible, and can't be made accessible without granting read only access to the home folder, I think?
- Process access is limited, so I can't ask Claude to run lldb or pkill or other commands that can help me debug local processes.
More fine-grained control would be really nice.
For handling global rules (like ~/.gitconfig and ~/.gitignore), I keep a local policy file that whitelists my "shared globals" paths, and I tell Safehouse to include that policy by default. I just updated the README with an example that might be useful[1]. I also enabled access to ~/.gitignore by default as it's a common enough default.
For process management, there is a blurry line about how much to allow without undermining the sandboxing concept. I just added new integrations[2] to allow more process control and lldb, but I don't know this area well. You can try cloning the repo, asking your agents to tweak the rules in the repo until your use-case works, and send a PR - I'll merge it!
Alternatively, using the "custom policy" feature above, you can selectively grant broad access to your tools (you can use log monitoring to see rejections, and then add more permisions into the policy file)
[1] https://github.com/eugene1g/agent-safehouse?tab=readme-ov-fi...
The process control policy, that's kind of niche and should definitely not be something agents are always allowed to do, so having a shorthand flag like you added in that pull request is the right choice.
I'm sure Anthropic and the other major players will catch up and add better sandboxing eventually, but for now, this tool has been exactly what I needed — many thanks!
I also wonder if this could have be a plugin or MCP server? I was using this plugin [1] for a bit, and it appears to use a "PreToolUse" that modifies every tool invocation. The benefit here would be that you could even change the Safehouse settings inside a session, e.g. turn process control on or off.
- I asked the agent to change my global git username, Codex asked my permission to execute `git config --global user.name "Botje"` and after I granted permission, it was able to change this global configuration.
- I asked it to list my home directory and it was able to (this time without Codex asking for permission).
I've been trying to get microsandbox to play nicely. But this is much closer to what I actually need.
I glimpsed through the site and the script. But couldn't really see any obvious gotchas.
Any you've found so far which hasn't been documented yet?
But lately I’ve been using agents to test via browsers, and starting headless browsers from the agent is flakey. I’m working on that but it’s hard to find a secure default to run Chrome.
In the repo, I have policies for running the Claude desktop app and VSCode inside the same sandbox (so can do yolo mode there too), so there is hope for sandboxing headless Chrome as well.
Did a migration myself last week from using playwright mcp towards playwright-cli instead. Which has been playing much nicer so far. I guess you would run into the same issues you've already mentioned about running chrome headless in one of these sandboxes.
I'll for sure keep an eye out for updates.
Kudos to the project!
Yet the first thing I find in your README is that to install your tool I need to trust some random server serve me an .sh file that I will execute in my computer (not sure if with sudo... but still).
Come on man, give me a tarball :)
EDIT: PS: before someone gives me the typical "but you could have malware in that tarball too!!!", well, it's easier to inspect what's inside the tarball and compare it to the sources of the repo, maybe also take a look at the CI of the repo to see if the tarball is really generated automatically from the contents of the repo ;)
Alternatively, you can feed these instructions to your LLM and have it generate you a minimal policy file and a shell wrapper https://agent-safehouse.dev/llm-instructions.txt
Anyway, thanks for building Agent Safehouse.
I've been trying out similar things to help internal teams to use systems and languages like Rego (for Open Policy Agent) to have a visual and more 'a la carte' experience when starting out, so they don't have to jump straight to learning all syntax and patterns for a language they might have never seen before.
The downside is that it requires access to more than it technically needs (Claude keys for example). I’m working on a version where you sandbox the agent’s Bash tool, not the agent itself. https://github.com/Kiln-AI/Kilntainers
I honestly think that sandboxing is currently THE major challenge that needs to be solved for the tech to fully realise its potential. Yes the early adopters will YOLO it and run agents natively. It won't fly at all longer term or in regulated or more conservative corporate environments, let alone production systems where critical operations or data are in play.
The challenge is that we need a much more sophisticated version of sandboxing than anybody has made before. We can start with network, file system and execute permissions - but we need way more than that. For example, if you really need an agent to use a browser to test your application in a live environment, capture screenshots and debug them - you have to give it all kinds of permissions that go beyond what can be constrained with a traditional sandboxing model. If it has to interact with resources that cost money (say, create cloud resources) then you need an agent aware cloud cost / billing constraint.
Somehow all this needs to be pulled together into an actual cohesive approach that people can work with in a practical way.
Have you considered that it's unsolvable? Or - at least - there is an irreconcilable tension between capability and safety. And people will always choose the former if given the choice.
The most unsolvable part is prompt injection. For that you need full tracking of the trust level of content the agent is exposed to and a method of linking that to what actions it has accessible to it. I actually think this needs to be fully integrated to the sandboxing solution. Once an agent is "tainted" its sandbox should inherently shrink down to the radius where risk is balanced with value. For example, my fully trusted agent might have a balance of $1000 in my AWS account, while a tainted one might have that reduced to $50.
So another aspect of sanboxing is to make the security model dynamic.
One idea is to have the coding agent write a security policy in plan mode before reading any untrusted files:
That's not the case with Agent Safehouse - you can give your agent access to select ~/.dotfiles and env, but by default it gets nothing (outside of CWD)
[1] https://www.tomshardware.com/tech-industry/artificial-intell...
This looks like a competent wrapper around sandbox-exec. I've seen a whole lot of similar wrappers emerging over the past few months.
What I really need is help figuring out which ones are trustworthy.
I think this needs to take the form of documentation combined with clearly explained and readable automated tests.
Most sandboxes - including sandbox-exec itself - are massively under-documented.
I am going to trust them I need both detailed documentation and proof that they work as advertised.
Your point is totally fair for evaluating security tooling. A few notes -
1. I implemented this in Bash to avoid having an opaque binary in the way.
2. All sandbox-exec profiles are split up into individual files by specific agent/integration, and are easily auditable (https://github.com/eugene1g/agent-safehouse/tree/main/profil...)
3. There are E2E tests validating sandboxing behavior under real agents
4. You don't even need the Safehouse Bash wrapper, and can use the Policy Builder to generate a static policy file with minimal permissions that you can feed to sandbox-exec directly (https://agent-safehouse.dev/policy-builder). Or feed the repo to your LLMs and have them write your own policy from the many examples.
5. This whole repo should be a StrongDM-style readme to copy&paste to your clanker. I might just do that "refactor", but for now added LLM instructions to create your own sandbox-exec profiles https://agent-safehouse.dev/llm-instructions.txt
Would xcodebuild work in this context? Presumably I'd watch a log (or have an agent) and add permissions until it works?
Yes, Safehouse should work for xcodebuild workloads in the way you described - try to run it, watch for failures, extend the profile, try again. Your agent can do this in a loop by itself - just feed it the repo as there are many integrations that are not enabled by default that will help it.
I like that it's just a shell script.
I do wish that there was a simple way to sandbox programs with an overlay or copy-on-write semantics (or better yet bind mounts). I don't care if, in the process of doing some work, an LLM agent modifies .bashrc -- I only care if it modifies _my_ .bashrc
┌─ YOLO shell ──────────────────────┬─ Outer shell ─────────────────────┐
│ │ │
│ yoloai new myproject . -a │ │
│ │ │
│ # Tell the agent what to do, │ │
│ # have it commit when done. │ │
│ │ yoloai diff myproject │
│ │ yoloai apply myproject │
│ │ # Review and accept the commits. │
│ │ │
│ # ... next task, next commit ... │ │
│ │ yoloai apply myproject │
│ │ │
│ │ # When you have a good set of │
│ │ # commits, push: │
│ │ git push │
│ │ │
│ │ # Done? Tear it down: │
│ │ yoloai destroy myproject │
└───────────────────────────────────┴───────────────────────────────────┘
Works with Docker, Seatbelt, and Tart backends (I've even had it build an iOS app inside a seatbelt container).Re “overlay FS” - I too wish this was possible on Macs, but the closest I got was restricting agents to be read-only outside of CWD which, after a few turns, bullies them into working in $TMP. Not the same though.
It's tailored to play nicely with Git: spin up sandboxes form CLI, expose TCP/UDP ports of apps to check your work, and if running hosted sandboxes, share the sandbox URLs with teammates. I basically want running sandboxed agents to be as easy as `git clone ...`.
Docs are early and edges are rough. This week I'm starting to dogfood all my dev using Amika. Feedback is super appreciated!
FYI: we are also a startup, but local sandbox mgmt will stay OSS.
Just use Docker, or a VM.
The other issue is that this does not facilitate unpredictable file access -- I have to mount everything up front. Sometimes you don't know what you need. And even then copying in and out is very different from a true overlay.
It sounds like a big part of your use case is to safely give an agent control of your computer? Like, for things besides codegen?
We're probably not going to directly support that type of use case, since we're focused on code-gen agents and migrating their work between localhost and the cloud.
We are going to add dynamic filesystem mounting, for after sandbox creation. Haven't figured out the exact implementation yet. Might be a FUSE layer we build ourselves. Mutagen is pretty interesting as well here.
The main issue I want to solve is unexpected writes to arbitrary paths should be allowed but ultimately discarded. macOS simply doesn't offer a way to namespace the filesystem in that way.
Apple is likely preparing to remove it for a secure alternative and all it takes is someone to find a single or a bunch of multiple vulnerabilities in sandbox-exec to give a wake up call to everyone why were they using it in the first place.
I predict that there is a CVE lurking in sandbox-exec waiting to be discovered.
The security researchers will leverage every part of the OS stack to bypass the sandbox in XNU which they have done multiple times.
Now, there is a good reason for them to break the sandbox thanks to the hype of 'agents'. It could even take a single file to break it. [0]
> My guess is sandbox-exec is deprecated more because it never was adequately documented rather than because it’s flawed in some way.
You do not know that. I am saying that it has been bypassed before and having it being used all over the OS doesn't mean anything. It actually makes it worse.
[0] https://the-sequence.com/crashone-cve-2025-24277-macos-sandb...
Apple can still decide to change it for any reason, regardless of who uses it, since it is undocumented for their use anyway.
> I’m not sure Apple could remove it even if they were sufficiently motivated to.
It can take multiple security issues for them to remove it.
Basically, give an agent its own unprivileged user account (interacting with it via sudo, SSH, and shared directories), then add sandbox-exe on top for finer-grained control of access to system resources.
I also found the author to be helpful and responsive and the tool to be nicely minimalistic rather than the usual vibe coded ever expanding mess.
‘brew install sandvault’ and running ‘sv’ should get you going.
(full disclosure: I created the Homebrew formula and submitted a few PRs to the project)
$ container system start
$ container run -d --name myubuntu ubuntu:latest sleep infinity
$ container exec myubuntu bash -c "apt-get update -qq && apt-get install -y openssh-server"
$ container exec myubuntu bash -c "
apt-get install -y curl &&
curl -fsSL https://deb.nodesource.com/setup_lts.x |
bash - &&
apt-get install -y nodejs
"
$ container exec myubuntu npm install -g @anthropic-ai/claude-code
$ container exec myubuntu claude --versionIts manpage has been saying it's deprecated for a decade now, yet we're continuing to find great uses for it. And the 'App Sandbox' replacement doesn't work at all for use cases like this where end users define their own sandbox rules. Hope Apple sees this usage and stops any plans to actually deprecate sandbox-exec. I recall a bunch of macOS internal services also rely on it.
In particular, has the profile language ever been documented by anything other than the examples used by the OS and third parties reverse engineering it?