Agent Safehouse – macOS-native sandboxing for local agents

Posted by atombender 15 hours ago

Agent Safehouse – macOS-native sandboxing for local agents(agent-safehouse.dev)

577 points | 142 commentspage 2

alpb 8 hours ago|

As I understand it, the problem nowadays doesn't seem to be so much that the agent is going to rm -rf / my host, it's more like it's going to connect to a production system that I'm authorized to on my machine or a database tool, and then it's going to run a potentially destructive command. There is a ton of value of running agents against production systems to troubleshoot things, but there are not enough guardrails to prevent destructive actions from the get-go. The solution seems to be specific to each system, and filesystem is just one aspect out of many.

crossroadsguy 8 hours ago|

As I understand it, the problem is these apps/agents can do all of these and lot more (if not absolutely everything, while I am sure it can go quite close to doing that).

Solution could be two parts:

OS bringing better and easier to use OS limitations (more granular permissions; install time options and defaults which will be visible to user right there and user can reject that with choices like:

- “ask later”

- “no”

- “fuck no”

with eli5 level GUIs (and well documented). Hell, a lot of these are already solved for mobile OS. While not taking away tools away from hands of the user who wants to go inside and open things up (with clear intention and effort; without having to notarise some shit or pay someone).

2. Then apps[1] having to, forced to, adhere to use those or never getting installed.

[1] So no treating of agents as some “other” kinds of apps. Just limit it for every app (unless user explicitly decides to open things up).

It will also be a great time to nuke the despicable mess like Electron Helpers and shit and app devs considering it completely fine to install a trillion other “things” when user installed just one app without explaining it in the beginning (and hence forced to keep their apps’ tentacles simple and limited)

carderne 3 hours ago||

How do agents tend to deal with getting blocked? Messing around with sandboxes, I've quite even seen them get blocked, assume something is wrong, and go _crazy_ trying to get around the block, never stopping to ask for user input. It might be good to add to the error message: "This is deliberate, don't try to get around it."

For those using pi, I've built something similar[1] that works on macOS+Linux, using sandbox-exec/bubblewrap. Only benefit over OP is that there's some UX for temporarilily/permanently bypassing blocks.

[1] https://github.com/carderne/pi-sandbox

e1g 3 hours ago||

Claude Code and Codex quickly figure out they are inside sandbox-exec environment. Maybe because they know it internally. Other agents often realize they are being blocked, and I haven't seen them go haywire yet.

Big love for Pi - it was the first integration I added to Safehouse. I wanted something that offers strong guarantees across all agents (I test and write them nonstop), has no dependencies (e.g., the Node runtime), and is easy to customize, so I didn't use the Anthropic sandbox-runtime.

carderne 3 hours ago||

Interesting, that's not been my experience! Maybe you've got the list of things to allow/block just right. While testing different policies I've frequently seen Opus 4.6 go absolutely nuts trying to get past a block, unless I made it more clear what was happening.

Yeah I think for general use the transparency of what your thing does is really great compared to a pile of TypeScript and whatnot.

gbrindisi 2 hours ago||

ah I also did my own sandbox and at least twice the agent inside tried really hard to go around the firewall, so I ended up intercepting calls to `connect` to return a message that says "Connection refused by the sandbox, don't try to bypass".

Code here: https://github.com/gbrindisi/agentbox

davidcann 11 hours ago||

I made a native macOS app with a GUI for sandbox-exec, plus a network sandbox with per-domain filtering and secrets detection: https://multitui.com/

brutuscat 3 hours ago||

What do you think of sandbox-exec being marked as deprecated?

https://news.ycombinator.com/item?id=31973232

https://github.com/openai/codex/issues/215

tl2do 13 hours ago||

Intriguing, but...

Around last summer (July–August 2025), I desperately needed a sandbox like this. I had multiple disasters with Claude Code and other early AI models. The worst was when Claude Code did a hard git revert to restore a single file, which wiped out ~1000 lines of development work across multiple files.

But now, as of March 2026, at least in my experience, agents have become more reliable. With proper guardrails in claude.md and built-in safety measures, I haven't had a major incident in about 3 months.

That said, layering multiple safeguards is always recommended—your software assets are your assets. I'd still recommend using something like this. But things are changing, bit by bit.

e1g 13 hours ago||

No doubt they are getting better, but even a 0.1% chance of “rm -rf” makes it a question of “when” not “if”. And we sure spin that roulette a lot these days. Safehouse makes that 0%, which is categorically different.

Also, I don’t want it to be even theoretically possible for some file in node_modules to inject instructions to send my dotfiles to China.

bilalq 13 hours ago|||

Look into git reflog. If the changes were committed, it was almost certainly possible to still restore them, even if the commit is no longer in your branch.

ZYbCRq22HbJ2y7 12 hours ago||

There are probably other tools like this that keep version history based on filesystem events, independent from the project's git repository

https://www.jetbrains.com/help/idea/local-history.html

jeremyjh 13 hours ago||

Prompt injection attacks are very much a thing. It doesn't matter how good the agent is, its vulnerable, and you don't know what you don't know.

ramoz 13 hours ago||

Where are we at with SOTA or reliable prompt injection detection mechanisms?

w10-1 6 hours ago||

But... why not just run macOS in a VM?

If/since AI agents work continuously, it seems like running macOS in a VM (via the virtualization framework directly) is the most secure solution and requires a lot less verification than any sandboxing script. (Critical feature: no access to my keychain.)

AI agents are not at all like container deploys which come and go with sub-second speed, and need to be small enough that you can run many at a time. (If you're running local inference, that's the primary resource hog.)

I'm not too worried about multiple agents in the same vm stepping on each other. I give them different work-trees or directory trees; if they step over 1% of the time, it's not a risk to the bare-metal system.

Not sure if I'm missing something...

sunnybeetroot 6 hours ago|

1 limitation is Apple Virtualisation does not offer USB passthrough for connecting to iPhones for iOS development.

jeff_antseed 4 hours ago||

the macOS-only constraint is the biggest blocker for us. most of our agents run on linux VMs and there's basically nothing equivalent -- you end up choosing between full docker isolation (heavy) or just... not sandboxing at all and hoping.

been watching microsandbox but its pretty early. landlock is the linux kernel primitive that could theoretically enable something like this but nobody's built the nice policy layer on top yet.

curious if anyone has a good solution for the "agent running on a remote linux server" case. the threat model is a bit different anyway (no iMessage/keychain to protect) but filesystem and network containment still matter a lot

carderne 3 hours ago||

There is sandbox-runtime [1] from Anthropic that uses bubblewrap to sandbox on Linux (and works the same as OP on macOS). You can look at the code to see how it uses it. Anthropic's tool only support read blacklist, not a whitelist, so I forked it yesterday to support that [2].

[1] https://github.com/anthropic-experimental/sandbox-runtime [2] https://github.com/carderne/sandbox-runtime

edf13 4 hours ago||

We are a different approach and are targeting Linux for our first release (Windows & Mac shortly afterwards).

Taking more of an automated supervisor approach with limited manual approval for edge cases.

Grith.ai

hsaliak 9 hours ago||

This is a very nice and clean implementation. Related to this - I've been exploring injecting landlock and seccomp profiles directly into the elf binary, so that applications that are backed by some LLM, but want to 'do the right thing' can lock themselves out. This ships a custom process loader (that reads the .sandbox section) and applies the policies, not unlike bubblewrap which uses namespaces). The loading can be pushed to a kernel module in the future.

https://github.com/hsaliak/sacre_bleu very rough around the edges, but it works. In the past there were apps that either behaved well, or had malicious intent, but with these LLM backed apps, you are going to see apps that want to behave well, but cannot guarantee it. We are going to see a lot of experimentation in this space until the UX settles!

garganzol 14 hours ago||

While we have `sandbox-exec` in macOS, we still don't have a proper Docker for macOS. Instead, the current Docker runs on macOS as a Linux VM which is useful but only as a Linux machine goes.

Having real macOS Docker would solve the problem this project solves, and 1001 other problems.

egorfine 2 hours ago||

> Having real macOS Docker would solve the problem

I'm very slowly working on a mock docker implementation for macOS that uses ephemeral VM to launch a true guest macOS and perform commands as per Dockerfile/copies files/etc. I use it internally for builds. No public repo yet though. Not sure if there is demand.

mkagenius 14 hours ago|||

Apple containers were released a few months back. Been using it to sandbox claude/gemini-cli generated code[1].

You can use it to completely sandbox claude code too.

1. Coderunner - https://github.com/instavm/coderunner

arianvanp 14 hours ago||

That is also Linux VM on MacOS. They're not MacOS containers.. So it's completely pointless / useless for MacOS or iOS development

mkagenius 14 hours ago||

Oh, yes. I thought GP was mostly worried about shared VM problem.

hrmtst93837 5 hours ago|||

If you expect macOS to behave like Linux, you are asking the wrong OS to do the job. Docker and runtimes like runc depend on Linux kernel primitives such as namespaces and cgroups that XNU does not provide, and macOS adds System Integrity Protection, TCC, signed system frameworks, and launchd behaviors that make sharing the host kernel for arbitrary workloads technically hard and legally messy.

A practical path is ephemeral macOS VMs using Apple's Virtualization.framework coupled with APFS copy-on-write clones for fast provisioning, or limited per-process isolation via seatbelt and the hardened runtime, which respects Apple's licensing that restricts macOS VMs to Apple hardware and gives strong isolation at the cost of higher RAM and storage overhead compared with Linux containers.

dpe82 14 hours ago|||

Nitpick, which probably doesn't matter too much in this context but is always good to remember: Docker containers are not security boundaries.

PlasmaPower 14 hours ago|||

Why not? They're definitely not perfect security boundaries, but neither are VMs. I think containers provide a reasonable security/usability tradeoff for a lot of use cases including agents. The primary concern is kernel vulnerabilities, but if you're keeping your kernel up-to-date it's still imo a good security layer. I definitely wouldn't intentionally run malware in it, but it requires an exploit in software with a lot of eyes on it to break out of.

dpe82 5 hours ago||

It's certainly better than nothing. Hence "probably doesn't matter too much in this context" - but of course it always matters what your threat model is. Your own agents under your control with aligned models and not interacting with attacker data? Should be fine.

But too many people just automatically equate docker with strong secure isolation and... well, it can be, sometimes, depending a hundred other variables. Thus the reminder; to foster conversations like this.

fredoliveira 14 hours ago|||

counter-intuitively, the fact that docker on the mac requires a linux-based VM makes it safer than it otherwise would be. But your point stands in general, of course.

PufPufPuf 14 hours ago||

What would native containers bring over Linux ones? The performance of VZ emulation is good, existing tools have great UX, and using a virtualized kernel is a bit safer anyways. I regularly use a Lima VM as a VSCode remote workspace to run yolo agents in.

qalmakka 5 hours ago|||

> What would native containers bring over Linux ones?

What would a Phillips screwdriver bring over a flathead screwdriver? Sometimes you don't want/need the flathead screwdriver, simple as that. There are macOS-specific jobs you need to run in macOS, such as xcode toolchains etc. You can try cross compiling, but it's a pain and ridiculous given that 100% of every other OS supports containers natively (including windows). It's clear to me that Apple is trying to make the ratio jobs/#MacMinis as small as possible

garganzol 14 hours ago||||

Sometimes you just have to run native software. In my case, that means macOS build agents using Xcode and Apple toolchains which are only available on macOS.

It's not a pleasure to run them in a mutable environment where everything has a floating state as I do now. Native Docker for macOS would totally solve that.

hirvi74 14 hours ago|||

VZ has been exceptional for me. I have been running headless VMs with Lima and VZ for a while now with absolutely zero problems. I just mount a directory I want Claude Code to be able to see and nothing more.

abhisek 7 hours ago|

I think this is the right approach to building sandbox for agents ie. over existing OS native sandbox capabilities so that they are truly enforced.

However the challenge is, sandbox profiles (rules) are always workload specific. How do you define “least privilege” for a workload and then enforce it through the sandbox.

Which is why general sandboxes wont be useful or even feasible. The value is observing and probably auto-generating baseline policy for a given workload.

Wrong or overly relaxed policies would make sandbox ineffective against real threats it is expected to protect against.

More comments...