Posted by atombender 15 hours ago
Solution could be two parts:
OS bringing better and easier to use OS limitations (more granular permissions; install time options and defaults which will be visible to user right there and user can reject that with choices like:
- “ask later”
- “no”
- “fuck no”
with eli5 level GUIs (and well documented). Hell, a lot of these are already solved for mobile OS. While not taking away tools away from hands of the user who wants to go inside and open things up (with clear intention and effort; without having to notarise some shit or pay someone).
2. Then apps[1] having to, forced to, adhere to use those or never getting installed.
[1] So no treating of agents as some “other” kinds of apps. Just limit it for every app (unless user explicitly decides to open things up).
It will also be a great time to nuke the despicable mess like Electron Helpers and shit and app devs considering it completely fine to install a trillion other “things” when user installed just one app without explaining it in the beginning (and hence forced to keep their apps’ tentacles simple and limited)
For those using pi, I've built something similar[1] that works on macOS+Linux, using sandbox-exec/bubblewrap. Only benefit over OP is that there's some UX for temporarilily/permanently bypassing blocks.
Big love for Pi - it was the first integration I added to Safehouse. I wanted something that offers strong guarantees across all agents (I test and write them nonstop), has no dependencies (e.g., the Node runtime), and is easy to customize, so I didn't use the Anthropic sandbox-runtime.
Yeah I think for general use the transparency of what your thing does is really great compared to a pile of TypeScript and whatnot.
Code here: https://github.com/gbrindisi/agentbox
Around last summer (July–August 2025), I desperately needed a sandbox like this. I had multiple disasters with Claude Code and other early AI models. The worst was when Claude Code did a hard git revert to restore a single file, which wiped out ~1000 lines of development work across multiple files.
But now, as of March 2026, at least in my experience, agents have become more reliable. With proper guardrails in claude.md and built-in safety measures, I haven't had a major incident in about 3 months.
That said, layering multiple safeguards is always recommended—your software assets are your assets. I'd still recommend using something like this. But things are changing, bit by bit.
Also, I don’t want it to be even theoretically possible for some file in node_modules to inject instructions to send my dotfiles to China.
If/since AI agents work continuously, it seems like running macOS in a VM (via the virtualization framework directly) is the most secure solution and requires a lot less verification than any sandboxing script. (Critical feature: no access to my keychain.)
AI agents are not at all like container deploys which come and go with sub-second speed, and need to be small enough that you can run many at a time. (If you're running local inference, that's the primary resource hog.)
I'm not too worried about multiple agents in the same vm stepping on each other. I give them different work-trees or directory trees; if they step over 1% of the time, it's not a risk to the bare-metal system.
Not sure if I'm missing something...
been watching microsandbox but its pretty early. landlock is the linux kernel primitive that could theoretically enable something like this but nobody's built the nice policy layer on top yet.
curious if anyone has a good solution for the "agent running on a remote linux server" case. the threat model is a bit different anyway (no iMessage/keychain to protect) but filesystem and network containment still matter a lot
[1] https://github.com/anthropic-experimental/sandbox-runtime [2] https://github.com/carderne/sandbox-runtime
Taking more of an automated supervisor approach with limited manual approval for edge cases.
Grith.ai
https://github.com/hsaliak/sacre_bleu very rough around the edges, but it works. In the past there were apps that either behaved well, or had malicious intent, but with these LLM backed apps, you are going to see apps that want to behave well, but cannot guarantee it. We are going to see a lot of experimentation in this space until the UX settles!
Having real macOS Docker would solve the problem this project solves, and 1001 other problems.
I'm very slowly working on a mock docker implementation for macOS that uses ephemeral VM to launch a true guest macOS and perform commands as per Dockerfile/copies files/etc. I use it internally for builds. No public repo yet though. Not sure if there is demand.
You can use it to completely sandbox claude code too.
1. Coderunner - https://github.com/instavm/coderunner
A practical path is ephemeral macOS VMs using Apple's Virtualization.framework coupled with APFS copy-on-write clones for fast provisioning, or limited per-process isolation via seatbelt and the hardened runtime, which respects Apple's licensing that restricts macOS VMs to Apple hardware and gives strong isolation at the cost of higher RAM and storage overhead compared with Linux containers.
But too many people just automatically equate docker with strong secure isolation and... well, it can be, sometimes, depending a hundred other variables. Thus the reminder; to foster conversations like this.
What would a Phillips screwdriver bring over a flathead screwdriver? Sometimes you don't want/need the flathead screwdriver, simple as that. There are macOS-specific jobs you need to run in macOS, such as xcode toolchains etc. You can try cross compiling, but it's a pain and ridiculous given that 100% of every other OS supports containers natively (including windows). It's clear to me that Apple is trying to make the ratio jobs/#MacMinis as small as possible
It's not a pleasure to run them in a mutable environment where everything has a floating state as I do now. Native Docker for macOS would totally solve that.
However the challenge is, sandbox profiles (rules) are always workload specific. How do you define “least privilege” for a workload and then enforce it through the sandbox.
Which is why general sandboxes wont be useful or even feasible. The value is observing and probably auto-generating baseline policy for a given workload.
Wrong or overly relaxed policies would make sandbox ineffective against real threats it is expected to protect against.