The agent harness belongs outside the sandbox

Posted by shad42 15 hours ago

The agent harness belongs outside the sandbox(www.mendral.com)

113 points | 83 commentspage 2

afshinmeh 7 hours ago|

Agreed and it's a pattern that OpenAI suggested a few days ago, too [1]. I also built a cross platform process level sandboxing that uses parts of OpenAI Codex for the same purpose [2]

[1] https://openai.com/index/the-next-evolution-of-the-agents-sd...

[2] https://github.com/afshinm/zerobox

NJL3000 13 hours ago||

Two points:

-What remains unsolved is what should an Agent reasonably have access to in what context and for how long (etc).

Probabilistic code that can run far faster than human driven code, we don’t have a great model yet. We all should spend our energy there…

- Separating / putting controls on the FS resource is no different than putting the agent behind a firewall / allow-deny list.

It doesn’t invalidate running a sandbox in a sandbox to have better security.

filipeisho 2 hours ago||

Everything you guys write is absolute fire. I could not agree more :)

nvader 12 hours ago||

Hey aluzzardi, thanks for sharing this article!

I'm really intrigued by your point on read-memory vs a dedicated read interface, because it is a real insight about success rates in harness design.

How did you come to the conclusion you did? Could you speak a little to the evaluations you ran, or the data or anecdotes you collected to validate that decision?

I'm also curious about the overall framing of the question, which I'll challenge with, does the agent have to have a where?

An agent could be modeled by a set of states and transitions. I don't think that there's anything inherently necessary about the current "one process claude" approach for harnesses, other than convenience. Why hasn't a fully distributed harness, built on functions and tables, gained more mindshare?

jFriedensreich 7 hours ago||

The title is highly misleading, they mean the harness belongs outside the sandbox the agent is working in. Please run also the harness in a sandbox, i don't think any of them is safe for a host. This is also the only valuable info in this marketing noise article. The rest is full of hidden endorsements of VC buddies (why would anyone build on closed source sandbox abstractions and claim alternatives need 1s to boot) And signs they cannot reason logically. (eg. Need shared files across users > the only two options are building distr. filesystem or storing in a database. Later admitting even the database solution needs a last write wins resolution on top and completely ignoring they could just as well have delegated shared writes to a authoritative file server with retry on conflicts and git.)

deevus 7 hours ago||

I’ve been working on a sandboxing tool that uses Incus. Originally it was only to run LLMs inside a sandbox, but recently I added MCP so that an agent could spin one up and do work that way.

It currently only exposes a rudimentary set of tools which I’d like to expand. The sandboxes created by MCP are generally ephemeral. The daemon will clean them up after an hour of no usage.

But it’s so cool that they get their own IP and you can ssh straight in. I can see that being very useful when you want to share with a colleague and then close your laptop (assuming it’s running on a remote instance).

https://github.com/deevus/pixels

vursekar 12 hours ago||

> Three engineers trigger the agent on the same incident, and they all see stale state until their sessions end. Conflict resolution, eventual consistency, cache invalidation.

Arguably this is a feature not a bug. Conflict resolution forces the need for a process to come to agreement on a common source of truth - one of the reasons why most Git repos don’t allow users to push to main directly. Writing directly to a shared memory database seems like it would result in chaos and a host of side effects once the number of users scales.

lwansbrough 12 hours ago||

I had an idea that devs could build wasm modules that would define tools and instructions, and a harness could load them. Kind of like MCP but with certain assurances about the sandboxing. You could build a package manager around these behaviours.

I still kind of think it’s a decent idea but it’s too close to MCP with drawbacks that make it a harder sell than MCP. It’s hard to compete on functionality from a secure sandbox if users decide they don’t care about security.

qudat 11 hours ago||

Interesting idea. Tangentially related I’ve been using my local agent to interact with remote shells via zmx, described here: https://bower.sh/zmx-ai-portal

The use case is different but this article strikes some vague similarities around an agent API to remotely execute commands.

Koffiepoeder 14 hours ago|

Slightly related: I am looking for:

- Easy single command CLI agent spawning with templates

- Automatic context transfer (i. e. a bit like git worktrees)

- Fully containerised, but remote (a bit like pods)

- Central, mitm-proxy zero trust authn/authz management (no keys or credentials inside the agents), rather enrichment in the hypervisor/encapsulation

- Multi agent follow-up functionalities

- Fully self hosted/FOSS

Basically a very dev-friendly, secure, "kubernetes"-like solution for running remote agents.

Anyone has an idea of how to achieve this or potential technologies?

nvader 12 hours ago|

Yeah, have you tried `mngr` by Imbue? It seems to have a bunch of the features you're looking for.

https://github.com/imbue-ai/mngr

More comments...