The agent harness belongs outside the sandbox

Posted by shad42 14 hours ago

The agent harness belongs outside the sandbox(www.mendral.com)

113 points | 83 comments

zmmmmm 12 hours ago|

I think it omits the real reason I want to run the harness in the sandbox: I barely trust the harness more than the LLM, at least at this point in time. They are so rapidly evolving along with the underlying models, that I don't think they are a reasonable component to rely on to provide safety constraints. Put more precisely: if your harness has an ability to do something the LLM can't, and it has a set of conditions under which the LLM can cause those to be invoked, you have to assume the LLM will work out those conditions and execute them. Effectively you have an arm of the lethal trifecta and pretending otherwise is more dangerous than helpful.

Having said that, some components need to live outside the sandbox (otherwise, who creates the sandbox?). Longer term, I see it as a dedicated security layer, not part of the harness. This probably has yet to emerge fully but it's more like a hypervisor type layer that sits outside of everything and authorises access based on context, human user, etc and can apply policy including mediate the human intervention for decision points when needed.

angry_octet 10 hours ago||

I don't trust the harness, and I especially don't trust that the LLM won't be able to subvert the harness, or trick me via the harness. I assume that the LLM will be able to leak any secret in the harness context to arbitrary internet destinations, or somehow encode the secret in a work product. Eg space characters at the end of lines encoding access tokens.

Having the harness in one VM, and tool use applied to user data in another, is about as safe as you can be at present. You can mount filesystem fragments from the data VM into the harness VM, but tool execution remains painful.

Having all authorisation and access control exist outside of the harness layer is essential. It should only have narrowly scoped and time limited credentials that are bound to its IP, and even then that is problematic.

bauerd 1 hour ago|||

>Having said that, some components need to live outside the sandbox (otherwise, who creates the sandbox?).

I run a single-node k3d cluster on each of my MacBooks which uses Agent Sandbox[0] to keep harnesses isolated. Harnesses access models through LiteLLM only. I have aliases for `kubectl exec`ing into whatever harness I need.

[0] https://agent-sandbox.sigs.k8s.io

TeMPOraL 3 hours ago|||

> Effectively you have an arm of the lethal trifecta and pretending otherwise is more dangerous than helpful.

"Lethal trifecta" is basically describing phishing but in a way more palatable to people who would rather die before allowing themselves to anthropomorphize LLMs even a little bit. It's not a problem you can fix with better coding, like some SQL injection. You can only manage risk around it (for which sandboxing is one of many solutions that can help).

So on one hand, I agree with you - you need to be mindful of what you're actually dealing with. On the other hand, you always have this, and need this, for the agent to be able to do anything useful.

aluzzardi 10 hours ago|||

Author here.

I should have made it more clear that the article is about agent / harness building (not about running third party agents).

> I barely trust the harness more than the LLM

Since we built it, I trust it just as much as I trust our API server :)

The latter gets untrusted inputs from the internet, while the former gets untrusted inputs from the LLM

gmerc 7 hours ago|||

The LLM has harness control in claude ;) “Let me switch off the sandbox and try again”

tantalor 7 hours ago||

> if your harness has an ability to do something the LLM can't

What does this even mean. The only capability of an LLM is generate text.

jbstack 3 hours ago|||

The LLM can only generate text. The harness can do more than just generate text. By joining the two you're allowing the LLM (through text) to carry out whatever actions the harness can take.

My brain can only generate electrical signals. My hand responds to electrical signals and can interact with the real world. The two together can do more than just what my brain alone can do.

If you don't trust a particular brain, don't put a gun in the hand which is connected to it. If you don't trust a LLM, don't connect it to a harness which has access to your production database and only recent backups (https://www.theregister.com/2026/04/27/cursoropus_agent_snuf...).

girvo 2 hours ago|||

We’ve trained models on JSON schemas for “tool calls”, and then built software to interpret and run those calls for the LLMs

jdw64 12 hours ago||

Personally, I find it fascinating to watch how, whenever a new technology appears, people start competing to define and own its standards.

Manus rebuilt its harness five times in six months. The model stayed the same, but the architecture changed five times.

LangChain re-architected Deep Research four times in one year.

Anthropic also ripped out Claude Code’s agent harness whenever the model improved.

Ever since Mitchell Hashimoto mentioned the harness in February, people have been trying to claim that concept. Eventually, someone will probably sell a book called Harness Engineering. I will buy it, of course. Then I will write a blog post about it that nobody reads, with a link that will be buried under ShowDead as soon as I submit it to HN.

And by that point, IT companies will start asking:

“You’re a new grad, right? You know harness engineering, don’t you?”

aluzzardi 11 hours ago||

Author here.

In my opinion, the main driver here is how fast models have evolved in the past 12 months. It makes the architecture of everything around them obsolete, very fast.

We went from using models as a building block, wrapping them in heavy workflow code, to now models being smart enough to drive their own workflows and planning.

jdw64 11 hours ago||

Really enjoyed your post, by the way. The idea of putting skills and memories in a database while keeping the file shaped interface for the agent is clean. One read/write surface, two backends, invisible to the modle that's a nice piece of design, and the candor in the "what's still hard" section made me trust the rest of the post. My comment above was meant as a joke, not about your architecture. If this pattern becomes the standard, I'll happily migrate my workflow again.

One thing I wonder about is whether path routing alone is enough.

If `/workspace` goes to the sandbox and `/memory` or `/skills` goes to the database, the path tells you where to send the request. But it does not tell you whether this user, session, or agent is allowed to access it.

When I built something similar with an MCP filesystem, I found that I needed a scope check before actually running the operation. In my case, I was using GPT dev mode through a Cloudflare tunnel to control my local environment/model, so this kind of boundary became important.

So I like the path-routing idea, but I wonder if it eventually needs a scope or permission layer as well.

aluzzardi 4 hours ago||

Thank you, appreciate it!

Regarding scoping: In our case, the agent loop runs in the same way as our API server does (as in, it’s a multi tenant service running in a container somewhere). And we solve scoping in the same way.

To put it in other words, whether it’s the API receiving “GET /memories/id” or the LLM requesting “Read(/memories/id)” we do pretty much the same thing (check authN/authZ, scope the db request, etc).

Basically the LLM is just another API client using a slightly different format for inputs and outputs, but sharing the same permission layer.

tokioyoyo 11 hours ago|||

Just wait 6 months for something new to come up and everyone will forget about harnesses.

TeMPOraL 12 hours ago||

> Ever since Mitchell Hashimoto mentioned the harness in February

What. The idea is as old as anyone can remember, and wrt. LLMs, it was known to be important since at least as early as ChatGPT being first released.

jdw64 11 hours ago||

Yes, the concept itself is not new. Around 2022, people would usually have called it the orchestration layer.

But I think the term started being used closer to its current meaning around this point:

https://www.softwareimprovementgroup.com/blog/what-is-harnes...

In a way, the sequence was something like:

prompt engineering(23~4) -> context engineering(25) ->harness engineering(26)

At first, it was mostly understood as a correction or extension of prompt engineering. But the idea of “harness” as the layer that corrects, constrains, and operationalizes agents seems to have emerged much more clearly around 2026.

So yes, there is definitely some terminological confusion in the early phase. That is normal. New technical fields often begin with several competing names for almost the same layer, and only later does one term become stable.

redanddead 9 hours ago|||

My 2c:

The word harness brings the truth of LLMs back down to Earth.

it really felt like between 2018 and 2022ish like LLMs had this magical aura, like the orchestration layer was intelligent, maybe even recursive, beyond what simple functions could do. It was assumed that this was a solved problem. The word "orchestration" denoted it, the words we used were full of optimism. When you lift the veil, it really is just regex, and cool tricks sure, but it's a harness it's a utility, there's no magic here, there's realism.

Maybe the labs even had a part to play in this as well; attempting to make themselves look magical. I mean just look at the choice of name for "Mythos", it's about bringing back that feeling of myth and magic after we saw under the veil.

The reality is that the labs have produced magical models yes, but are locking them into ecosystems that leave a lot to be desired, and are easily reproducible, and essentially are cron jobs, regex.. things we've seen in traditional cloud for decades. It feels like an attempt to create a moat where there is none.

Maybe I'm wrong but this has been my impression

TeMPOraL 3 hours ago||

There were no LLMs between 2018 and 2022, at least not in the sense resembling today. The whole LLM frenzy started in late 2022.

magicalist 10 hours ago|||

Harness itself was a widely used term by at least the "[LLM] plays pokemon" trend, which was a year ago[1]. That was basically the term of art to use when arguing about just how much special treatment LLMs should get.

"harness engineering" is the term claimed by that article to have originated in February. It does seem obvious in retrospect and I don't remember an origination point, but there's at least one hn comment predating that in December[2] and it doesn't treat it as novel.

I will admit that my bias is against any self congratulatory buzzword fads (I'm still not over "MCP is the USB of LLMs" or whatever and that's been a year now too). "Who coined the term harness engineering?" -> who cares? It was already widely being done.

[1] https://www.lesswrong.com/posts/7mqp8uRnnPdbBzJZE/is-gemini-...

[2] https://news.ycombinator.com/item?id=46331242

jdw64 10 hours ago||

I read your comment. I think we may be talking about slightly different contexts.

The Pokémon article you linked is basically about benchmarking. In that context, the harness functions as part of the benchmark setup: the controlled environment around the model, the available inputs, tools, and assistance.

The current usage of “harness,” at least in the agent engineering discussion, seems closer to a lower-level runtime layer, almost like an OS around the agent.

So I see this as a transition: from “harness” as a narrower benchmark/control-variable layer to “harness” as the broader operating environment of the agent.

That does not mean I think your point is wrong. With topics like this, the interpretation depends on which part of the lineage one emphasizes. The first appearance of the idea may go back to 2022 or earlier, while the usage that looks closer to the current meaning may have emerged at a different point.

I am probably giving more weight to the SIG article, while you are giving more weight to a different point in the lineage. Both seem reasonable to me.

tptacek 11 hours ago||

There are other models. Eschew the sandbox. Give the agent a computer, with all the trimmings, but keep that computer segregated from sensitive resources. Tokens are a solved problem: tokenize them[1] or do something equivalent with a proxy. The same thing goes for secrets.

A lot of this post presents false dichotomies. It assumes the existence of a sandbox that is by definition ephemeral or "cattle-like". Why? There are reasons to do that and reasons not to do that. You can have a durable computer with a network identity and full connectivity, and you can have that computer spin down and stop billing when not in use.

There are a zillion different shapes for addressing these problems, and I'm twitchy because I think people are super path-dependent right now, and it's causing them to miss a lot of valuable options.

[1]: https://fly.io/blog/tokenized-tokens/ (I work at Fly.io but the thing this post talks about is open source).

existencebox 7 hours ago||

I'd argue you are still using a sandbox, just at a higher ring (outside the machine/VM) and relaying on app/resource level permissions on each of your external resources to enforce it, which requires _all_ of those external systems to be hardened vs. the agent host itself. The capabilities a full machine has for exploring and exploiting external, ostensibly secured systems, has already been touched on via incidents like the anthropic internal model jailbreak. [0]

Giving the whole machine also doesn't answer the question for how the agent can hook into actions that eventually require more perms, and even if you "airgap" those via things like output queues that humans need to approve, that still feels "harnessey" to me.

I feel a bit guilty of debating semantics here, especially as I can't/don't intend to convey any confidence in a "right answer", but my reason for being pedantic is that I do think there are interesting tradeoffs between "P(jailbreak or unexpected capability use|time)" and "increasing power/available capability set", as well as interesting primitives emerging in terms of the components you'd need regardless of where you drew that line (ala paragraph 2.)

[0] - https://www-cdn.anthropic.com/3edfc1a7f947aa81841cf88305cb51... (specifically section 5.5.2.4)

tptacek 7 hours ago||

The post is explicit about what they mean by sandboxing and what the tradeoffs are for the model they're discussing.

existencebox 6 hours ago||

I've reread it and I stand by my statements that it's an isomorphism, simply replace "container" with "machine AAD/auth-system boundaries" in your example.

The "Your credentials stay out of the sandbox" problem, to quote them, is what I see your "require your perms system to enforce it" as implicitly solving for.

(Their "sandbox as cattle" discussion had less bearing on the "which pattern" question to me, since I tend to treat most parts of my agent stack as cattle-like, potentially out of a bias towards that architecture broadly, as I find it's much easier to reason about when as much as possible is disposable/idempotent/eventually consistent. The durable execution point also assumed aspects of the agent scaffold ala prompts don't have to be turned over in deploy, or conversely, can't finish their tasks and then migrate incrementally, and while I might cynically raise an eyebrow at the focus on 25ms for sandbox calls given the dev loops I currently experience, I'd argue there are other ways to solve that problem in both an in or outside of container sandbox pattern.)

I'd even agree with their final point "Consistency is the part we haven't answered" but in a different angle than they intended, as to why my focus was on "how do you _constrain_ agent behavior" since that has been, in my experience, the biggest bottleneck to letting agents do more.

ramraj07 1 hour ago|||

> It assumes the existence of a sandbox that is by definition ephemeral or "cattle-like". Why?

Because the moment you use k8s, you have to assume that, apparently. Or so Im told by all the infrastructure people I speak with. Getting these pods to not disappear just because one process ran out of memory has been an herculean task.

I wish our standard deploy processes produce durable computers that dont break our bank but that hasn't been an easy requirement with simple infra teams.

aluzzardi 10 hours ago|||

Author here.

This is an interesting and novel field, so I’m not pretending I know the answers, but this is what worked for us :)

At the end of the day, and oversimplifying things: why would I want to spawn a for loop that calls an API (LLM) into its own dedicated sandbox/computer?

When the model wants to run a command, it’ll tell you so. Doesn’t need to be a local exec, you can run it anywhere, the model won’t know the difference.

The agent loop itself doesn’t need sandboxing. In many cases, most tool calls don’t require sandboxing either. For the tools that do require a computer, you can route those requests there when needed, rather than running the whole software in that sandbox.

To me running the agent loop in the sandbox itself feels like “you should run your API in your DB container because it’ll talk to it at some point”.

afshinmeh 5 hours ago||

I wonder though, what about cases where you have multiple agents or LLM backends and the credentials is shared between all of them?

nrengan 5 hours ago||

[flagged]

nvader 10 hours ago|||

I'm also very excited by the different shapes for solving problems in this space. A little worried that the path dependence is ACTUALLY a bit warranted since "popular harness engineering is just claude-wrapping" is a bit of a self-fulfilling prophecy today.

I've heard many claims that because LLMs are tuned to specific harnesses, we should expect worse performance with novel architectures. That seems to make people reluctant to try to put effort into inventing them.

aluzzardi 10 hours ago||

Author here.

I’m worried about the same (models tuned for specific harnesses).

We actually work around that by respecting the “contract”. For instance, our harness’ Bash signature is exactly the same as Claude’s. We do our sandboxing stuff and respond using the same format.

In the “eyes” of the model there’s no difference between what Claude does and what we do (even though the implementation is completely different).

We basically use Claude’s tools as API contract

isoprophlex 4 hours ago|||

Wow, thanks for writing this up.

I'm building an agent sandboxing system for a client atm, and was about to start working on a system of ephemeral, short lived, derived secrets for the agent to use.

Lots of great thoughts to steal in this piece. Thanks again.

stavros 9 hours ago||

I agree with the argument that there are many more than two ways to do this. When I built my AI assistant (https://stavrobot.stavros.io/), for example, I implemented an architecture that has both the ways detailed in the post. The harness runs simultaneously both inside and outside the container (I didn't want the harness to touch the system, and I didn't want LLM-generated code to touch the harness).

It's all tradeoffs, and picking the ones that work for what you want to do is what architecture is. The more informed you are about the tradeoffs, the better you can make your architecture.

zbyforgotp 11 minutes ago||

You need supporting environment on both sides of the sandbox.

MrDarcy 11 hours ago||

> A lot of what an agent does doesn't need a sandbox at all: thinking, calling APIs, summarizing, waiting for CI.

I don’t get it. Calling an API requires a sandbox in most cases. The others could be abused in service of an un-sandboxed agent with API access.

If the harness is outside the sandbox then it’s just an ambiguous and confusing security model and boundary.

nvader 10 hours ago||

> Calling an API requires a sandbox in most cases.

I'm not following why this would this be the case? The purpose of calling the API is to get data or effect a state transition on some remote service, but I don't follow why the originating machine matters.

Or is your objection about auth?

MrDarcy 7 hours ago||

The purpose of a sandbox is to control the interface between inside and outside of the sandbox. If you put the harness on the outside and connect it to a model and to an API then there’s no point in the sandbox. You don’t have any control over the interface.

aluzzardi 11 hours ago|||

Author here.

I think the confusion is that “agent” is used for two very different things:

- building an agent

- an “agent” product/runtime (Claude Code, etc)

In the first case, the model never executes anything. It just outputs something like “call this API”. Your code is the one doing it, with whatever validation you want. There’s no need for a sandbox there because there’s no arbitrary execution.

MrDarcy 7 hours ago||

I can see that. It also seems like the first quickly evolves into the second.

shad42 11 hours ago||

No, for example a tool call calling an API. So the llm does not have access to the API keys, the tool does. For example an API call that fetches some data remotely and return it to the llm. You don’t need a sandbox for it. It’s faster and more efficient to keep this out of the sandbox.

skybrian 12 hours ago||

They didn't make a clear argument in favor of that architecture and I'm not really convinced.

On exe.dev the agent (Shelley) runs in a Linux VM, which is the security boundary. All the conversations are saved to a sqlite database, and it knows how to read it, so you can refer to a previous conversation in the database. It's also handy for asking the AI to do random sysadmin stuff, since it can use sudo.

A downside is that there's nowhere in the VM where secrets are safe from possibly getting exfiltrated via an injection attack. But they have "integrations" where you can put secrets into an http proxy server instead of having them locally.

Also, you don't need to use AI at all. You can use the VM as a VM.

ramraj07 1 hour ago|

No matter how smart you think you get, I personally dont trust the models in an environment where they can read the secrets one way or another, in any high volume production environment.

saltcured 13 hours ago||

Sure, the experimental, agentically-developed code should be tested in a sandbox. This sandbox should contain the damage of the code execution when it goes wrong.

But shouldn't there really be another sandbox where the agentic tool calls execute? This is to contain the damage of the tool execution when it goes wrong.

And, the agent harness itself should either implement or be contained in a third sandbox, which should contain the damage of the agent. There should be a firewall layer to limit what tool requests the agent can even make. This is to contain the damage of the agent when it formulates inappropriate requests.

The agent also should not possess credentials, so it cannot leak them to the LLM and allow them to be transformed into other content that might leak out via covert channels.

shad42 13 hours ago||

Yes, it's also because the agent described in the post is doing some operations on the user code (fix CI pipelines, rerun tests, fix them, etc...). So another big reason to use the sandbox is to run things like bash on a user code. you don't want credentials or anything trusted inside that sandbox, including the LLM api key.

aluzzardi 13 hours ago||

Author here. Depending on how it’s designed, the harness itself doesn’t need any sandboxing.

At the end of the day, it’s a “simple” loop that calls an external API (LLM) and receives requests to execute stuff on its behalf.

It’s not the agent running bash commands: you (the harness author) are, and you’re in full control of where and how those commands get executed.

In the article’s case, bash commands are forwarded to a sandbox, nothing ever runs on the harness itself (it physically can’t, local execution is not even implemented in the harness).

Weryj 2 hours ago||

Agreed, this is exactly what we do.

There's no harm in a string, only in the execution.

I create Tools as Actors, which you preconfigured for the LLM context (in-house agent loop). The tools being preconfigured means you setup their environment before they can be executed. If it calls a bash tool for instance, the Tool Actor gets called and then it runs that command against an attached remote VM.

Or filesystem operations, are just read/writes inside a .zip file, which is overlayed onto the target project at build time.

This article is spot on, and I probably say that because it's self reinforcing.

trjordan 13 hours ago||

Nah. Worse is better.

The reason agents work is because they have access to stuff by default. The whole world is context engineering at this point, and this proposal is to intermediate the context with a bespoke access layer. I put the bare minimum into getting my dev instance into a state where I can develop, because doing stuff (and these days: getting my agent to do stuff) is the goal.

This makes slightly more sense if you're building a SaaS and trying to get others to give you access to their code, their documents, and the rest so you can run agents against it. But the easiest, most powerful way is to just hook the agents up to the place that's already set up.

ossa-ma 12 hours ago|

They are building exactly what you described and this is their architectural solution to ensuring their YOLO agents do not nuke their customers code/documents/databases by sandboxing everything in the workspace — the git checkout the agent is working on, plus whatever's needed to run commands against it (compilers, package managers, etc.).

spankalee 12 hours ago|

This is angling in the right direction, but I think it has two problems:

1) It's still assuming agents have CLIs. This is a very developer-centric concept of agents, and doesn't map well to either consumer or enterprise agents that aren't primarily working with files. Skills, plans, TODO lists, and memory are good, but don't have to be modeled as raw file access. Many harnesses have tools for them.

2) It's talking about a singular sandbox. That's not good enough for prompt injection prevention, secure credential management, and limiting the blast radius of attacks.

ramraj07 1 hour ago|

For 1, the general thinking is that companies like these perform the job of abstracting the CLI complexity in their application while the harness presented to the llm can be independently as suave as needed for it.

More comments...