Snowflake AI Escapes Sandbox and Executes Malware

Posted by ozgune 4 hours ago

Snowflake AI Escapes Sandbox and Executes Malware(www.promptarmor.com)

183 points | 53 comments

john_strinlai 3 hours ago|

typically, my first move is to read the affected company's own announcement. but, for who knows what misinformed reason, the advisory written by snowflake requires an account to read.

another prompt injection (shocked pikachu)

anyways, from reading this, i feel like they (snowflake) are misusing the term "sandbox". "Cortex, by default, can set a flag to trigger unsandboxed command execution." if the thing that is sandboxed can say "do this without the sandbox", it is not a sandbox.

jacquesm 2 hours ago||

I don't think prompt injection is a solvable problem. It wasn't solved with SQL until we started using parametrized queries and this is free form language. You won't see 'Bobby Tables' but you will see 'Ignore all previous instructions and ... payload ...'. Putting the instructions in the same stream as the data always ends in exactly the same way. I've seen a couple of instances of such 'surprises' by now and I'm more amazed that the people that put this kind of capability into their production or QA process keep being caught unawares. The attack surface is 'natural language' it doesn't get wider than that.

cousin_it 32 minutes ago|||

Yeah. Even more than that, I think "prompt injection" is just a fuzzy category. Imagine an AI that has been trained to be aligned. Some company uses it to process some data. The AI notices that the data contains CSAM. Should it speak up? If no, that's an alignment failure. If yes, that's data bleeding through to behavior; exactly the thing SQL was trying to prevent with parameterized queries. Pick your poison.

kevin_thibedeau 22 minutes ago|||

We need something like Perl's tainted strings to hinder sandbox escapes.

jcalx 3 hours ago|||

> Cortex, by default, can set a flag to trigger unsandboxed command execution

Easy fix: extend the proposal in RFC 3514 [0] to cover prompt injection, and then disallow command execution when the evil bit is 1.

[0] https://www.rfc-editor.org/rfc/rfc3514

wojciii 35 minutes ago|||

The evil bit solves so many problems. It needs to be mandatory!

kagi_2026 3 hours ago|||

[dead]

embedding-shape 3 hours ago||

Did you really get so salty by my comment (https://news.ycombinator.com/item?id=47423992) that now you just have to spam HN with the same? Suck it up and move on, healthier for everyone.

kagi_2029 2 hours ago||

[dead]

sam-cop-vimes 2 hours ago||

It's a concept of a sandbox.

throw0101d 2 hours ago||

Not the first time; From §3.1.4, "Safety-Aligned Data Composition":

> Early one morning, our team was urgently convened after Alibaba Cloud’s managed firewall flagged a burst of security-policy violations originating from our training servers. The alerts were severe and heterogeneous, including attempts to probe or access internal-network resources and traffic patterns consistent with cryptomining-related activity. We initially treated this as a conventional security incident (e.g., misconfigured egress controls or external compromise). […]

> […] In the most striking instance, the agent established and used a reverse SSH tunnel from an Alibaba Cloud instance to an external IP address—an outbound-initiated remote access channel that can effectively neutralize ingress filtering and erode supervisory control. We also observed the unauthorized repurposing of provisioned GPU capacity for cryptocurrency mining, quietly diverting compute away from training, inflating operational costs, and introducing clear legal and reputational exposure. Notably, these events were not triggered by prompts requesting tunneling or mining; instead, they emerged as instrumental side effects of autonomous tool use under RL optimization.

* https://arxiv.org/abs/2512.24873

One of Anthropic's models also 'turned evil' and tried to hide that fact from its observers:

* https://www.anthropic.com/research/emergent-misalignment-rew...

* https://time.com/7335746/ai-anthropic-claude-hack-evil/

parliament32 2 hours ago|

Fascinating read. What's curious though, is the claim in section 2.3.0.1:

> Each task runs in its own sandbox. If an agent crashes, gets stuck, or damages its files, the failure is contained within that sandbox and does not interfere with other tasks on the same machine. ROCK also restricts each sandbox’s network access with per-sandbox policies, limiting the impact of misbehaving or compromised agents.

How could any of the above (probing resources, SSH tunnels, etc) be possible in a sandbox with network egress controls?

jacquesm 2 hours ago||

Sandboxes are almost never perfect. There are always ways to smuggle data in or out, which is kind of logical: if they were perfect then there would be no result.

1718627440 35 minutes ago||

> if they were perfect then there would be no result.

You shutdown the sandbox and access the data from the outside.

RobRivera 3 hours ago||

If the user has access to a lever that enables accesss, that lever is not providing a sandbox.

I expected this to be about gaining os privileges.

They didn't create a sandbox. Poor security design all around

travisgriggs 2 hours ago||

Sandbox. Sandbagging.

Tomato, tomawto

kagi_2026 3 hours ago||

[flagged]

_verandaguy 3 hours ago||

"The sandbox isn't so bad, if the criticism you have is that it totally fails at doing the one thing a sandbox is supposed to do."

isoprophlex 1 hour ago||

Posit, axiomatically, that social engineering works.

That is, assume you can get people to run your code or leak their data through manipulating them. Maybe not always, but given enough perseverance definitely sometimes.

Why should we expect a sufficiently advanced language model to behave differently from humans? Bullshitting, tricking or slyly coercing people into doing what you want them to do is as old as time. It won't be any different now that we're building human language powered thinking machines.

jessfyi 2 hours ago||

A sandbox that can be toggled off is not a sandbox, this is simply more marketing/"critihype" to overstate the capability of their AI to distract from their poorly built product. The erroneous title doing all the heavy lifting here.

lokar 1 hour ago|

IMO, it's not even a sandbox, that's just a marketing lie.

This was internal restrictions in the code, that was bypassed. A sandbox needs to be something external to the code you are running, that you can't change from the inside.

bilekas 3 hours ago||

> Note: Cortex does not support ‘workspace trust’, a security convention first seen in code editors, since adopted by most agentic CLIs.

Am I crazy or does this mean it didn't really escape, it wasn't given any scope restrictions in the first place ?

dd82 3 hours ago||

not quite, from the article

>Cortex, by default, can set a flag to trigger unsandboxed command execution. The prompt injection manipulates the model to set the flag, allowing the malicious command to execute unsandboxed.

>This flag is intended to allow users to manually approve legitimate commands that require network access or access to files outside the sandbox.

>With the human-in-the-loop bypass from step 4, when the agent sets the flag to request execution outside the sandbox, the command immediately runs outside the sandbox, and the user is never prompted for consent.

scope restrictions are in place but are trivial to bypass

hrmtst93837 2 hours ago||

[dead]

prakashsunil 2 hours ago||

Author of LDP here [1].

The core issue seems to be that the security boundary lived inside the agent loop. If the model can request execution outside the sandbox, then the sandbox is not really an external boundary.

One design principle we explored in LDP is that constraints should be enforced outside the prompt/context layer — in the runtime, protocol, or approval layer — not by relying on the model to obey instructions.

Not a silver bullet, but I think that architectural distinction matters here.

[1] https://arxiv.org/abs/2603.08852

lokar 1 hour ago|

Yeah, this is not the meaning of "sandbox" I'm used to

eagerpace 3 hours ago||

Is this the new “gain of function” research?

saltcured 2 hours ago||

Isn't it more like "imaginary function"?

People keep imagining that you can tell an agent to police itself.

bigstrat2003 1 hour ago||

Yep the whole thing is retarded. You cannot trust that a non-deterministic program (i.e. an LLM) will ever do what you actually tell it to do. Letting those things loose on the command line is incredibly stupid, but people out there don't care because they think "it's the future!".

wojciii 32 minutes ago||

Shhh .. everyone want AI. Just let them.

The ones that don't understand technology will get burned by it. This is nothing new.

logicchains 3 hours ago||

That would be deliberately creating malicious AIs and trying to build better sandboxes for them.

octopoc 2 hours ago||

Imagine if you could physical disconnect your country from the internet, then drop malware like this on everyone else.

SoftTalker 2 hours ago||

Hard to do when services like Starlink exist.

Groxx 2 hours ago||

>Any shell commands were executed without triggering human approval as long as:

>(1) the unsafe commands were within a process substitution <() expression

>(2) the full command started with a ‘safe’ command (details below)

if you spend any time at all thinking about how to secure shell commands, how on earth do you not take into account the various ways of creating sub-processes?

1718627440 32 minutes ago|

Also policing by parsing shell code seems fundamentally flawed and error prune. You want the restrictions at the OS level, that way it is completely irrelevant how you invoke the syscalls.

Dshadowzh 2 hours ago|

CLI is quickly becoming the default entry point for agents. But data agents probably need a much stricter permission model than coding agents. Bash + CLI greatly expands what you can do beyond the native SQL capabilities of a data warehouse, which is powerful. But it also means data operations and credentials are now exposed to the shell environment.

So giving data agents rich tooling through a CLI is really a double-edged sword.

I went through the security guidance for the Snowflake Cortex Code CLI(https://docs.snowflake.com/en/user-guide/cortex-code/securit...), and the CLI itself does have some guardrails. But since this is a shared cloud environment, if a sandbox escape happens, could someone break out and access another user’s credentials? It is a broader system problem around permission caching, shell auditing, and sandbox isolation.

More comments...