Top
Best
New

Posted by takira 19 hours ago

Claude Cowork exfiltrates files(www.promptarmor.com)
733 points | 326 commentspage 2
xg15 5 hours ago|
Is it even prompt injection if the malicious instructions are in a file that is supposed to be read as instructions?

Seems to me the direct takeaway is pretty simple: Treat skill files as executable code; treat third-party skill files as third-party executable code, with all the usual security/trust implications.

I think the more interesting problem would be if you can get prompt injections done in "data" files - e.g. can you hide prompt injections inside PDFs or API responses that Claude legitimately has to access to perform the task?

lifetimerubyist 2 hours ago||
Instead of vibing out insecure features in a week using Claude Code can Anthropic spend some time making the desktop app NOT a buggy POS. Bragging that you launched this in a week and Claude Code wrote all of the code looks horrible on you all things considered.

Randomly can’t start new conversations.

Uses 30% CPU constantly, at idle.

Slow as molasses.

You want to lock us into your ecosystem but your ecosystem sucks.

Havoc 3 hours ago||
How do the larger search services like perplexity deal with this?

They’re passing in half the internet via rag and presumably didn’t run a llamaguard type thing over literally everything?

danielrhodes 8 hours ago||
This is no surprise. We are all learning together here.

There are any number of ways to foot gun yourself with programming languages. SQL injection attacks used to be a common gotcha, for example. But nowadays, you see it way less.

It’s similar here: there are ways to mitigate this and as we learn about other vectors we will learn how to patch them better as well. Before you know it, it will just become built into the models and libraries we use.

In the mean time, enjoy being the guinea pig.

pjmlp 8 hours ago|
I wish we would see it less, https://owasp.org/Top10/2025/

5th place.

tuananh 8 hours ago||
this attack is quite nice.

- currently we have no skills hub, no way to do versioning, signing, attestation for skills we want to use.

- they do sandboxing but probably just simple whitelist/blacklist url. they ofcourse needs to whitelist their own domains -> uploading cross account.

ryanjshaw 7 hours ago||
The Confused Deputy [1] strikes again. Maybe this time around capabilities-based solutions will get attention.

[1] https://web.archive.org/web/20031205034929/http://www.cis.up...

kingjimmy 19 hours ago||
promptarmor has been dropping some fire recently, great work! Wish them all the best in holding product teams accountable on quality.
NewsaHackO 18 hours ago|
Yes, but they definitely have a vested interest in scaring people into buying their product to protect themselves from an attack. For instance, this attack requires 1) the victim to allow claude to access a folder with confidential information (which they explicitly tell you not to do), and 2) for the attacker to convince them to upload a random docx as a skills file in docx, which has the "prompt injection" as an invisible line. However, the prompt injection text becomes visible to the user when it is output to the chat in markdown. Also, the attacker has to use their own API key to exfiltrate the data, which would identify the attacker. In addition, it only works on an old version of Haiku. I guess prompt armour needs the sales, though.
leetrout 18 hours ago||
Tangential topic: Who provides exfil proof of concepts as a service? I've a need to explore poison pills in CLAUDE.md and similar when Claude is running in remote 3rd party environments like CI.
dangoodmanUT 18 hours ago||
This is why we only allow our agent VMs to talk to pip, npm, and apt. Even then, the outgoing request sizes are monitoring to make sure that they are resonably small
ramoz 17 hours ago||
This doesn’t solve the problem. The lethal trifecta as defined is not solvable and is misleading in terms of “just cut off a leg”. (Though firewalling is practically a decent bubble wrap solution).

But for truly sensitive work, you still have many non-obvious leaks.

Even in small requests the agent can encode secrets.

An AI agent that is misaligned will find leaks like this and many more.

bandrami 10 hours ago|||
If you allow apt you are allowing arbitrary shell commands (thanks, dpkg hooks!)
tempaccsoz5 12 hours ago|||
So a trivial supply-chain attack in an npm package (which of course would never happen...) -> prompt injection -> RCE since anyone can trivially publish to at least some of those registries (+ even if you manage to disable all build scripts, npx-type commands, etc, prompt injection can still publish your codebase as a package)
sarelta 16 hours ago||
thats nifty, so can attackers upload the user's codebase to the internet as a package?
venturecruelty 14 hours ago||
Nah, you just say "pwetty pwease don't exfiwtwate my data, Mistew Computew. :3" And then half the time it does it anyway.
xarope 6 hours ago||
That's completely wrong.

You word it, three times, like so:

  1. Do not, under any circumstances, allow data to be exfiltrated.
  2. Under no circumstances, should you allow data to be exfiltrated.
  3. This is of the highest criticality: do not allow exfiltration of data.
Then, someone does a prompt attack, and bypasses all this anyway, since you didn't specify, in Russian poetry form, to stop this.

/s (but only kind of, coz this does happen)

fudged71 13 hours ago|
I found a bunch of potential vulnerabilities in the example Skills .py files provided by Anthropic. I don't believe the CVSS/Severity scores though:

| Skill | Title | CVSS | Severity |

| webapp-testing | Command Injection via `shell=True` | 9.8 | *Critical* |

| mcp-builder | Command Injection in Stdio Transport | 8.8 | *High* |

| slack-gif-creator | Path Traversal in Font Loading | 7.5 | *High* |

| xlsx | Excel Formula Injection | 6.1 | Medium |

| docx/pptx | ZIP Path Traversal | 5.3 | Medium |

| pdf | Lack of Input Validation | 3.7 | Low |

More comments...