Claude Code Is Steganographically Marking Requests

Posted by kirushik 3 hours ago

Claude Code Is Steganographically Marking Requests(thereallo.dev)

631 points | 194 commentspage 2

sigmoid10 3 hours ago|

If they only collect the data for analysis I guess this is fine (they already get way more sensitive data from users anyways, so if privacy is your concern you've made the mistake many steps ago). The much more interesting question is if they directly act on this data in their API. For example by rate-limiting, compute-limiting or rerouting to weaker models. That might even be legally questionable. I would really like to see this as a follow-up analysis, but I guess it is way more difficult and will also cost quite a bit in tokens.

SubiculumCode 2 hours ago||

Would it be legally questionable, or actually complying with U.S. export law?

krupan 2 hours ago|||

"If they only collect the data for analysis I guess this is fine"

I think you missed the memo on how foolish this attitude is. It came out around the time Edward Snowden made his discoveries at the NSA public. I suggest you look into it

sigmoid10 2 hours ago||

As I said above, if you are worried about privacy while hooking up Claude Code, you need to reevaluate your understanding of this technology.

bakugo 3 hours ago||

I've heard that it was possible to trigger really obvious output poisoning on Fable with something as basic as asking the model to think outside of its built-in hidden thinking delimiters.

This watermark may trigger a similar mechanism.

ryanisnan 2 hours ago||

This is weird but, help me understand how this meaningfully impacts our exposure.

I'm authenticated to Claude, so they already have the whole attribution thing solved.

chinathrow 2 hours ago|

User != paying person/company/reseller.

tgtweak 2 hours ago||

None of this is surprising - they're trying to mask and relay when they detect known patterns of what looks like distillation attacks and client app copying/modification. The list obfuscation here is likely to prevent or make it difficult for those same adversaries to work around this or delete/null it out when making a bootleg copy.

Cool reverse engineering/analysis report but if this is the extent of nefarious activity that came of it (trying to catch/mitigate chinese lab model distillations), that's kind of encouraging.

throwawayffffas 3 hours ago||

Claude code does feel very malwarey to be honest. They have been like that from the start.

fny 3 hours ago||

This was already discovered during the source map leak.

> This is not a malicious feature, but it is a weird choice for a developer tool that asks for trust.

They already tell you they scan for malicious prompts, and they have no ZDR guarantees for consumers. Why do signatures like this matter at all?

llelouch 2 hours ago|

There has been an anti anthropic propaganda push by bad actors across social media sites especially Reddit and twitter. This started a few months ago when anthropic started beating openai.

zulban 1 hour ago||

Absolutely. Nothing makes me believe dead internet theory more than text threads discussing anyhropic and openai.

port3000 2 hours ago||

That's a lot of effort when they could just play a short video saying 'You wouldn't steal a car' instead

100ms 3 hours ago||

What's the point of even trying to obfuscate this with such a simple method? Could at least have hidden the targeted features by storing their hashes or embedding a bloom filter or similar

ajb 2 hours ago||

In this case, this is probably not the only stereographic tattletale.

Had a competitor pull something like this with a previous employer. They were supposed to be interoperating with a standard, but they had a secret steganographic handshake, which they used to pretend that competitors products were unreliable (they had a first mover position in a smaller national market with specific requirements, so this wasn't shooting themselves in the foot). Our guys figured out the handshake and just silently implemented it. In this case, the competitor wasn't big enough to waste engineering time on multiple such hacks, but Anthropic have time (or Claude does).

gonzalohm 3 hours ago||

The point is not raising red flags I guess

kej 3 hours ago||

I love how well this comment works as a vexillology joke, even if it wasn't intended.

chvid 2 hours ago||

(This sounds like a clumsy way of catching the Chinese that easily can be side-stepped.)

Claude Code has more or less full access to the client computer. The server (that hosts the actual AI) can just go: execute this payload and tell me the result - otherwise I won't answer any further questions or re-route you to a stupider model.

The payload could check for Chinese time-zones, scan for copies of the little red book on the local hard-drive, or ping truth.social to see it was behind the great firewall.

drnick1 1 hour ago|

> Claude Code has more or less full access to the client computer.

It shouldn't, not if you run CC as a separate unprivileged user. I wouldn't run CC on my main user account with sudo and access to my home directory or other resources. This is what the UNIX permissions system was designed for.

jacobgold 2 hours ago||

> "That also means the client itself deserves scrutiny. If a coding agent can read your repo and run commands, the binary that ships it should be boring (ƒor example, pi harness)"

You're actually trust your security to your harness AND model AND inference API provider in this scenario: https://jacob.gold/posts/why-i-wont-run-untrusted-models/

iqandjoke 2 hours ago|

It is about China detection. They seems to put a tracker on the email as well.

More comments...