Top
Best
New

Posted by kirushik 11 hours ago

Claude Code is steganographically marking requests(thereallo.dev)
1429 points | 409 commentspage 7
827a 10 hours ago|
This seems really, really stupid. Similar to the weird Zig runtime signature thing from a few months ago ago, it was bound to be discovered, quickly, and all the resellers have to do is find a new domain name that (checks notes) doesn't have the word DEEPSEEK in it. Like, seriously? Your goal was to identify resellers by checking if the proxy has the corporate name of one of your competitors in it? Is this amateur hour?

All Anthropic has done is reduce trust, once again, with legitimate customers, while doing nothing to stop illegitimate customers. They need to get adults into key leadership roles, quickly.

timmytokyo 7 hours ago|
To Claude Code: "Please modify Claude Code to mark requests in a way that is not immediately obvious to a human user. Requests should be marked if they originated from one of the following Chinese AI labs or LLM service providers: ..."

Consider also that Claude Code is explicitly designed to limit human agency [1].

[1] https://neuromatch.social/@jonny/11635101584259395

phendrenad2 10 hours ago||
Non-hugged: https://archive.is/Wdhp0
ZappoMan 9 hours ago||
One more example of "I thought Anthropic was supposed to be the good guys."
theplumber 10 hours ago||
The more I learn about Anthropic the more they disgust me. Finger crossed for all the companies from their “ban list”
conception 10 hours ago||
Which AI company have you learned more about where you liked them more as more details came out?
tancop 10 hours ago|||
nous research. started out making overhyped llama finetunes, now they got a great agent harness and a cutting edge distributed training network that actually works.
nmfisher 2 hours ago||
I haven't tried their Hermes agent yet, because I only want a coding agent and I wasn't sure if theirs was suitable. Would you recommend it?
selfhoster11 10 hours ago||||
Moonshot.
chvid 9 hours ago|||
Deepseek.
TZubiri 8 hours ago||
based and steganopilled
wolttam 10 hours ago||
I used Claude Code for a month because my boss gifted me a sub and wanted me to try it.

I used that month to complete a work project and then beef up my personal harness so I'd never have to deal with Anthropic (and these sorts of shenanigans) again.

thih9 10 hours ago||
How do people build something like a personal harness? Are there tools for that or is it done from scratch?
andai 10 hours ago|||
I like this tutorial for an agent in 50 lines:

http://minimal-agent.com/

And if you add one additional while loop, for user input, you can actually use it! :)

https://gist.github.com/a-n-d-a-i/5461a662ef8a7ee0a5eb7778c8...

nowittyusername 10 hours ago||||
Build it from scratch. Understanding fundamentals of how agentic coding harnesses is a must though if you gonna go that route. I think everyone should take time and learn these things, maybe reverse engineer Codex Cli or something like that as a starter. That info is very valuable in this day and age.
andai 9 hours ago||
Can you say more about Codex? I'm using GPT-5.5 in my own harness and it's not liking it very well, so I'm thinking I ought to make it more Codexy so it's more ergonomic for it. (edit format, tool calls etc.) But haven't gotten around to it yet.
nowittyusername 1 hour ago||
In short its a good idea to have tool calling be closely representative to what the model expects as these models are tuned to their own preferred way of doing things, it will surely save you lots of time. The disadvantage is that now your harness system is not as model agnostic as you would like and also you will have to keep up in changing landscape by adapting the tool calling structure with major updates for best results. Its a personal decision you will have to make for yourself. Personally my harness system uses its own way of doing tool calling as I am trying to experiment with simpler tool schema's that also work for smaller less intelligent models but I have yet to do enough A/B testing to say that is a smart approach. As time goes on I think the smart thing to do might be to set up an adapter type of module that changes its tool schema's based on underlying model used for the agent. This preserves optimal behavior patterns with little investment from me. You might have to adjust system prompt in some minor ways as well so keep that in mind. As far as codex i prefer it as i like the way Open Ai does things in that harness system (the spirit if you will), there's interesting tidbits I always find and while I don't usually use them for my own harness system they are inspirational in other ways. you can gather what the devs were trying to achieve with certain implementations.
hakunin 10 hours ago||||
Not the comment author, but I use pi and customize it with my own extensions. Pi automatically tells models how to customize itself, so it's a pretty easy process.
abtinf 9 hours ago||||
Here is a video I made explaining it from absolute basics:

https://m.youtube.com/watch?v=_AgKuFGvJfI

And the repo:

https://github.com/abtinf/homunctor

airhangerf15 8 hours ago||
I hope you've already invalidated that bearer token :-P
abtinf 6 hours ago||
Of course.
wolttam 10 hours ago||||
I started mine from scratch in 2023 because I wanted to use LLMs from a terminal and there was nothing else compelling at the time (nowadays there is pi and opencode)

Harnesses are/can be incredibly simple things, not much more than a HTTP client that renders things in a way that suites your taste.

kolinko 10 hours ago||||
It’s not that difficult, it’s just a system prompt and a set of basic file edit/bash/etc tools.

Me, personally, I didn’t build it from scratch but I ported original CC from published sources into Python and extended it to match my own requirements.

andai 9 hours ago||
Are you using it with Claude? They only allow their own harness with the subs right? (And per-token billing is like 10x more expensive?)
yomismoaqui 9 hours ago||||
Building something like this is the todo list of agents.

I found this one easy to understand:

https://ampcode.com/notes/how-to-build-an-agent

AJ007 9 hours ago||||
The real question is when do you transition from building it with codex/CC to the harness itself.
verdverm 8 hours ago||||
Lots of ways, it's a good exercise that you will learn a lot doing. Might make you cynical w.r.t. big ai harnesses

I used ADK, Dagger, and a VS Code extension for mine. Currently using opencode though.

echelon 10 hours ago|||
Why use a personal harness?

You have to pay API pricing, which is far more costly.

I'd either switch to GLM wholesale or just continue to use Opus within Claude Code as the blessed, subsidized path.

JTbane 9 hours ago|||
I would guess it is to avoid model lock-in.
echelon 9 hours ago||
My question is still this - why not just use GLM at that point?

The pricing of Opus outside of Claude Code is insane.

The tokens cost too much outside of Anthropic's blessed path.

andai 9 hours ago|||
I use GLM in my custom harness. It completes the same tasks at the same level of quality, except 8x faster and 8x cheaper. (Same goes for GPT!)

I'm not sure how that's possible. I expected to get increased correctness for that order of magnitude (something something test-time compute!) but I am not getting it.

WinstonSmith84 7 hours ago|||
Yes, this is actually "funny" that Anthropic feels the need to build such intrusive features into Claude Code, when anybody can build a (basic) Claude Code alternative. And the Chinese labs are certainly not "anybody". One may wonder what Anthropic really tries to achieve aside from awful publicity.
helloplanets 8 hours ago|||
The issue is that using Claude Code is an easy compromise for most to make, when you get to use the models 10x cheaper than through API pricing with a custom harness.

The cheap tokens are the product.

nananana9 8 hours ago||
Which is why my vibeslop harness supports `claude -p` as one of its backends.
helloplanets 8 hours ago||
If that ain't getting steganographically tagged...
tonmoy 10 hours ago|||
What models are you using? Aren’t you still dealing with some provider even if you are not using their binary
wolttam 10 hours ago||
I self-host DeepSeek V4 Flash on 2 DGX Sparks (approx. $10k)

I expect DeepSeek V4 Flash (or an equivalently sized model) to reach parity with GLM 5.2 some time this year (this based on DeepSeek V4 Flash launching at GLM 5.0 parity[0], and GLM 5.2 being freely available to distill from)

GLM 5.2 is within spitting distance of Opus 4.8 and is at least as good as Opus 4.6[1] which some devs were willing to spend hundreds to single-digit thousands of dollars a month for a few months ago.

[0]: https://artificialanalysis.ai/models/comparisons/deepseek-v4...

[1]: https://artificialanalysis.ai/models/comparisons/claude-opus...

ipsod 10 hours ago||
How fast is it?
wolttam 10 hours ago|||
2000 t/s prompt processing and 40-50 t/s generation. We should see 60-70 t/s generation with DSpark support solidifying in vLLM in a few days

Recent discussion on DSpark: https://news.ycombinator.com/item?id=48696585

krupan 10 hours ago|||
Given the Anthropic shenanigans, do you trust the personal harness code it wrote for you?
wolttam 10 hours ago|||
It did not write it for me, I used it to add a feature I wanted. It's a pretty small and understandable codebase, in fact :)
MichaelZuo 10 hours ago|||
Does anyone know what’s gone wrong with Anthropic?

They used to be a decently credible company with not-too-shady behaviour...

I hope they can actually regain some credibility…

hombre_fatal 9 hours ago|||
I don't think many people care that they are trying to detect resellers and distillation.

It also doesn't seem very consistent to fixate on that while sending Anthropic everything about you via your day to day prompts, every line of the projects and environments you're working on at work, etc.

Their credibility comes from having one of the best models.

MichaelZuo 9 hours ago||
This sounds similar to what people were saying regarding Microsoft when the shady tricks of consumer Windows 10 versions were revealed.

…And then Windows 11 became even worse.

satvikpendem 8 hours ago||||
When have they ever been credible? They have always been shady with their talk of safety, Dario was the one who wrote back in 2019 that GPT 2 was too dangerous to release.
slowmovintarget 9 hours ago||||
Their philosophy is what's gone wrong.

It has some good effects on the their models, like Claude seeking cooperation first. But the people behind the company have a typical "unconstrained" (in the Sowell vision sense) perspective that assumes that they know better, so they are righteous for attempting to control things (users, paying customers, their model outputs, their tool chain, the supposed deity they assume they will produce... etc.)

pishpash 9 hours ago|||
Amodei world: pompous zealot with God complex

Altman world: malfeasant nihilist with God complex

MichaelZuo 9 hours ago|||
Yeah I guess there is a slight undertone that they are the superiors… with the rest of the tech world being the inferiors.

But I hadn’t thought that as anything more than temporary flights of fancy.

AlexandrB 10 hours ago||||
They've only been around 5 years and have grown tremendously during that time. There's no stable reputation you can rely on yet.
skeptic_ai 10 hours ago||||
They just show their true face. You’ve been lied all this time. They were never “good”.
MichaelZuo 10 hours ago||
I used to interact with the LW crowd… and they were mostly not outright swindlers or scoundrels. (from what I could sense)

I think it’s fair to say most had decent respectability.

Anthropic hired heavily from that pool so it’s astonishing how it turned out.

solenoid0937 2 hours ago||
Everything they do is understandable if you think they are being honest when they say they're building superintelligence.

In this case they want to prevent a nation that censors its citizenry, puts/disappears dissidents into concentration camps for decades, and makes its own human rights lawyers literally eat their own shit, before raping and/or murdering them, from reaching superintelligence.

In this light, some client side code to potentially identify and ban the Chinese labs to slow them down by even a few days, is totally reasonable.

imhoguy 9 hours ago|||
Enshitification. Too big to.. upset the govt.
SubiculumCode 10 hours ago|||
[flagged]
tiahura 10 hours ago||
Phased rollouts are a triggering microagression for some.
bibimsz 9 hours ago||
this is the one they wanted us to find
bitlad 9 hours ago||
Silicon valley season 6 was on point.
ajross 10 hours ago||
Headline is, frankly, awful. This isn't the AI secretly doing stuff and hiding it. This is the very human Anthropic engineers trying to detect Chinese scraping via some frankly hamfisted and unimaginative URL trickery.
krupan 10 hours ago||
I didn't assume it was the AI, just that some part of the the overall Claude Code product was doing this. I didn't assume the feature was added to Claude Code without human oversight. If it was added by Claude-the-AI itself without the humans prompting it to I would still hold the humans at Anthropic responsible. Does that make you feel better?
zulban 9 hours ago|||
Defence in depth isn't hamfisted. They're only noobs if this is all they do.
ajross 8 hours ago||
FWIW: Defense in depth is a security technique, and abuse detection isn't part of that domain. Security starts from the premise that the system is supposed to be undefeatable but might have holes, and then asking where the holes might lie to decide where to put backstops.

Here the system is "insecure" by design (literally they're trying to get the whole world to sign up for Claude Code for $200/month!) and they're trying to plug the hole that results from a "Except for Chinese Scrapers!" add-on requirement. That might be possible as an arms race kind of thing. But it's very unlikely to work by (as in the linked article) doing stuff like checking the system time zone.

LoganDark 9 hours ago|||
The model is Claude. Claude Code is the harness.
Beigale 7 hours ago||
[dead]
grayhatter 10 hours ago|
Here's the sha of the prompt I submitted... no I don't know why there are no saved prompts with that sha.

What do you mean you don't know where the bug is coming from?

No, I absolutely didn't make it up, how could you accuse me of that?

Does anyone know when this regex isn't working? I double checked it 27 times, I even asked the LLM. They all say this regex should be finding these dates.

Weird, suddenly all the conversations are breaking when I feed them into this other tool? Something about UTF-8 errors, but I'm sure I'm only using ASCII?

I do try to take care to make sure the things I build can be used by other people even when they care about different things. I care about understandably, determinism (as it relates to computing), and repeatability (because I want to be able to trust the systems I use).

If y'all would be willing to try to account for use cases of others, and try not to break them... that would be nice.

Please note: that generally when you modify something that belongs to someone else without telling them... things should be expected to break.

More comments...