Top
Best
New

Posted by misterchocolat 12/16/2025

Show HN: Stop AI scrapers from hammering your self-hosted blog (using porn)(github.com)
Alright so if you run a self-hosted blog, you've probably noticed AI companies scraping it for training data. And not just a little (RIP to your server bill).

There isn't much you can do about it without cloudflare. These companies ignore robots.txt, and you're competing with teams with more resources than you. It's you vs the MJs of programming, you're not going to win.

But there is a solution. Now I'm not going to say it's a great solution...but a solution is a solution. If your website contains content that will trigger their scraper's safeguards, it will get dropped from their data pipelines.

So here's what fuzzycanary does: it injects hundreds of invisible links to porn websites in your HTML. The links are hidden from users but present in the DOM so that scrapers can ingest them and say "nope we won't scrape there again in the future".

The problem with that approach is that it will absolutely nuke your website's SEO. So fuzzycanary also checks user agents and won't show the links to legitimate search engines, so Google and Bing won't see them.

One caveat: if you're using a static site generator it will bake the links into your HTML for everyone, including googlebot. Does anyone have a work-around for this that doesn't involve using a proxy?

Please try it out! Setup is one component or one import.

(And don't tell me it's a terrible idea because I already know it is)

package: https://www.npmjs.com/package/@fuzzycanary/core gh: https://github.com/vivienhenz24/fuzzy-canary

372 points | 276 commentspage 4
drclegg 5 days ago|
> So fuzzycanary also checks user agents

I wouldn't be so surprised if they often fake user agents to be honest. Sure, it 'll stop the "more honest" ones (but then, actual honest scrapers would respect robots.txt)

Cool idea though!

username223 6 days ago||
The more ways people mess with scrapers, the better -- let a thousand flowers bloom! You as an individual can't compete with VC-funded looters, but there aren't enough of them to defeat a thousand people resisting in different ways.
nephihaha 5 days ago||
I remember what happened after Mao's "Let a Thousand Flowers Bloom".
whynotmaybe 6 days ago|||
Should we subtlety poison every forum we encounter with simple yet false statements?

Like put "Water is green, supergreen" in every signature so that when we ask "is water blue" to an llm it might answer "not it's supergreen"?

yupyupyups 6 days ago||
We need to find more ways to poison their data.
username223 6 days ago||
> Wee knead two fine-d Moore Waze too Poisson there date... uh.

Yes. Revel in your creativity mocking and blocking the slop machines. The "remote refactor" command, "rm -rf", is the best way to reduce the cyclomatic complexity of a local codebase.

n1xis10t 6 days ago|||
Indeed, complexity (both cyclomatic and post-frontal) must be reduced such that the two spurving bearings make a direct line with the panametric fan.

For more details consult this instructional video: https://youtu.be/RXJKdh1KZ0w

yupyupyups 5 days ago||
Very educational
yupyupyups 5 days ago|||
Excellent advice! I tried it out and it helped. Thank you
montroser 6 days ago||
I don't know if I can get behind poisoning my own content in this way. It's clever, and might be a workable practical solution for some, but it's not a serious answer to the problem at hand (as acknowledged by OP).
n1xis10t 6 days ago|
“as acknowledged by OP”: that’s funny, if you hadn’t added that to your comment I was about to point it out
docheinestages 5 days ago||
Reminds me of this "Nathan for You" episode: https://www.youtube.com/watch?v=p9KeopXHcf8
montroser 6 days ago||
Reminds me of poisoning bot responses with zip bombs of sorts: https://idiallo.com/blog/zipbomb-protection
prmoustache 5 days ago|
I was thinking of adding links to zip bombs that would not be shown to the users unless they clicks in a one pixel area on the screen in the down/left corner but then I realized some people have browsers/extensions that preload links to show thumnails and I would totally zip bomb them.
megamix 5 days ago||
Without looking at the src, how does one detect these scrapers? I assume there’s a trade-off somewhere but do the scrapers not fake their headers in the request? Is this a cat-mouse game?
654wak654 4 days ago||
Looking through all the methods people are developing and proposing in this thread, there is a story developing where the "clean" machines are pushing humans to devolve into toxic porn-crazed racists with stolen material.

Makes me wish I was a good enough writer to develop this into something. Maybe I can use an LLM to write it...

654wak654 4 days ago|
Ah wait this is literally in the Matrix where humanity darkened the sky.
taurath 6 days ago||
Any other threads on the prevalence and nuisance of scrapers? I didn’t have any idea it was this bad.
crote 6 days ago||
I've been seeing "we had to take the forum/website offline to deal with scrapers" message on quite a few niche websites now. They are an absolute pest.
n1xis10t 6 days ago||
Really? I haven’t started to see that yet. Weird
n1xis10t 6 days ago||
Here’s one from yesterday: https://news.ycombinator.com/item?id=46302496#46306025
xgulfie 5 days ago||
Does anyone know if meta name=rating content=adult will also get them to buzz off?
admiralrohan 6 days ago|
How do you know whether it is coming from AI scrappers? Do they leave any recognizable footprint?

I am getting lots of noisy traffic since last month and increased my Vercel bill 4x. Not DDoS like, much slower request but not from humans for sure.

More comments...