> You don’t really need any bot detection: just linking to the garbage from your main website will do. Because each page links to five more garbage pages, the crawler’s queue will quickly fill up with an exponential amount of garbage until it has no time left to crawl your real site.
> If a link is posted somewhere, the bots will know it exists,
> Unfortunately, based on what I'm seeing in my logs, I do need the bot detection. The crawlers that visit me, have a list of URLs to crawl, they do not immediately visit newly discovered URLs, so it would take a very, very long time to fill their queue. I don't want to give them that much time.
A single site doing this does nothing. But many sites doing this has a severe negative impact on the utility of AI scrapers - at least, until a countermeasure is developed.
https://ih879.requestcatcher.com/test
to each of the nonsense pages, so we can see an endless flood of funny requests at
https://ih879.requestcatcher.com
?
I'm not sure requestcatcher is a good one, it's just the first one that came up when I googled. But I guess there are many such services, or one could also use some link shortener service with public logs.
Example code:
for c in aqua blue green yellow ; do
for w in hello world huba hop ; do
magick -size 1024x768 xc:$c -gravity center -annotate 0 $w /tmp/$w-$c.jpeg
done
done
Do this in a loop for all colors known to the web and for a number of words from a text corpus, and voila, ... ;-)Edit: added example
https://iocaine.madhouse-project.org/
From the overview:
"This software is not made for making the Crawlers go away. It is an aggressive defense mechanism that tries its best to take the blunt of the assault, serve them garbage, and keep them off of upstream resources. "
A thought though. What happens if one of the bot operators sees the random stuff?
Do you think they will try to bypass it and put you and them in a cat and mouse game? Or would that be too time-consuming and unlikely?
And if they would today, it seems like a trivial think to fix - just don't click on incorrect/suspicious links?
Modern bots do this very well, plus the structure of the Web is such that it is sufficient to skip a few links here and there, most probably there will dxist another path toward the skipped page that the bot can go through later on.
What is being blocked here is violent scraping and to an extent major LLM companies bots as well. If I disagree that OpenAI should be able to take train off of everyone’s work especially if they’re going to hammer the whole internet irresponsibly and ignore all the rules, then I’m going to prevent that type of company from being profitable off my properties. You don’t get to play unfair for the unfilled promise “the good of future humanity”.
And another "classic" solution is to use white link text on white background, or a font with zero width characters, all stuff which is rather unlikely to be analysed by a scraper interested primarily in text.
https://www.cloudflare.com/press/press-releases/2025/cloudfl...