Top
Best
New

Posted by damir 10/23/2024

Ask HN: Website with 6^16 subpages and 80k+ daily bots

Last year, just for fun, I created a single index.php website calculating HEX colors to RGB. It takes 3 and 6 digit notation (ie. #c00 and #cc0000) and converts it to RGB value. No database, just single .php file, converting values on the fly.

It's little over a year old and now every day there's 60k-100k bots visiting and crawling the shit out of two-trillion-something sub pages...

I am out of ideas what to do with this site. I mean, it's probably one of the largest websites on the Internet, if counted by sub-pages...

What cool experiment/idea/stuff should I do/try with this website?

I'm sure AI could be (ab)used somehow here... :)

287 points | 201 commentspage 2
tallesttree 11/4/2024|
I agree with several posters here who say to use Cloudflare to solve this problem. A combination of their "bot fight" mode and a simple rate limit would solve this problem. There are, of course, lots of ways to fight this problem, but I tend to prefer a 3-minute implementation that requires no maintenance. Using a free Cloudflare account comes with a lot of other benefits. A basic paid account brings even more features and more granular controls.
iamleppert 10/24/2024||
If you want to make a bag, sell it to some fool who is impressed by the large traffic numbers. Include a free course on digital marketing if you really want to zhuzh it up! Easier than taking money from YC for your next failed startup!
Prbeek 10/24/2024|
Would be difficult as everyone would run the moment they got granular in checking traffic stats.
Kon-Peki 10/23/2024||
Put some sort of grammatically-incorrect text on each page, so it fucks with the weights of whatever they are training.

Alternatively, sell text space to advertisers as LLM SEO

damir 10/24/2024||
Actually, I did take some content from wikipedia regarding HEX/RGBA/HSL/etc colors and stuff it all together into one big variable. Then, on each sub-page reload I generate random content via Markov chain function, which outputs semi-readable content that is unique on each reload.

Not sure it helps in SEO though...

purple-leafy 10/23/2024||
Start a mass misinformation campaign or Opposite Day
inquisitor27552 10/23/2024||
so it's a honeypot except they get stuck on the rainbow and never get to the pot of gold
zahlman 10/23/2024||
Wait, how are bots crawling the sub-pages? Do you automatically generate "links to" other colours' "pages" or something?
damir 10/23/2024|
Yeah, each generated page has link to ~20 "similar" colors subpage to feed the bots :)
dahart 10/24/2024||
Wait, how are bots crawling these “sub-pages”? Do you have URL links to them?

How important is having the hex color in the URL? How about using URL params, or doing the conversion in JavaScript UI on a single page, i.e. not putting the color in the URL? Despite all the fun devious suggestions for fortifying your website, not having colors in the URL would completely solve the problem and be way easier.

bediger4000 10/23/2024||
Collect the User Agent strings. Publish your findings.
ecesena 10/24/2024||
Most bots are prob just following the links inside the page.

You could try serving back html with no links (as in no a-href), and render links in js or some other clever way that works in browsers/for humans.

You won’t get rid of all bots, but it should significantly reduce useless traffic.

Alternative just make a static page that renders the content in js instead of php and put it on github pages or any other free server.

stop50 10/23/2024||
How about the alpha value?
damir 10/23/2024|
You mean adding 2 hex values at the end of the 6-notation to increase number of sub-pages? I love it, will do :)
bpowah 10/24/2024|
I think I would use it to design a bot attractant. Create some links with random text use a genetic algorithm to refine those words based on how many bots click on them. It might be interesting to see what they fixate on.
More comments...