Posted by jeffpalmer 14 hours ago
Sounds pretty useless for any serious AI company
If your delay is 1s and you publish less than 60 updates a minute on average I can still get 100%. Most crawls are not that latency sensitive, certainly not the ai ones.
HFT bots, now that is an entirely different ballgame.
They certainly behave like they are. We constantly see crawlers trying to do cache busting, for pages that hasn't change in days, if not weeks. It's hard to tell where the bots are coming from theses days, as most have taken to just lie and say that they are Chrome.
I'd agree that the respecting robots.txt makes this a non-starter for the problematic scrapers. These are bots that that will hammer a site into the ground, they don't respect robots.txt, especially if it tells them to go away.
All of this would be much less of a problem if the authors of the scrapers actually knew how to code, understood how the Internet works and had just the slightest bit of respect for others, but they don't so now all scrapers are labeled as hostile, meaning that only the very largest companies, like Google, get special access.
Do you have a source for this? Not saying you're wrong, I'd just like to know more