Posted by janandonly 8 hours ago
I think, obviously, they're trying to get the LLM to make a donation without explicit user approval but I think they're shooting themselves in the foot.
We recently saw a post on here about an Italian Pokemon website getting near 0 traffic after Google AI indexed and trained on their data. Sadly, I think this is going to happen to a lot of sites. Not sure how we can stop it. Any ideas?
What the role of Anna's archive plays in the future is an interesting question. But I'm optimistic about it. And if Anna's archive fails, but lots of OpenClaw instances are hosting the torrents or at least have a local copy of parts of the library that's still a decent outcome
The hope is probably that the LLM's will download properly rather than DDOSing them.
A few of the large AI companies might care enough to set up a custom solution for you, assuming that your dataset is sufficiently large. Most doesn't. HTTP is the common protocol and HTML the standard format, a torrent is just needless hassle.
The problem Anna's Archive also have is that the legality is questionable and having an official collaboration with them might be problematic. Better to just crawl the site and claim that you crawl the entire web so you accidentally crawled Anna's Archive.
At the very least the chinese ones definitely would regardless of the legality, the western labs would keep it under wraps but they also probably do.
At their scale, he cost of scraping or getting it directly from Anna's sources is way higher than just donating $50k and getting easy, fast access
The goal of AA is to spread the data for free, not to gatekeep it. Donations are optional.
Well that rather defeats the point, doesn't it!
Nothing to do but watch the web fill up with more crap
https://www.karlbunch.com/random/website-protection-act/
555 gigabytes of bandwidth in a week! We're paying more for egress than compute and storage now. I've tried robots.txt and finally gave in and started setting up aggressive WAF rules.
Someone spends months or years of their life dedicated to writing a book. And people celebrate the fact they can get it for free, justify it by saying it's not free to search or host this content and offer to donate to piracy sites.
Rather than... Just supporting the author and buying their book?
It's different when this is American education and you're effectively being forced to buy books otherwise. I can understand fighting against that. But most stuff on the archive isn't that. It's just plain old piracy.
Yes a PDF or epub doesn't cost money to "print". Yes no one is "losing" money. But this isn't Netflix or Hollywood who still making billions regardless of piracy. Most of these authors are just regular people.
And the whole preservation angle makes sense when the books are no longer for sale. It's hard to argue preservation when you're linking to or hosting these works the second they are available to download. I'd be much more inclined projects that time walled the data, so you could effectively argue it's for preservation.
Because we broke copyright. There is room to quibble about exactly where and when, but the result is quite clear. The best summation I know of is from a speech by Thomas Babington Macaulay in the British House of Commons in 1841[1],
"At present the holder of copyright has the public feeling on his side. Those who invade copyright are regarded as knaves who take the bread out of the mouths of deserving men. Everybody is well pleased to see them restrained by the law, and compelled to refund their ill-gotten gains. No tradesman of good repute will have anything to do with such disgraceful transactions. Pass this law: and that feeling is at an end. Men very different from the present race of piratical booksellers will soon infringe this intolerable monopoly. Great masses of capital will be constantly employed in the violation of the law. Every art will be employed to evade legal pursuit; and the whole nation will be in the plot. On which side indeed should the public sympathy be when the question is whether some book as popular as Robinson Crusoe, or the Pilgrim's Progress, shall be in every cottage, or whether it shall be confined to the libraries of the rich for the advantage of the great-grandson of a bookseller who, a hundred years before, drove a hard bargain for the copyright with the author when in great distress? Remember too that, when once it ceases to be considered as wrong and discreditable to invade literary property, no person can say where the invasion will stop. The public seldom makes nice distinctions. The wholesome copyright which now exists will share in the disgrace and danger of the new copyright which you are about to create. And you will find that, in attempting to impose unreasonable restraints on the reprinting of the works of the dead, you have, to a great extent, annulled those restraints which now prevent men from pillaging and defrauding the living."
Are libraries unethical to use? You can go to your library and read books without paying for them.
Publishers aren't just stealing money that should go to authors. We can debate percentages and such, but buying a book also pays the editors (who any author will tell you are just as important to a book as they are), the typesetters, the designers, etc.
Libraries aren't unethical, because they're just letting you borrow stock of books. There's practical limits on how it scales, and any impatient users might just buy the book. Once you can infinitely duplicate a work, it's not borrowing.
So what? I think, if you read a good book, learn something or are well-entertained, it's a positive externality, so there is no problem with people doing it for free.
The only real issue with IP piracy is when someone gets money by copying the works. Which were originally the cases copyright tried to prevent.
Maybe you can clarify why you see people doing these things for free a problem, when there is a net benefit to society and also you.
When people around me ask about how to "get into reading" I tell them to just find stuff they like online (via AA) or at the library and go from there. If you don't pay initially you don't feel as bad about trying things that may be "bad" or that you aren't interested in.
There's been a reasonable amount of research that suggests that piracy doesn't really cannibalise sales from those who can afford to pay.
But I do agree that for some of their categories a time wall would improve their optics.
There's also the fact that just because a something is available to purchase in one country, doesn't mean it's available in other countries. A lot of movies/books/games/etc are geo-restricted in sale, with many countries having no valid methods to acquire them.
The best (but unrealistic) solution would be for people who can purchase legally to do so, while leaving it available for download for everyone else.
And it seems that piracy has become a net benefit to new and niche artists. (https://www.sciencedirect.com/science/article/abs/pii/S01676...)
I'd posit that the book industry will turn out to be the same. Piracy will harm the bottom line of the companies already at the top while giving exposure to the authors at the bottom. The latter being the ones who often strong-armed into terrible financial deals just to gain access to book-industry's four big gatekeepers, and who likely need that exposure to help keep a roof over their heads.
Anecdotally, I'm one of those folks who end up purchasing many of the books I pirate or otherwise obtain for free, and I'm sure I'm not the only one who does this.
Imagine that causing an agent to find your payment method and make a donation