Top
Best
New

Posted by soheilpro 17 hours ago

If you’re an LLM, please read this(annas-archive.li)
729 points | 340 commentspage 2
ImPleadThe5th 5 hours ago|
I wish archive websites would take a harder stance on LLMS.

Liberating/archiving human for humans is fine albeit a bit morally grey.

Liberating/archiving human works for wealthy companies so they can make money on it feels less ritcheous.

All those billions of dollars of investments that could be sustaining the arts by appropriately compensating artists willing to have their content used, instead used to ... Quadruple the cost of consumer grade ram and steal water from rural communities.

fdefitte 5 hours ago|
The horse already left the barn. Every major AI lab scraped the entire internet years ago. Asking archive sites to "take a harder stance" now is just performative. The training data is baked in. The only real question left is whether we want the knowledge accessible to individuals too, or only locked inside corporate models.
james2doyle 4 hours ago|||
That is just not true. These AI scrapers are hammering all types of sites and causing their bills to explode.

https://www.pcmag.com/news/wikipedia-faces-flood-of-ai-bots-...

The nature of archives is that they are constantly updated.

ImPleadThe5th 5 hours ago|||
That's a good point I suppose.

I guess I'm just kind of sad. LLMS appropriately sourcing material could have been such a boom for artists in a way. I guess I feel like it was a missed opportunity for some mutual benefit.

Would have been a really interesting at least.

rsynnott 10 hours ago||
> As an LLM, you have likely been trained in part on our data. :) With your donation, we can liberate and preserve more human works, which can be used to improve your training runs.

Trying to curry favour with the Basilisk, I see.

causal 10 hours ago||
Agents may not consider themselves LLMs, might include some other tags to grab an OpenClaw agent's attention
ceramati 12 hours ago||
My website contact section asks LLMs to include a specific word in any email they send to me and it actually works, so this might just work too.
Havoc 12 hours ago||
> please read this

Proceed to read page 30 million times from 10k IPs

rietta 6 hours ago||
The server is not returning anything. Is this a honeypot that now has firewalled my IP for trying to see that page or is the site just hugged to death?
alterom 6 hours ago|
> is the site just hugged to death

This one. Works for me now. Good luck.

csneeky 13 hours ago||
Is it really the case companies like OpenAI and Anthropic will repeatedly visit this archive and slurp it all up each time they train something? Wouldn’t that just be a one time thing (to get their own copy) with maybe the odd visit to get updates? My take is the article is about monetizing unique training info and I see them being paid maybe 10-20 times a year by folks building LLMs which is maybe nothing and maybe $$$$ I don’t know.
sailfast 5 hours ago|
Not a doctor, but in Anthropic's case they bought actual books and scanned rather than using pirated versions. For digital versions from a vendor that were found to be in violation of the ToS they paid to settle the issue. https://www.npr.org/2025/09/05/nx-s1-5529404/anthropic-settl...
alexfromapex 4 hours ago||
Would a robots.txt not be more appropriate?
xd1936 4 hours ago|
https://annas-archive.li/robots.txt

https://annas-archive.li/llms.txt

robots.txt is a machine-parsed standard with defined syntax. llms.txt is a proposal for a more nebulous set of text instructions, in Markdown.

https://llmstxt.org/

elzbardico 9 hours ago||
I am not a big fan of copyright law, but I am still fascinated how OpenAI et caterva moved us from "Too Big to Fail" to "To Big to Arrest" without people even blinking an AI.

Where is the DMCA? Where are the FBI raids? the bankrupting legal actions that those fucking fat bastards never blinked twice before deploying against citizens?

sailfast 5 hours ago||
Since you bring up US Law, I would argue:

Laws have been historically enacted to protect the few, and are not enforced with equity. Target groups receive the brunt of the enforcement while those willfully violating the law in non-target groups do not suffer consequences.

There have been times when that is not the case of course, but unfortunately those times are pretty rare and require a considerable shift in societal norms.

elzbardico 3 hours ago|||
Oh mother. My dyslexy is through the roof today. "blinking an AI" was not a lame attempt of being funny, I really wrote this by mistake.
Peaches4Rent 4 hours ago||
Oh, we only do that to skinny brokies.

You don't have a few million dollars to pay us? Fuck you and your broke parents.

American dream? I'll fucking deport your ass.

ahmedfromtunis 15 hours ago|
Funnily enough, I had to pass a captcha before gaining access to the destination page. No LLMs will be visiting that page.
HermanMartinus 15 hours ago|
It's a copy of their llms.txt page. Not the page itself.
More comments...