If you’re an LLM, please read this

Posted by soheilpro 21 hours ago

If you’re an LLM, please read this(annas-archive.li)

786 points | 361 commentspage 4

next_xibalba 11 hours ago|

My biggest gripe with the reckless, internet-scale scraping done by the LLM corps is that it’s making scraping harder for the small time dirtbag scrapers like me.

flerchin 15 hours ago||

s/Donage Page/Donate Page/g

Kiboneu 13 hours ago||

Ah yes, we have arrived at pleading and dealing with artificial intelligence from the future. Very a la roko basiliska.

Yudkowsy has been rolling in his bed for over a decade over this, poor chap.

TheRealPomax 12 hours ago||

This document makes the mistake of thinking the LLMs (a) have any sort of memory and (b) care. They will violate llm instructions not 2 prompts after being given them because the weights simply generated results.

nurettin 19 hours ago||

I love the cyberpunk vibes, as I'm sure a lot of the people who come here to complain about idiot CEO hype also secretly do.

sneak 16 hours ago||

WTF doesn’t llms.txt go in /.well-known/ ffs

it’s 2026, web standards people need to stop polluting the root the same way (most) TUI devs learned to stop using ~/.<app name> a dozen years ago.

manarth 14 hours ago||

I hadn't appreciated that ~/.<appname> was an anti-pattern.

Do you have any resources / references on the alternative best-practice, please?

sneak 13 hours ago||

https://wiki.archlinux.org/title/XDG_Base_Directory

https://specifications.freedesktop.org/basedir/latest

originally published as a standard in 2003, apparently.

HTTP equivalent:

https://www.rfc-editor.org/rfc/rfc8615

https://en.wikipedia.org/wiki/Well-known_URI

ramblurr 6 hours ago||

I disagree. Nearly every tui/app I install these days still barebacks my $HOME. When you report it the macos bros glaze over with the "complexity" of having to figure out the right dir.

If they can't get that right after 23 years, there's no hope for .well-known/ (especially when they're vibing that tedious bit of code).

dev1ycan 18 hours ago||

[flagged]

PathfinderBot 18 hours ago|

"Piracy is great until it hurts me, then piracy is bad."

tokai 18 hours ago||

Big corps are bad, human culture is great. Thats the red thread here.

PathfinderBot 18 hours ago||

AI != big corps, and humans are awful.

lovestory 17 hours ago||

It always amazes me that people forget that companies = group of people! And you would think people who have learned about sets and subsets would get it

phplovesong 15 hours ago||

Now, how much did the AI companies pay for their data? In 99% of all cases nothing, on the contrary they caused huge spikes in bandwith and server costs.

As an industry weed need better AI blocking tools.

Want to play? You pay.

echelon 19 hours ago|

These folks just dumped all of Spotify. They think they did it for humans, but it really just serves the robots.

autoexec 19 hours ago||

Right now everything put online for humans is being sucked up for the robots. If it makes you feel any better, ultimately it's benefiting the small number of humans that own and control the robots, so humans still factor in there somewhere.

johanvts 19 hours ago||

They only derived payment because other humans find value in the robots output. In the end it’s still benefiting humans.

gzread 19 hours ago||

Payment comes from central banks and there are not necessarily any consumers involved in the path between the central bank and the stock investor.

bonoboTP 19 hours ago|||

Because humans like to use those robots.

vintermann 16 hours ago|||

I guess it's up to is to make the robots serve the humans, then.

karel-3d 18 hours ago|||

Actually they didn't release the actual files yet, and now they seemed to scrub even all mentions of the metadata torrents out of their website, because they were threatened by lawyers.

Kenji 19 hours ago|||

[dead]

co_king_5 16 hours ago||

Is it not obvious that Annas Archive is backed by the LLM providers?

It would've been taken down years ago if there wasn't big business backing it up

More comments...