Posted by jaredwiener 2 hours ago
It is not hard to imagine a future in 50 years time where a huge percentage of this content is lost forever, or at best incredibly hard to find.
I'm sure that plays a role, but still... This obviously is about cost and money making, not security as a whole (ime)
The next natural thing to happen would be privatization or consolidation of the internet itself. Its already happening in the form of grabbing and consolidating IPv4 addresses.
Blocking archiving in a flailing attempt to keep AIs away is extremely shortsighted. Archiving is important for keeping historical context, especially when it comes to news and journalism.
One possible solution that I can think of for the long term good could be to just allow archival, no retrieval of the latest information, at-least for 6 months or a year. This should theoretically allow most goals.
This trend of outright banning the Internet Archive has me extremely worried. I fear a future where news articles are memoryholed, and no one can remember exactly what was reported and how sensational it all seemed.
I've been working on this project [0] for a while. Originally, I started with a tool that would allow people to snapshot webpages in their own browser, and they could selectively share their snapshots. Then by consensus, everyone could understand what exactly had changed, and they could draw their own conclusion about why.
While working on it, I realized that an authoritative answer to "what did it look like on $DATE" can't be produced by a no-name company. It's gotta be a non-commercial entity that's got a track record of integrity. The dream would be to allow MemoryHole customers to submit their snapshots to the Internet Archive (or other non-commercial entity). It's definitely a copyright nightmare - so no clue how this could work.
[0] - https://memoryhole.app
It could work as a decentralized free and open source system that doesn't care about copyright. Like how torrents work now, but it would be good to have it work over Tor or something. Perhaps as a DAO for the management aspect of it. I don't know how exactly. But disregarding copyright by using a centralized company is the wrong idea.
Or you can do the lawful approach and try to work within the framework of that copyright nightmare. But "fuck copyright" is an easier path.
The torrent approach is nice. I could imagine a selfhosted way to store the data (for a group of people)
Linkwarden does this well. You can share a collection for a small group of people.
Is there a way to export/download my saves in a reasonable way?
It looks like this:
├── files
│ └── 632daffb-2f4f-4795-bb4d-3149d24f4264
│ ├── original.html
│ ├── readerview.html
│ └── screenshot.png
├── manifest.json
└── metadata.csv
But I think this will hurt them as time goes on more then help. IIRC, one news org blocked free access and their revenue fell. I think that was in Australia.
But seems they are using AI as the reason. So allowing after a week will not avoid AI access.
But, what happens of an AI Company subscribes to the news site using a person's name (or a fake name) ? They will still get the article and avoid hassles.
One of the tests for Fair Use in the US, as I understand it, would be whether the archived work "competes" with the original.
If people start going to IA instead to read the news, the newspaper might have a claim. But if they're doing it to get around paywalls, or purely for archival/historical/research purposes, that may be allowed.
But the reality is such decisions are subjective and will be up to whatever judge happens to get such a case in front of them if this is challenged.