Top
Best
New

Posted by dvrp 1/14/2026

Inside The Internet Archive's Infrastructure(hackernoon.com)
https://github.com/internetarchive/heritrix3
456 points | 119 commentspage 2
initialg 1/16/2026|
Is it still year 2006 and websites haven’t figured out responsive design?
ghm2199 1/15/2026||
Does any one know how the size of this compares to archive.today?
textfiles 1/16/2026|
We absolutely lap them with many, many more petabytes of material. But archive.today is also not doing speculative or multiple scheduled captures of the amount of sites that archive.org is.
vladiim 1/15/2026||
How long will it take for them to send the PetaBox to space?
textfiles 1/16/2026|
That project gets discussed every once in a while.
lysace 1/15/2026||
The IA needs perhaps not just more money, but also more talented people, IMO. I worry that it has stagnated, from a tech pov.
mixologic 1/16/2026||
They can offer a perk that literally no other tech job can offer: Someday have a statue of your likeness preserved in ceramic: https://www.atlasobscura.com/places/internet-archive-headqua...

"Inside the church's main room, with its still-intact pews, there are more than 120 ceramic sculptures of the Internet Archive's current and former employees, created by artist Nuala Creed and inspired by the statues of the Xian warriors in China."

textfiles 1/16/2026|||
We've hired a few dozen people over the past couple of years. We think they're pretty talented.
lysace 1/16/2026||
Is retreival from the wayback machine intentionally made slow?
textfiles 1/16/2026||
Show me the faster wayback machine we are competing against.
brokensegue 1/16/2026|||
i'm a big fan of IA and wayback machine. i donate. but i do wish it were faster. i understand that would cost a lot more though.

i wonder if maybe donors above a certain level could get priority on archiving pages or something.

lysace 1/16/2026|||
Do you really think that is a good argument against the perception of technical stagnation?
pizza 1/16/2026||
That sounds really entitled.
textfiles 1/17/2026||
We've had showdowns with lawyers, governments, hackers and spammers, but I'm not sure how we'll stand up against perception.
bilater 1/16/2026||
I have always wondered how archives manage to capture screenshots of paywalled pages like the New York Times or the Wall Street Journal. Do they have agreements with publishers, do their crawlers have special privileges to bypass detection, or do they use technology so advanced that companies cannot detect them?
schmuckonwheels 1/15/2026||
Disappointed with the lack of pictures.
parttimelarry 1/15/2026||
Probably because this looks more like a Deep Research agent "delving" into the infrastructure -- with a giant list of sources at the end. The Archive is not just a library; it is a service provider.
schmuckonwheels 1/15/2026||
I wasn't expecting to read a podcast when clicking.
textfiles 1/16/2026||
What do you want some pictures of?
schmuckonwheels 1/16/2026||
An article about "infrastructure" that opens up with a dramatic description of a datacenter stuffed into an old church, I would expect more than just generic clipart you'd see in the back half of Wired magazine.
textfiles 1/16/2026||
Here's some photos I took a long time ago.

https://www.flickr.com/photos/textfiles/albums/7215763372220...

darkwater 1/16/2026|||
That's super cool! Can the IA building be accessed by some random people like myself? Next time I'm in SF (who knows when that will be though) I'd very much like visiting it!
textfiles 1/17/2026||
Fridays at about 1pm, we give tours.
schmuckonwheels 1/16/2026||||
That's great. Ask and ye shall receive.

What's most surprising is churches notoriously have really sketchy electrical. There had to be some renovation in that regard, right?

textfiles 1/17/2026|||
There was a lot of renovation. One day they fired up the pipe organ (which still works) inside the building as well as the servers and the transformer for the street blew up. That was a legendary day.
fc417fc802 1/17/2026||||
No regular residential building is set up to host a datacenter off the bat. Even racking more than half a dozen boxes in a given room requires an upgrade.

Most rooms in North America won't be wired for anything over 2.5 kW by default (kitchens and laundry rooms being obvious exceptions).

An electric dryer might pull 5 kW. An electric range ballpark 10 kW. Versus 15 kW per full rack for a fairly tame setup.

And then you've got the problem of dissipating all that heat.

cindyllm 1/16/2026|||
[dead]
Tempest1981 1/16/2026|||
Thanks! The church attendees (employees?) have a Severence Kier vibe... although I'm guessing the TV show came much later.
jarboot 1/16/2026||
Hate to be the guy in the comments complaining about the css, but the sides of the text of this article are cut off. It looks like I'm zoomed in, and there's no way I can see the first few columns of the text without going to Reader view. I'm on a modern iPhone using safari, accessibility settings font larger than usual.
nandomrumber 1/16/2026||
Same for me, Safari iOS 18.7.1 no accessibility font size set, no browsers font size set.
shmeeed 1/16/2026|||
FWIW, it's the same for me on FF Android.
textfiles 1/17/2026||
It's an AI-generated article. It's going to be pretty terrible.
segalord 1/16/2026||
this is every data hoarders dream setup haha
brcmthrowaway 1/15/2026|
[flagged]
More comments...