Top
Best
New

Posted by danso 1 day ago

Why AO3 Was Down(www.reddit.com)
152 points | 80 commentspage 2
madaxe_again 1 day ago|
This is like seeing a brick wall 40 miles down a straight road and yet still managing to drive into it, and then blaming the wall.
darkwater 1 day ago||
I guess that whoever maintains that infra simply hadn't thought of it or was not aware. It's not something you get for free in a monitoring system with some agent like disk usage for example. You need to know and remember you have a hard limit on IDs and be aware at which ID you are.
hinkley 1 day ago||
Meanwhile if I keep reminding people where the wall is and how fast we are approaching it I’m considered “negative”. That”s the real reason this stuff happens. If someone noticed, the got tired of harping on it and without the constant barrage everyone else immediately let it go out of sight, out of mind.
darkwater 21 hours ago||
In a company, totally. But here it is a volunteer effort, I doubt it had happened.
ohdeargodno 1 day ago|||
Ao3 doesn't have a dude getting slack alerts by a dozen monitoring agents. It's one of the last holdouts of the old, more personal internet. Hell, it's even certain that they forgot or even didn't know that the type was an unsigned int.

And that's perfect. Blame the wall too, because it was running just fine. It's a site to write (mostly porn), with better uptime and more daily users than most of the companies posted on HN daily.

camel-cdr 1 day ago||
I wasn't sure what the percentage of porn is, so I counted the number of works for each maturity rating:

    4,247,583: Teen And Up Audiences
    4,173,082: General Audiences
    2,816,083: Explicit
    2,271,446: Mature
    1,676,061: Not Rated
alt187 1 day ago||
> This is like seeing a brick wall 40 miles down a straight road and yet still managing to drive into it, and then blaming the wall.

Not really, no. For example, if you drive into the wall, you may die.

Another experience that feels like death is working in a company that implements on-call rotations.

It would be too easy to draw out a parallel between how you approach a free fanfiction website (the website should mystically owe you five 9's uptime) and the mentality that metastased in the industry.

Instead, I'm gonna take this opportunity to point out that the AO3 downtime affected you, as a non-user, enough to vitrify the admin, where hardcore users laughed it off (because they're not entitled toddlers).

randallsquared 1 day ago|||
> enough to vitrify the admin

Not sure it was that solid.

madaxe_again 1 day ago|||
I don’t think I turned the admin into glass, nor vilified them - just pointed out that this sort of thing is readily avoided.

But sure, I committed a hate crime.

charcircuit 1 day ago||
>to fix it they have to migrate the entire database to use a different type for bookmark IDs... except of course this will take a while because there are two Billion Of Them Lol

You can shard them between 2 tables. Then migrate them to a single one later.

ohdeargodno 1 day ago|
There's no SLA for Harry Styles porn. Run the migration, lock the table for two days and redo the same in 13 years when you get to 4 billion bookmarks.
camel-cdr 1 day ago|||
> There's no SLA for Harry Styles porn

But what about my good night's sleep? How can I go to bed without reading about my favorite blorbos?

ohdeargodno 1 day ago||
Real ones use bookmarks to find them ag- ah, shit.

Real ones back them up in a single .txt file

rsynnott 1 day ago||||
I mean I’d assume they went for a 64bit integer. In a few million years, people who are into weird porn about whatever the temporally local equivalent of Harry Styles is (probably some sort of robot) will once again be mildly inconvenienced.
kijin 1 day ago|||
In 13 years, the Unix timestamp will probably be a much bigger problem.
notorandit 1 day ago|
> typical database column

Typical for 70s and 80s.

Honestly, designing a 21st century database is a different thing if compared to back then.

You can use 128 bit integers, provided that you really want to use integers. And maybe you put a timestamp along.

rsynnott 1 day ago||
The website appears to date from 2008. This was a _very_ common latent bug at that point, particularly because Rails would basically force you to implement it. I assume this got fixed at some point, but for a long time all ActiveRecord models had an autoincrementing ID, which had to be a signed 32 bit int. There were scary monkey-patching workarounds if you wanted something more sensible.

EDIT: And, yes, it is apparently Rails! https://fanlore.org/wiki/Archive_of_Our_Own#Timeline

throwawaysoxjje 1 day ago|||
Nah I made the same mistake back in 2009 for a system that was storing behavior events during malware analysis.

You don’t often expect to have two billion of something until you do.

9dev 1 day ago||
It's not like those two billion things just materialise in your database, right? Someone must have watched that graph climb, and climb, and climb, approaching the limit.
detaro 1 day ago||
If they have that graph and remember the limit they choose 15 years ago... It's not something you think about constantly running a mostly stable code-wise site.
shakna 1 day ago|||
Salesforce is a rather popular platform.

Its defaults are also either a 18-character ID, or a 32bit integer. So, unless you take the effort to actually fight Apex, you're gonna hit this problem sooner or later.

quickthrowman 1 day ago||
Doesn’t an 18-character alphanumeric ID give you 18^36 combinations? 1.54 x 10^45 seems like enough combinations.
shakna 23 hours ago||
That's the point of the "or". You probably don't know which you're getting. It's what makes that particular design decision bite you more often.
Sharlin 1 day ago|||
One of the first things I internalized about databases was "just always use BIGSERIAL for primary keys". There are very few good reasons not to.
looperhacks 1 day ago|||
Maybe don't: https://wiki.postgresql.org/wiki/Don't_Do_This#Don.27t_use_s...
jarofgreen 1 day ago|||
or use UUID/GUIDS, many databases (eg PostgreSQL) and frameworks (eg Django) support them.
dwedge 1 day ago||
Using uuids can cause lots of problems with indexing, fragmentation, row size and index size
j16sdiz 1 day ago||
let's use 128bit integer and handle them like floats in php!

and maybe put a 32bit timestamp along and pretend it can somehow store more than a 32bit integer can.