Top
Best
New

Posted by sam_bristow 20 hours ago

Nobody ever gets credit for fixing problems that never happened (2001) [pdf](web.mit.edu)
697 points | 231 commentspage 3
ChicknNuggt 18 hours ago|
This is exactly the problem with the nature prevention. When it's well done, it seems like nothing was done.
HerbManic 17 hours ago|
When Covid started, our local government was very clear from the start in saying "If people think we have over reacted, then that means we have done a good job."

Alas, that doesn't always fly with the populace.

teiferer 15 hours ago||
Trouble is that it is not always true either. You can legitimately overreact and in hindsight it can be hard to distinguish between these two things.

Plus, even if you did overreact, that can still be the better side to have erred on, in moderation.

rmunn 19 hours ago||
Article published in the Summer 2001 edition of California Management Review, yet it never mentioned Y2K, the first thing I thought of when I read the line "fixing problems that never happened". Perhaps it was actually written in 1999 and took a while to get published, because otherwise that seems a very strange omission. The Y2K problem was very much over-hyped by the American news media at the time (no, at no point would airplanes have been falling out of the sky — I literally heard someone say that would happen once — even if no effort had been put into fixing the bug).

But in recent years I have seen people (elsewhere, not on HN) claim that Y2K was a big nothingburger, and all the money spent on fixing the bug was wasted. No, that's not true either. All the money spent on fixing the bug was why it turned into a big nothingburger. Sure, some of that money was wasted, by executives who wanted an "official" Y2K-certified certificate, issued by a consulting firm that had nothing "official" about it except their own say-so. And so they spent $2 million learning what their own employees could have told them for $2,000. THAT money was wasted. But a lot of banks were running old COBOL code that used 2-digit years, and needed to be fixed. The fact that in January 2000, everyone's bank interest was still calculated correctly, and not calculated as if it was January 1900? THAT was entirely due to the vast amounts of money spent paying old COBOL coders to come out of retirement and fix the 2-digit years.

The lesson I learned from that is that it's possible for a problem to be overhyped, even massively overhyped, and yet still be a serious problem. The other lesson I should have learned is that people rarely get credit (I won't go so far as the article authors and say "nobody ever gets credit") for fixing problems that never happened.

armada651 19 hours ago||
The problem is that a lot of people have a very binary view on life. Either something is a complete success or a complete waste of money, rarely do we accept that most projects fall somewhere in the middle.
yvdriess 13 hours ago|||
The binary view is mostly true, unless it's for events or problems they are themselves familiar with. There is a term for this, but can't for the life of me remember it: People think the problems they are dealing with are infinitely more nuanced, complex and unique than the problems other people are dealing with.
cortesoft 18 hours ago|||
And even worse, they don't think probability is a thing. If something happens, it was certain to happen and we just failed to predict it correctly.

So when someone predicts something will happen with a 90% probability, and then the 10% chances happens and the predicted event does not happen, people will talk about what a bad prediction that was and how they were clearly wrong.

It's the same logic that causes people to say vaccines don't work because they don't stop a disease with 100% effectiveness, or that there is no point to wear a seatbelt because people still die while wearing one.

takinola 19 hours ago|||
My issue with this version of explaining the lack of severity of Y2K is that there were lots of countries that were being derided for not taking the issue seriously but did not seem to suffer any ill effects.
akoboldfrying 19 hours ago||
This is interesting, do you have any links?

A couple of possible confounding factors I can think of:

1. Plenty of countries use software developed elsewhere.

2. I suspect that the more recently you computerised your economy, the less likely it would be to have code vulnerable to Y2K.

rmunn 16 hours ago||
It's also possible that in some places there were a few issues, but people looked at bills for 100 years of electrical service and said "Yeah right," and fixed the now-easier-to-find code that still used 2-digit dates. If that only happened a few times, the extra work involved in working out the January bill by hand (or waiting until February then billing for 2 months) wouldn't cause too many issues in the economy, and anyone looking in from outside wouldn't even realize there had been an issue. If it happened everywhere the economic impact would be more noticeable from outside.
kristiandupont 15 hours ago|||
>no, at no point would airplanes have been falling out of the sky

The assertion may have been unfounded, but I think it's just as unreasonable to assert the opposite. Bugs have cascading effects and in a sufficiently complex piece of software they can create chaos with unpredictable outcomes.

rmunn 15 hours ago||
The one case I'm aware of where a software glitch did cause a plane crash, there was pilot error compounding the problem. Air France flight 447 was an Airbus A330 flying from France to Brazil, and while high over the Atlantic, the software recorded inconsistent data in its airspeed measurements. (The official crash analysis team concluded that the inconsistent data was likely due to ice crystals blocking the pitot tubes on the plane). The inconsistent data made the autopilot disengage. Pilot error then caused a stall. One pilot then tried the correct move to recover from a stall, pushing forward on the stick to nose down and regain speed. The other pilot was pulling up on the stick to stop the dive, not realizing that that's exactly the wrong thing to do in a stall (or more likely forgetting his training due to panic; he had a lot less experience). The flight software, receiving inconsistent inputs from both controls, averaged the inputs, resulting in zero change in pitch. (It also sounded the "Dual Input" alarm, but the pilots were too preoccupied with their own controls to figure out what that meant at first, and by the time they figured out what was going on it was too late to recover before the plane hit the water).

https://news.ycombinator.com/item?id=4224707 has some discussion of the events, including the fact that the control design (where each pilot has an independent stick) was part of the problem. On a design like Boeing uses where both sets of controls move together, the experienced pilot would have noticed the less-experienced pilot pulling up on the stick because his own stick would be moving, and he would have said "No, nose down." And if they had nosed down to recover speed while still high enough in the air, they almost certainly could have regained control of the plane and saved 228 lives (including their own).

So in retrospect, I think my first sentence was wrong. The software did not glitch, it did exactly what it was supposed to do. It was pilot error that caused the initial stall, and multiple pilot errors that caused the failure to recover from the stall.

There may be examples of software error that has caused planes to fall out of the sky, but I don't know of any. The only plane crashes whose cause I know were due to hardware failure or pilot error, usually a combination of the two.

teiferer 14 hours ago||
I think your conclusion is upside down. Air safety is based on the "Swiss cheese" model. Multiple layers of safety nets are in place to compensate for issues in one layer. In particular, technical safeguards are there to prevent disasters if the human in the loop makes a mistake which will eventually happen. Any weakening of any technical safeguard makes the system less safe. No matter if the human ultimately made a mistake -- the technical system failing contributed to the accident just as much.
tjwebbnorfolk 19 hours ago|||
Y2K is especially interesting because the fact that the year 2000 would one day occur was entirely foreseeable, and no less probable in 1990 than in 1999. I can hardly think of anything with closer to 100% probability of happening.
alduino 19 hours ago|||
To be fair, there was a non-zero chance that society could have ended (or your company, or the tech became obsolete) before 2000, which would be higher the earlier before 2000 you were.
rmunn 18 hours ago||
The tech being obsolete is why Y2K was a smaller problem than it would have been otherwise. Most places were no longer running much COBOL code. But banks are famously slow to upgrade their tech, and for good reason much of the time, so most of the world's remaining COBOL code (and other code too, COBOL is just what I'm most familiar with, not that I'm all that familiar with it) was in banks and other financial institutions.
Dwedit 18 hours ago|||
Year 2038 says hi.
marcus_holmes 16 hours ago||
my first thought too. I've met a few people who assert that Y2K was a complete waste of money.

I earned my first house deposit helping the team fixing the water and gas company in Wales, UK. Their entire system was running off a set of COBOL programs on a mainframe, none of which had been properly documented over the years, and the whole thing used 2-digit dates. It would have caused actual deaths if not fixed; everything would have shut down, and no water and no heating in a British winter is potentially lethal. And then it would have sent everyone in Wales a bill for 100 years of water and gas.

They were bribing retired software devs to come out of retirement with huge stacks of money, because that was cheaper than training new COBOL devs and getting them familiar with the spaghetti system.

It worked, no-one died, life went on. So obviously it was all fake rolls eyes

rmunn 16 hours ago||
I'm curious why things would have shut down when the system thought it was 1900. What part of the logic had the effect of "shut the system down if current date is less than (X date)?" (If you can remember the code 25+ years later, that is).
sandeepkd 18 hours ago||
Its really hard to measure effectiveness, problem becomes even harder when a non-engineering person has the job to measure effectiveness of a engineering person.

On other hand. for software engineering some of the signals that can be used to measure such a management itself can be

1. On call requirement, outages and team burnout - A well written software should not require on-calls from the dev team

2. Ask them about the "concrete" roadmap for next 6 months to a year - Absence of concrete items is a bad sign

t43562 8 hours ago||
I recognise almost every aspect of this document - it's exactly what's so intractable about the software business. This is why I think you do need to do some programming every now and again no matter what your level is because otherwise you cannot see what's happening and you'll be tempted into the "lazy developer" attribution.
Root_Access 3 hours ago||
This is true but if you remove the need for credit then you can just get back to work and not have to create a category about it.
afisxisto 13 hours ago||
I remember finding a comment in the first codebase I ever worked on professionally in my first ever job.

It read "This fixes a bug that hasn't happened yet".

It seemed really smart at first, but later I learned that the developer that added that code also had a pattern of appending spaces to the start and end of user input and comparing the length to 2 to determine whether the value was empty or not...

So I'm fairly sure "that hasn't happened yet" was probably more a case of "that I personally haven't introduced unnecessarily yet" :)

matja 7 hours ago||
Apt comic: https://www.workchronicles.com/p/comic-prevention-vs-cure
arkensaw 11 hours ago||
I feel obliged to point out Stanslav Petrov, who absolutely got credit for fixing a problem that never happened. Granted it's a very extreme case.
ekjhgkejhgk 10 hours ago|
Credit only in fame.

https://en.wikipedia.org/wiki/Stanislav_Petrov#Aftermath

> Petrov underwent intense questioning by his superiors about his judgment. Initially, he was praised for his decision.[2] Colonel-general Yuri Votintsev, the then-commander of the Soviet Air Defense's Missile Defense Units, who was the first to hear Petrov's report of the incident (and the first to reveal it to the public in the 1990s), states that Petrov's "correct actions" were "duly noted".[2] Petrov himself states he was initially praised by Votintsev and promised a reward,[2][22] but recalls that he was also reprimanded for improper filing of paperwork because he had not described the incident in the war diary.[22][23]

> Petrov has said that he was neither rewarded nor punished for his actions.[24] According to Petrov, he received no reward because the incident and other bugs found in the missile detection system embarrassed his superiors and the scientists who were responsible for it, so that if he had been officially rewarded, they would have had to be punished.[2][24][22][23] He was reassigned to a less sensitive post,[23] took early retirement (although he emphasized that he was not "forced out" of the army),[22] and suffered a nervous breakdown.[23]

pedroza_alex 5 hours ago|||
The same article points out that he received at least £26k in awards. It could be argued that the reward isn't proportional to the magnitude of his actions, but it exists.
thx67 4 hours ago|||
His page links to https://en.wikipedia.org/wiki/Nuclear_close_calls which is a harrowing thing to read. We keep rolling the dice with no changes to the game.
random3 19 hours ago||
Like nobody gets credit for avoiding problems or unnecessary things/complexity altogether. In fact the opposite may happen.
coldtea 13 hours ago|
Pendatically speaking, people do get credit for fixing problems that never happened.

E.g. if the problems are quantifiable and there's a record, like dropping homicides from 100 per year to 20 per year in a city. Those extra homicides "didn't happen", but the improvement is understood.

For an one-off problem, it depends on how clear the path to the problem is. An electrician doing an inspection and noticing and fixing big electrical issues in the installation, would be appreciated, even if the accidents didn't happen.

dormento 7 hours ago||
> Those extra homicides "didn't happen", but the improvement is understood.

People are gonna criticize by saying "see? it was an overreaction to the problem, since there's not been many homicides at all!", when in fact the homicides were prevented by fixing the original problem. Same way with the electrician: "how much are you gonna charge again? And you're charging for a fix to a problem that didn't happen yet? Nah, I'll call you when the problem happens".

Its maddening.

Steve16384 5 hours ago||
> An electrician doing an inspection and noticing and fixing big electrical issues in the installation, would be appreciated, even if the accidents didn't happen.

Not if nobody knew he'd fixed it.

More comments...