Posted by tartieret 10/29/2025
Tell HN: Azure outage
How did we get here? Is it because of scale? Going to market in minutes by using someone else's computers instead of building out your own, like co-location or dedicated servers, like back in the day.
Now, they go down a lot less frequently, but when they do, it's more widespread.
I work on a product hosted on Azure. That's not the case. Except for front door, everything else is running fine. (Front door is a reverse proxy for static web sites.)
The product itself (an iot stormwater management system) is running, but our customers just can't access the website. If they need to do something, they can go out to the sites or call us and we can "rub two sticks together" and bypass the website. (We could also bypass front door if someone twisted our arms.)
Most customers only look at the website a few times a year.
---
That being said, our biggest point of failure is a completely different iot vendor who you probably won't hear about on Hacker News when they, or their data networks, have downtime.
Now I will admit I am more of a point-and-click person; but if I had to I could have figured out how to use the command line.
> Big Tech lobbying is riding the EU’s deregulation wave by spending more, hiring more, and pushing more, according to a new report by NGO’s Corporate Europe Observatory and LobbyControl on Wednesday (29 October).
> Based on data from the EU’s transparency register, the NGOs found that tech companies spend the most on lobbying of any sector, spending €151m a year on lobbying — a 33 percent increase from €113m in 2023.
Gee whizz, I really do wonder how they end up having all the power!
Decentralisation is winning it seems.
I think the response lies in the surrounding ecosystem.
If you have a company it's easier to scale your team if you use AWS (or any other established ecosystem). It's way easier to hire 10 engineers that are competent with AWS tools than it is to hire 10 engineers that are competent with the IBM tools.
And from the individuals perspective it also make sense to bet on larger platforms. If you want to increase your odds of getting a new job, learning the AWS tools gives you a better ROI than learning the IBM tools.
In our highly interconnected world, decentralization paradoxically requires a central authority to enforce decentralization by restricting M&A, cartels, etc.
Pick your point on the scale
But the cloud compute market is basically centralized into 2.5 companies at this point. The point of paying companies like Azure here is that they've in theory centralized the knowledge and know-how of running multiple, distributed datacenters, so as to be resilient.
But that we keep seeing outages encompassing more than a failure domain, then it should be fair game for engineers / customers to ask "what am I paying for, again?"
Moreover, this seems to be a classic case of large barriers to entry (the huge capital costs associated with building out a datacenter) barring new entrants into the market, coupled with "nobody ever got fired for buying IBM" level thinking. Are outages like these truly factored into the napkin math that says externalizing this is worth it?
Stonks
And it's very clear from these updates that they're more focused on the portal than the product, their updates haven't even mentioned fixing it yet, just moving off of it, as if it's some third party service that's down.
Unsubstantiated idea: So the support contract likely says there is a window between each reporting step and the status page is the last one and the one in the legal documents giving them several more hours before the clauses trigger.
If Azure goes down and nobody feels it, does Azure really matter?
If Azure goes down, it's mostly affecting internal stuff at big old enterprises. Jane in accounting might notice, but the customers don't. Contrast with AWS which runs most of the world's SaaS products.
People not being able to do their jobs internally for a day tends not to make headlines like "100 popular internet services down for everyone" does.
You name them. Other good providers you have experience with?
There is no reason for an expensive cloud. Never has been, but decision makers tried to keep their pants dry.
It acts as a GSLB controller inside Kubernetes — doing DNS-level health checks, region awareness, and automatic failover between clusters when one goes down.
It integrates with ExternalDNS and supports multiple DNS providers (Infoblox, Route53, Azure DNS, NS1, etc.), so it can handle failover across both on-prem and cloud clusters.
It’s not a silver bullet for every architecture, but it’s one of the few OSS projects that make multi-region failover actually manageable in practice.
When you find an honest vendor, cherish them. They are rare, and they work hard to earn and keep your confidence.