Tell HN: Azure outage

Posted by tartieret 10/29/2025

Azure is down for us, we can't even access the azure portal. Are other experiencing this? Our services are located in Canada/Central and US-East 2

https://downdetector.ca/status/windows-azure/

https://azure.status.microsoft/en-gb/status

885 points | 806 commentspage 7

elFarto 10/29/2025|

We saw all incoming traffic to our app drop to zero at about 15:45. I wonder how long this one will take to fix.

sech8420 10/29/2025|

Same exact time for us as well.

NDizzle 10/29/2025||

My best guess at the moment is something global like the CDN is having problems affecting things everywhere. I'm able to use a legacy application we have that goes directly to resources in uswest3, but I'm not able to use our more modern application which uses APIM/CDN networks at all.

vs4vijay 10/29/2025||

Service Status: https://status.cloud.microsoft/ and https://azure.status.microsoft/en-us/status

ipsum2 10/29/2025||

Status page (first link) is down for me. Second one works

charv 10/29/2025||

oh the irony, the status link being down too

karateka01 10/29/2025||

status page being affected by the same issue is so lame

millzlane 10/29/2025||

It begs the question from a noob like me... Where should they host the status page? Surely it shouldn't be on the same infra that it's supposed to be monitoring. Am I correct in thinking that?

aftergibson 10/29/2025||

Looks like the status page is overloaded...

andhuman 10/29/2025||

I bet it’s DNS.

andhuman 10/29/2025||

“ Starting at approximately 16:00 UTC, we began experiencing DNS issues resulting in availability degradation of some services. Customers may experience issues accessing the Azure Portal. We have taken action that is expected to address the portal access issues here shortly. We are actively investigating the underlying issue and additional mitigation actions. More information will be provided within 60 minutes or sooner.

This message was last updated at 16:35 UTC on 29 October 2025”

pbhjpbhj 10/30/2025|||

That was my bet too, then I looked at ISC and noticed there were PoCs released for critical BIND9 vulns yesterday ... might be related?

vinyl7 10/29/2025||

Vibe coded internet keeps getting better

avgDev 10/29/2025||

Quick find someone who can actually read documentation and code!

the_af 10/29/2025||

You just paste the outage error codes back to the LLM and pray it's still working and can fix whatever went wrong!

m_fayer 10/29/2025||

When all the people forget to code for themselves, every LLM will code itself out of existence with that one last bug. One, after another.

ApolloFortyNine 10/29/2025||

Two hours after the initial outage, they have finally updated the Front Door status on their status page.

LouisLazaris 10/29/2025||

The VS Code website is down: https://code.visualstudio.com/

And so is Microsoft: http://www.microsoft.com/

codethief 10/29/2025|

https://www.microsoft.com works for me (with the www subdomain).

tonymet 10/29/2025||

Any healthcare IT admins care to chime in? A predominantly MS industry with critical workloads.

SoftTalker 10/29/2025||

We're on Office 365 and so far it's still responding. At least Outlook and Teams is.

jeffdn 10/29/2025|

They don't run on Azure!

RajT88 10/29/2025|||

They definitely do run on Azure. Probably not 100%, but at least some footprint of those services do.

rcarmo 10/29/2025|||

Are you absolutely sure?

jansper39 10/29/2025||

They don't, however authentication for those services relies on Entra ID which seems to be affected.

rcarmo 10/29/2025||

I'd say DNS/Front Door (or some carrier interconnect) is the thing affected, since I can auth just fine in a few places. (I'm at MS, but not looped into anything operational these days, so I'm checking my personal subscription).

rvz 10/29/2025|

Looking forward to the post mortem.

internet_points 10/30/2025|

> What went wrong and why?

> An inadvertent tenant configuration change within Azure Front Door (AFD) triggered a widespread service disruption affecting both Microsoft services and customer applications dependent on AFD for global content delivery. The change introduced an invalid or inconsistent configuration state that caused a significant number of AFD nodes to fail to load properly, leading to increased latencies, timeouts, and connection errors for downstream services.

> As unhealthy nodes dropped out of the global pool, traffic distribution across healthy nodes became imbalanced, amplifying the impact and causing intermittent availability even for regions that were partially healthy. We immediately blocked all further configuration changes to prevent additional propagation of the faulty state and began deploying a ‘last known good’ configuration across the global fleet. Recovery required reloading configurations across a large number of nodes and rebalancing traffic gradually to avoid overload conditions as nodes returned to service. This deliberate, phased recovery was necessary to stabilize the system while restoring scale and ensuring no recurrence of the issue.

> The trigger was traced to a faulty tenant configuration deployment process. Our protection mechanisms, to validate and block any erroneous deployments, failed due to a software defect which allowed the deployment to bypass safety validations. Safeguards have since been reviewed and additional validation and rollback controls have been immediately implemented to prevent similar issues in the future.

So, so far they're saying it's a combination of bad config + their config-validator had a bug. Would love more details.

Aldipower 10/30/2025||

We have some trouble with the AFD in Germany too.

More comments...