Posted by tartieret 10/29/2025
Tell HN: Azure outage
Timeline
15:45 UTC on 29 October 2025 – Customer impact began.
16:04 UTC on 29 October 2025 – Investigation commenced following monitoring alerts being triggered.
16:15 UTC on 29 October 2025 – We began the investigation and started to examine configuration changes within AFD.
16:18 UTC on 29 October 2025 – Initial communication posted to our public status page.
16:20 UTC on 29 October 2025 – Targeted communications to impacted customers sent to Azure Service Health.
17:26 UTC on 29 October 2025 – Azure portal failed away from Azure Front Door.
17:30 UTC on 29 October 2025 – We blocked all new customer configuration changes to prevent further impact.
17:40 UTC on 29 October 2025 – We initiated the deployment of our ‘last known good’ configuration.
18:30 UTC on 29 October 2025 – We started to push the fixed configuration globally.
18:45 UTC on 29 October 2025 – Manual recovery of nodes commenced while gradual routing of traffic to healthy nodes began after the fixed configuration was pushed globally.
23:15 UTC on 29 October 2025 - PowerApps mitigation of dependency, and customers confirm mitigation.
00:05 UTC on 30 October 2025 – AFD impact confirmed mitigated for customers.
Me: "How do I connect [X] to [Y] using [Z]?"
Copilot: "Please select the AKS cluster you'd like to delete"
Can't say I've experienced many bugs in there either. It definitely is overpriced but I assume they all are?
At a large scale, azure is dramatically worse then Aws.
Don't forget extremely insecure. There is a quarterly critical cross-tenant CVE with trivial exploitation for them, and it has been like that for years.
But what we do when things are easy is not who we are. That's a fiction. It's how we show up when we are in the shit that matters. It's discipline that tells you to voluntarily go into all of the multi-tenant mitigations instead of waiting for your boss to notice and move the goalposts you should have moved on your own.
16:04 UTC on 29 October 2025 – Investigation commenced following monitoring alerts being triggered. ` A 19-minute delay in alert is a joke.
It would be nice though if alert systems made it easy to wire up CD to turn down sensitivity during observed actions. Sort of like how the immune system turns down a bit while you're eating.
I think if you really wanted to do on call right to avoid gaps you’d want no more than 6 hours on primary per day per shift, and you want six, not four, shifts per day. So you’re only alone for four hours in the middle of your shift and have plenty of time to hand off.
The reason is probably because changes to the status page require executive approval, because false positives could lead to bad publicity, and potentially having to reimburse customers for failing to meet SLAs.
We should be lucky MSFT is so consistent!
Hug ops to the Azure team, since management is shredding up talent over there.
You don’t want to debug stuff with low sugar.
16:04 Started running around screaming
16:15 Sat down & looked at logsVery circular way of saying “the validator didn’t do its job”. This is AFAICT a pretty fundamental root cause of the issue.
It’s never good enough to have a validator check the content and hope that finds all the issues. Validators are great and can speed a lot of things up. But because they are independent code paths they will always miss something. For critical services you have to assume the validator will be wrong, and be prepared to contain the damage WHEN it is wrong.
Troubleshooting has completed
Troubleshooting was unable to automatically fix all of the issues found. You can find more details below.
>> We initiated the deployment of our ‘last known good’ configuration.
System Restore can help fix problems that might be making your computer run slowly or stop responding.
System Restore does not affect any of your documents, pictures, or other personal data. Recently installed programs and drivers might be uninstalled.
Confirm your restore point
Your computer will be restored to the state it was in before the event in the Description field below.
Looks like there was no monitoring and no alerts.
Which is kinda weird.
I think it's perhaps a gap in the tools. We apply the same alert criteria at 2 am that we do while someone is actively running deployment or admin tasks and there's a subset that should stay the same, like request failure rate, and others that should be tuned down, like overall error rate and median response times.
And it means one thing if the failure rate for one machine is 90% and something else if the cluster failure rate is 5%, but if you've only got 18 boxes it's hard to discern the difference. And which is the higher priority error may change from one project to another.
Azure Portal Access Issues
Starting at approximately 16:00 UTC, we began experiencing Azure Front Door issues resulting in a loss of availability of some services. In addition. customers may experience issues accessing the Azure Portal. Customers can attempt to use programmatic methods (PowerShell, CLI, etc.) to access/utilize resources if they are unable to access the portal directly. We have failed the portal away from Azure Front Door (AFD) to attempt to mitigate the portal access issues and are continuing to assess the situation.
We are actively assessing failover options of internal services from our AFD infrastructure. Our investigation into the contributing factors and additional recovery workstreams continues. More information will be provided within 60 minutes or sooner.
This message was last updated at 16:57 UTC on 29 October 2025
---
Update: 16:35 UTC:
Azure Portal Access Issues
Starting at approximately 16:00 UTC, we began experiencing DNS issues resulting in availability degradation of some services. Customers may experience issues accessing the Azure Portal. We have taken action that is expected to address the portal access issues here shortly. We are actively investigating the underlying issue and additional mitigation actions. More information will be provided within 60 minutes or sooner.
This message was last updated at 16:35 UTC on 29 October 2025
---
Azure Portal Access Issues
We are investigating an issue with the Azure Portal where customers may be experiencing issues accessing the portal. More information will be provided shortly.
This message was last updated at 16:18 UTC on 29 October 2025
---
Message from the Azure Status Page: https://azure.status.microsoft/en-gb/status
Starting at approximately 16:00 UTC, we began experiencing Azure Front Door issues resulting in a loss of availability of some services. We suspect that an inadvertent configuration change as the trigger event for this issue. We are taking two concurrent actions where we are blocking all changes to the AFD services and at the same time rolling back to our last known good state.
We have failed the portal away from Azure Front Door (AFD) to mitigate the portal access issues. Customers should be able to access the Azure management portal directly.
We do not have an ETA for when the rollback will be completed, but we will update this communication within 30 minutes or when we have an update.
This message was last updated at 17:17 UTC on 29 October 2025
"This message was last updated at 18:11 UTC on 29 October 2025"
This message was last updated at 19:57 UTC on 29 October 2025
> In 50%+ the cases they just don‘t report it anywhere, even if its for 2h+.
I assume you mean publicly. Are you getting the service health alerts?
But, for future reference:
site:microsoft.com csam
Child Sex-Abuse Material?!? Well, a nice case of acronym collision.
No -- the one referencing crime should NEVER have be turned into an acronym.
Crimes should not be described in euphemistic terms (which is exactly what the acronym is)
actual Managers hate that
Storytelling is how issues get addressed. Help the CSAM tell the story to the higher ups.
I'm simplifying a bit, but I don't think it's likely that Azure has a similar race condition wiping out DNS records on _one_ system than then propagates to all others. The similarity might just end at "it was DNS".
They didn't provide any details on latency. It could have been delayed an hour or a day and no one noticed
They quickly updated the message to REMOVE the link. Comical at this point.
Edit: Typo!
• https://www.xbox.com/en-US also doesn't fully paint. Header comes up, but not the rest of the page.
• https://www.minecraft.net/en-us is extremely slow, but eventually came up.
The other day during the AWS outage they "reported" OVH down too.
We already had to do it for large files served from Blob Storage since they would cap out at 2MB/s when not in cache of the nearest PoP. If you’ve ever experienced slow Windows Store or Xbox downloads it’s probably the same problem.
I had a support ticket open for months about this and in the end the agent said “this is to be expected and we don’t plan on doing anything about it”.
We’ve moved to Cloudflare and not only is the performance great, but it costs less.
Only thing I need to move off Front Door is a static website for our docs served from Blob Storage, this incident will make us do it sooner rather than later.
Be aware that if you’re using Azure as your registrar, it’s (probably still) impossible to change your NS records to point to CloudFlare’s DNS server, at least it was for me about 6 months ago.
This also makes it impossible to transfer your domain to them either, as CloudFlare’s domain transfer flow requires you set your NS records to point to them before their interface shows a transfer option.
In our case we had to transfer to a different registrar, we used Namecheap.
However, transferring a domain from Azure was also a nightmare. Their UI doesn’t have any kind of transfer option, I eventually found an obscure document (not on their Learn website) which had an az command which would let you get a transfer code which I could give to Namecheap.
Then I had to wait over a week for the transfer timeout to occur because there is no way on Azure side that I could find to accept the transfer immediately.
I found CloudFlare’s way of building rules quite easy to use, different from Front Door but I’m not doing anything more complex than some redirects and reverse proxying.
I will say that Cloudflare’s UI is super fast, with Front Door I always found it painfully slow when trying to do any kind of configuration.
Cloudflare also doesn’t have the problem that Front Door has where it requires a manual process every 6 months or so to renew the APEX certificate.
https://news.ycombinator.com/item?id=32031639
https://news.ycombinator.com/item?id=32032235
Edit: wow, I can't believe we hadn't put https://news.ycombinator.com/item?id=32031243 in https://news.ycombinator.com/highlights. Fixed now.
Long before that, the first raid array anyone set up for my (teams’) usage, arrived from Sun with 2 dead drives out of 10. They RMA’d us 2 more drives and one of those was also DOA. That was a couple years after Sun stopped burning in hardware for cost savings, which maybe wasn’t that much of a savings all things considered.
I was an intern but everyone seemed very stressed.
dang saying it's temporary: https://news.ycombinator.com/item?id=32031136
$ dig news.ycombinator.com
; <<>> DiG 9.10.6 <<>> news.ycombinator.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54819
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;news.ycombinator.com. IN A
;; ANSWER SECTION:
news.ycombinator.com. 1 IN A 209.216.230.207
;; Query time: 79 msec
;; SERVER: 100.100.100.100#53(100.100.100.100)
;; WHEN: Wed Oct 29 13:59:29 EDT 2025
;; MSG SIZE rcvd: 65
And that IP says it's with M5 again.Bunch of on-call peeps over there that definitely know the instant something major goes down
But they won't be.
Time and time again it's shown that AWS is far more expensive than other solutions, just easier for the Execs to offshore the blame.
I've seen it multiple times at various stores; only once did I see them taking cash and writing things down (probably to enter into the system later when it came back up).
they think that they are 'eliminating a single point of failure', but in reality, they end up adding multiple, complicated points of mostly failure.
I always go everywhere adequately prepared for beverages and food. Thanks to your comment, I have a new reason to do so. Take out coffees are actually far from guaranteed. Payment systems could go down, my bank account could be hacked or maybe the coffee shop could be randomly closed. Heck, I might even have an accident crossing the road. Anything could happen. Hence, my humble flask might not have the top beverage in it but at least it works.
We all design systems with redundancy, backups and whatnot, but few of us apply this thinking to our food and drink. Maybe get a kettle for the office and a backup kettle, in case the first one fails?
However netatmo does need to have a server to store data as you need to consolidate acreoss devices plus you can query gfor a year's data and that won't and can't be held locally.
Here in The Netherlands, almost all trains were first delayed significantly, and then cancelled for a few hours because of this, which had real impact because today is also the day we got to vote for the next parlement (I know some who can't get home in time before the polls close, and they left for work before they opened).
If it’s a multi day event, it’s probably that way for a reason. Partially the same as the solution to above.
The description of voting in the Netherlands is that you can see your ballot physically go into a clear box and stay to see that exact box be opened and all ballots tallied.
Dropping a ballot in a box in tour neighborhood helps ensure nothing with regards to the actually ballot count.
> You can stay there and wait for the count at the end of the day if you want to.
And if you watch the election night news, you'll see footage of multiple people counting the votes from the ballot boxes, again with various people observing to check that nothing dodgy is going on.
Having everyone just put their ballots in a postbox seems like a good way remove public trust from the electoral system, because noone's standing around waiting for the postie to collect the mail, or looking at what happens in the mail truck, or the rest of the mail distribution process.
I'm sure I've seen reports in the US of people burning postboxes around election time. Things like this give more excuses to treat election results as illegitimate, which I believe has been an issue over there.
(Yes, we do also have advanced voting in NZ, but I think they're considered "special votes" and are counted separately .. the elections are largely determined on the day by in-person votes, with the special votes being confirmed some days later)
Yeah that happened once in OR then got re-plastered all over the news dozens of times. I'm sure you can find way more incidents of intimidation, fighting, long lines and other issues for in-person voting. But individual incidents does not mean that there is anything wrong with a system that has worked for decades in multiple states.
It is a small but distinct difference between mail/early voting and putting the votes directly into the ballot box.
(AI generated explanation) How the double-envelope system works
Inner “secrecy” envelope
You mark your ballot, fold it, and slip it into an unmarked inner envelope. No name or identifying info is on this envelope, so your choices stay anonymous. Outer declaration envelope
The inner envelope goes inside a larger outer envelope that carries: – A ballot ID/barcode unique to you. – A signature line that must match the one on file with your election office. In many states, a detachable privacy flap or perforated strip hides the signature until election officials open the outer envelope, keeping the ballot secret.
There's so much more you have to trust.
If you wish, you can write a phrase on your ballot. The phrases and their corresponding vote are broadcast (on tv, internet, etc). So if you want to validate that your vote was tallied correctly, write a unique phrase. Or you could pick a random 30 digit number, collisions should be zero-probability, right?
I mean, this would be annoying because people would write slurs and advertisements, and the government would have to broadcast them. But, it seems pretty robust.
I’d suggest the state handle the number issuing, but then they could record who they issues which numbers to, and the winning party could go about rounding up their opposition, etc.
Googling around a bit, it sounds like there are systems that let you verify that your ballot made it, but not necessarily that it was counted correctly. (For this reason, I guess?)
When I vote in person, I know all the officials there from various parties are just like...looking at the box for the whole day to make sure everything is counted. It's much easier to understand and trust.
Sure you got a notification! That doesn't mean anything. Even with human counted ballots or electronic ballots.
Following the chain of custody from vote to verification, in some way, would be nice.
Here in Latvia the "election day" is usually (always?) on weekend, but the polling stations are open for some (and different!) part of every weekday leading up. Something like couple hours on monday morning, couple hours on tuesday evening, couple around midday wednesday, etc. In my opinion, it's a great system. You have to have a pretty convoluted schedule for at least one window not to line up for you.
Here is the form to register for postal voting in the Republic of Ireland - https://www.dublincity.ie/sites/default/files/2024-01/pv4-wo...
Instructions on how to submit the form / register for mail-in votes is on page 4.
Hope that helps anyone else out who needs in Ireland
> You may use this form to apply for a postal vote if, due to the circumstances of your work/service or your full-time study in the State, you cannot go to your polling station on polling day.
Which seems to indicate that's only for people who can't go to the polling station, otherwise you do have to go there.
As someone who spent the first 30 years of my life in Ireland but is now part of that diaspora, it's frustrating but I get it. I don't get to vote, but neither do thousands of plastic paddys who have very little genuine connection to Ireland.
That said, I'm sure they could expand the voting window to a couple of days at least without too much issue.
But I still prefer the paper vote and I usually a blockchain apathetic.
In most countries, in the elections you vote or the member of parliament you want. Presidential elections, and city council elections are held separately, but are also equally simple. But in one election you cast your vote for one person, and that's it.
With this kind of elections, many countries manage to hold the elections on paper ballots, count them all by hand, and publish results by midnight.
But on an American ballot, you vote for, for example:
- US president
- US senator
- US member of congress
- state governor
- state senator
- state member of congress
- several votes for several different state judge positions
- several other state officer positions
- several votes for several local county officers
- local sheriff
- local school board member
- several yes/no votes for several proposed laws, whether they should be passed or not
I don't think it would be possible to calculate all these 20 or 40 votes, if calculated by hand. That's why they use voting machines in America.Here in Indonesia, in a city of 2 million people there are over 7000 voting stations. While we vote for 5 ballots (President, Legislative (National, Province, and City/Regency), we still use paper ballots and count them by hand.
We've been closing a lot of polling places recently:
https://abcnews.go.com/US/protecting-vote-1-5-election-day-p...
Here's the President of the United States on Sunday: https://truthsocial.com/@realDonaldTrump/posts/1154418712892...
"No mail-in or 'Early' Voting, Yes to Voter ID! Watch how totally dishonest the California Prop Vote is! Millions of Ballots being 'shipped.' GET SMART REPUBLICANS, BEFORE IT IS TOO LATE!!!"
Mail in voting is just better all around for a geographically diverse place as the US and I wish would be adopted by all states.
So excited to see how the right-wing pedants here disagree with this.
If so, I see a lot to dislike. As the point I was making is you can’t anticipate what might come up. Just because it’s worked thus far doesn’t mean it’s designed for resilience. There’s a lot of ways you could miss out in that type of situation. I seems silly to make sure everything else is redundant and fault tolerant in the name of democracy when the democratic process itself isn’t doing the same.
That’s just ridiculous in my opinion. Makes me wonder how many well intentioned would be voters end up missing out each election cause shit happens and voting is pretty optional
What is the that group's deviation from the general voting population's preferences?
What are the margins of the votes on those ballot questions?
I really do feel the only viable future for clouds is hybrid or agnostic clouds.
There is a ballot tracking system as well, I can see and be notified as my ballot moves through the counting system. It's pretty cool.
I actually just got back from dropping off my local elections ballot 15m ago, quick bike trip maybe a mile or so away and back.
Of course, because it makes it easy for people to vote, the republicans want to do away with it. If you have to stand in line for several hours (which seems to be very normal in most cities) and potentially miss work to do it that's going to all but guarantee that working people and the less motivated will not vote.
So yes in places that only do in person voting, national or state holiday.
I do need a human to provision a few servers and configure e.g. load balancing and when to spin up additional servers under load. But that is far less of a PITA than having my systems tied to a specific provider or down whenever a cloud precipitates.
The moment you choose to use S3 instead of hosting your own object store, though, you either use AWS because S3 and IAM already have you or spend more time on the care and feeding of your storage system as opposed to actually doing the thing you customers are paying you to do.
It's not impossible, just complicated and difficult for any moderately complex architecture.
One thing very important, is that I can authorise specific web clients (users) to access specific resources from S3. Such as a document that he can download, but others with the link should not be able to download.
Thank you!
Another way you can do it is generating pre-signed URLs in your backend on each request to download something... but the URL that is generated when you do that is only valid for some small time period, so not a stable URL at all.
In my use case, I needed stable URLs, so I went the proxy route.
Thank you.
https://nltimes.nl/2025/10/29/ns-hit-microsoft-cloud-outage-...
It should be noted that the article isn't complete: while the travel planner and ticket machines were the first to fail, trains were cancelled soon after; it took a few hours before everything restarted.
Based on what the conductors said, I would speculate that the train drivers digital schedule was not operative, so they didn't know where to go next.
This list doesn't have anything that looks relevant: https://www.rijdendetreinen.nl/en/disruptions/archive?date_b...
The day does not appear as an outlier in the monthly statistics: https://www.rijdendetreinen.nl/en/statistics/2025/10
I don't find a detailed statistic on the overall delays, but the per-station statistics for Amsterdam Centraal say 5% of trains were cancelled and 17% were delayed by 5 minutes or more (mostly by 10 minutes): https://www.rijdendetreinen.nl/en/train-archive/2025-10-29/a...
Horses were famously tamed in 2007 after AWS released S3 to the public, this is the best of times.
Old trains had paper tickets, the locomotive was its own power source, the conductor had a flashlight, and the conductor could sell tickets for cash.
And if everything else failed, the conductor would just let you ride for free.
Now everything's so interconnected that any one part failing brings everything to a halt.
Personally I am thinking more and more about hetzner, yes I know its not an apples to orange comparison. But its honestly so good
Someone had created a video where they showed the underlying hardware etc., I am wondering if there is something like https://vpspricetracker.com/ but with geek-benchmarks as well.
This video was affiliated with scalahosting but still I don't think that there was too much bias of them and they showed at around 3:37 a graph comparison with prices https://www.youtube.com/watch?v=9dvuBH2Pc1g
Now it shows how contabo has better hardware but I am pretty sure that there might be some other issues, and honestly I feel a sense of trust with hetzner I am not sure about others.
Either hetzner or self hosting stuff personally or just having a very cheap vps and going to hetzner if need be but hetzner already is pretty cheap or I might use some free service that I know of are good as well.
https://blog.cloudflare.com/rearchitecting-workers-kv-for-re...
Personally I just trust cloudflare more than google, given how their focus is on security whereas google feels googly...
I have heard some good things about google cloud run and the google's interface feels the best out of AWS,Azure,GCloud but I still would just prefer cloudflare/hetzner iirc
Another question: Has there ever been a list of all major cloud outages, like I am interested how many times google cloud and all cloud providers went majorly down I guess y'know? is there a website/git project that tracks this?
I have never had much confidence in Azure as a cloud provider. The vertical integration of all the things for a Microsoft shop was initially very compelling. I was ready to fight that battle. But, this fantasy was quickly ruined by poor execution on Microsoft's part. They were able to convince me to move back to AWS by simply making it difficult to provision compute resources. Their quota system & availability issues are a nightmare to deal with compared to EC2.
At this point I'd rather use GCP over Azure and I have zero seconds of experience with it. The number of things Microsoft gets right in 2025 can be counted single-handedly. The things they do get right are quite good, but everything else tends to be extremely awful.
I remember I at one point had expanded enough menus that it covered the entirety of the screen.
Never before have I felt so lost in a cloud product.
Yeah, that had some fun ideas but was way more confusing than it needed to be. But also that was quite a few years back now. The Portal ditched that experience relatively quickly. Just long enough to leave a lot of awful first impressions, but not long enough for it to be much more than a distant memory at this point, several redesigns later.
[0] The name "Blades" for that came from the early years of the Xbox 360, maybe not the best UX to emulate for a complex control panel/portal.
Like, AWS, and GCP to a lesser extent, has a principled approach where simple click-ops goals are simple. You can access the richer metadata/IAM object model at any time, but the wizards you see are dumb enough to make easy things easy.
With Azure, those blades allow tremendously complex “you need to build an X Container and a Container Bucket to be able to add an X” flows to coexist on the same page. While this exposes the true complexity, and looks cool/works well for power users, it is exceedingly unintuitive. Inline documentation doesn’t solve this problem.
I sometimes wonder if this is by design: like QuickBooks, there’s an entire economy of consultants who need to be Certified and thus will promote your product for their own benefit! Making the interface friendly to them and daunting to mere mortals is a feature, not a bug.
But in Azure’s case it’s hard to tell how much this is intentional.
I don't want to pay for or lock myself into, "Azure Insights".
I just want to see the logging, that I know if I can remember the right buttons to click, are available.
The worst place to try is "Monitoring > Logs", this is where you get faced up front with a query designer. I've never worked out how to do a simple "list by time" on that query designer, but it doesn't matter, because if you suffer through that UX, you find out that's not actually where the logs are anyway.
You have to go down a different path. Don't be distracted by "Log Stream", that's not it either, it sounds useful but it's not. By default it doesn't log anything. If you do configure it to log, then it still doesn't actually log everything.
What you have to actually do, and I've had to open the portal to check this, is click "Diagnose and Solve Problems" and then look for "Diagnostic tools" and then a small link to "Application Event Logs".
Finally you get to your logs, although it's still a bad way to try to view logs, it's at least marginally better than the real windows event viewer, an application that feels like it hasn't been updated since NT4. ( Although some might suggest that's a good thing. )
By bringing those eyeballs onto your cloud console, you're creating infinitely more opportunities for branded interaction and discovery of your other cloud products - you could even quantify these eyeballs as you would ad inventory! There should have been an arms race for each cloud provider to have the best log-tailing and log-searching and log-aggregation system imaginable. OTel could have been killed before it began, because Honeycomb and its other originators would have been acquired years ago and made specific locked-in value-adds for each cloud.
But nobody had this foresight, and thus comments like yours are absolutely correct. OTel is a blessing and I love the tools coming out. But from a cloud provider's perspective, it's a massive missed opportunity that continues to be missed.
I think that's what Application Insights has always been, Azure's free-to-start, suggest-out-of-the-box Honeycomb. App Insights had a long slow road away from Microsoft-specific log and metrics ingesters that weren't OTel, but it is hard to argue that standard ingestors are a bad idea. App Insights still downplays that it can be "just a Honeycomb" using only OTel sources and still encourages "secret sauce" ingestors in addition to OTel ones. App Insights is a small moat (around a data lake; to mix metaphors). That said, it's also a standards-supporting tool now as well.
It's not been as clear of an arms race because AWS and GCP didn't invest in it in a similar way and it mostly impacted what are often called "dark matter" teams (Microsoft shops doing "boring" stuff that rarely makes HN headlines), but I have worked in teams that absolutely favored Azure over AWS/GCP with one of the reasons being Application Insights was an easy install and powerful first-party supported tool rather than an extra third party vendor relationship like Grafana/Honeycomb/Dynatrace/etc.
Here's a somewhat ancient Stack Overflow screenshot I found: https://i.sstatic.net/yCseI.png
(I think that's from near the transition because it has full "windowing" controls of minimize/maximize/close buttons. I recall a period with only close buttons.)
All that blue space you could keep filling with more "blades" as you clicked on things until the entire page started scrolling horizontally to switch between "blades". Almost everything you could click opened in a new blade rather than in place in the existing blade. (Like having "Open in New Window" as your browser default.)
It was trying to merge the needs of a configurable Dashboard and a "multi-window experience". You could save collections of blades (a bit like Niri workspaces) as named Dashboards. Overall it was somewhere between overkill and underthought.
(Also someone reminded me that many "blades" still somewhat exist in the modern Portal, because, of course, Microsoft backwards compatibility. Some of the pages are just "maximized Blades" and you can accidentally unmaximize them and start horizontally scrolling into new blades.)
depending on the resource you're accessing, you can get 5+ sections each with their own ui/ux on the same page/tab and it can be confusing to understand where you're at in your resources
if you're having trouble visualizing it, imagine an url where each new level is a different application with its own ui/ux and purpose all on the same webpage
I never understood why a clear and consistent UI and improved UX isn't more of a priority for the big three cloud providers. Even though you talk mostly via platform SDK's, I would consider better UI especially initially, a good way to bind new customers and pick your platform over others.
I guess with their bottom line they don't need it (or cynically, you don't want to learn and invest in another cloud if you did it once).
For some reason this applies to all AWS, GCP and Azure. Seems like the result of dozens of acquisitions.
Any time something is that unintuitive to get started, I automatically assume that if I encounter a problem that I’ll be unable to solve it. That thought alone leads me to bounce every time.
AWS Is a complete mess. Everything is obscured behind other products, and they're all named in the most confusing way possible.
MSFT : Hold my beer...
TBH, GCP is very good! More people should use it.
https://cloud.google.com/resource-manager/docs/project-suspe...
I'd hope you can create a Google Cloud account under a completely different email address, but I do as little business with Google as I can get away with, so I have no idea.
>TBH, GCP is very good! More people should use it.
These takes couldn't be further apart. Gotta love HN comments.
I feel like compliance is the entire point of using these cloud providers. You get a huge head start. Maintaining something like PCI-DSS when you own the real estate is a much bigger headache than if it's hosted in a provider who is already compliant up through the physical/hardware/networking layers. Getting application-layer checkboxes ticked off is trivial compared to "oops we forgot to hire an armed security team". I just took a look and there are currently 316 certifications and attestations listed under my account.
Microsoft really wants you to use their PaaS offerings, and so things on Azure are priced accordingly. A Microsoft shop just wanting to lift-and-shift, Azure isn't the best choice unless the org has that "nobody ever got fired for buying Microsoft" attitude.
They think they have the market captured, but I think what their dwindling quality and ethics are really going to drive is adoption of self hosting, distributed computing frameworks. Nerds are the ones who drove adoption of these platforms, and we can eventually end if we put in the work.
Seriously with container technology, and a bit more work / adoption on distributed compute systems and file storage (IPFS,FileCoin) there is a future where we dont have to use big brothers compute platform. Fuck these guys.
I really hope this pushes the internet back to how it used to be, self hosted, privacy, anonymity. I truly hope that's where we're headed, but the masses seem to just want to stay comfortable as long as their show is on TV
if all companies focused on fixing each and every social issue that exists in the world, how would they make any money?
From 2000-2016 most tech marketing\branding was aimed at some kind of social benefit.
I would link to that article, but that one does seem down ;)
> They're stating they're working with the Azure teams, so I suspect this is related.
Chick-fil-a has this.
One of the tech people there was on HN a few years ago describing their system. Credit card approval slows down the line, so the cards are automatically "approved" at the terminal, and the transaction is added to a queue.
The loss from fraudulent transactions turns out to be less than the loss from customers choosing another restaurant because of the speed of the lines.
Credit card information would be recorded by the POS, synced to a mini-server in the back office (using store-and-forward to handle network issues) and then in a batch process overnight, sent to HQ where the payment was processed.
It wasn't until chip-and-PIN was rolled out that they started supporting "online" (i.e. processed then and there) card transactions, and even then the old method still worked if there was a network issues or power failure (all POSes has their own UPS).
The only real risk at the time was that someone tried to pay with a cancelled credit card - the bank would always honour the payment otherwise. But that was pretty uncommon back then, as you'd have to phone your bank to do it, not just press a button in an app.
I go there daily because it's a nice 30min round trip walk and I wfh. I go up there to get a diet coke or something else just to get out of the house. It amazes me when i see a handwritten sign on the door "closed, system is down". I've gotten to know the cashiers so I asked and it's because the internet connection goes down all the time. That store has to one of the most poorly run things i've ever seen yet it stays in business somehow.
Your responses imply that you think people are questioning whether you would lose money on the deal while we are instead saying you’ll get laughed out of the store, or possibly asked never to come back.
It seems like an easy problem to fix and a retail store being closed for a whole weekday because of inet access sounds crazy to me.
1: I doubt they're "with it" enough to put together a backup arrangement for internet.
2: Their internet problems are probably due to a cheapo router, loose wire, ect.
3: The employees probably like the break.
Good luck if you make this work for you, it would be exciting to hear about if you're able to get them to work with you.
EDIT: their last quarterly was 36%. they lost $3.7bn in 24Q4 -- the christmas quarter. sold to PE in Q1.
Why doesn't someone in the store at least have one of those manual kachunk-kachunk carbon copy card readers in the back that they can resuscitate for a few days until the technology is turned back on? Did they throw them all away?
And that was the day Visa had a full on outage. We would walk into one shop, try to buy stuff, get declined, then go into the next and get accepted because they were running in offline mode.
Got a nice big bill from my cellphone carrier for making the call to visa to ask them wtf as well.
How aptly descriptive.
The stores are in the hood or middle of nowhere. The customers don’t have many options.
Last week I couldn't pay for flowers for grandma's grave because smartphone-sized card terminal refused to work - it stuck on charging-booting loop so I had to get cash. Tho my partner thinks she actually wanted to get cash without a receipt for herself excluding taxes
Whereas the smaller, owner-run stores have more leeway; the local tiny grocery "sold" all freezer/refrigerator food for cheap/free during a power failure. The big Walmart closed and threw everything away the next day.
God help me if I hand someone $25 for a $14.75 total. I’m getting small bills back.
I wonder what they teach in Germany.
Its not the we are not capable. Its, is the business willing to assume the risk?
There's a fairly large supermarket near me that has both kinds of outages.
Occasionally it can't take cards because the (fiber? cable?) internet is down, so it's cash only.
Occasionally it can't take cash because the safe has its own cellular connection, and the cell tower is down.
I was at Frank's Pizza in downtown Houston a few weeks ago and they were giving slices of pizza away because the POS terminal died, and nobody knew enough math to take cash. I tried to give them a $10 and told them to keep the change, but "keep the change" is an unknown phrase these days. They simply couldn't wrap their brains around it. But hey, free pizza!
And microsoft.com too - that's gotta hurt
- on a US tenant I am unable to access login.microsoftonline.com and the login flow stalls on any SSO authentication attempt.
- on a European tenant, probably germany-west, I am able to login and access the Azure portal.
I feel pretty justified in my previous decisions to move away from Azure. Using it feels like building on quicksand…
At this point I dont believe that any one of them is any better or reliable than the others.
I felt this way about AWS last week
Luckily, we moved off Azure Front Door about a year ago. We’d had three major incidents tied to Front Door and stopped treating it as a reliable CDN.
They weren’t global outages, more like issues triggered by new deployments. In one case, our homepage suddenly showed a huge Microsoft banner about a “post-quantum encryption algorithm” or something along those lines.
Kinda wild that a company that big can be so shaky on a CDN, which should be rock solid.