Top
Best
New

Posted by ndhandala 10/29/2025

AWS to bare metal two years later: Answering your questions about leaving AWS(oneuptime.com)
727 points | 491 commentspage 3
sema4hacker 10/29/2025|
Anycast, Argo Rollouts, Aurora Serverless, AWS, BGP, Ceph, ClickHouse, Cloudflare, CloudFront, DWDM, Flux, Frankfurt, Glacier, Helm, Kinesis, Kubernetes, Metabase, MicroK8s, NVMe, OneUptime, OpenTelemetry Collector, Paris, Postgres, Posthog, PXE, Redis, Step Functions, Supermicro, Talos, Terraform, Tinkerbell, VM's.

I wish you started out by telling me how many customers you have to serve, how many transactions they generate, how much I/O there is.

jammo 10/29/2025||
Equinix Metal is now EOL, so worth bearing that in mind..
submeta 10/29/2025||
There is so much hidden cost in maintaining your own bare metal infrastructure. I am always astounded by how people overlook the massive opportunity cost involved in not only setting up, securing, and maintaining your bare metal infrastructure, but also make it state of the art, including best practices, making sure you have required uptime, monitoring and intervening if necessary. - I work in a highly regulated market with 700 coworkers, our IT maintains an endless amount of VMs. And you cannot imagine how much more work they have to do compared to a setup where you spin up services in AWS or Azure. And destroy it when you don’t need it. No updates, no patches. No misconfiguration. Not every company uses automation either (chef, ansible and whatnot)
saxenaabhi 10/29/2025|
I agree, I have a restaurant POS system and I think self-hosting would easily kill the product velocity, and if we screw up bad, even the company.

However, I do get the point about cost-premium and more importantly vendor-risk that's paid when using managed services.

We are hosted on cloudflare workers which is very cheap, but to mitigate the vendor risk we have also setup up replicas of our api servers on bunny.net and render.com.

kyledrake 10/29/2025||
The article mentions Equinix Metal but if you look it up they are shutting down the service https://docs.equinix.com/metal/hardware/standard-servers

Doesn't make me want to be a Equinix customer when they just randomly shut down critical hosting services.

I'm pretty sure that it's just the post-merger name for Packet which was an incredible provider that even had BYO IP with an anycast community. Really a shame that it went away, it was a solid alternative to both AWS and bare metal and prices were pretty good.

There's a missing middle between ultra expensive/weird cloud and cheap junk servers that I would really love to see get filled.

dilyevsky 10/29/2025|
Fwiw equinix metal was an acquisition (Packet). Seems like it didnt go too well
spprashant 10/29/2025||
The thing I find counter intuitive about AWS and hyper-scalers in general is, they make so much sense when you are starting out a new project. A few VMs, some gigs of data storage, you are off to the races in a day or two.

As soon as you start talking about any kind of serious data storage and data transfer the costs start piling up like crazy.

Like in my mind, the cost curve should flatten out over time. But that just doesn't seem to be the reality.

marcinzm 10/29/2025||
They were running for a long time (months? over a year?) on a single rack in a single datacenter. Eventually they scaled out but the word is eventually. I think that summarizes both sides of this debate in a nutshell. You can move off of AWS but unless you invest a lot you will take on increased risk. Maybe you'll get lucky and your one rack won't burn down. Maybe you won't. They did get lucky.
shakow 10/29/2025||
> Maybe you'll get lucky and your one rack won't burn down

Given the rates of fires in DCs, you'd rather need to be quite unlucky for it to happen to you.

gizzlon 10/29/2025|||
Hm.. I wonder what the risk of a rack going offline is? Maybe 5% in a given year? Less? More?

Compared to all the other things that can and will go wrong, this risk seems pretty small, but I have no data to back that up.

athrowaway3z 10/29/2025||
From the story, they seem to have kept the option to fallback on AWS.
hedora 10/29/2025||
Reason to use AWS from the article:

> You do not have the appetite to build a platform team comfortable with Kubernetes, Ceph, observability, and incident response.

Has work been using AWS wrong? Other than Ceph, all those things add up to onerous half time jobs for rotating software engineers.

Before gp3 came out, working around EBS price/performance terribleness was also on the list.

dev_l1x_be 10/29/2025||
> AWS is extremely expensive.

I really like how people throw around these baseless accusations.

S3 is one of the cheapest storage solutions ever created. The last 10 years I have migrated roughly 10-20PB worth of data to AWS S3 and it resulted in significant cost saving every single time.

If you do not know how to use cloud computing than yes, AWS can be really expensive.

MontyCarloHall 10/29/2025||
Assuming those 20PB are hot/warm storage, S3 costs roughly $0.015/GB/month (50:50 average of S3 standard/infrequent access). That comes out to roughly $3.6M/year, before taking into account egress/retrieval costs. Does it really cost that much to maintain your own 20PB storage cluster?

If those 20PB are deep archive, the S3 Glacier bill comes out to around $235k/year, which also seems ludicrous: it does not cost six figures a year to maintain your own tape archive. That's the equivalent of a full-time sysadmin (~$150k/year) plus $100k in hardware amortization/overhead.

The real advantage of S3 here is flexibility and ease-of-use. It's trivial to migrate objects between storage classes, and trivial to get efficient access to any S3 object anywhere in the world. Avoiding the headache of rolling this functionality yourself could well be worth $3.6M/year, but if this flexibility is not necessary, I doubt S3 is cheaper in any sense of the word.

matwood 10/29/2025|||
Like most of AWS, it depends if you need what it provides. A 20PB tape system will have an initial cost in the low to mid 6 figures for the hardware and initial set of tapes. Do the copies need to be replicated geographically? What about completely offline copies? Reminds me of conversations with archivists where there's preservation and then there's real preservation.
torginus 10/29/2025||||
How the heck does anyone have that much data? I once built myself a compressed plaintext library from one of those data-hoarder sources that had almost every fiction book in existence, and that was like 4TB compressed (but would've been much less if I bothered hunting for duplicates and dropped non-English).

I suspect the only way you could have 20PB is if you have metrics you don't aggregate or keep ancient logs (why do you need to know your auth service had a transient timeout a year ago?)

MontyCarloHall 10/29/2025|||
Lots of things can get to that much data, especially in aggregate. Off the top of my head: video/image hosting, scientific applications (genomics, high energy physics, the latter of which can generate PBs of data in a single experiment), finance (granular historic market/order data), etc.
geoka9 10/29/2025|||
In addition to what others have mentioned, before the "AI bubble", there was a "data science bubble" where every little signal about your users/everything had to be saved so that it could be analyzed later.
dev_l1x_be 10/30/2025|||
> Does it really cost that much to maintain your own 20PB storage cluster?

If you think S3 = storage cluster than the answer is no.

If you think about S3 what it actually is: scalable, high throughput, low latency, reliable, durable, low operational overhead, high uptime, encrypted, distributed, replicated storage with multiple tier1 uplinks to the internet than the answer is yes.

MontyCarloHall 10/30/2025||
>scalable, high throughput, low latency, reliable, durable, low operational overhead, high uptime, encrypted, distributed, replicated storage with multiple tier1 uplinks to the internet

If you need to tick all of those boxes for every single byte of 20PB worth of data, you are working on something very cool and unique. That's awesome.

That said, most entities who have 20PB of data only need to tick a couple of those boxes, usually encryption/reliability. Most of their 20PB will get accessed at most once a year, from a predictable location (i.e. on-prem), with a good portion never accessed at all. Or if it is regularly accessed (with concomitant low latency/high throughput requirements), it almost certainly doesn't need to be globally distributed with tier1 access. For these entities, a storage cluster and/or tape system is good enough. The problem is that they naïvely default to using S3, mistakenly thinking it will be cheaper than what they could build themselves for the capabilities they actually need.

Aurornis 10/29/2025||
The implicit claims are more misleading, in my opinion: The claim that self-hosting is free or nearly free in terms of time and engineering brain drain.

The real cost of self-hosting, in my direct experience with multiple startup teams trying it, is the endless small tasks, decisions, debates, and little changes that add up over time to more overhead than anyone would have expected. Everyone thinks it’s going to be as simple as having the colo put the boxes in the rack and then doing some SSH stuff, then you’re free of those AWS bills. In my experience it’s a Pandora’s box of tiny little tasks, decisions, debates, and “one more thing” small changes and overhauls that add up to a drain on the team after the honeymoon period is over.

If you’re a stable business with engineers sitting idle that could be the right choice. For most startups who just need to get a product out there and get customers, pulling limited headcount away from the core product to save pennies (relatively speaking) on a potential AWS bill can be a trap.

rafabulsing 10/30/2025|||
> The implicit claims are more misleading, in my opinion: The claim that self-hosting is free or nearly free in terms of time and engineering brain drain.

Not only is that not an implicit claim in the post, they explicitly say the that it's not free, it's actually just around the same amount of time they used to spend with AWS:

> Total toil is ~14 engineer-hours/month, including prep. The AWS era had us spending similar time but on different work: chasing cost anomalies, expanding Security Hub exceptions, and mapping breaking changes in managed services. The toil moved; it did not multiply.

As for the following:

> If you’re a stable business with engineers sitting idle that could be the right choice. For most startups who just need to get a product out there and get customers, pulling limited headcount away from the core product to save pennies (relatively speaking) on a potential AWS bill can be a trap.

You're just agreeing with them:

> Cloud-first was the right call for our first five years. Bare metal became the right call once our compute footprint, data gravity, and independence requirements stabilised.

marcosdumay 10/29/2025|||
> The claim that self-hosting is free or nearly free in terms of time and engineering brain drain.

Free? No, it's not free. It only costs less engineering time than AWS.

twodave 10/29/2025||
I feel like it just depends on what you’re trying to do.

* data driven website with some internal api integration, maybe some client-side application or tooling? Put a server rack in a closet and get a fiber line.

* trying to serve the general public in a bursty, not-cacheable way? Probably going to have to carry a lot of machines that usually don’t do much, cloud might make more sense

* lots of ingress or DNS rules? A hybrid approach could make sense.

In general once you start thinking about scaling data to larger capacities is when you start considering the cloud, because just the storage solution ends up on a long amortization schedule if you need it to be resilient, let alone the servers you’re racking to drive the DB.

layoric 10/30/2025|
> In general once you start thinking about scaling data to larger capacities is when you start considering the cloud

What kind of capacities as a rule of thumb would you use? You can fit an awful lot of storage and compute on a single rack, and the cost for large DBs on AWS and others is extremely high, so savings are larger as well.

twodave 10/30/2025||
Well, if you want proper DR you really need an off-site backup, disk failover/recovery, etc. And if you don’t want to manually be maintaining individual drives then you’re looking at one of the big, expensive storage solutions with enterprise grade hardware, and those will easily cost some large multiple more than whatever 2U db server you end up putting in front of it.
suralind 10/30/2025|
Love this write up. But also love the fact that the company pretty much uses the best solutions that they can (that definitely helps with hosting, regardless of cloud/self-host, but is impressive anyway). I’ve been a big Talos fan for couple years, but never worked with a client that would use something different than managed offering from their cloud provider.

Similarly, the fact that they are regularly updating Kubernetes, Talos and presumably other things means that these things are a normal flow and thus can optimize and automate. If you’re doing that because AWS is forcing you to upgrade… then it’s stressful.

More comments...