Posted by ndhandala 10/29/2025
I wish you started out by telling me how many customers you have to serve, how many transactions they generate, how much I/O there is.
However, I do get the point about cost-premium and more importantly vendor-risk that's paid when using managed services.
We are hosted on cloudflare workers which is very cheap, but to mitigate the vendor risk we have also setup up replicas of our api servers on bunny.net and render.com.
Doesn't make me want to be a Equinix customer when they just randomly shut down critical hosting services.
I'm pretty sure that it's just the post-merger name for Packet which was an incredible provider that even had BYO IP with an anycast community. Really a shame that it went away, it was a solid alternative to both AWS and bare metal and prices were pretty good.
There's a missing middle between ultra expensive/weird cloud and cheap junk servers that I would really love to see get filled.
As soon as you start talking about any kind of serious data storage and data transfer the costs start piling up like crazy.
Like in my mind, the cost curve should flatten out over time. But that just doesn't seem to be the reality.
Given the rates of fires in DCs, you'd rather need to be quite unlucky for it to happen to you.
Compared to all the other things that can and will go wrong, this risk seems pretty small, but I have no data to back that up.
> You do not have the appetite to build a platform team comfortable with Kubernetes, Ceph, observability, and incident response.
Has work been using AWS wrong? Other than Ceph, all those things add up to onerous half time jobs for rotating software engineers.
Before gp3 came out, working around EBS price/performance terribleness was also on the list.
I really like how people throw around these baseless accusations.
S3 is one of the cheapest storage solutions ever created. The last 10 years I have migrated roughly 10-20PB worth of data to AWS S3 and it resulted in significant cost saving every single time.
If you do not know how to use cloud computing than yes, AWS can be really expensive.
If those 20PB are deep archive, the S3 Glacier bill comes out to around $235k/year, which also seems ludicrous: it does not cost six figures a year to maintain your own tape archive. That's the equivalent of a full-time sysadmin (~$150k/year) plus $100k in hardware amortization/overhead.
The real advantage of S3 here is flexibility and ease-of-use. It's trivial to migrate objects between storage classes, and trivial to get efficient access to any S3 object anywhere in the world. Avoiding the headache of rolling this functionality yourself could well be worth $3.6M/year, but if this flexibility is not necessary, I doubt S3 is cheaper in any sense of the word.
I suspect the only way you could have 20PB is if you have metrics you don't aggregate or keep ancient logs (why do you need to know your auth service had a transient timeout a year ago?)
If you think S3 = storage cluster than the answer is no.
If you think about S3 what it actually is: scalable, high throughput, low latency, reliable, durable, low operational overhead, high uptime, encrypted, distributed, replicated storage with multiple tier1 uplinks to the internet than the answer is yes.
If you need to tick all of those boxes for every single byte of 20PB worth of data, you are working on something very cool and unique. That's awesome.
That said, most entities who have 20PB of data only need to tick a couple of those boxes, usually encryption/reliability. Most of their 20PB will get accessed at most once a year, from a predictable location (i.e. on-prem), with a good portion never accessed at all. Or if it is regularly accessed (with concomitant low latency/high throughput requirements), it almost certainly doesn't need to be globally distributed with tier1 access. For these entities, a storage cluster and/or tape system is good enough. The problem is that they naïvely default to using S3, mistakenly thinking it will be cheaper than what they could build themselves for the capabilities they actually need.
The real cost of self-hosting, in my direct experience with multiple startup teams trying it, is the endless small tasks, decisions, debates, and little changes that add up over time to more overhead than anyone would have expected. Everyone thinks it’s going to be as simple as having the colo put the boxes in the rack and then doing some SSH stuff, then you’re free of those AWS bills. In my experience it’s a Pandora’s box of tiny little tasks, decisions, debates, and “one more thing” small changes and overhauls that add up to a drain on the team after the honeymoon period is over.
If you’re a stable business with engineers sitting idle that could be the right choice. For most startups who just need to get a product out there and get customers, pulling limited headcount away from the core product to save pennies (relatively speaking) on a potential AWS bill can be a trap.
Not only is that not an implicit claim in the post, they explicitly say the that it's not free, it's actually just around the same amount of time they used to spend with AWS:
> Total toil is ~14 engineer-hours/month, including prep. The AWS era had us spending similar time but on different work: chasing cost anomalies, expanding Security Hub exceptions, and mapping breaking changes in managed services. The toil moved; it did not multiply.
As for the following:
> If you’re a stable business with engineers sitting idle that could be the right choice. For most startups who just need to get a product out there and get customers, pulling limited headcount away from the core product to save pennies (relatively speaking) on a potential AWS bill can be a trap.
You're just agreeing with them:
> Cloud-first was the right call for our first five years. Bare metal became the right call once our compute footprint, data gravity, and independence requirements stabilised.
Free? No, it's not free. It only costs less engineering time than AWS.
* data driven website with some internal api integration, maybe some client-side application or tooling? Put a server rack in a closet and get a fiber line.
* trying to serve the general public in a bursty, not-cacheable way? Probably going to have to carry a lot of machines that usually don’t do much, cloud might make more sense
* lots of ingress or DNS rules? A hybrid approach could make sense.
In general once you start thinking about scaling data to larger capacities is when you start considering the cloud, because just the storage solution ends up on a long amortization schedule if you need it to be resilient, let alone the servers you’re racking to drive the DB.
What kind of capacities as a rule of thumb would you use? You can fit an awful lot of storage and compute on a single rack, and the cost for large DBs on AWS and others is extremely high, so savings are larger as well.
Similarly, the fact that they are regularly updating Kubernetes, Talos and presumably other things means that these things are a normal flow and thus can optimize and automate. If you’re doing that because AWS is forcing you to upgrade… then it’s stressful.