Posted by ndhandala 10/29/2025
Eventually, I realized that it was because the devs wanted to put "AWS" on their resumes. I wondered how long it would take management to catch on that they were being used as a place to spruce up your resume before moving on to catch bigger fish.
But not long after, I realized that the management was doing the same thing. "Led a team migration to AWS" looked good on their resume, also, and they also intended to move on/up. Shortly after I left, the place got bought and the building it was in is empty now.
I wonder, now that Amazon is having layoffs and Big Tech generally is not as many people's target employer, will "migrated off of AWS to in-house servers" be what devs (and management) want on their resume?
And then discussions on how to move forward are held between people that only know AWS and people who want to use other stuff, but only one side is transparent about it.
What people forget about the OVH or Hetzner comparison is that for those entry servers they are known for, think the Advance line with OVH or AX with Hetzner. Those boxes come with some drawbacks.
The OVH Advance line for example comes without ECC memory, in a server, that might host databases. It's a disaster waiting to happen. There is no option to add ECC memory with the Advance line, so you have to use Scale or High Grade servers, which are far from "affordable".
Hetzner per default comes with a single PSU, a single uplink. Yes, if nothing happens this is probably fine, but if you need a reliable private network or 10G this will cost extra.
For a startup with one rack in each of two data centers, it’s probably fine. You’ll end up testing failover a bit more, but you’ll need that if you scale anyway.
If it’s for some back office thing that will never have any load, and must not permanently fail (eg payroll), maybe just slap it on an EC2 VM and enable off-site backup / ransomware protection.
But imo, systems like these (like the ones handling bank transaction), should have a degree of resiliency to this kind of failure, as any hw or sw problem can cause something similar.
Very much this.
Small team in a large company who has an enterprise agreement (discount) with a cloud provider? The cloud can be very empowering, in that teams who own their infra in the cloud can make changes that benefit the product in a fraction of the time it would take to work those changes through the org on prem. This depends on having a team that has enough of an understanding of database, network and systems administration to own their infrastructure. If you have more than one team like this, it also pays to have a central cloud enablement team who provides common config and controls to make sure teams have room to work without accidentally overrunning a budget or creating a potential security vulnerability.
Startup who wants to be able to scale? You can start in the cloud without tying yourself to the cloud or a provider if you are really careful. Or, at least design your system architecture in such a way that you can migrate in the future if/when it makes sense.
There are huge advantages of scale to computer operations in a few areas:
- facility: the capital and running cost of a purpose-built datacenter is far cheaper per rack than putting machines in existing office-class buildings, as long as it's a reasonable size - ours is ~1000 racks, but you might get decent scale at a quarter of that. (also one fat network pipe instead of a bunch of slow ones)
- purchasing: unlike consumer PCs, low-volume prices for major vendor servers are wildly inflated, and you don't get decent prices until you buy quite a few of them.
- operations: people come in integer units, and (assuming your salary ranges are bounded) are only competent in small number of technical areas each. Whether you have one machine or 1000s you need someone who can handle each technology your deployment depends on, from Kubernetes to network ops; multiply 4x for those requiring 24/7 coverage, or accept long response times for off-hours failures.
That last one is probably the kicker. To keep salary costs below 50% of your total, assuming US pay rates and 5-year depreciation since machines aren't getting faster as quickly as they used to, you probably need to be running tens of millions of dollars in hardware.
Note that a tiny deployment of a few machines in a tech company is an exception, since you have existing technical staff who can run them in their spare time. (and you have other interesting work for them to do, so recruiting and retention isn't the same problem as if their only job was to babysit a micro-deployment)
That's why it can be simultaneously true that (a) profit margins on AWS-like services are very high, and (b) AWS is cheaper than running your own machines for a large number of companies.
Just want to confirm what I am reading. You are talking about ~1000 racks as the facility size, not what a typical university requires.
https://en.wikipedia.org/wiki/Bare_metal
Edit: For clarity, wikipedia does also have pages with other meanings of "bare metal", including "bare metal server". The above link is what you get if you just look up "bare metal".
I do aim to be some combination of clear, accurate and succinct, but I very often seem to end up in these HN pissing matches so I suppose I'm doing something wrong. Possibly the mistake is just commenting on HN in itself.
I'm not sure what you did, but when you go to that Wikipedia article, it redirects to "Bare Machine", and the article contents is about "Bare Machine". Clicking the link you have sends you to https://en.wikipedia.org/wiki/Bare_machine
So it seems like you almost intentionally shared the article that redirects, instead of linking to the proper page?
Can we stop this now? Please?
Fix it then, if you think it's incorrect. Otherwise, link to https://en.wikipedia.org/wiki/Bare_metal_(disambiguation) like any normal and charitable commentator would do.
> Can we stop this now? Please?
Sure, feel free to stop at any point you want to.
I thought my link made the point a bit better. I think maybe you've misunderstood something about how Wikipedia works, or about what I'm saying, or something. Which is OK, but maybe you could try to be a bit more polite about it? Or charitable, to use your own word?
Edit: In case this part isn't obvious, Wikipedia redirects are managed by Wikipedia editors, just like the rest of Wikipedia. Where the redirect goes is as much an indication of the collective will of Wikipedia editors as eg. a disambiguation page. I don't decide where a request for the "bare metal" page goes, that's Wikipedia.
Edit2: Unless you're suggesting I edited the redirect page? The redirect looks to have been created in 2013, and hasn't been changed since.
The biggest downside I see? We had to sign a 3 year contract with the colocation facility up front, and any time we want to change something they want a new commitment. On AWS you don't commit to spending until after you've got it working, and even then it's your choice.
It's pretty simple to run low cost services in AWS. If you're small enough that an $800 desktop on a home internet connection will handle it, surely you could run it completely serverless for much less.
I'm surprised how many people I see wanting to go on-prem vs AWS/public cloud. Feels penny smart and pound foolish to me. Lots of people too deep into the technical side of things that they don't even understand the business side.
Bandwidth is usually the killer with AWS. Bandwidth on prem is free. Some cloud vendors are much better, in fairness.
You don't get to an overcomplicated AWS madness without having a few engineers already pushing complexity.
And an overcomplicated setup also means it needs maintenance. There are no personnel savings there.
You could get manual failover with a single writer replicated managed Postgres setup and a warm VM.
That’s on the order of a thousand a month for a medium workload. It’s probably a 10x markup vs buying the servers, but it doesn’t matter if it saves an employee.
Our .NET Apps are still deployed as Docker Compose Apps which we use GitHub Actions and Kamal [1] to deploy. Most Apps use SQLite + Litestream with real-time replication to R2, but have switched to a local PostgreSQL for our Latest App with regular backups to R2.
Thanks to AI that can walk you through any hurdle and create whatever deployment, backup and automation scripts you need, it's never been easier to self-host.
Gee, how hard is to find SE experts in that particular combination of available ops tools? While in AWS every AWS certified engineer would speak the same language, the DIY approach surely suffers from the lack of "one way" to do things. Change Flux with Argo for example (assuming the post is talking about that Flex and no another tool with the same name), and you have a almost completely different gitops workflow. How do they manage to settle with a specific set of tools?
You find expert in Ops, not in tools. People that know the fundamentals, not just the buttons to push in "certain situations" without knowing what's really going on under the hood.
What are the major differences?