Top
Best
New

Posted by ndhandala 10/29/2025

AWS to bare metal two years later: Answering your questions about leaving AWS(oneuptime.com)
727 points | 491 commentspage 2
rossdavidh 10/29/2025|
I had a problem figuring out why the place I was working wanted to move from in-house to AWS; their workload was easily handled by a few servers, they had no big bursts of traffic, and they didn't need any of the specialized features of AWS.

Eventually, I realized that it was because the devs wanted to put "AWS" on their resumes. I wondered how long it would take management to catch on that they were being used as a place to spruce up your resume before moving on to catch bigger fish.

But not long after, I realized that the management was doing the same thing. "Led a team migration to AWS" looked good on their resume, also, and they also intended to move on/up. Shortly after I left, the place got bought and the building it was in is empty now.

I wonder, now that Amazon is having layoffs and Big Tech generally is not as many people's target employer, will "migrated off of AWS to in-house servers" be what devs (and management) want on their resume?

whstl 10/29/2025||
Devs wanting to put AWS on their resume push for it, then the next wave you hire only knows AWS.

And then discussions on how to move forward are held between people that only know AWS and people who want to use other stuff, but only one side is transparent about it.

ahel 10/29/2025||
with "dev wanting X" nothing happens. "leadership deciding X" then it needs to get done.
rossdavidh 10/30/2025||
That was what I thought, but when the middle management also wants it, then it can become the 'obvious choice', a la 'nobody ever got fired for choosing IBM'. It seems that middle management + devs can make it seem inevitable to the people above them, especially if those people are non-tech.
nik736 10/29/2025||
It's an interesting article, thanks for that.

What people forget about the OVH or Hetzner comparison is that for those entry servers they are known for, think the Advance line with OVH or AX with Hetzner. Those boxes come with some drawbacks.

The OVH Advance line for example comes without ECC memory, in a server, that might host databases. It's a disaster waiting to happen. There is no option to add ECC memory with the Advance line, so you have to use Scale or High Grade servers, which are far from "affordable".

Hetzner per default comes with a single PSU, a single uplink. Yes, if nothing happens this is probably fine, but if you need a reliable private network or 10G this will cost extra.

lossolo 10/29/2025||
These concerns are exaggerated. I've been running on Hetzner, OVH and friends for 20 years. During that time I've had only two issues, one about 15 years ago when a PSU failed on one of the servers, and another a few years ago when an OVH data center caught fire and one of the servers went down. There have been no other hardware issues. YMMV.
hedora 10/29/2025|||
They matter at scale, where 1% issues end up happening on a daily or weekly basis.

For a startup with one rack in each of two data centers, it’s probably fine. You’ll end up testing failover a bit more, but you’ll need that if you scale anyway.

If it’s for some back office thing that will never have any load, and must not permanently fail (eg payroll), maybe just slap it on an EC2 VM and enable off-site backup / ransomware protection.

ghaff 10/29/2025||
Wasn't my product as a product manager but my long-ago company came out with an under the desk minicomputer product for distributed sites. And they didn't use ECC memory in the design. The servers didn't fail very often but multiply that fairly low error rate by a large number of servers and a system failure was happening every few days or so. The customer wasn't happy.
torginus 10/29/2025|||
I never understood the draw of 'server-grade hardware'. Consumer hardware fails rarely enough that you could 2x your infra and still be paying less.
jammo 10/29/2025|||
Yes, but there are options for dedicated server providers who offer dual PSU and ECC ram etc. It's more expensive though for e.g a 24 Core Epyc with 384GB RAM dual 10G netowork is like $500/month (though there's smaller servers on serversearcher.com for other examples)
montecarl 10/29/2025|||
I can't believe how affordable Hetzner is. I just rented a bare metal 48 core AMD EPYC 9454P with 256 GB of ram and two 2 TB NVME ssds for $200/month (or $0.37 per hour). Its hard to directly compare with AWS, but I think its about 10x cheaper.
titanomachy 10/29/2025||
Wow. Probably performs better too, with a recent CPU and non-"elastic" disk. What about ingress/egress?
wombarly 10/30/2025||
Hetzner dedicated servers by default have 1 GBit and free egress. You can opt for 10 GBit, which has 20TB and then €1 / TB overage.
hedora 10/29/2025|||
Their current advance offerings use AMD EPYC 4004 with on-die ECC. I can’t figure out if it’s “real” single correction double detection, or if the data lines between the processor and dimms are protected or not though.
nik736 10/29/2025||
It's only on-die ECC not real ECC
vjerancrnjak 10/29/2025||
Is there software that works without ECC RAM ? I think most popular databases just assume memory never corrupts .
torginus 10/29/2025||
I'm pretty sure they keep internal internal checksums at various points to make sure the data on disk is intact - so does the filesystem, I think they can catch when memory corruption occurs, and can roll back to a consistent state (you still get some data loss).

But imo, systems like these (like the ones handling bank transaction), should have a degree of resiliency to this kind of failure, as any hw or sw problem can cause something similar.

TYPE_FASTER 10/29/2025||
> It depends on your workload.

Very much this.

Small team in a large company who has an enterprise agreement (discount) with a cloud provider? The cloud can be very empowering, in that teams who own their infra in the cloud can make changes that benefit the product in a fraction of the time it would take to work those changes through the org on prem. This depends on having a team that has enough of an understanding of database, network and systems administration to own their infrastructure. If you have more than one team like this, it also pays to have a central cloud enablement team who provides common config and controls to make sure teams have room to work without accidentally overrunning a budget or creating a potential security vulnerability.

Startup who wants to be able to scale? You can start in the cloud without tying yourself to the cloud or a provider if you are really careful. Or, at least design your system architecture in such a way that you can migrate in the future if/when it makes sense.

pjdesno 10/29/2025||
I'm involved in a fairly large academic cloud deployment, sited in a 15MW data center built and shared by a few large universities.

There are huge advantages of scale to computer operations in a few areas:

- facility: the capital and running cost of a purpose-built datacenter is far cheaper per rack than putting machines in existing office-class buildings, as long as it's a reasonable size - ours is ~1000 racks, but you might get decent scale at a quarter of that. (also one fat network pipe instead of a bunch of slow ones)

- purchasing: unlike consumer PCs, low-volume prices for major vendor servers are wildly inflated, and you don't get decent prices until you buy quite a few of them.

- operations: people come in integer units, and (assuming your salary ranges are bounded) are only competent in small number of technical areas each. Whether you have one machine or 1000s you need someone who can handle each technology your deployment depends on, from Kubernetes to network ops; multiply 4x for those requiring 24/7 coverage, or accept long response times for off-hours failures.

That last one is probably the kicker. To keep salary costs below 50% of your total, assuming US pay rates and 5-year depreciation since machines aren't getting faster as quickly as they used to, you probably need to be running tens of millions of dollars in hardware.

Note that a tiny deployment of a few machines in a tech company is an exception, since you have existing technical staff who can run them in their spare time. (and you have other interesting work for them to do, so recruiting and retention isn't the same problem as if their only job was to babysit a micro-deployment)

That's why it can be simultaneously true that (a) profit margins on AWS-like services are very high, and (b) AWS is cheaper than running your own machines for a large number of companies.

kshacker 10/29/2025|
> the capital and running cost of a purpose-built datacenter is far cheaper per rack than putting machines in existing office-class buildings, as long as it's a reasonable size - ours is ~1000 racks, but you might get decent scale at a quarter of that.

Just want to confirm what I am reading. You are talking about ~1000 racks as the facility size, not what a typical university requires.

seidleroni 10/29/2025||
As someone who works with firmware, it is funny how different our definitions of "bare metal" is.
embedding-shape 10/29/2025||
As someone who does material science, it's funny how our definition of "bare metal" is so different.
onionisafruit 10/29/2025|||
As someone who listens to loud rock and roll music …
amluto 10/29/2025|||
Ask an astronomer what a “metal” is.
Joeboy 10/29/2025|||
Wikipedia still thinks it means the thing I (and presumably you) do.

https://en.wikipedia.org/wiki/Bare_metal

Edit: For clarity, wikipedia does also have pages with other meanings of "bare metal", including "bare metal server". The above link is what you get if you just look up "bare metal".

I do aim to be some combination of clear, accurate and succinct, but I very often seem to end up in these HN pissing matches so I suppose I'm doing something wrong. Possibly the mistake is just commenting on HN in itself.

embedding-shape 10/29/2025||
Seems there is a difference between "Bare Metal" and "Bare Machine".

I'm not sure what you did, but when you go to that Wikipedia article, it redirects to "Bare Machine", and the article contents is about "Bare Machine". Clicking the link you have sends you to https://en.wikipedia.org/wiki/Bare_machine

So it seems like you almost intentionally shared the article that redirects, instead of linking to the proper page?

Joeboy 10/29/2025||
I indeed deliberately pasted a link that shows what happens when you try to go to the Wikipedia page for "bare metal".
embedding-shape 10/29/2025||
Right, slightly misleading though, as https://en.wikipedia.org/wiki/Bare-metal_server is a separate page.
Joeboy 10/29/2025||
Yes, but if you look up "bare metal" it goes to the page about actual bare metal (aka "bare machines" or whatever).

Can we stop this now? Please?

embedding-shape 10/29/2025||
> Yes, but if you look up "bare metal" it goes to the page about actual bare metal (or bare machines or whatever).

Fix it then, if you think it's incorrect. Otherwise, link to https://en.wikipedia.org/wiki/Bare_metal_(disambiguation) like any normal and charitable commentator would do.

> Can we stop this now? Please?

Sure, feel free to stop at any point you want to.

Joeboy 10/29/2025||
There is nothing that needs fixing? Both my link and yours give the same "primary" definition for "bare metal". Which is not unequivocally the correct definition, but it's the one I and the person I was replying to favour.

I thought my link made the point a bit better. I think maybe you've misunderstood something about how Wikipedia works, or about what I'm saying, or something. Which is OK, but maybe you could try to be a bit more polite about it? Or charitable, to use your own word?

Edit: In case this part isn't obvious, Wikipedia redirects are managed by Wikipedia editors, just like the rest of Wikipedia. Where the redirect goes is as much an indication of the collective will of Wikipedia editors as eg. a disambiguation page. I don't decide where a request for the "bare metal" page goes, that's Wikipedia.

Edit2: Unless you're suggesting I edited the redirect page? The redirect looks to have been created in 2013, and hasn't been changed since.

andrewl-hn 10/29/2025|||
In similar way I once worked on a financial system, where a COBOL-powered mainframe was referred to as "Backend", and all other systems around it written in C++, Java, .NET, etc. since early 80s - as "Frontend".
embedding-shape 10/29/2025||
Had somewhat similar experience, the first "frontend" I worked on was a sort of proxy server that sat in front of a database basically, meant as a barrier for other applications to communicate via. At one point we called the client side web application "frontend-frontend" as it was the frontend for the frontend.
pgwhalen 10/29/2025|||
I don't work in firmware at all, but I'm working next to a team now migrating an application from VMs to K8S, and they refer to the VMs as "bare metal" which I find slightly cringeworthy - but hey, whatever language works to communicate an idea.
ghaff 10/29/2025||
I'm not sure I've ever heard bare metal used to refer to virtualized instances. (There were debates around Type 1 and Type 2 (hosted) hypervisors at one point but haven't heard that come up in years.
electroly 10/29/2025||
I put our company onto a hybrid AWS-colocation setup to attempt to get the best of both worlds. We have cheap fiddly/bursty things and expensive stable things and nothing in between. Obviously, put the fiddly/bursty things in AWS and put the stable things in colocation. Direct Connect keeps latency and egress costs down; we are 1 millisecond away from us-east-1 and for egress we pay 2¢/GB instead of the regular 9¢/GB. The database is on the colo side so database-to-AWS reads are all free ingress instead of egress, and database-to-server traffic on the colo side doesn't transit to AWS at all. The savings on the HA pair of SQL Server instances is shocking and pays for the entire colo setup, and then some. I'm surprised hybrids are not more common. We are able to manage it with our existing (small) staff, and in absolute terms we don't spend much time on it--that was the point of putting the fiddly stuff in AWS.

The biggest downside I see? We had to sign a 3 year contract with the colocation facility up front, and any time we want to change something they want a new commitment. On AWS you don't commit to spending until after you've got it working, and even then it's your choice.

jcalvinowens 10/29/2025||
I have seen multiple startups paying thousands of dollars a month in AWS bills to run a tiny service which could trivially run on an $800 desktop on a residential internet connection. It's absolutely tragic.
kaydub 10/30/2025||
Then they architected and built it out wrong.

It's pretty simple to run low cost services in AWS. If you're small enough that an $800 desktop on a home internet connection will handle it, surely you could run it completely serverless for much less.

I'm surprised how many people I see wanting to go on-prem vs AWS/public cloud. Feels penny smart and pound foolish to me. Lots of people too deep into the technical side of things that they don't even understand the business side.

jcalvinowens 10/30/2025|||
You vastly underestimate what you can do on prem. Equivalent elastic compute for a $800 desktop that uses $100/mo in electricity is >$2000/mo. As sibling points out, it could easily be much much more depending on what you're doing.

Bandwidth is usually the killer with AWS. Bandwidth on prem is free. Some cloud vendors are much better, in fairness.

winrid 10/30/2025|||
Hard to say. A $800 desktop could easily equal the compute of $100k/mo in lambda costs.
hedora 10/29/2025||
That’s like $24K a year. Assuming they have working failover and business continuity plans, it’s actually a really good deal (vs having a 10-20% time employee deal with it).
whstl 10/29/2025|||
AWS doesn't get magically expensive just because you put your website there.

You don't get to an overcomplicated AWS madness without having a few engineers already pushing complexity.

And an overcomplicated setup also means it needs maintenance. There are no personnel savings there.

hedora 10/29/2025||
For one VM, EBS with backups gives you business continuity.

You could get manual failover with a single writer replicated managed Postgres setup and a warm VM.

That’s on the order of a thousand a month for a medium workload. It’s probably a 10x markup vs buying the servers, but it doesn’t matter if it saves an employee.

whstl 10/29/2025||
It doesn’t save employees. Over-complicated infrastructure doesn’t magically appear out of nowhere. Someone has to setup and maintain. It’s expensive.
dimitrios1 10/29/2025||
One thing I can say definitively, as someone who is definitely not an AI zealot (more of an AI pragmatist): GPT language models have reduced the barrier of running your own bare metal server. AWS salesfolk have long often used the boogeyman of the costs (opportunity, actual, maintenance) of running your own server as the reason you should pick AWS (not realizing you are trading one set of boogeymen for another), but AI has reduced a lot of that burden.
mythz 10/29/2025||
Several years off AWS, the only thing I still prefer AWS for is SES, otherwise Cloudflare has the more cost effective managed services. For everything else we use Hetzner US Cloud VMs for hosting all App Servers and Server Software.

Our .NET Apps are still deployed as Docker Compose Apps which we use GitHub Actions and Kamal [1] to deploy. Most Apps use SQLite + Litestream with real-time replication to R2, but have switched to a local PostgreSQL for our Latest App with regular backups to R2.

Thanks to AI that can walk you through any hurdle and create whatever deployment, backup and automation scripts you need, it's never been easier to self-host.

[1] https://docs.servicestack.net/kamal-deploy

aeve890 10/29/2025|
>We're now moving to Talos. We PXE boot with Tinkerbell, image with Talos, manage configs through Flux and Terraform, and run conformance suites before each Kubernetes upgrade.

Gee, how hard is to find SE experts in that particular combination of available ops tools? While in AWS every AWS certified engineer would speak the same language, the DIY approach surely suffers from the lack of "one way" to do things. Change Flux with Argo for example (assuming the post is talking about that Flex and no another tool with the same name), and you have a almost completely different gitops workflow. How do they manage to settle with a specific set of tools?

zppln 10/29/2025||
If you're that much of a slave to your tool chain you don't get to call yourself an engineer.
film42 10/29/2025||
Or you have PTSD after 10 years of being on-call 24/7 for your company's stack. I've built my next chapter around offloading the pager. Worth every penny.
amluto 10/29/2025|||
I would not want to hire an engineer who claimed to be proficient with any cloud Kubernetes stack but couldn’t learn Talos in a week.
Ragnarork 10/30/2025|||
> Gee, how hard is to find SE experts in that particular combination of available ops tools?

You find expert in Ops, not in tools. People that know the fundamentals, not just the buttons to push in "certain situations" without knowing what's really going on under the hood.

63stack 10/29/2025||
Argocd and flux are "almost completely different"? The last time I looked was about a year ago, and there seemed to be only minor differences.

What are the major differences?

More comments...