Posted by yusufusta 12 hours ago
if agents eat that glue, the moat gets thin fast.
No wonder they hallucinate :)
But maybe I’m just thinking of the current capabilities of agents, and if we fast forward a couple years, even removing these abstractions or migrating will be very low friction.
I run k8s on a bunch of dedicated servers that are super cheap and I have all bells and whistles - just tell your coding agent to do it. You can literally design the thing you would never do yourself and it works brilliantly.
Postgres running on dedicated hardware replicated and with wal backups - easy just tell codebuff (my harness of choice) to do it. Then any number of firewalls, load balancers, bastion servers, etc. if you can imagine it , codebuff will implement it.
How deep does this go?
I know your comment is tongue-in-cheek and the poster here is kinda known, but this kind of astroturfing is a new low and it's everywhere on forums such as these.
It's too bad Reddit allows accounts to hide their comment history now. That was an easy way to identify bot accounts before they started allowing accounts to hide their post history
They're incredibly exposed to investor sentiment and were likely panicking around sept/oct/nov time when AI bubble stories were trending.
These posts were really consistent and repetitive - similar language about "scary good" models and fear of losing jobs.
I'm not. I stick around for the popcorn, and I'm not gonna miss the schadenfreude in a few years.
Just noting for fellow just-waking-up people
(edit: OP edited)
Of course if all you do is "host wordpress website" (like 80% of what's "webdev" do), it will work. Now the issue is that the last 20% are the hardest to cover, and current AI methods will not get there (you need some much more complex methods, like being able to integrate logic with learning-based ML, to do this)
So it's a Claude ad inside a Hetzner ad inside a decent grammar ad.
Btw this type of grammar error can be found by proofreading your posts with ChatGPT powered OpenClaw assistant.
I don't see it as much different from "I used script X to do it" or something.
Or you can update the app to remove the dependency on the library.
But honestly, this is what containers or VMs are built for in the first place.
https://en.wikipedia.org/wiki/Salvatore_Sanfilippo
This whole thread is hilarious.
Sure you lose a little of the benefit of a “virtual” server which can be migrated but Hetzner’s support has always been super fast and capable, should I wind up in a situation where I’ve got downtime.
What’s exciting is how simple cli tools can be so impactful to dev workflows
Obviously I agree that AI can be useful to write boilerplate, but it's in no way something you should use blindly when trying to do a migration or anything touching prod
So, to be more precise: no, "Claude Code didn't migrate it all". Claude Code helped you write boilerplate so that you could migrate
And, recent research suggests that anthropomorphization may actually be positively correlated with intelligence.
Syntax did a nice episode on this topic recently. They went over where it works well, and where it does not work well.
https://syntax.fm/show/992/migrating-legacy-code-just-got-ea...
Amazon might think that they’re locking people in with the egress fees. But they’re also locking people out. As soon as you switch one part to a competitor, the high egress forces you to switch over everything.
It’s going to be complicated to switch, but it’s made easier by the fact that I didn’t fall into the trap of building my platform on Amazon-specific services.
AWS matched a few months later:
https://aws.amazon.com/blogs/aws/free-data-transfer-out-to-i...
I'm not trying to convince you to stay (I work for neither anymore!), just wanted to note that you can technically request a waiver. I'm not sure how this works in practice though. Like, if you want to leave Athena and move to something on-premise is that enough to have just that workload? Maybe!
Edit: I also didn't follow this at the time, but the AWS wording suggests that the "EU Data Act" is also involved.
This doesn't actually work as advertised. I attempted free data egress from AWS in December. It took them 31 days to respond to my initial ticket. At which point they gave me a multi-page questionnaire to determine eligibility and they also told me I could not begin DTO until 60 days had passed from approval of the questionnaire.
By the time I was allowed "free egress" my cumulative S3 storage charges over the prior 100 days would have roughly matched the cost of egress if I just did so originally.
I'm in the US so the EU Data Act protections don't apply.
So in my case that would have been 14 weeks plus the time to migrate away. The egress costs are equivalent to around 17 weeks storage cost. So you save around 1c/gb if they don't find some reason to reject it.
You saved a lot of money but you'll spend a lot of time in maintenance and future headaches.
Sometimes it's completely acceptable that a server will run for 10 years with say 1 week or 1 month of downtime spread over those 10 years, yes. That's the sort of uptime you can see with single servers that are rarely changed and over-provisioned as many on Hetzner are. Some examples:
Small businesses where the website is not core to operations and is more of a shop-front or brochure for their business.
Hobby websites too don't really matter if they go down for short periods of time occasionally.
Many forums and blogs just aren't very important too and downtime is no big deal.
There are a lot of these websites, and they are at the lower end of the market for obvious reasons, but probably the majority of websites in fact, the long tail of low-traffic websites.
Not everything has to be high availability and if you do want that, these providers usually provide load balancers etc too. I think people forget here sometimes that there is a huge range in hosting from squarespace to cheap shared hosting to more expensive self-hosted and provisioned clouds like AWS.
But I do agree the poster should think about this. I don't think it's 'off' or misleading, they just haven't encountered a hardware error before. If they had one on this single box with 30 databases and 34 Nginx sites it would probably be a bad time, and yes they should think about that a bit more perhaps.
They describe a db follower for cutover for example but could also have one for backups, plus rolling backups offsite somewhere (perhaps they do and it just didn't make it into this article). That would reduce risk a lot. Then of course they could put all the servers on several boxes behind a load-balancer.
But perhaps if the services aren't really critical it's not worth spending money on that, depends partly what these services/apps are.
Could I take it down for the afternoon? Sure. Or could I wait and do it after hours? Also sure. But would I rather not have to deal with complaints from users that day and still go home by 5pm? Of course!
Also, in general, you can architect your application to be more friendly to migration. It used to be a normal thing to think about and plan for.
VMware has a conversion tool that converts bare metal into images.
One could image, then do regular snapshots, maybe centralize a database being accessed.
Sometimes it's possible to create a migration script that you run over and over to the new environment for each additional step.
Others can put a backup server in between to not put a load on the drive.
Digital Ocean makes it impossible to download your disk image backups which is a grave sin they can never be forgiven for. They used to have some amount of it.
Still, a few commands can back up the running server to an image, and stream it remotely to another server, which in turn can be updated to become bootable.
This is the tip of the iceberg in the number of tasks that can be done.
Someone with experience can even instruct LLMs to do it and build it, and someone skilled with LLMs could probably work to uncover the steps and strategies for their particular use case.
This is a general response to it.
I have run hosting on bare metal for millions of users a day. Tens of thousdands of concurrent connections. It can scale way up by doing the same thing you do in a cloud, provision more resources.
For "downtime" you do the same thing with metal, as you do with digital ocean, just get a second server and have them failover.
You can run hypervisors to split and manage a metal server just like Digital Ocean. Except you're not vulnerable to shared memory and cpu exploits on shared hosting like Digital Ocean. When Intel CPU or memory flaws or kernel exploits come out like they have, one VM user can read the memory and data of all the other processes belonging to other users.
Both Digital Ocean, and IaaS/PaaS are still running similar linux technologies to do the failover. There are tools that even handle it automatically, like Proxmox. This level of production grade fail over and simplicity was point and click, 10 years ago. Except no one's kept up with it.
The cloud is convenient. Convenience can make anyone comfortable. Comfort always costs way more.
It's relatively trivial to put the same web app on a metal server, with a hypervisor/IaaS/Paas behind the same Cloudflare to access "scale".
Digital Ocean and Cloud providers run on metal servers just like Hetzner.
The software to manage it all is becoming more and more trivial.
> This level of production grade fail over and simplicity was point and click, 10 years ago.
While some of the tools are _designed_ for point and click, they don't always work. Mostly because of bugs.
We run Ceph clusters under our product, and have seen a fair share of non-recoveries after temporary connection loss [1], kernel crashes [2], performance degradations on many small files, and so on.
Similarly, we run HA postgres (Stolon), and found bugs in its Go error checking cause failure to recover from crashes and full-disk conditions [3] [4]. This week, we found that full-disk situations will not necessarily trigger failovers. We also found that if DB connections are exhausted, the dameon that's supposed to trigger postgres failover cannot connect to do that (currently testing the fix).
I believe that most of these things will be more figured out with hosted cloud solutions.
I agree that self-hosting HA with open-source software is the way to. These softwares are good, and the more people use them, the less bugs they will have.
But I wouldn't call it "trivial".
If you have large data, it is also brutally cheaper; we could hire 10 full-time sysadmins for the cost of hosting on AWS, vs doing our own Hetzner HA with Free Software, and we only need ~0.2 sysadmins. And it still has higher uptime than AWS.
It is true that Proxomox is easy to setup and operate. For many people it will probably work well for a long time. But when things aren't working, it's not so easy anymore.
[1]: "Ceph does not recover from 5 minute network outage because OSDs exit with code 0" - https://tracker.ceph.com/issues/73136
[2]: "Kernel null pointer derefecence during kernel mount fsync on Linux 5.15" - https://tracker.ceph.com/issues/53819
[3]: https://github.com/sorintlab/stolon/issues/359#issuecomment-...
Even if Amazon was down, if I was planning to buy, I'd wait. heck, I got a bunch of crap in my cart right now I haven't finished out.
Intentional downtime lets everyone plan around it, reduces costs by not needing N layers of marginal utility which are all fragile and prone to weird failures at times you don't intend.
> Intentional downtime lets everyone plan around it, reduces costs by not needing N layers of marginal utility which are all fragile and prone to weird failures at times you don't intend.
Quite frankly, I would manage if things were run "on-supply" with solar and would just go dark at night.
That's a strawman version of what happens.
There have been times when I've tried to visit a webshop to buy something but the site was broken or down, so I gave up and went to Amazon and bought an alternative.
I've also experienced multiple business situations where one of our services went down at an inconvenient time, a VP or CEO got upset, and they mandated that we migrate away from that service even if alternatives cost more.
If you think of your customers or visitors as perfectly loyal with infinite patience then downtime is not a problem.
> Unless you are Amazon and every minute costs you bazillions, you are likely gonna get the better deal not worrying about availability and scalability. That 250€/m root server is a behemoth. Complete overkill for most anything.
You don't need every minute of downtime to cost "bazillions" to justify a little redundancy. If you're spending 250 euros/month on a server, spending a little more to get a load balancer and a pair of servers isn't going to change your spend materially. Having two medium size servers behind a load balancer isn't usually much more expensive than having one oversized server handling it all.
There are additional benefits to having the load balancer set up for future migrations, or to scale up if you get an unexpected traffic spike. If you get a big traffic spike on a single server and it goes over capacity you're stuck. If you have a load balancer and a pair of servers you can easily start a 3rd or 4th to take the extra traffic.
Great. So how much did the webshop lose in that hour of maintenance (which realistically would be in the middle of the night for their main audience) and how much would they have paid for redundancy? Also a bit hard to believe you repeatedly ran into the situation of an item sold at a self-hosted webshop and Amazon alike. Are you sure they haven't just messed up the web dev biz? You could totally do that with AWS too...
> If you're spending 250 euros/month on a server, spending a little more to get a load balancer and a pair of servers isn't going to change your spend materially.
Of course, but that's not the argument. It's implied you can just double the 250€/m server for redundancy, as you would still get an offer at the fraction of cloud prices. But really that server needs no more optimization in terms of hardware diversification. As I said, it's complete overkill. Blogs and forums could easily be run on a 30€/m recycled machine.
Spot on! People still go to Chick-fil-A, even if they are closed on Sundays!
The confusing part about this article is the emphasis on a zero-downtime migration toward a service that isn't really ideal for uptime. It wouldn't be that expensive to add a little bit of architecture on the Hetzner side to help with this. I guess if you're doing a migration and you're paid salary or your time is free-ish, doing the migration in a zero downtime way is smart. It's a little funny to see the emphasis on zero downtime juxtaposed to the architecture they chose where uptime depends on nothing ever failing
Clever architecture will always beat cleverly trying to pick only one cloud.
Being cloud agnostic is best.
This means setting up a private cloud.
Hosted servers, and managed servers are perfectly capable of near zero downtime. this is because it's the same equipment (or often more consumer grade) that the "cloud" works on and plans for even more failure.
Digital Ocean definitely does not guarantee zero downtime. That's a lot of 9's.
It's simple to run well established tools like Proxmox on bare metal that will do everything Digital Ocean promises, and it's not susceptible to attacks, or exploits where the shared memory and CPU usage will leak what customers believe is their private VPS.
Nothing ever failing in the case of a tool like Proxmox is, install it on two servers, one VPS exists on both nodes (you connect both servers as nodes), click high availability, and it's generally up and running. Put cloudflare in front of it like the best preference practices of today.
If you're curious about this, there's some pretty eye opening and short videos on Proxmox available on Youtube that are hard to unsee.
When you have 2 nodes running, both are mirrored and running, one can have hardware break.
Also, hardware can provide failure notifications before it breaks, and experience teaches to just update and upgrade before hard drives break.
Since tools like proxmox just add a node, you add new hardware, mark the VM for that node to mirror, and it is taken care of.
Terraform etc can sit below Proxmox and alleviate what you're speaking about:
Some examples: https://www.youtube.com/watch?v=dvyeoDBUtsU
Also, don't underestimate the reliability of simplicity.
I was a Linux sysadmin for many years, and I have never seen as much downtime from simpler systems as I routinely see from the more complicated setups. Somewhere between theory and reality, simpler systems just comes out ahead most of the time.
Usually those articles describe two situations:
- they were "on the cloud" for the wrong reasons and migrating to something more physical is the right approach
- they were "on the cloud" for the right reasons and migrating to something more physical is going to be a disaster
Here they appear to be in the first situation.
If their setup was running fine on DO and they put the right DR policies in place at Hetzner, they should be fine.As a bonus, Hetzner is European.
Dealing with over engineered bullshit, that behaved in strange ways that disrupted the service was far more often a problem.
So, yes, redundancy is something that can be left away, if you're comfortable to be responsible for fixing things at a Saturday morning.
They saved money and lost nothing.
Now, if they so wish, they could use a portion of that to increase redundancy - but that wasn't the point of the article.
If someone starts thinking about redundancy and load balancers than DO's solution is rent a second similar sized droplet, and then add their load balancing service. If you do those things with Hetzner instead, you would still be spending less than you did with Digital Ocean.
Personally, what is keeping me on DO is that no single droplet I have is large enough to justify moving on its own, and I'm not prepared to deal with moving everything.
It’s amusing that the US government can shutdown for days/weeks/months over budget reasons and there’s no adult discussions that take place about fixing the cause. Yet the latest HN demo that 100 people will use need all 9’s reliability and hundreds of responses.
Given the downtimes we saw in the past year(s) (AWS, Cloudflare, Azure - the later even down several times), I would argue moving to any of the big cloud providers give you not much of a better guarantee.
I myself am a Hetzner customer with a dedicated vServer, meaning it is a shared virtual server but with dedicated CPUs (read: still oversubscribed, but some performance guarantee) and had zero hardware-based downtime for years [0]. I would guess their vservers are on similar redundant hardware where the failing components can be hotswapped.
[0] = They once within the last 3 years sent me an email that they had to update a router that would affect network connectivity for the vServer, but the notification came weeks in advance and lasted about 15 minutes. No reboot/hardware failure on my vServer though.
Not a bad tradeoff for 99.8% of shops out there.
I know people like FAANG LARPing. Not everyone has budget or need to run four nines with 24/7 and FAANG level traffic.
If you can tolerate few hours of downtime and some data rollback/loss, single server + robust backups can be viable strategy
If your scaling need is not that high, you can get very far with a single server
Like, I know Leetcode tells otherwise, but most companies really don't need full FAANG stack with 99.999% uptime. A day of outage in a few years isn't going affect bottom lines.
LARPing as a FAANG is waste of money and lots of companies doesn’t even need 3 nines let alone 5ve.
People underestimate how far you can go with one or two servers. In fact, what I have seen in ky career is many examples of services that should have been running on one or two servers and instead went for a hugely complex microserviced approach, all in on Cloud providers and crazy requirements of reliability for a scale that never would come.
Deploying a new docker instance or just restoring the app from a snapshot and restoring the latest db in most cases is enough.
For backups we use both Velero and application-level backup for critical workloads (i.e. Postgres WAL backups for PITR). We also ensure all state is on at least two nodes for HA.
We also find bare metal to be a lot more performant in general. Compared to AWS we typically see service response times halve. It is not that virtualisation inherently has that much overhead, rather it is everything else. Eg, bare metal offers:
- Reduced disk latency (NVMe vs network block storage)
- Reduced network latency (we run dedicated fibre, so inter-az is about 1/10th the latency)
- Less cache contention, etc [1]
Anyway, if you want to chat about this sometime just ping me an email: adam@ company domain.
[1] I wrote more on this 6 months ago: https://news.ycombinator.com/item?id=45615867
I measured this several years back and never looked at virtual servers again. Since CPU time isn't reserved (like RAM is), the performance is abysmal compared to real hardware.
https://jan.rychter.com/enblog/cloud-server-cpu-performance-...
My entire stack is.. k8s, hosted Postgres, s3 type storage. I can always host my own Postgres. So really down to k8s and s3. I think hetzner has some kind of s3 storage but haven’t looked into, and I assume moving in 100 TB is a process….
High availability, in case anyone else was wondering.
Your post was reasonable until the spam tagline.
Not cool.
Are hetzner in the wrong for denying service to clients in a deficit?
Personally, if I knew they were gonna shut me down if I didn't pay before X date, I'd fight it up until X-2 days, pay it, then continue fighting (depends on the amount of). But it's not clear that OP was given such a deadline.
I see the DigitalOcean vs Hetzner comparison as a tradeoff that we make in different domains all day long, similar to opening your DoorDash or UberEats instead of making your own dinner(and the cost ratio is similar too).
I work in all 3 major clouds, on-prem, the works. I still head to the DigitalOcean console for bits and pieces type work or proof of concept testing. Sometimes you just want to click a button and the server or bucket or whatever is ready and here's the access info and it has sane defaults and if I need backups or whatnot it's just a checkbox. Your time is worth money too.
One is about all the steps of zero downtime migration. It's widely applicable.
The other is the decision to replace a cloud instance with bare metal. It saves a lot in costs, but also the loss of fast failover and data backups is priced in.
If I were doing this, I would run a hot spare for an extra $200, and switched the primary every few days, to guarantee that both copies work well, and the switchover is easy. It would be a relatively low price for a massive reduction of the risk of a catastrophic failure.
You're describing Hetzner Cloud, which has been like this for many years. At least 6.
Hetzner also offers Hetzner Cloud API, which allows us to not have to click any button and just have everything in IaC.
i hardly ever visit their website, everything from terminal.
However, the dealbreaker for me was that Hetzner IPs have a bad reputation. At work, I learned that one of the managed AWS firewall rules blocks many (maybe all) of their IPs. I can’t even open a website hosted on a Hetzner IP from my work laptop because it’s blocked by some IT policy (maybe this is not an issue for you if you are using CloudFlare or similar).
I've read online that the DDoS protection is very bad as well.
So in the end, I picked DO App Platform in one of the EU regions. Having the option to use a managed DB was a big plus as well.
source: moved away from DO for this very reason.
https://news.ycombinator.com/item?id=47279518
It looks like Hetzner is Tor (and Tor adjacent) friendly, I suggested this might affect IP reputation, 2 users responded they had no IP reputation issues. But it looks like that wasn't quite the whole story
https://community.torproject.org/relay/community-resources/g...
It seems that Hetzner holds 7% of the Tor network. (if I understood the table right)
so much that I'm thinking of selling nothing but an aws and azure blocker as a service.
Because with a single-server setup like this, I'd imagine that hardware (e.g. SSD) failure brings down your app, and in the case of SSD failure, you then have hours or days downtime while you set everything up again.
Once the first SSD fails after some years, and your monitoring catches that, you can either migrate to a new box, find another intermediate solution/replica, or let them hotswap it while the other drive takes on.
Of course though, going to physical servers loses redundency of the cloud, but that's something you need to price in when looking at the savings and deciding your risk model.
And yes, running this without also at least daily snapshotting/backup to remote storage is insane - that applies to cloud aswell, albeit easier to setup there.
Have 2x servers atleast then invest in proper monitoring.
Server can fail without disk failures.
For quite a while we ran single power supplies because they were pretty high quality, but then Supermicro went through a ~6 month period where basically every power supply in machines we got during that time failed within a year, and replacements were hard to come by (because of high demand, because of failures), and we switched to redundant. This was all cost savings trade-offs. When running single power supplies, we had in-rack Auto Transfer Switches, so that the single power supplies could survive A or B side power failure.
But, and this is important, we were monitoring the systems for drive failures and replacing them within 24 hours. Ditto for power supplies. If you don't monitor your hardware for failure, redundancy doesn't mean anything.
Yeah. This blog post reads like it was written by someone who didn't think things through and just focused on hyper-agressive cost-cutting.
I bet their DigitalOcean vm did live migrations and supported snapshots.
You can get that at Hetzner but only in their cloud product.
You absolutely will not get that in Hetzner bare-metal. If your HD or other component dies, it dies. Hetzner will replace the HD, but its up to you to restore from scratch. Hetzner are very clear about this in multiple places.
They could, but they didn't and instead they wrote that blog post which, even being generous is still kinda hard to avoid describing as misleading.
I would not have written the post I did if they had presented a multi-node bare-metal cluster or whatever more realistic config.
What do you feel was misleading?
They don't.
And reading the article, they don't seem to understand that.
Erm. I already spelt it out in my original post ?
I'm not going to re-write it, the TL;DR is they are making an Apples and Oranges comparison.
Yes they "saved money" but in no way, shape or form are the two comparable.
The polite way to put is is .... they saved as much money as they did because they made very heavy handed "architectural decisions". "Decisions" that they appear to be unaware of having made.
I agree with the other poster, this is fine for a toy site or sites but low quality manual DR isn't good for production.
I don't know where to start with this comment. Do I really need to spell out the difference between cloud and bare metal ?
A few examples...
- Live migration ? Cloud only.
- Snapshots ? Cloud only.
- Want to increase disk space ? Tick box in cloud vs. replace disks (or move to different machine) and re-install/restore in bare metal....
- Want to increase RAM ? Tick box in cloud vs. shutdown, pull out of rack, install new chips (or move to different machine and re-install/restore)....
- Want to upgrade to a beefier processor ? Tick box in cloud vs move to a completely different machine and re-install/restoreAlso, with something like Hetzner you would not be going in and physically doing anything. You also just tick a box for a RAM upgrade, and then migrate over or do active/passive switch.
The cloud does have advantages, mostly in how "easy" it is to do some specific workflows, but per-compute it's at least 10x the cost. Some will argue it's less than that, but they forget to factor in just how slow virtual disks and CPU are. Cloud only makes sense for very small businesses, in which the operational cost of colocation or on-prem hosting is too expensive.
Yeah you pay for and get additional stuff with cloud. Nobody disputed that.
Well, technically its still a possibility.
I am old enough to have seen issues with RAID1 setups not being able to restore redundancy, as well as RAID controller failures and software RAID failures.
Also, frankly you are being somewhat pedantic. My broader point was regarding cloud. I gave HD Failure as one example, randomly selected by my brain ... I could have equally randomly chosen any of the other items ... but this time, my brain chose HD.
Curious what the delta to pain-in-ass would be if I want to deal with storing data. (And not just backups / migrations, but also GDPR, age verification etc.)
i already design with Auto Scale Group in mind, we run it in spot instance which tend to be much cheaper. Spot instances can be reclaimed anytime, so you need to keep this is kind.
I also have data blobs which are memory maped files, which are swapped with no downtime by pulling manifest from GCS bucket each hour, and swapping out the mmaped data.
i use replicas, with automatic voting based failover.
I've used mongo with replication and automative failover for a decade in production with no downtime, no data lost.
Recently, got into postgres, so far so good. Before that i always used RDS or other managed solution like Datastore, but they cost soo much compared to running your own stuff.
Healthchecks start new server in no time, even if my Hertzner server goes out or if whole Hertzer goes out, my system will launch digital ocean nodes which will start soaking up all requests.
Recently, I did it in PostgreSQL using pg_auto_failover. I have 1 monitor node, 1 primary, and 1 replica.
Surprisingly, once you get the hang of PostgreSQL configuration and its gotchas, it’s also very easy to replicate.
I’m guessing MySQL is even easier than PostgreSQL for this.
I also achieved zero downtime migration.
Not every app needs 24/7 availability. The vast majority of websites out there will not suffer any serious consequences from a few hours of downtime (scheduled or otherwise) every now and then. If the cost savings outweigh the risk, it can be a perfectly reasonable business decision.
A more interesting question would be what kind of backup and recovery strategy they have, and which aspects of it (if any) they had to change when they moved to Hetzner.