Go ahead, self-host Postgres

Posted by pavel_lishin 12/20/2025

Go ahead, self-host Postgres(pierce.dev)

683 points | 396 commentspage 3

vitabaks 12/21/2025|

Just use Autobase for PostgreSQL

automates the deployment and management of highly available PostgreSQL clusters in production environments. This solution is tailored for use on dedicated physical servers, virtual machines, and within both on-premises and cloud-based infrastructures.

nhumrich 12/20/2025||

What do you postgres self hosters use for performance analysis? Both GCP-SQL and RDS have their performance analysis pieces of the hosted DB and it's incredible. Probably my favorite reason for using them.

cuu508 12/20/2025||

I use pgdash and netdata for monitoring and alerting, and plain psql for analyzing specific queries.

sa46 12/20/2025||

I’ve been very happy with Pganalyze.

_ea1k 12/21/2025||

I often find it sad how many things that we did, almost without thinking about them, that are considered hard today. Take a stroll through this thread and you will find out that everything from RAID to basic configuration management are ultrahard things that will lead you to having a bus factor of 1.

What went so wrong during the past 25 years?

wreath 12/20/2025||

> Take AWS RDS. Under the hood, it's:

    Standard Postgres compiled with some AWS-specific monitoring hooks
    A custom backup system using EBS snapshots
    Automated configuration management via Chef/Puppet/Ansible
    Load balancers and connection pooling (PgBouncer)
    Monitoring integration with CloudWatch
    Automated failover scripting

I didn't know RDS had PgBouncer under the hood, is this really accurate?

The problem i find with RDS (and most other managed Postgres) is that they limit your options for how you want to design your database architecture. For instance, if write consistency is important to you want to support synchronous replication, there is no way to do this in RDS without either Aurora or having the readers in another AZ. The other issue is that you only have access to logical replication, because you don't have access to your WAL archive, so it makes moving off RDS much more difficult.

mystifyingpoi 12/20/2025|

> I didn't know RDS had PgBouncer under the hood

I don't think it does. AWS has this feature under RDS Proxy, but it's an extra service and comes with extra cost (and a bit cumbersome to use in my opinion, it should have been designed as a checkbox, not an entire separate thing to maintain).

Although, it technically has "load balancer", in form of a DNS entry that resolves to a random reader replica, if I recall correctly.

zsoltkacsandi 12/20/2025||

I've operated both self-hosted and managed database clusters with complex topologies and mission-critical data at well-known tech companies.

Managed database services mostly automate a subset of routine operational work, things like backups, some configuration management, and software upgrades. But they don't remove the need for real database operations. You still have to validate restores, build and rehearse a disaster recovery plan, design and review schemas, review and optimize queries, tune indexes, and fine-tune configuration, among other essentials.

In one incident, AWS support couldn't determine what was wrong with an RDS cluster and advised us to "try restarting it".

Bottom line: even with managed databases, you still need people on the team who are strong in DBOps. You need standard operating procedures and automation, built by your team. Without that expertise, you're taking on serious risk, including potentially catastrophic failure modes.

Nextgrid 12/20/2025|

I've had an RDS instance run out of disk space and then get stuck in "modifying" for 24 hours (until an AWS operator manually SSH'd in I guess). We had to restore from the latest snapshot and manually rebuild the missing data from logs/other artifacts in the meantime to restore service.

I would've very much preferred being able to SSH in myself and fix it on the spot. Ironically the only reason it ran out of space in the first place is that the AWS markup on that is so huge we were operating with little margin for error; none of that would happen with a bare-metal host where I can rent 1TB of NVME for a mere 20 bucks a month.

As far as I know we never got any kind of compensation for this, so RDS ended up being a net negative for this company, tens of thousands spent over a few years for laptop-grade performance and it couldn't even do its promised job the only time it was needed.

ergonaught 12/20/2025||

> Self-hosting a database sounds terrifying.

Is this actually the "common" view (in this context)?

I've got decades with databases so I cannot even begin to fathom where such an attitude would develop, but, is it?

Boggling.

Nextgrid 12/20/2025||

Over a decade of cloud provider propaganda achieves that. We appear to have lost the basic skill of operating a *nix machine, so anything even remotely close to that now sounds terrifying.

You mean you need to SSH into the box? Horrifying!

gnusi 12/21/2025||

Can't agree more.

vbezhenar 12/21/2025||

I don't feel like it's easy to self-host postgres.

Here are my gripes:

1. Backups are super-important. Losing production data just is not an option. Postgres offers pgdump which is not appropriate tool, so you should set up WAL archiving or something like that. This is complicated to do right.

2. Horizontal scalability with read replicas is hard to implement.

3. Tuning various postgres parameters is not a trivial task.

4. Upgrading major version is complicated.

5. You probably need to use something like pgbouncer.

6. Database usually is the most important piece of infrastructure. So it's especially painful when it fails.

I guess it's not that hard when you did it once and have all scripts and memory to look back. But otherwise it's hard. Clicking few buttons in hoster panel is much easier.

npn 12/21/2025||

wal archiving is piss easy. you can also just use basebackup. with postgres 17 it is easier than ever with incremental backup feature.

you don't need horizontal scalability when a single server can have 384 cpu real cores, 6TB of ram, some petabytes of pcie5 ssd, 100Gbps NIC.

for tuning postgres parameters, you can start by using pgtune.leopard.in.ua or pgconfig.org.

upgrading major version is piss easy since postgres 10 or so. just a single command.

you do not need pgbouncer if your database adapter library already provide the database pool functionality (most of them do).

for me maintained database also need that same amount of effort, due to shitty documents and garbage user interfaces (all aws, gcp or azure is the same), not to mention they change all the time.

tonyhart7 12/21/2025|||

"all scripts and memory to look back. But otherwise it's hard. Clicking few buttons in hoster panel is much easier."

so we need open source way to do that, coolify/dokploy comes to mind and it exactly do that way

I would say 80% of your point wouldnt be hit at certain scale, as most application grows and therefore outgrow your tech stack. you would replace them anyway at some point

nrhrjrjrjtntbt 12/21/2025||

Scaling to a different instance size is also easy on AWS.

That said a self hosted DB on a dedicated Hetzner flies. It does things at the price that may save you time reworking your app to be more efficient on AWS for cost.

So swings and roundabouts.

npn 12/21/2025||

> I sleep just fine at night thank you.

I also self-host my webapp for 4+ years. never have any trouble with databases.

pg_basebackup and wal archiving work wonder. and since I always pull the database (the backup version) for local development, the backup is constantly verified, too.

cosmodust 12/21/2025||

I would suggest if you do host your database yourself consider taking the data seriously. Few easy solutions are using a multi zonal disk [1] with scheduled automatic snapshots [2].

[1] https://docs.cloud.google.com/compute/docs/disks/hd-types/hy... [2] https://docs.cloud.google.com/compute/docs/disks/create-snap...

cube00 12/21/2025||

Scheduled automatic snapshots are not the kind of consistent snapshots you need for a filesystem based backup.

cosmodust 12/22/2025||

Snapshots might break ACID for last few transactions but it will flush all in-memory writes before taking the freeze. Consider its 1 click solution, its good enough than losing everything?

ipsento606 12/20/2025|

> If your database goes down at 3 AM, you need to fix it.

Of all the places I've worked that had the attitude "If this goes down at 3AM, we need to fix it immediately", there was only one where that was actually justifiable from a business perspective. I'm worked at plenty of places that had this attitude despite the fact that overnight traffic was minimal and nothing bad actually happened if a few clients had to wait until business hours for a fix.

I wonder if some of the preference for big-name cloud infrastructure comes from the fact that during an outage, employees can just say "AWS (or whatever) is having an outage, there's nothing we can do" vs. being expected to actually fix it

From this perspective, the ability to fix problems more quickly when self hosting could be considered an antifeature from the perspective of the employee getting woken up at 3am

laz 12/20/2025||

The worst SEV calls are the one where you twiddle your thumbs waiting for a support rep to drop a crumb of information about the provider outage.

You wake up. It's not your fault. You're helpless to solve it.

OccamsMirror 12/20/2025||

Not when that provider is AWS and the outage is hitting news websites. You share the link to AWS being down and go back to sleep.

sixdonuts 12/21/2025|||

News is one thing, if the app/service down impacts revenue, safety or security you won't be getting any sleep AWS or not.

laz 12/20/2025|||

No. You sit on the call and wait to restore your service to your users. There’s bullshit toil in disabling scale in as the outage gets longer.

Eventually, AWS has a VP of something dial in to your call to apologize. They’re unprepared and offer no new information. The get handed to a side call for executive bullshit.

AWS comes back. Your support rep only vaguely knows what’s going on. Your system serves some errors but digs out.

Then you go to sleep.

jonahx 12/20/2025|||

This is also the basis for most SaaS purchases by large corporations. The old "Nobody gets fired for choosing IBM."

zbentley 12/20/2025||

Really? That might be an anecdote sampled from unusually small businesses, then. Between myself and most peers I’ve ever talked to about availability, I heard an overwhelming majority of folks describe systems that really did need to be up 24/7 with high availability, and thus needed fast 24/7 incident response.

That includes big and small businesses, SaaS and non-SaaS, high scale (5M+rps) to tiny scale (100s-10krps), and all sorts of different markets and user bases. Even at the companies that were not staffed or providing a user service over night, overnight outages were immediately noticed because on average, more than one external integration/backfill/migration job was running at any time. Sure, “overnight on call” at small places like that was more “reports are hardcoded to email Bob if they hit an exception, and integration customers either know Bob’s phone number or how to ask their operations contact to call Bob”, but those are still environments where off-hours uptime and fast resolution of incidents was expected.

Between me, my colleagues, and friends/peers whose stories I know, that’s an N of high dozens to low hundreds.

What am I missing?

runako 12/20/2025||

> What am I missing?

IME the need for 24x7 for B2B apps is largely driven by global customer scope. If you have customers in North American and Asia, now you need 24x7 (and x365 because of little holiday overlap).

That being said, there are a number of B2B apps/industries where global scope is not a thing. For example, many providers who operate in the $4.9 trillion US healthcare market do not have any international users. Similarly the $1.5 trillion (revenue) US real estate market. There are states where one could operate where healthcare spending is over $100B annually. Banks. Securities markets. Lots of things do not have 24x7 business requirements.

zbentley 12/20/2025||

I’ve worked for banks, multiple large and small US healthcare-related companies, and businesses that didn’t use their software when they were closed for the night.

All of those places needed their backend systems to be up 24/7. The banks ran reports and cleared funds with nightly batches—hundreds of jobs a night for even small banking networks. The healthcare companies needed to receive claims and process patient updates (e.g. your provider’s EMR is updated if you die or have an emergency visit with another provider you authorized for records sharing—and no, this is not handled by SaaS EMRs in many cases) over night so that their systems were up to date when they next opened for business. The “regular” businesses closed for the night generated reports and frequently had IT staff doing migrations, or senior staff working on something at midnight due the next day (when the head of marketing is burning the midnight oil on that presentation, you don’t want to be the person explaining that she can’t do it because the file server hosting the assets is down all the time after hours).

And again, that’s the norm I’ve heard described from nearly everyone in software/IT that I know: most businesses expect (and are willing to pay for or at least insist on) 24/7 uptime for their computer systems. That seems true across the board: for big/small/open/closed-off-hours/international/single-timezone businesses alike.

runako 12/20/2025|||

You are right that a lot of systems at a lot of places need 24x7. Obviously.

But there are also a not-insignificant number of important systems where nobody is on a pager, where there is no call rotation[1]. Computers are much more reliable than they were even 20 years ago. It is an Acceptable Business Choice to not have 24x7 monitoring for some subset of systems.

Until very recently[2], Citibank took their public website/user portal offline for hours a week.

1 - if a system does not have a fully staffed call rotation with escalations, it's not prepared for a real off-hours uptime challenge 2 - they may still do this, but I don't have a way to verify right now.

stickfigure 12/21/2025|||

This lasts right up until an important customer can't access your services. Executives don't care about downtime until they have it, then they suddenly care a lot.

true_religion 12/21/2025||

You can often have services available for VIPs, and be down for the public.

Unless there's a misconfiguration, usually apps are always visible internally to staff, so there's an existing methodology to follow to make them visible to VIPs.

But sometimes none of that is necessary. I've seen at a 1B market cap company, a failure case where the solution was manual execution by customer success reps while the computers were down. It was slower, but not many people complained that their reports took 10 minutes to arrive after being parsed by Eye Ball Mk 1s, instead of the 1 minute of wait time they were used to.

sixdonuts 12/21/2025|||

Thousands of orgs have full stack OT/CI apps/services that must run 24/7 365 and are run fully on premise.

chickensong 12/20/2025|||

Uptime is also a sales and marketing point, regardless of real-world usage. Business folks in service-providing companies will usually expect high availability by default, only tempered by the cost and reality of more nines.

Also, in addition to perception/reputation issues, B2B contracts typically include an SLA, and nobody wants to be in breach of contract.

I think the parent you're replying to is wrong, because I've worked at small companies selling into large enterprise, and the expectation is basically 24/7 service availability, regardless of industry.

More comments...