Top
Best
New

Posted by pavel_lishin 3 days ago

Go ahead, self-host Postgres(pierce.dev)
671 points | 395 commentspage 2
kachapopopow 3 days ago|
since this is on the front page (again?) I guess I'll chime in: learn kubernetes - it's worth it. It did take me 3 attempts at it to finally wrap my head around it I really suggest trying out many different things and see what works for you.

And I really recommend starting with *default* k3s, do not look at any alternatives to cni, csi, networked storage - treat your first cluster as something that can spontaniously fail and don't bother keeping it clean learn as much as you can.

Once you have that, you can use great open-source k8s native controllers which take care of vast majority of requirements when it comes to self-hosting and save more time in the long run than it took to set up and learn these things.

Honerable mentions: k9s, lens(I do not suggest using it in the long-term, but UI is really good as a starting point), rancher webui.

PostgreSQL specifically: https://github.com/cloudnative-pg/cloudnative-pg If you really want networked storage: https://github.com/longhorn/longhorn

I do not recommend ceph unless you are okay with not using shared filesystems as they have a bunch of gotchas or if you want S3 without having to install a dedicated deployment for it.

ninkendo 3 days ago||
At $WORK we’ve been using the Zalando Postgres kubernetes operator to great success: https://github.com/zalando/postgres-operator

As someone who has operated Postgres clusters for over a decade before k8s was even a thing, I fully recommend just using a Postgres operator like this one and moving on. The out of box config is sane, it’s easy to override things, and failover/etc has been working flawlessly for years. It’s just the right line between total DIY and the simplicity of having a hosted solution. Postgres is solved, next problem.

vovavili 3 days ago||
For something like a database, what is the added advantage to using Kubernetes as opposed to something simple like Docker Compose?
alex23478 2 days ago|||
In this case the advantage are operators for running postgres.

With Docker Compose, the abstraction level you're dealing with is containers, which means in this case you're saying "run the postgres image and mount the given config and the given data directory". When running the service, you need to know how to operate the software within the container.

Kubernetes at its heart is an extensible API Server, which allows so called "operators" to create custom resources and react to them. In the given case, this means that a postgres operator defines for example a PostgresDatabaseCluster resource, and then contains control loops to turn these resources into actual running containers. That way, you don't necessarily need to know how postgres is configured and that it requires a data directory mount. Instead, you create a resource that says "give me a postgres 15 database with two instances for HA fail-over", and the operator then goes to work and manages the underlying containers and volumes.

Essentially operators in kubernetes allow you to manage these services at a much higher level.

mystifyingpoi 3 days ago||||
Docker Compose (ignoring Swarm which seems to be obsolete) manages containers on a single machine. With Kubernetes, the pod that hosts the database is a pod like any other (I assume). It gets moved to a healthy machine when node goes bad, respects CPU/mem limits, works with generic monitoring tools, can be deployed from GitOps tools etc. All the k8s goodies apply.
Nextgrid 2 days ago||
When it comes to a DB moving the process around is easy, it's the data that matters. The reason bare-metal-hosted DBs are so fast is that they use direct-attach storage instead of networked storage. You lose those speed advantages if you move to distributed storage (Ceph/etc).
ninkendo 2 days ago||
You don’t need to use networked storage, the zalando postgres operator just uses local storage on the host. It uses a StatefulSet underneath so that pods will stay on the same node until you migrate them.
Nextgrid 2 days ago||
But if I'm pinning it to dedicated machines then Kubernetes does not give me anything, but I still have to deal with its tradeoffs and moving parts - which from experience are more likely to bring me down than actual hardware failure.
ninkendo 2 days ago|||
It’s not like anyone’s recommending you setup k8s just to use Postgres. The advice is that, if you’re already using k8s, the Postgres operator is pretty great, and you should try it instead of using a hosted Postgres offering or having a separate set of dedicated (non-k8s) servers just for Postgres.

I will say that even though the StatefulSet pins the pod to a node, it still has advantages. The StatefulSet can be scaled to N nodes, and if one goes down, failover is automatic. Then you have a choice as an admin to either recover the node, or just delete the pod and let the operator recreate it on some other node. When it gets recreated, it resyncs from the new primary and becomes a replica and you’re back to full health, it’s all pretty easy IMO.

lukaslalinsky 2 days ago|||
I run PostgreSQL+Patroni on Kubernetes where each instance is a separate StatefulSet pinned to dedicated hosts, with data on local ZFS volumes, provisioned by the OpenEBS controller.

I do this for multiple reasons, one is that I find it easier to use Kubernetes as the backend for Patroni, rather than running/securing/maintaining just another etcd cluster. But I also do it for observability, it's much nicer to be able to pull all the metrics and logs from all the components. Sure, it's possible to set that up without Kubernetes, but why if I can have the logs delivered just one way. Plus, I prefer how self-documenting the whole thing is. No one likes YAML manifests, but they are essentially running documentation that can't get out of sync.

ninkendo 2 days ago||||
The assumption is that you’re already using Kubernetes, sorry.

Docker compose has always been great for running some containers on a local machine, but I’ve never found it to be great for deployments with lots of physical nodes. k8s is certainly complex, but the complexity really pays off for larger deployments IMO.

kachapopopow 2 days ago|||
I hate that this is starting to sound like a bot Q&A, but the primary advantages is that it provides secure remote configuration and it's that it's platform agnostic, multi-node orchestration, built in load balancing and services framework, way more networking control than docker, better security, self healing and the list goes on, you have to read more about it to really understand the advantages over docker.
satvikpendem 3 days ago|||
Check out canine.sh, it's to Kubernetes what Coolify or Dokploy is to Docker, if you're familiar with self hosted open source PaaS.
chuckadams 3 days ago|||
And on a similar naming note yet totally unrelated, check out k9s, which is a TUI for Kubernetes cluster admin. All kinds of nifty features built-in, and highly customizable.
satvikpendem 3 days ago||
If we're talking about CLIs, check out Kamal, the build system that 37signals / Basecamp / DHH developed, specifically to move off the cloud. I think it uses Kubernetes but not positive, it might just be Docker.
Nextgrid 2 days ago||
It's just Docker - it SSH's in to the target servers and runs `docker` commands as needed.
kachapopopow 3 days ago|||
I just push to git where there is a git action to automatically synchronize deployments
chandureddyvari 3 days ago|||
Any good recommendations you got for learning kubernetes for busy people?
mystifyingpoi 2 days ago|||
No path for busy people, unfortunately. Learn everything from ground up, from containers to Compose to k3s, maybe to kubeadm or hosted. Huge abstraction layers coming from Kubernetes serve their purpose well, but can screw you up when anything goes slightly wrong on the upper layer.

For start, ignore operators, ignore custom CSI/CNI, ignore IAM/RBAC. Once you feel good in the basics, you can expand.

kachapopopow 2 days ago|||
k3sup a cluster, ask an AI on how to serve an nginx static site using trafeik on it and explain every step of it and what it does (it should provide: a config map, a deployment, a service and an ingress)

k3s provides: csi, cni (cluster storage interface, cluster network interface) which is flannel and and local-pv which just maps volumes to disk (pvcs)

trafeik is what routes your traffic from the outside to inside your cluster (to an ingress resource)

groundzeros2015 2 days ago||
Are you working on websites with millions of hourly visits?
lbrito 3 days ago||
I'm probably just an idiot, but I ran unmanaged postgres on Fly.io, which is basically self hosting on a vm, and it wasn't fun.

I did this for just under two years, and I've lost count of how many times one or more of the nodes went down and I had to manually deregister it from the cluster with repmgr, clone a new vm and promote a healthy node to primary. I ended up writing an internal wiki page with the steps. I never got it: if one of the purposes of clusters is having higher availability, why did repmgr not handle zombie primaries?

Again, I'm probably just an idiot out of my depth with this. And I probably didn't need a cluster anyway, although with the nodes failing like they did, I didn't feel comfortable moving to a single node setup as well.

I eventually switched to managed postgres, and it's amazing being able to file a sev1 for someone else to handle when things go down, instead of the responsibility being on me.

indigodaddy 3 days ago|
Assuming you are using fly's managed postgres now?
lbrito 2 days ago||
Yep
jpgvm 2 days ago||
Beyond the usual points there are some other important factors to consider self-hosting PG:

1. Access to any extension you want and importantly ability to create your own extensions.

2. Being able to run any version you want, including being able to adopt patches ahead of releases.

3. Ability to tune for maximum performance based on the kind of workload you have. If it's massively parallel you can fill the box with huge amounts of memory and screaming fast SSDs, if it's very compute heavy you can spec the box with really tall cores etc.

Self hosting is rarely about cost, it's usually about control for me. Being able to replace complex application logic/types with a nice custom pgrx extension can save massive amounts of time. Similarity using a custom index access method can unlock a step change in performance unachievable without some non-PG solution that would compromise on simplicity by forcing a second data store.

ijustlovemath 3 days ago||
And if you want a supabase-like functionality, I'm a huge fan of PostgREST (which is actually how supabase works/worked under the hood). Make a view for your application and boom, you have a GET only REST API. Add a plpgsql function, and now you can POST. It uses JWT for auth, but usually I have application on the same VLAN as DB so it's not as rife for abuse.
satvikpendem 3 days ago|
You can self host Supabase too.
SamDc73 2 days ago||
Last time I checked, it was a pain in the ass to self-host it
satvikpendem 1 day ago||
I assume by their own design and also because there are a lot of moving pieces they packaged up together.
markstos 3 days ago||
I hosted PostgreSQL professionally for over a decade.

Overall, a good experience. Very stable service and when performance issues did periodically arise, I like that we had full access to all details to understand the root cause and tune details.

Nobody was employeed as a full-time DBA. We had plenty of other things going on in addition to running PostgreSQL.

jbmsf 2 days ago||
I started in this industry before cloud was a thing. I did most of the things RDS does the hard way (except being able to dynamically increase memory on a running instance, that's magic to me). I do not want that responsibility, especially because I know how badly it turns out when it's one of a dozen (or dozens) of responsibilities asked of the team.
arichard123 3 days ago||
I've been self hosting it for 20 years. Best technical decision I ever made. Rock solid
newsoftheday 3 days ago|
I've been selfhosting it for at least 10 years, it and mysql, mysql longer. No issues selfhosting either. I have backups and I know they work.
moxplod 3 days ago||
What server company are you guys using with high reliability? Looking for server in US-East right now.
vitabaks 2 days ago||
Just use Autobase for PostgreSQL

https://github.com/vitabaks/autobase

automates the deployment and management of highly available PostgreSQL clusters in production environments. This solution is tailored for use on dedicated physical servers, virtual machines, and within both on-premises and cloud-based infrastructures.

sergiotapia 2 days ago||
Some fun math for you guys.

I had a single API endpoint performing ~178 Postgres SQL queries.

  Setup              Latency/query   Total time
  -------------------------------------------------
  Same geo area      35ms            6.2s
  Same local network 4ms             712ms
  Same server        ~0ms            170ms
This is with zero code changes, these time shavings are coming purely from network latency. A lot of devs lately are not even aware of latency costs coming from their service locations. It's crazy!
conradfr 2 days ago|
I've been self hosting Postgresql for 12+ years at this point. Directly on bare metal then and now in a container with CapRover.

I have a cron sh script to backup to S3 (used to be ftp).

It's not "business grade" but it has also actually NEVER failed. Well once, but I think it was more the container or a swarm thing. I just destroyed and recreated it and it picked up the same volume fine.

The biggest pain point is upgrading as Postgresql can't upgrade the data without the previous version installed or something. It's VERY annoying.

More comments...