Posted by pavel_lishin 3 days ago
And I really recommend starting with *default* k3s, do not look at any alternatives to cni, csi, networked storage - treat your first cluster as something that can spontaniously fail and don't bother keeping it clean learn as much as you can.
Once you have that, you can use great open-source k8s native controllers which take care of vast majority of requirements when it comes to self-hosting and save more time in the long run than it took to set up and learn these things.
Honerable mentions: k9s, lens(I do not suggest using it in the long-term, but UI is really good as a starting point), rancher webui.
PostgreSQL specifically: https://github.com/cloudnative-pg/cloudnative-pg If you really want networked storage: https://github.com/longhorn/longhorn
I do not recommend ceph unless you are okay with not using shared filesystems as they have a bunch of gotchas or if you want S3 without having to install a dedicated deployment for it.
As someone who has operated Postgres clusters for over a decade before k8s was even a thing, I fully recommend just using a Postgres operator like this one and moving on. The out of box config is sane, it’s easy to override things, and failover/etc has been working flawlessly for years. It’s just the right line between total DIY and the simplicity of having a hosted solution. Postgres is solved, next problem.
With Docker Compose, the abstraction level you're dealing with is containers, which means in this case you're saying "run the postgres image and mount the given config and the given data directory". When running the service, you need to know how to operate the software within the container.
Kubernetes at its heart is an extensible API Server, which allows so called "operators" to create custom resources and react to them. In the given case, this means that a postgres operator defines for example a PostgresDatabaseCluster resource, and then contains control loops to turn these resources into actual running containers. That way, you don't necessarily need to know how postgres is configured and that it requires a data directory mount. Instead, you create a resource that says "give me a postgres 15 database with two instances for HA fail-over", and the operator then goes to work and manages the underlying containers and volumes.
Essentially operators in kubernetes allow you to manage these services at a much higher level.
I will say that even though the StatefulSet pins the pod to a node, it still has advantages. The StatefulSet can be scaled to N nodes, and if one goes down, failover is automatic. Then you have a choice as an admin to either recover the node, or just delete the pod and let the operator recreate it on some other node. When it gets recreated, it resyncs from the new primary and becomes a replica and you’re back to full health, it’s all pretty easy IMO.
I do this for multiple reasons, one is that I find it easier to use Kubernetes as the backend for Patroni, rather than running/securing/maintaining just another etcd cluster. But I also do it for observability, it's much nicer to be able to pull all the metrics and logs from all the components. Sure, it's possible to set that up without Kubernetes, but why if I can have the logs delivered just one way. Plus, I prefer how self-documenting the whole thing is. No one likes YAML manifests, but they are essentially running documentation that can't get out of sync.
Docker compose has always been great for running some containers on a local machine, but I’ve never found it to be great for deployments with lots of physical nodes. k8s is certainly complex, but the complexity really pays off for larger deployments IMO.
For start, ignore operators, ignore custom CSI/CNI, ignore IAM/RBAC. Once you feel good in the basics, you can expand.
k3s provides: csi, cni (cluster storage interface, cluster network interface) which is flannel and and local-pv which just maps volumes to disk (pvcs)
trafeik is what routes your traffic from the outside to inside your cluster (to an ingress resource)
I did this for just under two years, and I've lost count of how many times one or more of the nodes went down and I had to manually deregister it from the cluster with repmgr, clone a new vm and promote a healthy node to primary. I ended up writing an internal wiki page with the steps. I never got it: if one of the purposes of clusters is having higher availability, why did repmgr not handle zombie primaries?
Again, I'm probably just an idiot out of my depth with this. And I probably didn't need a cluster anyway, although with the nodes failing like they did, I didn't feel comfortable moving to a single node setup as well.
I eventually switched to managed postgres, and it's amazing being able to file a sev1 for someone else to handle when things go down, instead of the responsibility being on me.
1. Access to any extension you want and importantly ability to create your own extensions.
2. Being able to run any version you want, including being able to adopt patches ahead of releases.
3. Ability to tune for maximum performance based on the kind of workload you have. If it's massively parallel you can fill the box with huge amounts of memory and screaming fast SSDs, if it's very compute heavy you can spec the box with really tall cores etc.
Self hosting is rarely about cost, it's usually about control for me. Being able to replace complex application logic/types with a nice custom pgrx extension can save massive amounts of time. Similarity using a custom index access method can unlock a step change in performance unachievable without some non-PG solution that would compromise on simplicity by forcing a second data store.
Overall, a good experience. Very stable service and when performance issues did periodically arise, I like that we had full access to all details to understand the root cause and tune details.
Nobody was employeed as a full-time DBA. We had plenty of other things going on in addition to running PostgreSQL.
https://github.com/vitabaks/autobase
automates the deployment and management of highly available PostgreSQL clusters in production environments. This solution is tailored for use on dedicated physical servers, virtual machines, and within both on-premises and cloud-based infrastructures.
I had a single API endpoint performing ~178 Postgres SQL queries.
Setup Latency/query Total time
-------------------------------------------------
Same geo area 35ms 6.2s
Same local network 4ms 712ms
Same server ~0ms 170ms
This is with zero code changes, these time shavings are coming purely from network latency. A lot of devs lately are not even aware of latency costs coming from their service locations. It's crazy!I have a cron sh script to backup to S3 (used to be ftp).
It's not "business grade" but it has also actually NEVER failed. Well once, but I think it was more the container or a swarm thing. I just destroyed and recreated it and it picked up the same volume fine.
The biggest pain point is upgrading as Postgresql can't upgrade the data without the previous version installed or something. It's VERY annoying.