Posted by KraftyOne 1 hour ago
The main benefit is centralizing all the data in one place so we don't need to worry about copying data in between multiple systems. Once something becomes the bottleneck, you can eventually migrate to a purpose specific tool to scale out.To be honest, LISTEN/NOTIFY in my opinion is the most fragile part of PG but it's fine as start until you scale out.
I'm working at a scale where almost every day I have to ask people "are you use you need to treat that as relational data? It doesn't seem relational"
I think if you grow enough to look for these extensions, it's usually better to bet on purpose-specific tooling. For example, I use DuckDB/Iceberg combination extensively for columnar data and connect DuckDB to PG when I need it.
That said, my gamer-brain wants to call this "Save-scumming at scale." Which is to say, a lot of people already know that this approach works, but maybe they haven't made the connection to abstract CS stuff.
Another strategy that can be used to build robustness is to build your workflow out of idempotent operations. That can be useful for situations where the workflow state is too large to back up. Instead, you just run the job from the top and it's a bunch of no-ops until you start making progress again.
https://lucumr.pocoo.org/2025/11/3/absurd-workflows/
https://github.com/earendil-works/absurd
https://earendil-works.github.io/absurd/
I've not used it, but it's worth comparing to other options
Postgres is not cheap to run in the cloud at scale. We went for the cheapest infra, which is basically the disk storage.
https://github.com/agentspan-ai/agentspan which is essentially an agentic SDK layer for Conductor can convert any of your langgraph, openAI, vercel, or ADK agent and makes it durable and adds orchestration with no code changes.
I have used Temporal in the past, works really good, my only problem with it was some limits on request payload or event sizes, created some inconveniences to us when building solutions. It also enforces good engineering practices, but sometimes you don't want to write special logic if your CSV file is larger than 2Mb, upload it to S3, pass link, then download it in the workflow.
What is your experience with DBOS? How does it compare to Temporal in terms of operational complexity, feature parity and anything else
Then I tried their Cloud offering and was appalled at their pricing. I burned through the $1,000 free credits before I even got something to production. Didn't want to bother with running a local Temporal, either.
Best solution is to just take inspiration from their architecture and then do it yourself in Postgres, IMO.
Temporal is, in my opinion having run it in prod for over a year - poorly designed, slow and ridicliously heavy infra wise.
If you're doing anything non-trivial (say, 200+ events/workflow) and you need to run only a couple hundred of them concurrently all day, you're going to spend millions on infra, and it's still going to absolutely suck.
Try running their own benchmarks, the numbers are pathetic.
Their sales team is also absolutely appalling and desperate.
From a Developer standpoint, the SDK is quite nice though.
Don't get trapped into nexus, and if the sales team call you make sure legal is in the room.
https://github.com/temporalio/temporal/blob/e22e6304b3c4a409...
https://github.com/temporalio/temporal/blob/e22e6304b3c4a409...
Temporal does a crazy amount of database operations and all of these are behind that mutex.
Oh, and you can't change the shard count on existing clusters.
Great stuff.
I recently developed a distributed queue and it works really great - benchmarks great too, with no race conditions or conflicts. I used SKIP LOCKED so that workers can compete safely.
You can also have multiple workers across nodes avoid conflict by using session wide mutexes i.e. pg advisory lock.
Edit: Actually I checked this again and apparently the advice has now changed to the inverse.