Show HN: Hatchet v1 – A task orchestration platform built on Postgres

Posted by abelanger 4/3/2025

Show HN: Hatchet v1 – A task orchestration platform built on Postgres(github.com)

Hey HN - this is Alexander from Hatchet. We’re building an open-source platform for managing background tasks, using Postgres as the underlying database.

Just over a year ago, we launched Hatchet as a distributed task queue built on top of Postgres with a 100% MIT license (https://news.ycombinator.com/item?id=39643136). The feedback and response we got from the HN community was overwhelming. In the first month after launching, we processed about 20k tasks on the platform — today, we’re processing over 20k tasks per minute (>1 billion per month).

Scaling up this quickly was difficult — every task in Hatchet corresponds to at minimum 5 Postgres transactions and we would see bursts on Hatchet Cloud instances to over 5k tasks/second, which corresponds to roughly 25k transactions/second. As it turns out, a simple Postgres queue utilizing FOR UPDATE SKIP LOCKED doesn’t cut it at this scale. After provisioning the largest instance type that CloudSQL offers, we even discussed potentially moving some load off of Postgres in favor of something trendy like Clickhouse + Kafka.

But we doubled down on Postgres, and spent about 6 months learning how to operate Postgres databases at scale and reading the Postgres manual and several other resources [0] during commutes and at night. We stuck with Postgres for two reasons:

1. We wanted to make Hatchet as portable and easy to administer as possible, and felt that implementing our own storage engine specifically on Hatchet Cloud would be disingenuous at best, and in the worst case, would take our focus away from the open source community.

2. More importantly, Postgres is general-purpose, which is what makes it both great but hard to scale for some types of workloads. This is also what allows us to offer a general-purpose orchestration platform — we heavily utilize Postgres features like transactions, SKIP LOCKED, recursive queries, triggers, COPY FROM, and much more.

Which brings us to today. We’re announcing a full rewrite of the Hatchet engine — still built on Postgres — together with our task orchestration layer which is built on top of our underlying queue. To be more specific, we’re launching:

1. DAG-based workflows that support a much wider array of conditions, including sleep conditions, event-based triggering, and conditional execution based on parent output data [1].

2. Durable execution — durable execution refers to a function’s ability to recover from failure by caching intermediate results and automatically replaying them on a retry. We call a function with this ability a durable task. We also support durable sleep and durable events, which you can read more about here [2]

3. Queue features such as key-based concurrency queues (for implementing fair queueing), rate limiting, sticky assignment, and worker affinity.

4. Improved performance across every dimension we’ve tested, which we attribute to six improvements to the Hatchet architecture: range-based partitioning of time series tables, hash-based partitioning of task events (for updating task statuses), separating our monitoring tables from our queue, buffered reads and writes, switching all high-volume tables to use identity columns, and aggressive use of Postgres triggers.

We've also removed RabbitMQ as a required dependency for self-hosting.

We'd greatly appreciate any feedback you have and hope you get the chance to try out Hatchet.

[0] https://www.postgresql.org/docs/

[1] https://docs.hatchet.run/home/conditional-workflows

[2] https://docs.hatchet.run/home/durable-execution

240 points | 74 commentspage 3

anentropic 4/4/2025|

Quick feedback:

Would love to see some sort of architecture overview in the docs

The top-level docs have a section on "Deploying workers" but I think there are more components than that?

It's cool there's a Helm chart but the docs don't really say what resources it would deploy

https://docs.hatchet.run/self-hosting/docker-compose

...shows four different Hatchet services plus, unexpectedly, both a Postgres server and RabbitMQ. Can't see anywhere that describes what each one of those does

Also in much of the docs it's not very clear where the boundary between Hatchet Cloud and Hatchet the self-hostable OSS part lies

gabrielruttner 4/4/2025|

Thanks for this feedback, we'll add some details and an architecture diagram.

The simplest way to run hatchet is with `hatchet-lite`[0] which bundles all internal services. For most deployments we recommend running these components separately hence the multiple services in the helm chart [1]. RabbitMQ is now an optional dependency which is used for internal-service messages for higher throughput deployments [2].

Your workers are always run as a separate process.

[0] https://docs.hatchet.run/self-hosting/hatchet-lite

[1] https://docs.hatchet.run/self-hosting/improving-performance#...

[2] https://hatchet.run/launch-week-01/pg-only-mode

edit: missed your last question -- currently self-host includes everything in cloud except managed workers

bluelightning2k 4/4/2025||

Is this Python only?

More importantly: can this be used to run untrusted jobs? E.g. user-supplied or AI supplied code?

wilted-iris 4/3/2025||

This looks very cool! I see a lot of Python in the docs; is it usable in other languages?

abelanger 4/3/2025|

Thanks! There are SDKs for Python, Typescript and Go. We've gotten a lot of requests for other SDKs which we're tracking here: https://github.com/hatchet-dev/hatchet/discussions/436

throwaway9w4 4/3/2025||

Is there any documentation of the api, so that someone can call it directly without going through the sdk?

abelanger 4/3/2025||

We use gRPC on our workers. All API specs can be found here: https://github.com/hatchet-dev/hatchet/tree/main/api-contrac...

However, the SDKs are very tightly integrated with the runtime in each language, and we use gRPC on the workers which will make it more difficult to call the APIs directly.

pkiv 4/4/2025||

Congrats on the launch guys!

bomewish 4/3/2025||

Why not fix all the broken doc links and make sure you have the full sdk spec down first, ready to go? Then drop it all at once, when it’s actually ready. That’s better and more respectful of users. I love the product and want y’all to succeed but this came off as extremely unprofessional.

abelanger 4/3/2025|

Really appreciate the candid feedback, and glad to hear you like the product. We ran a broken links checker against our docs, but it's possible we missed something. Is there anywhere you're seeing a broken link?

Re SDK specs -- I assume you mean full SDK API references? We're nearly at the point where those will be published, and I agree that they would be incredibly useful.

szvsw 4/4/2025||

I’ve been using Hatchet since the summer, and really do love it over celery. I’ve been using Hatchet for academic research experiments with embarrassingly parallel tasks - ie thousands of simultaneous tasks just with different inputs, each CPU bound and on the order of 10s-2min, totaling in the millions of tasks per experiment - and it’s been going great. I think the team is putting together a very promising product. Switching from a roll-my-own SQS+AWS batch system to Hatchet has made my research life so much better. Though part of that also probably comes from the forced improvements you get when re-designing a system a second time.

Although there was support for pydantic validation in v0, now that the v1 SDK has arrived, I would definitely say that the #1 distinguishing feature (at least from a dx perspective) for anyone thinking of switching from Celery or working on a greenfield project is the type safety that comes with the first class pydantic support in v1. That is a huge boon in my opinion.

Another big boon for me was that the combo of both Python and Typescript SDKs - being able to integrate things into frontend demos without having to set up a separate Python api is great.

There are a couple rough edges around asyncio/single worker concurrency IMO - for instance, choosing between 100 workers each with capacity for 8 concurrent task runs vs 800 workers each with capacity for 1 concurrent task run. In Celery it’s a little bit easier to launch a worker node which uses separate processes to handle its concurrent tasks, whereas right now with Hatchet, that’s not possible as far as I am aware, due to how asyncio is used to handle the concurrent task runs which a single worker may be processing. If most of your work is IO bound or already asyncio friendly, this does not really affect you and you can safely use eg a worker with 8x task run capacity, but if you are CPU bound there might be some cases where you would prefer the full process isolation and feel more assured that you are maximally utilizing all your compute in a given node, and right now the best way to do that is only through horizontal scaling or 1x task workers I think. Generally, if you do not have a great mental model already of how Python handles asyncio, threads, pools, etc, the right way to think about this stuff can be a little confusing IMO, but the docs on this from Hatchet have improved. In the future though, I’d love to see an option to launch a Python worker with capacity for multiple simultaneous task runs in separate processes, even if it’s just a thin wrapper around launching separate workers under the hood.

There are also a couple of rough edges in the dashboard right now, but the team has been fixing them, and coming from celery/flower or SQS, it’s already such an improved dashboard/monitoring experience that I can’t complain!

It’s hard to describe, but there is just something fun about working with Hatchet for me, compared to Celery or my previous SQS system. Almost all of the design decision just align with what I would desire, and feel natural.

revskill 4/4/2025||

Confusing docs as there is no setup self hosted for postgres.

abelanger 4/4/2025||

Hey there - all of our self-hosting docs show you how to set up Postgres: https://docs.hatchet.run/self-hosting

Would love to hear more about what you found confusing!

curtisszmania 4/4/2025||

[dead]

tombhowl 4/3/2025|

[dead]