Top
Best
New

Posted by enether 10/29/2025

Kafka is Fast – I'll use Postgres(topicpartition.io)
559 points | 392 commentspage 5
justinhj 10/29/2025|
As engineers we should try to use the right tool for the job, which means thinking about the development team's strengths and weaknesses as well as differentiating factors your product should focus on. Often we are working in the cloud and it's much easier to use a queue or a log database service than manage a bunch of sql servers and custom logic. It can be more cost effective too once you factor in the development time and operational costs.

The fact that there is no common library that implements the authors strategy is a good sign that there is not much demand for this.

jasonthorsness 10/29/2025||
Using a single DBMS for many purposes because it is so flexible and “already there” from an operations perspective is something I’ve seen over and over again. It usually goes wrong eventually with one workload/use screwing up others but maybe that’s fine and a normal part of scaling?

I think a bigger issue is the DBMS themselves getting feature after feature and becoming bloated and unfocused. Add the thing to Postgres because it is convenient! At least Postgres has a decent plugin approach. But I think more use cases might be served by standalone products than by add-ons.

quaunaut 10/29/2025|
It's a normal part of scaling because often bringing in the new technology introduces its own ways of causing the exact same problems. Often they're difficult to integrate into automated tests so folks mock them out, leading to issues. Or a configuration difference between prod/local introduces a problem.

Your DB on the other hand is usually a well-understood part of your system, and while scaling issues like that can cause problems, they're often fairly easy to predict- just unfortunate on timing. This means that while they'll disrupt, they're usually solved quickly, which you can't always say for additional systems.

udave 7 days ago||
I find the distinction between queue and pub sub system quite poor. A pub sub system is just a persistent queue at its core, the only distinction is you have multiple queues for each subscriber, hence multiple readers. everything else stays the same. Ordering is expected to be strict in both cases. The Durability factor is also baked in both systems. On the question of bounded and unbounded queue: does not message queues also spill to disk in order to prevent OOM scenarios?
woile 7 days ago||
There are a few things missing I think.

I think kafka makes easy to create an event driven architecture. This is particularly useful when you have many teams. They are properly isolated from each other.

And with many teams, another problem comes, there's no guarantee that queries are gonna be properly written, then postgres' performance may be hindered.

Given this, I think using Kafka in companies with many teams can be useful, even if the data they move is not insanely big.

Copenjin 10/29/2025||
I'm not really convinced by the comment on NOTIFY instead of the inferior (at least in theory) polling, I expect the global queue if it's really global to be only a temporary location to collect notifications before sending them and not a bottleneck. Never did any benchmark with PG or Oracle (that has a similar feature) but I expect that depending on the polling frequency and average amount of updates each solution could be the best depending on the circumstances.
cpursley 10/29/2025||
Related: https://www.pgflow.dev

It's built on pgmq and not married to supabase (nearly everything is in the database).

Postgres is enough.

GrumpyGoblin 7 days ago||
There is another aspect that many people aren't discussing, the communication aspect.

For a medium to large organization with independent programs that need to talk to each other, Kafka provides an essential capability that would be much slower and higher risk with Postgres.

Standardizing the flow of information across an organization is difficult. Kafka is crucial for that. To achieve that in Postgres would require either a shared database which is inherently risky or would require a customized API for access which introduces another layer of performance bottleneck and build/maintenance cost and decreases development productivity/performance. So you have a double whammy of performance degradation with an API. And for multiple consumers operating against the same events (for example: write to storage, perform action, send to data lake), with a database you need a magnitude more access, so N*X with N being the number of consumers multiplied by the query to consume. With three consumers you're tripling your database queries, which adds up fast across topics. Now you need to start fixing indexes and creating views and other workload to keep performance optimal. And at some point you're just poorly recreating Kafka in a database.

The common denominator in every "which is better" debate is always use case. This article seems like it would primariy apply to small organizations or limited consumer need. And yea, at that point why are you using events in the first place? Use a single API or database and be done with it. This is where the buzzword thing is relevant. If you're using Kafka for your single team, single database, small organization, it's overkill.

Side note: Someone mentioned Postgres as an audit log. Oh god. Done it. It was a nightmare. Ended up migrating to pub/sub with long-term storage in Mongo. which solved significant performance issues. Audit log is inheritently write once read many. There is no advantage to storing in a relational database.

asah 7 days ago||
"500 KB/s workload should not use Kafka" - yyyy!!! indeed, I'm running 5MBps logging system through a single node RDS instance costing <$1000/mon (plus 2x for failover). There's easily 4-10x headroom for growth by paying AWS more money and 3-5x+ savings by optimizing the data structure.
EdwardDiego 7 days ago|
I've always said, don't even think about Kafka until you're into MiB/s territory.

It's a complex piece of software that solves a complex problem, but there's many trade-offs, so only use it when you need to.

Sparkyte 10/29/2025||
You can also use Redis as a queue if the data isn't in danger of being too important.
joaohaas 10/29/2025|
Even if the data is important, you can enable WAL and make sure the worker/consumer gets items by RPOPLPUSHing to a working queue. This way you can easily requeue the data if the worker ever goes offline mid-process.
Sparkyte 7 days ago||
Very true.
0xDEAFBEAD 7 days ago|
Why does it matter how many distinct tools you use? It seems easiest to just always use the most standard tool in the most standard way, to minimize the amount of custom code you have to write.
More comments...