Top
Best
New

Posted by enether 10/29/2025

Kafka is Fast – I'll use Postgres(topicpartition.io)
559 points | 392 commentspage 7
dev_l1x_be 7 days ago|
Apples are sweet, I am going to eat an onion.

I love these articles.

> The other camp chases common sense

It is never too late to inject some tribalism into any discussion.

> Trend 1 - the “Small Data” movement.

404

Just perfect.

CuriouslyC 10/29/2025||
If you don't need all the bells and whistles of Kafka, NATS Jetstream is usually the way to go.
suyash 7 days ago||
Postgres isn't ideal, you need a timeseries database for streaming data.
jackvanlightly 10/29/2025||
> A 500 KB/s workload should not use Kafka

This is a simplistic take. Kafka isn't just about scale, it, like other messaging systems provide queue/streaming semantics for applications. Sure you can roll your own queue on a database for small use cases, but it adds complexity to the lives of developers. You can offload the burden of running Kafka by choosing a Kafka-as-a-service vendor, but you can't offload the additional work of the developer that comes from using a database as a queue.

enether 10/29/2025||
The question is the organizational overhead in adopting yet another specialized distributed system, which btw frequently is about scalability at its core. Kafka's original paper emphasizes this ("We introduce Kafka, a distributed messaging system that we developed for collecting and delivering high volumes of log data with low latency. ", "We made quite a few unconventional yet practical design choices in Kafka to make our system efficient and scalable.")[1]

To be honest, there isn't a large burden in running Kafka when it's 500 KB/s. The system is so underutilized there's nothing to cause issues with it. But regardless, the organizational burden persists. As the piece mentions - "Managed SaaS offerings trade off some of the organizational overhead for greater financial costs - but they still don’t remove it all.". Some of the burden continues to exist even if a vendor hosts the servers for you. The API needs to be adopted, the clients have many configs, concepts like consumer groups need to be understood, the vendor has its own UI, etc.

The Kafka API isn't exactly the simplest. I wouldn't recommend people write the pub-sub-on-postgres SQL themselves - a library should abstract it away. What is the complexity being added from a library with a simple API? Regardless if that library is based on top of Postgres, Kafka or another system - precisely what complexity is added to the lives of developers?

I really don't see any complexity existing at this miniscule scale, neither at the app developer layer or the infra operator layer. But of course, I haven't run this in production so I could be wrong.

[1] - https://notes.stephenholiday.com/Kafka.pdf

cyanf 10/29/2025||
There are existing solutions for queues in Postgres, notably pgmq.
zer00eyz 10/29/2025||
> Should You Use Postgres? Most of the time - yes. You should always default to Postgres until the constraints prove you wrong.

Kafka, GraphQL... These are the two technology's where my first question is always this: Does the person who championed/lead this project still work here?

The answer is almost always "no, they got a new job after we launched".

Resume Architecture is a real thing. Meanwhile the people left behind have to deal with a monster...

bencyoung 10/29/2025||
Kafka is great tech, never sure why people have an issue with it. Would I use it all the time? No, but where it's useful, it's really useful, and opens up whole patterns that are hard to implement other ways
evantbyrne 10/29/2025|||
Managed hosting is expensive to operate and self-managing kafka is a job in of itself. At my last employer they were spending six figures to run three low volume clusters before I did some work to get them off some enterprise features, which halved the cost, but it was still at least 5x the cost of running a mainstream queue. Don't use kafka if you just need queuing.
CuriouslyC 10/29/2025|||
I always push people to start with NATS jetstream unless I 100% know they won't be able to live without Kafka features. It's performant and low ops.
bencyoung 10/29/2025||||
Cheapest MSK cluster is $100 a month and can easily run a dev/uat cluster with thousands of messages a second. They go up from there but we've made a lot of use of these and they are pretty useful
singron 10/29/2025|||
I've basically never had a problem with MSK brokers. The issue has usually been "why are we rebalancing?" and "why aren't we consuming?", i.e. client problems.
evantbyrne 10/29/2025|||
It's not the dev box with zero integrations/storage that's expensive. AWS was quoting us similar numbers for MSK. Part of the issue is that modern kafka has become synonymous with Confluent, and once you buy into those features, it is very difficult to go back. If you're already on AWS and just need queuing, start with SQS.
j45 10/29/2025|||
Engaging difficulty is a form of procrastination and avoiding stoking a product in some cases.

Instead of not knowing 1 thing to launch.. let’s pick as many new to us things, that will increase the chances of success.

bonesss 10/29/2025|||
Kafka also provides early architectural scaffolding for multiple teams to build in parallel with predictable outcomes (in addition to the categorical answers to hard/error-prone patterns). It’s been adopted in principle by the services on, and is offered turn-key by, all the major cloud providers.

Personally I’d expect some kind of internal interface to abstract away and develop reusable components for such an external dependency, which readily enables having relational data stores mirroring the brokers functionality. Handy for testing and some specific local scenarios, and those database backed stores can easily pull from the main cluster(s) later to mirror data as needed.

janwijbrand 10/29/2025|||
"resume" as in "resumé" not as in "begin again or continue after a pause or interruption" - it took me longer than I care to admit to get that.
Groxx 10/29/2025|||
having never hosted a GraphQL service, but I can see many obvious room for problems:

is there some reason GraphQL gets so much hate? it always feels to me like it's mostly just a normal RPC system but with some incredibly useful features (pipelining, and super easy to not request data you don't need), with obvious perf issues in code and obvious room for perf abuse because it's easy to allow callers to do N+1 nonsense.

so I can see why it's not popular to get stuck with for public APIs unless you have infinite money, it's relatively wide open for abuse, but private seems pretty useful because you can just smack the people abusing it. or is it more due to specific frameworks being frustrating, or stuff like costly parsing and serialization and difficult validation?

twodave 10/29/2025|||
As someone who works with GraphQL daily, many of the criticisms out there are from before the times of persisted queries, query cost limits, and composite schemas. It’s a very mature and useful technology. I agree with it maybe being less suitable for a public API, but less because of possible abuse and more because simple HTTP is a lot more widely known. It depends on the context, as in all things, of course.
Groxx 10/29/2025||
yeah, I took one look at it and said "great, so add some cost tracking and kill requests before they exceed it" because like. obviously. it's similar to exposing a SQL endpoint: you need to build for that up front or the obvious results will happen.

which I fully understand is more work than "it's super easy just X" which it gets presented as, but that's always the cost of super flexible things. does graphql (or the ecosystem, as that's part of daily life of using it) make that substantially worse somehow? because I've dealt with people using protobuf to avoid graphql, then trying to reimplement parts of its features, and the resulting API is always an utter abomination.

marcosdumay 10/29/2025|||
Take a look on how to implement access control over GraphQL requests. It's useless for anything that isn't public data (at least public for your entire network).

And yes, you don't want to use it for public APIs. But if you have private APIs that are so complex that you need a query language, and still want use those over web services, you are very likely doing something really wrong.

Groxx 10/29/2025||
I'm honestly not seeing much here that isn't identical to almost all other general purpose RPC systems: https://graphql.org/learn/authorization/

"check that the user matches the data they're requesting by comparing the context and request field by hand" is ultra common - there are some real benefits to having authorization baked into the language, but it seems very rare in practice (which is part of why it's often flawed, but following the overwhelming standard is hardly graphql's mistake imo). I'd personally think capabilities are a better model for this, but that seems likely pretty easy to chain along via headers?

marcosdumay 10/29/2025||
> identical to almost all other general purpose RPC systems

The problem is that GraphQL doesn't behave like all other general purpose RPC systems. As a rule, authorization does not work on the same abstraction level as GraphQL.

And that explanation you quoted is disingenuous, because GraphQL middleware and libraries don't usually export places where you can do anything by hand.

forgetfulness 10/29/2025|||
We’re all passing through our jobs, the value of the solutions remains in the hands of the shareholders, if you don’t try to squeeze some long-term value for your resume and long-term employability, you’re assuming a significant opportunity cost on their behalf

They’ll be fine if you made something that works, even if it was a bit faddish, make sure you take care of yourself along the way (they won’t)

candiddevmike 10/29/2025||
Attitudes like this are why management treats developers like children who constantly need to be kept on task, IMO.
forgetfulness 10/29/2025||
Software is a line of work that has astounding amounts of autonomy, if you compare it to working in almost anything else.

My point stands, company loyalty tallies up to very little when you’re looking for your next job; no interviewer will care much to hear of how you stood firm, and ignored the siren song of tech and practices that were more modern than the one you were handed down (the tech and practices they’re hiring for).

The moment that reverses, I will start advising people not to skill up, as it will look bad in their resumes.

darkstar_16 10/29/2025|||
GraphQL sure, but I'm not sure I'd put kafka in the same bucket. It is a nice technology that has it's use in some cases, where postgresql would not work. It is also something a small team should not start with. Start with postgres and then move on to something else when the need arises.
sitestable 10/29/2025|||
The best architecture decision is the one that's still maintainable when the person who championed it leaves. Always pretend the person who maintains a project after you knows where you live and all that.
kvdveer 10/29/2025||
To be fair, this is true for all technologically interesting solutions, even when they use postgres. People championing novel solutions typically leave after the window for creativity has closed.
odie5533 10/29/2025||
How fast is failover?
lmm 7 days ago||
If Kakfa had come first, no-one would ever pick Postgres. Yes, it offers a lot of fancy functionality. But most of that functionality is overengineered stuff you don't need, and/or causes more problems than it solves (e.g. transactions sound great until you have to deal with the deadlocks and realise they don't actually help you solve any business problems). Meanwhile with no true master-master HA in the base system you have to use a single point of failure server or a flaky (and probably expensive) third-party addon.

Just use Kafka. Even if you don't need speed or scalability, it's reliable, resilient, simple and well-factored, and gives you far fewer opportunities to architect your system wrong and paint yourself into a corner than Postgres does.

psadri 10/29/2025||
A resource that would benefit the entire community is a set of ballpark figures for what kind of performance is "normal" given a particular hardware + data volume. I know this is a hard problem because there is so much variation across workloads, but I think even order of magnitude ballparks would be useful. For example, it could say things like:

task: msg queue

software: kafka

hardware: m7i.xlarge (vCPUs: 4 Memory: 16 GiB)

payload: 2kb / msg

possible performance: ### - #### msgs / second

etc…

So many times I've found myself wondering: is this thing behaving within an order of magnitude of a correctly setup version so that I can decide whether I should leave it alone or spend more time on it.

me551ah 10/29/2025||
Imagine if historic humans had decided that only hammers are enough. That there is no need for a specialized tool like Scissors, Chisel, Axe, Wrench, Shovel , Sickle and that a hammer and fingers are enough.

Use the tool which is appropriate for the job, it is trivial to write code to use them with LLMs these days and these software are mature enough to rarely cause problems and tools built for a purpose will always be more performant.

rjurney 10/29/2025|
One bad message in a Kafka queue and guess what? The entire queue is down because it kills your workers over and over. To fix it? You have to resize the queue to zero, which means losing requests. This KILLS me. Jay Kreps says there is no reason it can't be fixed, but it never had been and this infuriates me because it happens so often :)
pram 10/30/2025|
You can modify a consumer groups offset to any value JFYI, so you really don’t need to purge the topic. You can just start after the bad message.
More comments...