Posted by enether 7 days ago
> ...
> The other camp chases common sense
I don't really like these simplifications. Like one group obviously isn't just dumb, they're doing things for reasons you maybe don't understand. I don't know enough about data science to make a call, but I'm guessing there were reasons to use Kafka due to current hardware limits or scalability concerns, and while the issues may not be as present today that doesn't mean they used Kafka just because they heard a new word and wanted to repeat it.
1) People constantly chasing the latest technology with no regard for whether it's appropriate for the situation.
2) People constantly trying to shoehorn their favourite technology into everything with no regard for whether it's appropriate for the situation.
The third camp:
3) People who look at a task, then apply a tool appropriate for the task.
Postgres also has been around for a long time and a lot of people didn’t know all it can do which isn’t what we normally think about with a database.
Appropriateness is a nice way to look at it as long as it’s clear whether or not it’s about personal preferences and interpretations and being righteous towards others with them.
Customers rarely care about the backend or what it’s developed in, except maybe for developer products. It’s a great way to waste time though.
This made me wonder about a tangential statistic that would, in all likelihood, be impossible to derive:
If we looked at all database systems running at any given time, what proportion does each technology represent (e.g., Postgres vs. MySQL vs. [your favorite DB])? You could try to measure this in a few ways: bytes written/read, total rows, dollars of revenue served, etc.
It would be very challenging to land on a widely agreeable definition. We'd quickly get into the territory of what counts as a "database" and whether to include file systems, blockchains, or even paper. Still, it makes me wonder. I feel like such a question would be immensely interesting to answer.
Because then we might have a better definition of "most of the time."
Server side. Client side. iOS, iPad, Mac apps. Uses in every field. Uses in aerospace.
Just think for a moment that literally every photo and video taken on every iPhone (and I would assume android as well) ends up stored (either directly or sizable amounts of metadata) in a SQLite db.
You need some sort of server-side logic to manage that, and the consumer heartbeats, and generation tracking, to make sure that only the "correct" instances can actually commit the new offsets. Distributed systems are hard, and Kafka goes through a lot of trouble to ensure that you don't fail to process a message.
Of course the implementation based off that is going to miss a bit.
Unless you're a five man shop where everybody just agrees to use that one table, make sure to manage transactions right, cron job retention, YOLO clustering, etc. etc.
Performance is probably last on the list of reasons to choose Kafka over Postgres.
There’s several implementations of queues to increase the chance of finishing what one is after. https://github.com/dhamaniasad/awesome-postgres
I truly miss a good standard client side library following the Kafka-in-SQL philosophy. I started on in my previous job and we used it internally but it never got good enough that it would be widely used elsewhere, and now I work somewhere else...
(PS: Talking about the pub/sub Kafka-like usecase, not the work queue FOR UPDATE usecase)
I'm not saying they're useless, but if I see something like that lying around, it's more likely that someone put it there based on vibes rather than an actual engineering need. Postgres is good enough for OpenAI, chances are it's good enough for you.
Ok so instead of running Kafka, we're going to spend development cycles building our own?