Posted by saisrirampur 3 days ago
Used to take a few seconds to get a week's uptime data and do some useful analysis.
Since moving to Clickhouse I think I can grab a full year's data in around 200ms (probably less if I try optimising it). Still completely blows my mind everyday.
Managers rejected it because it wasn't well known and was seen as "some database made by Russians."
On a personal level, it's quite sad to have seen that train coming so early and not been able to get on board.
ClickHouse is no panacea but if you understand how your data is accessed and thus how to arrange it you will get so many miles out of it.
afaik CH introduced FTS rececently.
Most probably don't even realize it. I see it as something similar to what racial minorities in the US go through: ask a random stranger on the street if he's racist, and he will honestly say no, even if he actually simply does not realize it, while it deeply affects how he sees the world.
I've also been seeing similar attitudes in relation to the Chinese. People avoiding excellent projects because they were written by some Chinese guy, including things where supply chain security is of no concern. Again apparently not realizing that these days a large part of the work on the Linux kernel is committed by paid employees of several large Chinese companies, all of them tightly intertwined with the government. Forget talking about who is building the hardware we all use.
Whatever, the internet is fracturing and balkanizing at full speed anyway, and the borders are slowly closing. Won't be long before we won't be able exchange anything non-destructive anymore. It was good while it lasted.
My lord you people are beyond patronizing.
When people refer to "the Chinese" or "the Russians", we are taking about the nation state, not the people. And there are legitimate security concerns. Whether we should be adversial is another question. But we are.
I think US is very tolerant when it comes to people from Russia.
You said, "ask a random stranger...and he will honestly say no" not "ask a random stanger...and he will probably honestly say no".
Most of most people are racist, it's just different groups. Americans obviously have less distrust of americans, but then I am just as certain that there are many many humans who would proudly share their "dumb american" stories as if that is not every bit as prejudicial to those of us who do not fit the description as any other "weak french" or "commie russian" or "sister fucking indian" or whatever else.
Racism is about race (i.e. phenotypical or genotypical properties), while being US-American/French/Russian/Indian/... is about nationality. So, these stories are not about racism (since they are not about race), but about prejudices against other nations/nationalities.
Every time we reached some limit or huge optimization burdens that were unfeasible. Clickhouse has been rock solid for the past 4 years.
In my current setup I was thinking on doing both: upgrading postgresql to timescaledb (to archive old data etc.), and to deploy ClickHouse in parallel. I'm still considering whether to go big on PeerDB to get ClickHouse mirror or just deploy it separately without additional fragility layer.
Would you not recommend using timescaledb at all? I definitely want to avoid alpha-quality software pain, since PostgreSQL is one of the most rock-solid parts of the stack at the moment.
But it’s a ETl tool. Stupid naming
Agree that Level 3 is what inspires confidence. But we need to invent new business models to sustain in the era of vibe-coded databases.
most people dont have that scale problems, but when you do...
Wow
CREATE TABLE default.events (
`timestamp` DateTime
`event` String -- e.g. 'product.updated' or empty/null
`message` -- human readable message
`raw` -- the raw message - this is very useful when pushing logs that aren't JSON - you just let the `event` be null and dump the entire message here
)
ENGINE = MergeTree
PARTITION BY toDate(timestamp)
ORDER BY (timestamp, event)
TTL timestamp + toIntervalMonth(6)
ClickHouse is extremely performant even in the cases of e.g.: SELECT count(*) FROM `events` WHERE `raw` LIKE '%hello world%'Of course, the more columns you splat out (e.g. like correlation_id, user_id, order_id, etc) the better you can index and expect those queries to perform but in general I don't bother outside the obvious core domain ones (exampled above), the performance is so good that unindexed queries are significantly faster than indexed queries in Loki. I have reached the point where I JSON extract on-the-fly for the WHERE clause with very large queries with no meaningful performance issues.
That’s probably not to advertise for that company. I don’t see why it’s sad?