DuckDB Internals: Why Is DuckDB Fast? (Part 1)

charanmilan 12 hours ago|

[flagged]

gordonwu8383 6 hours ago||

[flagged]

gordonwu8383 6 hours ago||

[dead]

gordonwu8383 9 hours ago||

[dead]

tobyhinloopen 13 hours ago|

The only reason I know and use DuckDB is because my (internal, private-use-only, experimental) vibe coded projects use it a ton. I didn't pick it - LLMs did. Until this article, I wasn't aware of what it actually is capable of.

Most of these projects use JSON(L) files for storage, and duckdb to process them.

mootothemax 13 hours ago||

If you haven’t investigated storing in parquet format - and it doesn’t break other consumers that need your jsonl formatted files - it could be worth trialling for your use case. You’ll see vastly smaller file sizes (even more so if you use zstd compression), and querying time will shoot up.

Usual caveats apply, but as a general rule it’s held up well for me. Only downside is that inspecting the results moves from vi on the output file to duckdb and a select * from.

medvezhenok 9 hours ago|||

Yup, especially data backups (although I wouldn't store critical backups like this, only nice-to-have ones). One minor note is that parquet file sizes / compressed sizes can be sensitive to ordering, so you can try different sort orders to get optimal compression.

I found with using various tricks I can get the zstd parquet to be up to 10x (or more) smaller than an equivalent Postgres table - but obviously the exact compression ratios will depend on the kind of data you have and how well your Postgres table is normalized.

tobyhinloopen 11 hours ago|||

I'll 100% try DuckDB in more serious projects where I would normally use Sqlite.

ai_fry_ur_brain 13 hours ago||

What an incredible way to build software