Posted by marklit 3 days ago
Most of these projects use JSON(L) files for storage, and duckdb to process them.
Usual caveats apply, but as a general rule it’s held up well for me. Only downside is that inspecting the results moves from vi on the output file to duckdb and a select * from.
I found with using various tricks I can get the zstd parquet to be up to 10x (or more) smaller than an equivalent Postgres table - but obviously the exact compression ratios will depend on the kind of data you have and how well your Postgres table is normalized.