Do you even need a database?

Posted by upmostly 4 days ago

Do you even need a database?(www.dbpro.app)

306 points | 297 commentspage 7

ForHackernews 4 days ago|

Surprised to see this beating SQLite after previously reading https://sqlite.org/fasterthanfs.html

ethan_smith 4 days ago|

The SQLite "faster than filesystem" page is specifically about reading small blobs where the overhead of individual filesystem calls (open/read/close per blob) exceeds SQLite reading from a single already-open file. Once you're talking about reading one big JSON file sequentially, that overhead disappears and you're just doing a single read - which is basically the best case for the filesystem and the worst case for SQLite (which still has to parse its B-tree, check schemas, etc).

cratermoon 4 days ago||

I worked one place that shoehorned SQL Server into a system to hold a small amount of static data that could easily have been a config file or even (eek) hard-coded.

traderj0e 4 days ago||

The "database" in this article is only a read-only KV-store. Mind that the hard part of a KV store is writing. Still the benchmarks are interesting.

goerch 4 days ago||

Pretty sure the origin should be `dbunpro.app`, no? I'd think the consensus should be: do you even need the file system?

janalsncm 3 days ago||

I have come at it from another angle, so maybe it’s useful perspective.

A lot of data pipeline work can function pretty well with each step producing artifacts that the next can consume. If you need unique items, use uuids. If you want to check whether an item exists, use a hash of something (maybe a url). And each day goes in its own subdirectory. It works.

Sometimes, though, you end up reinventing database constructs. Joins, indexing, versioning. A little of that is ok, but it gets gnarly fast. If you mess up, it could mean corrupting your data in a way that’s irreversible. Or, corrupting your data in a way that’s reversible but really hard.

So a database has the benefit of enforcing consistency that a glob over json files doesn’t. It doesn’t mean every one-off script needs a blue-green Postgres DB, but there can be an inflection point.

jbiason 4 days ago||

Honestly, I have been thinking about the same topic for some time, and I do realize that direct files could be faster.

In my (hypothetical, 'cause I never actually sat down and wrote that) case, I wanted the personal transactions in a month, and I realized I could just keep one single file per month, and read the whole thing at once (also 'cause the application would display the whole month at once).

Filesystems can be considered a key-value (or key-document) database. The funny thing about the example used in the link is that one could simply create a structure like `user/[id]/info.json` and directly access the user ID instead of running some file to find them -- again, just 'cause the examples used, search by name would be a pain, and one point where databases would handle things better.

m6z 4 days ago|

I have found that SQLite can be faster than using text or binary files, confirming their claims here: https://sqlite.org/fasterthanfs.html

freedomben 4 days ago||

I avoided DBs like the plague early in my career, in favor of serialized formats on disk. I still think there's a lot of merit to that, but at this point in my career I see a lot more use case for sqlite and the relational features it comes with. At the least, I've spent a lot less time chasing down data corruption bugs since changing philosophy.

Now that said, if there's value to the "database" being human readable/editable, json is still well worth a consideration. Dealing with even sqlite is a pain in the ass when you just need to tweak or read something, especially if you're not the dev.

giva 4 days ago|

> Dealing with even sqlite is a pain in the ass when you just need to tweak or read something, especially if you're not the dev.

How? With SQL is super easy to search, compare, and update data. That's what it’s built for.

freedomben 4 days ago||

Pain in the ass was way too strong, I retract that. Mainly I meant relative. For example `nvim <filename>.json` and then /search for what I want, versus tracking down the sqlite file, opening, examining the schema, figuring out where the most likely place is that I care about, writing a SQL statement to query, etc.

giva 4 days ago|||

Well, you still need to track down the <filename> part and knowing what you want to search, so you need to examine the schema anyway.

However, if your all application state can be represented in a single json file of less than a dozen MB, yes, a database can be overkill.

freedomben 4 days ago||

> Well, you still need to track down the <filename> part and knowing what you want to search, so you need to examine the schema anyway.

Yes agreed, but it's usually a lot easier to find the filename part, especially if the application follows XDG. Sqlite databases are usually buried somewhere because they aren't expected to be looked at.

the_pwner224 4 days ago||||

SqliteBrowser will let you open up your tables in an Excel-type view. You can also edit directly from the GUI. Still not as frictionless as a plain text file, and I'm not sure how good the search functionality is, but it lets you skip having to write any SQL.

JohnMakin 4 days ago||

everyone thinks this is a great idea until they learn about file descriptor limits the hard way

allthetime 3 days ago||

Funny, I was just hard at work on my new article “Just use a database”.

Mawr 3 days ago|

> Binary search beats SQLite. This was unexpected. Plain sorted files with a hand-rolled index outperform SQLite's B-tree by about 1.7x at every scale. SQLite does more work per lookup than a hand-rolled binary search, even for a simple primary key read. That overhead is worth it when you need the features. For a pure ID lookup, you're paying for machinery you're not using.

1. You're paying very little for 1000x the features.

2. The chances your application will keep doing "pure ID lookups" forever are zero. The moment you need to query the data in any other way or have more than one writer you're going to have to throw all this nonsense code into the trash.

3. Do you need the 1.7x speedup? No, of course you don't. It's just optimizing for the sake of optimizing.

I'd have just used sqlite to begin with and not benchmarked anything. No custom code, no need to waste any time, great performance, amazing flexibility — all with minimum effort.

More comments...