Do you even need a database?

Posted by upmostly 1 day ago

Do you even need a database?(www.dbpro.app)

142 points | 208 commentspage 2

jrecursive 1 day ago|

I suggest every developer write a database from scratch at least once, and use it for something real. Or, even better, let somebody else use it for something real. Then you will know "why database".

traderj0e 1 day ago||

My first time was with a Bukkit plugin as a kid. One of my updates broke existing flat json files. Someone asked me if it has MySQL support, I didn't know what that was, then realized oh this is nice.

There are also things besides databases that I'll DIY and then still wonder why so many people use a premade tool for it, like log4j

yubblegum 1 day ago|||

It's indeed an amazing design and implementation space to explore. If distributed it is nearly comprehensive in scope. (However, did lol @ your "every developer" - that's being super kind and generous or "developer" is doing heavy lifting here.)

goerch 1 day ago|||

Hm, sometimes opening a book could do wonders? But these were the old times...

subhobroto 1 day ago||

I'll do one better.

I suggest every developer learn how to replicate, backup and restore the very database they are excited about, from scratch at least once. I propose this will teach them what takes to build a production ready system and gain some appreciation for other ways of managing state.

gf000 1 day ago||

If the app is not running, then `cp sqlite.db sqlite.db.BK`

subhobroto 1 day ago||

The if is doing a lot of heavy lifing there but with that out of the way you do make a strong case about "local first" apps. Nothing beats the simplicity of a simple `cp` but there are other tradeoffs to be made there.

That's why I was encouraging people to make backups and restores of their DB because personally, it has made me appreciate why different designs exist and when to use them vs just slapping a DB connection to an RDS instance and calling it a day.

agustechbro 1 day ago||

To not destroy the article author and apreciate his effort to prove something, that might be useful in a extreme case of optimization with a limited ammount of data and NO NEED to update/write the files. Just a read cache only.

If you need to ever update a single byte in your data, please USE A PROPER DATABASE, databases does a lot of fancy thing to ensure you are not going to corrupt/broke your data on disk among other safety things.

cold_tom 1 day ago||

you can get surprisingly far with files, but the moment you care about things like concurrent writes or not losing data on crash, the whole thing changes at that point you're not choosing speed vs simplicity anymore -you're choosing how much risk you're willing to carry

jmull 1 day ago||

> Binary search beats SQLite... For a pure ID lookup, you're paying for machinery you're not using.

You'll likely end up quite a chump if you follow this logic.

sqlite has pretty strong durability and consistency mechanism that their toy disk binary search doesn't have.

(And it is just a toy. It waves away the maintenance of the index, for god's sake, which is almost the entire issue with indexes!)

Typically, people need to change things over time as well, without losing all their data, so backwards compatibility and other aspects of flexibility that sqlite has are likely to matter too.

I think once you move beyond a single file read/written atomically, you might as well go straight to sqlite (or other db) rather than write your own really crappy db.

WatchDog 8 hours ago|

It's also just wrong, their SQLite benchmark is only using a single thread for the SQLite connection. It's much faster with multiple connections.

waldrews 1 day ago||

File systems are nice if you need to do manual or transparent script-based manipulations. Like 'oh hey, I just want to duplicate this entry and hand-modify it, and put these others in an archive.' Or use your OS's access control and network sharing easily with heterogeneous tools accessing the data from multiple machines. Or if you've got a lot of large blobs that aren't going to get modified in place.

What the world needs is a hybrid - database ACID/transaction semantics with the ability to cd/mv/cp file-like objects.

Joeboy 1 day ago||

Don't know if it counts, but my London cinema listings website just uses static json files that I upload every weekend. All of the searching and stuff is done client side. Although I do use sqlite to create the files locally.

Total hosting costs are £0 ($0) other than the domain name.

nishagr 1 day ago||

The real question - do you really need to hack around with in-memory maps and files when you could just use a database?

randusername 1 day ago||

Separate from performance, I feel like databases are a sub-specialty that has its own cognitive load.

I can use databases just fine, but will never be able to make wise decisions about table layouts, ORMs, migrations, backups, scaling.

I don't understand the culture of "oh we need to use this tool because that's what professionals use" when the team doesn't have the knowledge or discipline to do it right and the scale doesn't justify the complexity.

goerch 1 day ago|

Hm, I somewhat understand your point of `making wise decisions`. But doesn't that concern all kinds of software development? For me, it does.

vovanidze 1 day ago||

people wildly underestimate the os page cache and modern nvme drives tbh. disk io today is basically ram speeds from 10 years ago. seeing startups spin up managed postgres + redis clusters + prisma on day 1 just to collect waitlist emails is peak feature vomit.

a jsonl file and a single go binary will literally outlive most startup runways.

also, the irony of a database gui company writing a post about how you dont actually need a database is pretty based.

upmostly 1 day ago||

The irony isn’t lost on us, trust me. We spent a while debating whether to even publish this one.

But yeah, the page cache point is real and massively underappreciated. Modern infrastructure discourse skips past it almost entirely. A warm NVMe-backed file with the OS doing the caching is genuinely fast enough for most early-stage products.

vovanidze 1 day ago|||

props for actually publishing it tbh. transparent engineering takes are so rare now, usually its just seo fluff.

weve basically been brainwashed to think we need kubernetes and 3 different databases just to serve a few thousand users. gotta burn those startup cloud credits somehow i guess.

mad respect for the honesty though, actually makes me want to check out db pro when i finally outgrow my flat files.

upmostly 1 day ago|||

I'm feel like I could write another post: Do you even need serverless/Cloud because we've also been brainwashed into thinking we need to spend hundreds/thousands a month on AWS when a tiny VPS will do.

Similar sentiment.

vovanidze 1 day ago|||

id 100% read that post. the jump from free tier serverless to why is my aws bill $400 this month for a hobby project is a rite of passage at this point. a $5 hetzner or digitalocean box with dokku/docker-compose is basically a superpower that most newer devs just bypass entirely now.

hilariously 1 day ago||||

You are both right, with the exception that it requires knowledge and taste to accomplish, both of which are in short supply in the industry.

Why setup a go binary and a json file? Just use google forms and move on, or pay someone for a dead simple form system so you can capture and commmunicate with customers.

People want to do the things that make them feel good - writing code to fit in just the right size, spending money to make themselves look cool, getting "the right setup for the future so we can scale to all the users in the world!" - most people don't consider the business case.

What they "need" is an interesting one because it requires a forecast of what the actual work to be done in the future is, and usually the head of any department pretends they do that when in reality they mostly manage a shared delusion about how great everything is going to go until reality hits.

I have worked for companies getting billions of hits a month and ones that I had to get the founder to admit there's maybe 10k users on earth for the product, and neither of them was good at planning based on "what they need".

hooverd 1 day ago|||

Serverless is cheap as hell as low volumes. Your tiny VPS can't scale to zero. If you're doing sustained traffic your tiny VPS might win though. The real value in Cloud is turning capex spend into opex spend. You don't have to wait weeks or months to requisition equipment.

locknitpicker 1 day ago|||

> weve basically been brainwashed to think we need kubernetes and 3 different databases just to serve a few thousand users. gotta burn those startup cloud credits somehow i guess.

I don't think it makes any sense to presume everyone around you is brainwashed and you are the only soul cursed with reasoning powers. Might it be possible that "we" are actually able to analyse tradeoffs and understand the value of, say, have complete control over deployments with out of the box support for things like deployment history, observability, rollback control, and infrastructure as code?

Or is it brainwashing?

Let's put your claim to the test. If you believe only brainwashed people could see value in things like SQLite or Kubernetes, what do you believe are reasonable choices for production environments?

vovanidze 1 day ago||

i think you missed the "on day 1" part of my comment. k8s, iac, and observability are incrdible tools when you actually have the scale and team to justifiy them.

my point is strictly about premature optimizaton. ive seen teams spend their first month writing helm charts and terraform before they even have a single paying user. if you have product-market fit and need zero-downtime rollbacks, absolutly use k8s. but if youre just validatng an mvp, a vps and docker-compose (or sqlite) is usually enough to get off the ground.

its all about trade-offs tbh.

locknitpicker 16 hours ago||

> i think you missed the "on day 1" part of my comment. k8s, iac, and observability are incrdible tools when you actually have the scale and team to justifiy them.

No, not really. It's counterproductive and silly to go out of your way to setup your whole IaC in any tool you know doesn't fit your needs just because you have an irrational dislike for a tool that does. You need to be aware that nowadays Kubernetes is the interface, not the platform. You can easily use things like minikube, k3s, microk8s, etc, or even have sandbox environments in local servers or cloud providers. It doesn't matter if you target a box under your desk or AWS.

It's up to you to decide whether you want to waste your time to make your life harder. Those who you are accusing of being brainwashed seem to prefer getting stuff done without fundamentalisms.

tracker1 1 day ago||||

Definitely appreciate the post and the discussion that has come from it... While I'm still included to just reach for SQLite as a near starting point, it's often worth considering depending on your needs.

In practice, I almost always separate the auth chain from the service chain(s) in that if auth gets kicked over under a DDoS, at least already authenticated users stand a chance of still being able to use the apps. I've also designed auth system essentially abstracted to key/value storage with adapters for differing databases (including SQLite) for deployments...

Would be interested to see how LevelDB might perform for your testing case, in that it seems to be a decent option for how your example is using data.

grep_it 1 day ago||||

Except that eventually you'll find you lose a write when things go down because the page cache is write behind. So you start issuing fsync calls. Then one day you'll find yourself with a WAL and buffer pool wondering why you didn't just start with sqlite instead.

skapadia 21 hours ago|||

The second paragraph sounds eerily AI-generated.

mamcx 1 day ago|||

> people wildly underestimate the os page cache and modern nvme drives

And worse, overestimate how safe is their data!

All this fancy thing about not using a RDBMS could had been true only if the APIs and actual implementation across ALL the IO path were robust and RELIABLE.

But is not!

EVERY LAYER LIES

ALL of them

ALL OF TIME

That is why the biggest reason building a real database (whatever the flavor) is that there is no way to avoid pay performance taxes all over the place because you can't believe the IO and having a (single | some files) getting hammered over and over make this painfully obvious.

One of the most sobering experiences is that you write your IO with all the care in the world, let the (your brand new) DB run for hours, on good, great, hardware, and in less than a week you will find that that breaks in funny ways.

P.D: Was part of a team doing a db

phillipcarter 1 day ago||

> seeing startups spin up managed postgres + redis clusters + prisma on day 1 just to collect waitlist emails is peak feature vomit.

I'm pretty sure most startups just use a quick and easy CRM that makes this process easy, and that tool will certainly use a database.

jmaw 1 day ago|

Very interesting, I'd never heard of JSONL before: https://jsonlines.org/

Also notable mention for JSON5 which supports comments!: https://json5.org/

More comments...