Posted by upmostly 4 days ago
I will still reach for a database 99% or the time, because I like things like SQL and transactions. However, I've recently been working on a 100% personal project to manage some private data; extracting insights, graphing trends, etc. It's not high volume data, so I decided to use just the file system, with data backed at yaml files, with some simple indexing, and I haven't run into any performance issues yet. I probably never will at my scale and volume.
In this particular case having something that was human readable, and more importantly diffable, was more valuable to me than outright performance.
Having said that, I will still gladly reach for a database with a query language and all the guarantees that comes with 99% of the time.
Also notable mention for JSON5 which supports comments!: https://json5.org/
Memory of course, as you wrote, also seems reasonable in many cases.
It would be much better to write all of this as binary data, omitting separators.
• Since it’s fixed-width and simple, inspecting the data is still pretty easy—there are tools for working with binary data of declared schema, or you could write a few-liner to convert it yourself. You don’t lose much by departing ASCII.
• You might want to complicate it a little by writing a version tag at the start of the file or outside it so you can change the format more easily (e.g. if you ever add a third column). I will admit the explicit separators do make that easier. You can also leave that for later, it probably won’t hurt.
• UUID: 36 bytes → 16 bytes.
• Offset: 20 bytes (zero-padded base-ten integer) → 8 bytes.
• It removes one type of error altogether: now all bit patterns are syntactically valid.
• It’ll use less disk space, be cheaper to read, be cheaper to write, and probably take less code.
I also want to register alarm at the sample code given for func FindUserBinarySearch. To begin with, despite a return type of (*User, error), it always returns nil error—it swallows all I/O errors and ignores JSON decode errors. Then:
entryID := strings.TrimRight(string(buf[:36]), " ")
That strings.TrimRight will only do anything if your data is corrupted. cmp := strings.Compare(entryID, id)
Not important when you control the writing, but worth noting that UUID string comparison is case-insensitive. offsetStr := strings.TrimLeft(string(buf[37:57]), "0")
Superfluous. ParseInt doesn’t mind leading zeroes, and it’ll probably skip them faster than a separate TrimLeft call. dataOffset, _ := strconv.ParseInt(offsetStr, 10, 64)
That’s begging to make data corruption difficult to debug. Most corruption will now become dataOffset 0. Congratulations! You are now root.I don't choose a DB over a flat file for its speed. I choose a DB for the consistent interface and redundancy.
In practice, the records themselves took no less than 30 joins for a flat view of the record data that was needed for a single view of what could/should have been one somewhat denormalied record in practice. In the early 2010's that meant the main database was often kicked over under load, and it took a lot of effort to add in appropriate caching and the search db, that wound up handling most of the load on a smaller server.
But if you have data that is static or effectively static (data that is updated occasionally or batched), then serving via custom file handling can have its place.
If the records are fixed width and sorted on the key value, then it becomes trivial to do a binary search on the mmapped file. It's about as lightweight as could be asked for.