Posted by malgamves 13 hours ago
Relational is better. Hell, and kind of unique identifier would be nice. So many better ways to organize data stores.
The best compromise is what modern OSs have: a tree-like structure to store files but a database index on top for queries.
So often we want to look up 'the last file I printed' or 'that message I got from Bob'. Instead of just creating that lookup, we have to go spelunking.
Hell, every major app creates it's own abstractions because the OS/Filesystem doesn't have anything useful. Email systems organize messages and tags; document editors have collections of document aspects they store in a structured blob. Instead of asking the OS to do that.
All files are represented in a table with rows and columns. "Directories" simply have a special "directory = true" attribute in a row (simplified).
The hierarchy is for you, the human.
Like many file systems, NTFS also contains a log for recoverability/rollback purposes.
It's not quite relational but it doesn't make sense to be relational. Why would you need more than one 'table' to contain everything you need to know about a file? Microsoft experimented with WinFS, which wasn't a traditional file system (it was an MSSQL database with BLOB storage which sat ontop of a regular NTFS volume). Performance was bad and Skydrive replaced the need for it (in the view of MSFT).
Please elaborate.
NTFS is still the better choice for common desktop usage. ReFS goals are centered around data integrity but it comes at the cost of performance.
One thing directories solve: they're great grouping mechanisms. "All the Q3 stuff lives in this directory"
I bet we move towards a world where files are just UUIDs, then directory structures get created on demand, like tags.
You can have several versions of the same set of data object at once - an entire source set for a build, all the names duplicate but tagged with 'revision' so they can be distinguished.
Hard to do that without a UUID at root, to use for unique identification of the particular 'particle' of the particular data set.
All good.
Which is mainly to say, trust me, this is a temporary state, the god of complexity is coming. It is utterly inevitable. The people who created React, Kubernetes, all those Java frameworks you hated etc didn't go away. They are right now thinking about how amazing it would be if you if you stacked ten different tools together with brand new structured file formats and databases. We already have "beads" and "gastown" where this is starting. Enjoy these times because a couple of years from now it will already be the end of the "fun" part I think.
If you've got a coding convention, enforce it using a linter. Have the LLM write the rules and integrate it into the local build and CI tool.
Has noone ever thought about how – gasp – a future human collaborator would be onboarded?
Instead of reading multi-meg data into memory to determine what to do, I used the file system and the program would store data related to the key in sub directories instead. The older people saw what I did and thought that was interesting. With development time factored in, doing it this way ended up being much faster and avoided memory issues that would have occurred.
So with AI, back to the old ways I guess :)
My life's data, including all the official stuff (bank statements, notary acts, statements made to the police [witness, etc.], insurance, property titels), all my coding projects, all the family pictures (not just the ones I took) and all the stuff I forgot, is in files, not in a dedicated DB. But these files are a definitely a database.
And because I don't want to deal with data corruption and even less want to deal with synching now corrupted data, many of my files contains, in their filename, a partial cryptographic checksum. E.g. "dsc239879879.jpg" becomes "dsc239789879-b3-6f338201b7.jpg" (meaning the Blake3 hash of that file has to begin with 6f338201b7 or the file is corrupted).
At any time, if I want to, I can import these in "real" dedicated DBs. For example I can pass my pictures as a read-only to "I'm Mich" (immich) and then query my pictures: "Find me all the pictures of Eliza" or "Find me all the pictures taken in 2016 on the french riviera".
But the real database of my all my life is and shall always be files on a filesystem.
With a "real" database, a backup can be as simple as a dump. With files backuping involve... Making sure you keep a proper version of all your files.
I'd say files are even more important than the filesystem: a backup on a BluRay disc or on an ext4-formatted SSD or on an exfat formatted SSD or on a tape... Doesn't matter: the files are the data.
A filesystem is the first "database" with these data: a crude one, with only simple queries. But a filesystem is definitely a database.
The main advantage of this very simple database is that as long as the data are accessible, you know your data is safe and can always use them to populate more advanced databases if needed.
Were Haiku mor mature/stable would have been a nice fit for the OS for the LLM/Ai personal use cases.
[1] https://arstechnica.com/information-technology/2018/07/the-b...