Posted by jakobem 16 hours ago
Basically an abstraction that is filesystem like, but doesn't require a filesystem. Though you can both export storage-combinators as filesystem and, of course, also access filesystems via storage-combinators.
[1] https://dl.acm.org/doi/10.1145/3359591.3359729
[2] https://2019.splashcon.org/details/splash-2019-Onward-papers...
Implementing a database abstraction as a file system for an LLM feels like an extra layer of indirection for indirection's sake: just have the LLM write some views/queries/stored procs and give it sane access permissions.
LLMs are smart enough to use databases, email, etc without needing a FUSE layer to do so, and permissions/views/etc will keep it from doing or seeing stuff it shouldn't. You'll be keeping access and permissions where they belong, and not in a FUSE layer, and you won't have to maintain a weird abstraction that's annoying/hampered with licensing issues if you want to deploy it cross platform.
Also, your simplified FUSE abstraction will not map accurately to the state of the world unless you're really comprehensive with your implementation, and at that point, you might as well be interacting directly in order to handle that state accurately.
I think there is a gap between “real file systems” and “non file things in a database” where mapping your application representation of things to a filesystem is useful. Basically all those platforms that let users upload files for different purposes and work with them (ex Google Drive, notion, etc). In those cases representing files to an agent via a filesystem is the more intuitive and powerful interface compared to some home grown tools that the model never saw during training.
https://github.com/nalgeon/sqlean/blob/main/docs/fileio.md
fileio_read - Read file contents as a blob.
fileio_scan - Read a file line by line.
fileio_write - Write a blob to a file.
fileio_append - Append a string to a file.
fileio_mkdir - Create a directory.
fileio_symlink - Create a symlink.
fileio_ls - List files in a directory.
If one only exposes sqlite command query access and limit certain aspects of this sqlite extension depending on the use case perhaps, I feel like this might be a good alternative as well?The file system as an abstraction is actually not that good at all beyond the basic use-cases. Imagine you need to find an email. If you grep (via fuse) you will end up opening lots of files which will result in fetches to some API and it will be slow. You can optimise this and caching works after first fetch but the method is slow. The alternative is to leverage the existing API which will be million times faster. Now you could also create some kind of special file via fuse that acts like a search but it is weird and I don't think the models will do well with something so obscure.
We went as much as implementing this idea in rust to really test it out and ultimately it was ditched because, well it sucks.
Unrelated to FUSE and MCP[1] agents, this scenario reminded me of using nmh[0] as an email client. One of the biggest reasons why nmh[0] is appealing is to script email handling, such as being able to use awk/find/grep/sed and friends.
This is a limitation of the POSIX filesystem interface. If there were a grep() system call, it could delegate searches to the filesystem, which could use full text indices, run them on a remote server, etc
To learn FUSE, however, I started just making everything into filesystems that I could mount. I wrote a FUSE driver for Cassandra, I wrote a FUSE driver for CouchDB, I wrote a FUSE driver for a thing that just wrote JSON files with Base64 encoding.
None of these performed very well and I'm sort of embarrassed at how terrible the code is hence why I haven't published them (and they were also just learning projects), but I did find FUSE to be extremely fun and easy to write against. I encourage everyone to play with it.
FUSE makes me think that the Plan 9 people were on to something. Filesystems actually can be a really nice abstraction; sort of surreal that I could make an application so accessible that I could seriously have it directly linked with Vim or something.
I feel like building a FUSE driver would be a pretty interesting way to provide a "library" for a service I write. I have no idea how I'd pitch this to a boss to pay me to do it, but pretending that I could, I could see it being pretty interesting to do a message broker or something that worked entirely by "writing a file to a folder". That way you could easily use that broker from basically anything that has file IO, even something like bash.
I always have a dozen projects going on concurrently, so maybe I should add that one to the queue.
I built the original version in Python for a job years ago. But the version above is almost entirely vibe-coded in Rust in a lazy afternoon for fun.
However, I disagree that the filesystem is the right abstraction in general. It works for git, because git is essentially structured like a filesystem already.
More generally, filesystems are roughly equivalent to hierarchical databases, or at most graph databases. And while you can make that work, many collections of data are actually better organised and accessed by other means. See https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf for an particularly interesting and useful model that has found widespread application and success.
Also, looks like my message queue idea has already been done: https://github.com/pehrs/kafkafs
No new ideas under the sun I suppose.
Maybe the most mainstream incarnation is its use in the Windows Subsystem for Linux (WSL).
You can test it here ==> https://ainiro.io/natural-language-api