Filesystems are having a moment

Posted by malgamves 11 hours ago

Filesystems are having a moment(madalitso.me)

160 points | 94 comments

staplung 3 hours ago|

Not knocking the article in any way but from the headline I was expecting - perhaps hoping - this would be about some innovation in filesystems research like it was the 90's again. That's not what this is.

It's about how filesystems as they are (and have been for decades) are proving to be powerful tools for LLMs/agents.

alecco 1 hour ago||

And by filesystem they mean CLI (command line interface) and a full *nix system. Like the hundreds of similar articles about it for the past year said.

Gigachad 51 minutes ago|||

I feel like every article on HN now disguises itself as interesting but the content is just the same boring AI slop.

palata 29 minutes ago||

I have been reading HN for a few years, and my feeling is that I find fewer and fewer interesting articles. Maybe it's just me, and the average articles are the same quality.

Now I tend to skim through it to see if a title looks like it may bring interesting discussions, and then I skim through the discussions. Because there are very knowledgeable people who sometimes share valuable insights.

Interestingly, last time I asked a question, hoping to get interesting people to share insights, I was answered that I "should learn how to use an LLM instead of asking questions" :-).

fragmede 1 hour ago|||

Yeah, none of it was really about file systems. There was a brief mention that file systems look like a graph, and that you build roughly an index so it looks graph and thus database-y, but you could store it all in a sqlite database with a column, called filename and a column called content for all the details about file systems this post went into. I too was expecting something more in depth about file systems like for instance, cluster file systems have made a little to no advancement. ZFS is not a cluster file system and we've been needing a good one of those for decades, ever since VM's became feasible on consumer grade hardware. Still, files on desk is better than having to pay Oracle a fee per-skill on today's modern, open Internet. That was never going to happen.

mangogogo 3 hours ago||

i was hoping the same, but then it turned out to be another article about LLMs.

tacitusarc 7 hours ago||

Does everyone just use AI to write these days? Or is the style so infectious that I just see it everywhere? I swear there needs to be some convention around labeling a post with how much AI was used in its creation.

heavyset_go 4 hours ago||

I'd be embarassed to put my name on AI prose without a disclaimer and I'd also be annoyed to read it as a reader.

IMO it's insulting to the audience, it says your time and attention is not worthy of the author's own time and attention spent putting their own thoughts in their own words.

If you're going to do that at least mention it's LLM output or just give me your outline prompts. I don't care what your LLM has to say, I'm capable of prompting your outline in my own model myself if I feel like it.

josephg 2 hours ago|||

> If you're going to do that at least mention it's LLM output

Yes, this! Please label AI generated content. Pull request written by an AI? Label it as ai generated. Blog post? Article generated with AI? Say so! It’s ok to use AI models. Especially if English is your second language. But put a disclaimer in. Don’t make the reader guess.

Eg:

> This content was partially generated by chatgpt

> Blog post text written entirely by human hand, code examples by Claude code

fragmede 45 minutes ago||||

Have any outlines you'd care to share?

coliveira 4 hours ago|||

I'm not a fan of AI and try to avoid it, but there is a difference from AI output published by someone knowledgeable and any other AI output that you run by yourself. If an expert looked at the result and found it to be ok, then you can have some assurance that it at least makes sense. Your own AI run doesn't mean anything, it could be 100% hallucination and a non-expert will buy it as truth.

Joel_Mckay 3 hours ago||

Unfortunately, LLM slop now makes up >53% of the web, and is growing.

It is easy to spot the compacted token distribution unique to each model, but search engines still seem to promote nonsense content. =3

"Bad Bot Problem - Computerphile"

https://www.youtube.com/watch?v=AjQNDCYL5Rg

"A Day in the Life of an Ensh*ttificator "

https://www.youtube.com/watch?v=T4Upf_B9RLQ

sethev 6 hours ago|||

LLMs were trained on stuff that people wrote. I get there are "tells", but don't really think people are as good at identifying AI generated text as they think they are...

afro88 4 hours ago|||

I wouldn't have picked this article as AI until I got an agent to do some writing for me and read a bunch of it to figure out if I can stand behind it. Now I see the tells everywhere "It's not this. It's that." is particularly common and I can't unsee it. (FWIW I rewrote most of the writing it generated, but it did help me figure out my structure and narrative)

The problem I think with AI generated posts is that you feel like you can't trust the content once it's AI. It could be partly hallucinated, or misrepresented.

sethev 1 hour ago||

Yeah, but "it's not X. It's Y" is a common idiom that LLMs picked up from people. That's the point i was making. And it's starting to feel like every post has at least one comment claiming that it was AI generated.

antonvs 4 hours ago||||

Good chunks of the article don't trigger this for me, but I would bet money on the final paragraph involving AI:

> That's not a technical argument. It's a values argument. And it's one that the filesystem, for all its age and simplicity, is uniquely positioned to serve. Not because it's the best technology. But because it's the one technology that already belongs to you.

adi_kurian 1 hour ago||||

Contractions

computably 2 hours ago|||

You don't have to be good at identifying AI generated text to detect low-effort slop.

malgamves 4 hours ago|||

As the author I can assure you there’s a human behind these words. Interesting times me live in though, I find myself questioning what’s AI and what’s not often too and at the moment we’ve offloaded that responsibility to the good will of authors or platform policy which might have to change soon

green-salt 1 hour ago|||

Nice dodge! Unfortunately, this made it more obvious.

jonmagic 18 minutes ago||||

I thought it was a great post tying a lot of things I’ve been reading and thinking about together. Could care less if you used AI if it helps my brain expand and or make connections I wouldn’t have otherwise.

meindnoch 3 hours ago||||

"there’s a human behind these words"

That's a bit vague. Was the article written without the aid of LLMs? Yes or no.

torginus 2 hours ago||

Well, if the words were 100% hand-written, I assume he'd have said that.

lovecg 3 hours ago|||

As in, you used 0 AI to write or edit this text? Or some AI? I’d like to calibrate myself.

grey-area 2 hours ago||

We all know the answer to that.

q3k 6 hours ago|||

Everyone's trying to be the new thought leader enlightened technical essayist. So much fluff everywhere.

orsorna 6 hours ago||

What's wild is that with a few minutes of manual editing it would give exponential return. For instance, a lead sentence in your section saying "here's why X" that was already described by your subheading is unnecessary and could have been wholly removed.

amarant 4 hours ago|||

Exponential return? This is the front page of HN! What does exponential returns even look like?

Are you saying this post is a few edits away from becoming a New York Times bestseller?

orsorna 3 hours ago||

No, I guess I meant editing to approach a text that doesn't look rushed over (LLM generation is a subset of such poor writings)

But you're right, it did hit the front page, and that says more about my sensibilities not lining up with whoever is voting the article up.

gzread 5 hours ago||||

You'd have to have a good idea of how you want the document to read, which is half (or more) of the process of writing it.

antonvs 4 hours ago|||

IME many people aren't very capable of editing their own work effectively. It's why "editor" exists as a profession.

idiotsecant 5 hours ago|||

This doesn't seem particularly AI slopped to me.

einr 3 hours ago||

"Not bigger than databases. Different from databases.

It's not a website you go to — it's a little spirit that lives on your machine.

Not a chatbot. A tool that reads and writes files on your filesystem.

That's not a technical argument. It's a values argument."

goodmythical 5 hours ago|||

Does everyone just complain about people using the tools they like to use these days? Or is the style so infectious that I just see it everywhere? I swear there needs to be some convention around labeling a post with how much whining was used in its creation.

panarky 3 hours ago||

Does everyone just easily accuse genuine, literate humans of "cheating" with AI when there's no way they could know that?

There are a lot of unique aspects of the writing in this post that LLMs don't typically generate on their own.

And there's not a "delve" or "tapestry" or even a bullet point to be found.

Also, accusations and complaints like this are off-topic and uninteresting.

We should be talking about filesystems here, not your gut instinct AI detector that has a sky-high false-positive rate.

I swear there needs to be some convention around throwing wild accusations at people you don't know based exclusively on vibes and with zero actual evidence.

korbatz 7 hours ago||

I was having exact same observation, albeit from a bit diffrent perspective: SaaS. This is where as the code tends to be temporary and very domain specific, the data (files) must strive to be boring standards.

The problem today is that we build specific, short-lived apps that lock data into formats only they can read. If you don't use universal formats, your system is fragile. We can still open JPEGs from 1995 because the files don't depend on the software used to make them. Using obscure or proprietary formats is just technical debt that will eventually kill your project. File or forget.

Gigachad 17 minutes ago||

The frustrating thing about photo management these days is how every major photo library app/cloud service these days stores every edit / tag / album externally. If you crop a photo, change the taken at date, etc, the original file never gets touched but an external bit of metadata is created. So any time you move platform, all of these edits and your albums are erased.

It is convenient to be able to undo crops or filters, but I wish the industry would standardize so these changes are portable.

jmathai 6 hours ago||

My 10+ year old photo management system [1] relies on the file system and EXIF as the source of truth for my entire photo library.

It’s proven several times over that it’s the correct approach. Abstractions (formerly Google photos, currently Immich) should just be built on top - but these proprietary databases are only for convenience.

For work, I’m having the same experience as the author and everything is just markdown and csv files for Claude Code (for research and document writing).

[1] https://github.com/jmathai/elodie

whartung 4 hours ago|||

I know some systems leverage the modern file meta data (extended attributes), but it's clearly not successful enough that folks can use them for an application like this.

Ostensibly, things like MacOS Spotlight can bring real utility and value to the file system, and extended attributes through the sidecar indexing, etc. But Spotlight is infamous for its unreliability.

The other issue with file systems is simply that the user (potentially) has "direct access" to them, in that they can readily move files in and up and around whimsically. The "structure" is laid bare for them to potentially interfere with, or, such as the case with the extended attributes, drag a file to a USB fob, and then copy it back -- inadvertently removing those attributes.

And thats how we end up with everything being stuffed into a SQLite DB.

zenoprax 4 hours ago||||

I have your repo starred from a post/comment you made a few weeks ago but haven't had time to actually use/integrate it with my own stuff.

What are your thoughts on XMP sidecar files? I'm torn right now between digital negative + external metadata versus all-in-one image with mutable properties. Portability vs. Durability etc.

jmathai 17 minutes ago||

I've avoided using XMP sidecars. Mostly because I don't want to have to worry about two files for every photo. And I don't think they're ubiquitously supported like EXIF.

Thanks for starring the repo and let me know if you need any help.

alanbernstein 4 hours ago|||

Thanks for sharing, I might have too much NIH syndrome to use it but I'd love to check it out.

jmathai 17 minutes ago||

Ha! I totally get it. Use it for inspiration though!

hmokiguess 5 hours ago||

Notable mention: Plan 9 from Bell Labs.

https://en.wikipedia.org/wiki/Plan_9_from_Bell_Labs

mieubrisse 2 hours ago|

I'm building an agent orchestrator (plug: https://github.com/mieubrisse/agenc) and asked Claude what prior art exists.

It pulled back Plan 9, and I was shocked: this is exactly what we need today, as I'm convinced we need to think about minimizing agent permissions the exact same way companies do. Plan 9 was just too early.

packetlost 2 hours ago||

We once again discover that Plan9 and UNIX were right. The most powerful, lowest common denominator interface is text files exposed over a file system. Now to get back to making 9p2026.

The article gets some fundamentals completely wrong though: file systems are full graphs, not strict trees and are definitely not acyclic

andai 9 minutes ago|

So what are Plan 9's killer features, and can they be bolted on with FUSE or is there a deeper magic at play?

largbae 2 hours ago||

I think this article just speaks to the immaturity of our use of AI at this "moment."

Production grade systems might be written by agents running on filesystem skills, but the production systems themselves will run on consistent and scalable data structures.

Meanwhile the UI of AI agents will almost certainly evolve away from desktop computers and toward audio/visual interfaces. An agent might get more context from a zoom call with you, once tone and body language can be used to increase the bandwidth between you.

andai 7 minutes ago||

https://www.youtube.com/watch?v=GH9-EmgtABw

Saw this video recently, by an AI company working to get contextual cues from tone and body language. I think they're converting it to text and feeding it into a LLM, so not natively multimodal, but I still thought it was really cool.

fragmede 31 minutes ago||

I don't think written prompting will ever go away. Writing helps you organize your thoughts in a way that speaking, umm, ah, wait no, hang on, does not. Writing I can go back and change what I've already written before I hit send. Anybody who's prompted with speech for any length has been "wait no nevermind start over". So STT will get better, sure, it's already quite good. I just don't see text extry entirely going away because Human Intelligence (HI) just doesn't work in a way that speech would be the only interface.

MarkMarine 4 hours ago||

Over a number of files similar to a codebase, that are well organized (like a codebase) the coding agents and harnesses are quite good at finding information, they clearly train on them so they will only improve.

The challenge is how to structure messy data as a filesystem the agent can use. That is a lot harder than querying a vector db for a semantic query.

The code bases we’ve been using agents in had been pruned and maintained over years, we’ve got principles like DRY that pushed us to put the answer in one place… implicitly building and maintaining that graph with all the actors in the system invested in maintaining this. This is not the case for messy data, so while I see the authors point and agree that a filesystem is a better structure for context over time, we haven’t supplanted search yet for non-code data.

dzello 6 hours ago||

Resonates deeply with me. I’ve moved personal data out of ~10 SaaS systems into a single directory structure in the last year. Agents pay a higher price for fragmentation than humans. A well-organized system of files eliminates that fragmentation. It’s enough for single player. I suspect we’ll see new databases emerge that enable low multi-player (safe writes etc) scenarios without making the filesystem data more opaque. Not unlike what QMD is for search.

_pdp_ 50 minutes ago|

In other words, file systems are an excellent way to organise information. I mean, yeah - we've been using them forever.

File systems are not a good abstraction mechanism for remote procedure calls, though. I think it's important to distinguish between the two, since I find there are a lot of articles conflating both - comparing MCPs to SKILLs, which are completely different things.

I think the confusion comes from the fact that MCP came before SKILLs, and there's a mental model where SKILLs are somehow "better than" MCPs. This is like saying local Word documents are better than a fully integrated collaborative office suite. It's just not the same thing.

The reason SKILLs work so well is because there's 50 years of accumulated knowledge of how to run rudimentary Unix tools.

the TLDR

File systems - organising information MCP/APIs - remote procedure calls

More comments...