Node.js needs a virtual file system

Posted by voctor 5 hours ago

Node.js needs a virtual file system(blog.platformatic.dev)

143 points | 126 comments

giancarlostoro 4 minutes ago|

> I pointed the AI at the tedious parts, the stuff that makes a 14k-line PR possible but no human wants to hand-write: implementing every fs method variant (sync, callback, promises), wiring up test coverage, and generating docs.

This is the biggest takeaway for me for AI. It's not even that nobody wants to do these things, its that by the time you finish your tasks, you have no time to do these things, because your manage / scrum master / powers that be want you to work on the next task.

indutny 4 hours ago||

Taking the question of whether this would be a useful addition to Node.js core or aside, it must be noted that this 19k LoC PR was mostly generated by Claude Code and manually reviewed by the submitter which in my opinion is against the spirit of the project and directly violates the terms of Developer's Certificate of Origin set in the project's CONTRIBUTING.md

syrusakbary 1 hour ago||

Fully disagree with this take. Not allowing AI assistance on PRs will likely decimate the project in the future, as it will not allow fast iteration speeds compared to other alternatives.

Note aside, OpenJS executive director mentioned it's ok to use AI assistance on Node.js contributions:

  I checked with legal and the foundation is fine with the DCO on AI-assisted contributions. We’ll work on getting this documented.

[1]: https://github.com/nodejs/node/pull/61478#issuecomment-40772...

indutny 56 minutes ago|||

I appreciate hearing your point of view on this. In my opinion the future of Open Source and AI assisted coding is a much bigger issue, and different people have different levels of confidence in both positive and negative outcomes of LLM impact on our industry.

It is great to have a legal perspective on compliance of LLM generated code with DCO terms, and I feel safer knowing that at least it doesn't expose Node.js to legal risk. However it doesn't address the well known unresolved ethical concerns over the sourcing of the code produced by LLM tooling.

szmarczak 33 minutes ago|||

> Not allowing AI assistance on PRs will likely decimate the project in the future, as it will not allow fast iteration speeds compared to other alternatives.

It's not an AI issue. Node.js itself is lots of legacy code and many projects depend on that code. When Deno and Bun were in early development, AI wasn't involved.

Yes, you can speed up the development a bit but it will never reach the quality of newer runtimes.

It's like comparing C to C++. Those languages are from different eras (relatively to each other).

mixologic 2 hours ago|||

Worth noting that mcollina is a member of the Node.js Technical Steering Committee

everlier 2 hours ago||

We call it a slip slop at work, it's ok to slip some slop if it's "our" slop :-)

giancarlostoro 3 minutes ago||

Is it slop if it is carefully calculated? I tire of hearing people use slop to mean anything AI, even when it is carefully reviewed.

digikata 3 hours ago|||

Large PRs could follow the practices that the Linux kernel dev lists follow. Sometimes large subsystem changes could be carried separately for a while by the submitter for testing and maintenance before being accepted in theory, reviewed, and if ready, then merged.

While the large code changes were maintained, they were often split up into a set of semantically meaningful commits for purposes of review and maintenance.

With AI blowing up the line counts on PRs, it's a skill set that more developers need to mature. It's good for their own review to take the mass changes, ask themselves how would they want to systematically review it in parts, then split the PR up into meaningful commits: e.g. interfaces, docs, subsets of changed implementations, etc.

dakiol 2 hours ago|||

Nobody wants to review AI-generated code (unless we are paid for doing so). Open source is fun, that's why people do it for free... adding AI to the mix is just insulting to some, and boring to others.

Like, why on earth would I spent hours reviewing your PR that you/Claude took 5 minutes to write? I couldn't care less if it improves (best case scenario) my open source codebase, I simply don't enjoy the imbalance.

goalieca 3 hours ago|||

> With AI blowing up the line counts on PRs,

Well, the process you’re describing is mature and intentionally slows things down. The LLM push has almost the opposite philosophy. Everyone talks about going faster and no one believes it is about higher quality.

digikata 2 hours ago|||

Go slow to go fast. Breaking up the PR this way also allows later humans and AI alike to understand the codebase. Slowing down the PR process with standards lets the project move faster overall.

If there is some bug that slips by review, having the PR broken down semantically allows quicker analysis and recovery later for one case. Even if you have AI reviewing new Node.js releases for if you want to take in the new version - the commit log will be more analyzable by the AI with semantic commits.

Treating the code as throwaway is valid in a few small contexts, but that is not the case for PRs going into maintained projects like Node.js.

tracker1 1 hour ago||||

TBF, most of the AI code I've reviewed isn't significantly different than code I've seen from people... in fact, I've seen significantly worse from real people.

The fact is, it's useful as a tool, but you still should review what's going on/in. That isn't always easy though, and I get that. I've been working on a TS/JS driver for MS-SQL so I can use some features not in other libraries, mostly bridging a Rust driver (first Tiberious, then mssql-client), the clean abstraction made the switch pretty quick... a fairly thorough test suite for Deno/Node/Bun kapt the sanity in check. Rust C-style library with FFI access in TS/JS server environment.

My hardest part, is actually having to setup a Windows Server to test the passswordless auth path (basically a connection string with integrated windows auth). I've got about 80 hours of real time into this project so far. And I'll probably be doing 2 followups.. one with be a generic ODBC adapter with a similar set of interfaces. And a final third adapter that will privide the same methods, but using the native SQLite underneath but smothing over the differences.

I'm leveraging using/dispose (async) instead of explicit close/rollback patterns, similar to .Net as well as Dapper-like methods for "Typed" results, though no actual type validation... I'd considered trying to adapt Zod to check at least the first record or all records, and may still add the option.

All said though, I wouldn't have been able to do so much with so relatively little time without the use of AI. You don't have to sacrifice quality to gain efficiency with AI, but you do need to take the time to do it.

dotancohen 2 hours ago|||

  > Everyone talks about going faster and no one believes it is about higher quality.

Go Fast And Break Things was considered a virtue in the JavaScript community long before LLMs became widely available.

epolanski 3 hours ago|||

Do as I say, not as I do.

On a more serious note, I think that this will be thoroughly reviewed before it gets merged and Node has an entire security team that overviews these.

indutny 3 hours ago||

As someone who was a part of the aforementioned security team I'm not sure I'd be interested in reviewing such volume of machine generated code, expecting trap at every corner. The implicit assumption that I observed at many OSS projects I've been involved with is that first time contributions are rarely accepted if they are too large in volume, and "core contributor" designation exists to signal "I put effort into this code, stand by it, and respect everyone's time in reviewing it". The PR in the post violates this social contract.

epolanski 2 hours ago|||

For free, you can decide to do what you want, if it's your job, it's a bit different and you may have to do so, especially considering Collina, is one of the largest contributors of the project and member of the technical committee.

exe34 2 hours ago||

> if it's your job, it's a bit different and you may have to do so

Oh I'd use an llm to generate large amounts of feedback and request changes!

epolanski 1 hour ago||

Imagine if every profession reasoned liked that when doing something they don't enjoy.

kruffalon 19 minutes ago||

What a wonderful world we would have, or possibly at least better than the current shit show :)

lemagedurage 3 hours ago|||

[dead]

athorax 3 hours ago||

How exactly does it violate the Developer's Certificate of Origin clause?

indutny 3 hours ago||

The submitted code must adhere to either of (a), (b), (c), and separately a (d) clause of: https://github.com/nodejs/node/blob/main/CONTRIBUTING.md#dev...

If submitter picks (a) they assert that they wrote the code themselves and have right to submit it under project's license. If (b) the code was taken from another place with clear license terms compatible with the project's license. If (c) contribution was written by someone else who asserted (a) or (b) and is submitted without changes.

Since LLM generated output is based on public code, but lacks attribution and the license of the original it is not possible to pick (b). (a) and (c) cannot be picked based on the submitter disclaimer in the PR body.

athorax 1 hour ago|||

Not sure if you are intentionally misrepresenting (a), but here is the full text

(a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or

Dylan16807 31 minutes ago||||

If there's a "the original" the LLM is copying then there's a problem.

If there isn't, then (b) works fine, the code is taken from the LLM with no preexisting license. And it would be very strange if a mix of (a) and (b) is a problem; almost any (b) code will need some (a) code to adapt it.

benatkin 41 minutes ago||||

To many, it qualifies under either A or B, and therefore C as well. Under A, you can think of the LLM as augmenting your own intelligence. Under B, the license terms of LLM output are essentially that you can do whatever you want with it. The alternative is avoiding use of AI because of copyright or plagiarism concerns.

charcircuit 3 hours ago|||

It would be considered (a) since the author would own the copyright on the code.

lacoolj 2 hours ago|||

Owning copyright of something and writing it are very different things

crote 3 hours ago|||

Citation needed.

Whether AI output can fall under copyright at all is still up for debate - with some early rulings indicating that the fact that you prompted the AI does not automatically grant you authorship.

Even if it does, it hasn't been settled yet what the impact of your AI having been trained on copyrighted material is on its output. You can make a not-completely-unreasonable argument that AI inference output is a derivative work of AI training input.

Fact is, the matter isn't settled yet, which means any open-source project should assume the worst possible outcome - which in practice means a massive AI-generated PR like this should be treated like a nuke which could go off at any moment.

phendrenad2 1 hour ago|||

Why write open-source software at all, when the government could outlaw open-source entirely? What if an asteroid destroys Earth and there are no humans left to enjoy your work? At some point, you have to agree that a risk isn't worth worrying about. And your "worst possible outcome" is just the arbitrary outcome that you think has some subjective risk threshold. And it's certainly not one I agree with. Furthermore, calling it a "nuke" is a bad analogy because that implies that it can't be put back in the bottle once opened. In reality, we're dealing with legal definitions, which can be redefined as easily as defined.

charcircuit 3 hours ago|||

The two main points are that:

1. Copyright cannot be assigned to an AI agent.

2. Copyrighted works require human creativity to be applied in order to be copyrighted.

For point 2 this would apply to times were AI one shots a generic prompt. But for these large PRs where multiple prompts are used and a human has decided what the design should be and how the API should look you get the human creativity required for copyright.

In regards to being a derivative work I think it would be hard to argue that an LLM is copying or modifying an existing original work. Even if it came up with an exact duplicate of a piece of code it would be hard to prove that it was a copy and not an independent recreation from scratch.

>the worst possible outcome

The worst possible outcome is they get sued and Anthropic defends them from the copyright infringement claim due to Anthopic's indemnity clause when using Claude Code.

monocularvision 1 hour ago||

That indemnity clause is only for Team, Enterprise and API users. Do you know what was used here?

Also the commercial version is limited to “…Customer and its personnel, successors, and assigns…”. I am very much not a lawyer and couldn’t find definitions of these in the agreement but I am not sure how transferable this indemnity would be to an open source project.

charcircuit 41 minutes ago||

I reviewed it and it looks like personal Claude Code subscriptions are not covered, so it's riskier than I claimed.

lacoolj 2 hours ago||

Using Claude for code you use yourself or at your own company internally is one thing, but when you start injecting it into widely-shared projects like this (or, the linux kernel, or Debian, etc) there will always be a lingering feeling of the project being tainted.

Just my opinion, probably not a popular one. But I will be avoiding an upgrade to Node.js after 24.14 for a while if this is becoming an acceptable precedent.

wccrawford 4 hours ago||

I'm not convinced that allowing Node to import "code generated at runtime" is actually a good thing. I think it should have to go through the hoops to get loaded, for security reasons.

I like the idea of it mocking the file system for tests, but I feel like that should probably be part of the test suite, not Node.

The example towards the end that stores data in a sqlite provider and then saves it as a JSON file is mind-boggling to me. Especially for a system that's supposed to be about not saving to the disk. Perhaps it's just a bad example, but I'm really trying to figure out how this isn't just adding complexity.

Normal_gaussian 3 hours ago||

    node -e "new Function('console.log(\"hi\")')()"

or more to the point

    node -e "fetch('https://unpkg.com/cowsay/build/cowsay.umd.js').then((r) => r.text()).then(c => new Function(c + 'console.log(exports.say({ text: \"like this\"}))')())"

that one is particularly bad, because umd messes with the global object - so this works

    node -e "fetch('https://unpkg.com/cowsay/build/cowsay.umd.js').then((r) => r.text()).then(c => new Function(c)()).then(() => console.log(exports.say({ text: 'oh no'})))"

phendrenad2 1 hour ago||

Well there you have it.

I had to laugh, because the post you're replying to STRONGLY reminds me of this story, https://news.ycombinator.com/item?id=31778490 , in which some people on the GNOME project objected to thumbnails in the file-open dialog box because it might be a "Security issue" (even though thumbnails were available in the normal file browser, something those commenters probably should have known about, but didn't, but they just had to chime in anyway).

TheRealPomax 4 hours ago||

But then you go "hang on, doesn't ESM exist?" and you realize that argument 4 isn't even true. You can literally do what this argument says you can't, by creating a blob instead of "writing a temp file" and then importing that using the same dynamic import we've had available since <checks his watch> 2020.

dfabulich 2 hours ago|||

A virtual filesystem makes it possible for the ESM you import to statically import other files in the virtual filesystem, which isn't possible by just dynamically importing a blob. Anything your blob module imports has to be updated to dynamically import its dependencies via blobs.

notnullorvoid 4 hours ago|||

There's also a module expression proposal, that would remove the need to use blob imports.

https://github.com/tc39/proposal-module-expressions

gnarbarian 42 minutes ago||

one of the reasons I prefer deno is the availability of indexeddb (and all the other great stuff that comes with it out of the box)

butz 38 minutes ago||

How about trying to reduce dependencies? 11ty is going in correct direction, dropping significant chunk of various dependencies or replacing them with packages with no dependencies or using platform features, that becomes readily available.

PaulHoule 5 hours ago||

Would be nice if node packages could be packed up in ZIP files so to avoid the security/metadata tax for small file access on Windows.

MarleTangible 4 hours ago||

The number of files in the node modules folder is crazy, any amount of organization that can tame that chaos is welcomed.

koolba 4 hours ago||

And if you thought malware hiding in a mess of files was bad, just wait till you see it in two layers of container files.

PaulHoule 3 hours ago||

Or worse yet, the performance load of anti-malware software that has to look inside ZIP files.

Look, most of us realized around 2004 or so that if you had a choice between Norton and the virus you would pick the virus. In the Windows world we standardized around Defender because there is some bound on how much Defender degrades the performance of your machine which was not the case with competitive antivirus software.

I've done a few projects which involved getting container file formats like ZIP and PDF (e.g. you know it's a graph of resources in which some of those resources are containers that contain more resources, right?) and now that I think of it you ought to be able to virus scan ZIP files quickly and intelligently but the whole problem with the antivirus industry is that nobody ever considers the cost.

ronsor 2 hours ago||

Now we'll have to encrypt the files to prevent the performance hit of antivirus peeking inside.

Oh, wait...

Dangeranger 4 hours ago|||

There are alternative package managers like Yarn that use zip files as a way to store each Node package.[0]

[0] https://yarnpkg.com/advanced/pnp-spec#zip-access

chrisweekly 3 hours ago|||

Strong recommendation to use PNPM instead of yarn or npm. IME (webdev since 1998) it's the only sane tool for stewardship of an npm dependency graph.

See https://pnpm.io/motivation

Also, while popularity isn't necessarily a great indicator of quality, a quick comparison shows that the community has decided on pnpm:

https://www.npmcharts.com/compare/pnpm,yarn,npm

Normal_gaussian 2 hours ago||

yarn with zero-installs removes an awful lot of pain present in npm and pnpm. Its practically the whole point of yarn berry.

Firstly - with yarn pnp zero-installs, you don't have to run an `install` every time you switch branch, just in case a dep changed. So much dev time is wasted due to this.

Secondly - "it worked on my machine" is eliminated. CI and deploy use the exact same files - this is particularly important for deeply nested range satisfied dependencies.

Thirdly - packages committed to the repo allows for meaningful retrospectives and automated security reviews. When working in ops, packages changing is hell.

All of this is facilitated by the zip files that the comment you replied to was discussing, that you tangented away from.

The graph you have linked is fundamentally odd. Firstly - there is no good explanation of what it is actually showing. I've had claude spin on it and it reckons its npm download counts. This leads to it being a completely flawed graph! Yarn berry is typically installed either via corepack or bootstrapped via package.json and the system yarn binary. Yarn even saves itself into your repo. pnpm is never (I believe) bundled with the system node, wheras yarn and npm typically are.

Your graph doesn't show what you claim it does.

PaulHoule 3 hours ago|||

... and of course JAR files in Java are just ZIP files with a little extra metadata and the JVM can unpack them in realtime just fine.

buttsack 1 hour ago|||

When npm decided to have per-project node_modules (rather than shared like ruby and others) and human readable configs and library files I think the goal was to be a developer friendly and highly configurable, which it is. And package.json became a lot more than that as a result, it’s been a great system IMO.

Combined with a hackable IDE like Atom (Pulsar) made with the same tech it’s a pretty great dev exp for web devs

fmorel 4 hours ago|||

I remember when Firefox started putting everything into jars for similar reasons.

https://web.archive.org/web/20161003115800/https://blog.mozi...

zadikian 3 hours ago|||

Would accessing deps directly from a zip really be faster? I'd be a little surprised but not terribly, given that it's readonly on an fs designed for RW. If not, maybe just tar?

pie_flavor 23 minutes ago||

You just cat the exe with the zip file, then it is all loaded into memory at the same time on process init. This is how e.g. LÖVE does game code packaging. (It can't be tar, because this trick only works because the PKZIP descriptor is at the end of the file.)

pverheggen 2 hours ago|||

You can always use virtualized Linux to avoid the NTFS penalty (WSL2, VS Code dev containers, etc.)

hrmtst93837 2 hours ago||

Moving your whole workflow into WSL or nested containers just to dodge NTFS is a band-aid. Then you get flaky file watchers, odd perms, and a dev setup that feels like a workaround piled on top of another workaround. A fast Node VFS would remove a lot of this nonsense.

pverheggen 1 hour ago||

Oh it's a workaround for sure, didn't mean to suggest otherwise.

MBCook 4 hours ago|||

It’s insane to me that node works how it does. Zip files make so much more sense, I really liked that about Yarn.

sheept 4 hours ago||

Would it work to run a bundler over your code, so all (static) imports are inlined and tree shaken?

torginus 1 hour ago||

Why do people keep reinventing OS features?

There's Docker, OverlayFS, FUSE, ZFS or Btrfs snapshots?

Do you not trust your OS to do this correctly, or do you think you can do better?

A lot of this stuff existed 5, 10, 15 years ago...

Somehow there's been a trend for every effing program to grow and absorb the features and responsibilities of every other program.

Actually, I have a brilliant idea, what if we used nodejs, and added html display capabilities, and browser features? After all Cursor has already proven you can vibecode a browser, why not just do it?

I'm just tired at this point

williamstein 1 hour ago||

This exact thing solves a huge problem with SEA binaries as he points out in his post. You can include complicated assets easily and skip an ugly unpack step entirely. This is very useful.

ryandrake 1 hour ago||

One of the worst is media players that all insist on grafting their own "library" on top of my already-working OS filesystem. So I can't just run the media player and play files. No, that would be too simple. I have to first "import" my media into a "library" abstraction and then store that library somewhere else on my filesystem. Terrible!

mg 4 hours ago||

    You can’t import or require() a module
    that only exists in memory.

You can convert it into a data url and import that, can't you?

afavour 3 hours ago||

What happens to relative imports?

doctorpangloss 4 hours ago||

Yeah but Claude didn't suggest that when it wrote this blog post and did all the work so...

austin-cheney 4 hours ago|

Most of the 4 justifications mentioned sound like mitigations of otherwise bad design decisions. JavaScript in the browser went down this path for the longest time where new standards were introduced only to solve for stupid people instead of actually introducing new capabilities that were otherwise unachievable.

I do see some original benefits to a VFS though, bad application decisions aside, but they are exceedingly minor.

As an aside I think JavaScript would benefit from an in-memory database. This would be more of language enhancement than a Node.js enhancement. Imagine the extended application capabilities of an object/array store native to the language that takes queries using JS logic to return one or more objects/records. No SQL language and no third party databases for stuff that you don't want to keep in offline storage on a disk.

iainmerrick 2 hours ago||

Why would you want a language enhancement for that, rather than just writing it in JS code? (or perhaps WASM)

dotancohen 2 hours ago|||

  > I think JavaScript would benefit from an in-memory database.

That database would probably look a lot like a JSON object. What are you suggesting, that a global JSON object does not solve?

austin-cheney 1 hour ago|||

Whether it is an object, array, something else, or a combination thereof is a design decision. It is not so much about the design of the structure, which should be determined by execution performance considerations, but how information is added, removed and retrieved. Gathering one or more records from a JSON object, or array index, by value of some child property somewhere in a descendant structure of the instance index always feels like a one-off based upon the shape of the data. That could just be a query which is more elegant to read and yet still achieves superior execution performance compared to a bunch of nested loops or string of function array methods.

The more structures you have in a given application and the larger those structures become in their schemas the more valuable a uniform storage and retrieval solution becomes.

curtisblaine 1 hour ago|||

sorted maps with log(n) access.

duped 2 hours ago||

> As an aside I think JavaScript would benefit from an in-memory database.

isn't that just global state, or do you mean you want that to be persistent?

More comments...