Top
Best
New

Posted by jakobem 19 hours ago

FUSE is All You Need – Giving agents access to anything via filesystems(jakobemmerling.de)
173 points | 59 commentspage 2
everlier 18 hours ago|
I've implemented agentic framework exactly like this for my current employer.

It opens up absolutely bonkers capabilities.

disdi89 12 hours ago||
Do you know how does it impact the RAG usecase? Do I not need those vector databases anymore if I instead use this FUSE layer?
mickael-kerjean 17 hours ago||
> My prediction is that one of the many sandbox providers will come up with a nice API on top of this that lets you do something like ... No worrying about FUSE, the sandbox, where things are executed, etc. This will be a huge differentiator and make virtual filesystems easily accessible to everyone.

I've done exactly that with Filestash [1] using its virtual filesystem plugin [2], which exposes arbitrary systems as a filesystem. It turns out the filesystem abstraction works extremely well even for systems that are not filesystems at all. There are connector for literally every possible storage (SFTP, S3, GDrive, Dropbox, FTP, Sharepoint, GCP, Azure Cloud, IPFS....), but also things like MySQL and Postgres (where the first level folder represent the list of databases, the second level is tables that belong to a database, and each row is represented as a form file generated from the schema), LDAP (where tree nodes are represented as folders and leaf are form files), ....

The whole filesystem is available to agents via MCP [3] and has been published to the OpenAI marketplace since around Christmas, currently pending review.

ref:

[1]: https://github.com/mickael-kerjean/filestash

[2]: https://www.filestash.app/docs/guide/virtual-filesystem.html

[3]: https://www.filestash.app/docs/guide/mcp-gateway.html https://github.com/mickael-kerjean/filestash/tree/master/ser...

Eikon 18 hours ago||
For ZeroFS [0], I went an alternate route with NFS/9P. I am surprised that it’s not more common as this approach has various advantages [1] while being much more workable than fuse.

[0] https://github.com/Barre/ZeroFS

[1] https://github.com/Barre/ZeroFS?tab=readme-ov-file#why-nfs-a...

jakobem 18 hours ago|
Interesting! The network first point makes a lot of sense, especially bc you will most likely not access your actual datastore within the process running in the sandbox and instead just call some server that handles db access, access control etc.
glemmaPaul 7 hours ago||
Wouldnt GraphQL work as well?
ohnoesjmr 16 hours ago||
Why not just MCP? Feels like easier to implement and doesn't need a filesystem/root/admin perms?
dkdcio 16 hours ago|
a few reasons:

- agents tend to need (already have) a filesystem anyway to be useful (not technically required but generally true, they’re already running somewhere with a filesystem)

- LLMs have a ton of CLI/filesystem stuff in their training data, while MCP is still pretty new (FUSE is old and boring)

- MCP tends to bloat context (not necessarily true but generally true)

UNIX philosophy is really compelling (moreso than MCP being bad). if you can turn your context into files, agents likely “just work” for your use case

mbreese 12 hours ago||
I’m sympathetic to this idea, but there is no LLM training data for how to access random data like this using a filesystem through a FUSE interface.

Yes, it should be able to generically use a filesystem, but there has to be a better way to find an email than greping through each email as a file.

So, I see merit in the idea in theory, I’m just skeptical in practice.

AmazingTurtle 4 hours ago||
Yet another prime example of "All You Need is All You Need". Useless clutter in a headline.
moonlet 18 hours ago|
I am so sick of the ‘sandboxed’ AI-infra meme. A container is not a sandbox. A chroot is not a sandbox. A VM is also not a sandbox. A filesystem is also also not a sandbox. You can sandbox an application, you can run an application in a secure context, but this is not a secure context the author is describing, firstly, and secondly they haven’t described any techniques for sandboxing unless that part of the page didn’t load for me somehow.
jakobem 18 hours ago||
Didn’t mean to say this is a sandbox, it certainly isn’t, this is just an illustration on how to bridge the gap and make things available in a file system from the source of truth of your application.

There is tons of more complexity to sandboxing, I agree!

moonlet 14 hours ago||
No worries! And I definitely appreciate you taking time to write up your work, it’s a good blog.
tptacek 18 hours ago|||
Wait, can you provide the positive definition for "sandbox" you're relying on here?
moonlet 14 hours ago||
To me ‘a sandbox’ is a secured context, which is specific to whatever is in it. It is not a generic thing unless we are literally referring to a real-world box with sand in it, and I’ve kinda hit the breaking point with the term in tech. ‘A sandboxed application’ to me is an instrumented and controlled deployment of an application that can only make the sys/network/ipc calls the deployer expects and appreciates, which are then themselves filtered and monitored. A sandboxed deployment of an application? Sure. That’s a thing to me. But each application needs different privileges and does different things. Sandboxing an application may involve lots of different technologies. Eg the way I think about it, things like seccomp, apparmor, et al also aren’t themselves ‘sandboxes’, they’re enforcement mechanisms which rely on knowing and configuring them to monitor and enforce what the app should and shouldn’t do. A lot of things that assist with sandboxing may also be combined in different ways to get to a more secure environment, in which the app is sandboxed.
akerl_ 14 hours ago||
You may just be using a personalized definition of that word, that differs from what it means.

https://en.wikipedia.org/wiki/Sandbox_(computer_security)

Notably, a sandbox exists to separate one thing from other things. Limiting/filtering/monitoring what the sandboxes thing can do are often components of that, but the underlying premise is about separation.

Containers, VMs, etc. are 100% examples of sandboxing based on the actual industry definition of the term.

moonlet 14 hours ago|||
I’m saying I don’t think sandbox is a noun, I think it’s a verb. I also don’t get why this is such an issue to you? A container simply is not a sandbox by itself. The collection of technologies that can sandbox can be used to sandbox a container, or an app running in a container, or whatever you want. A door lock isn’t security, a door lock is used to lock your door, which gives you part of a security strategy. Same principle.
saagarjha 4 hours ago|||
A door lock is a lock and you can lock a door lock. A container can be a sandbox and you can use a container to sandbox.
akerl_ 14 hours ago|||
> I’m saying I don’t think sandbox is a noun, I think it’s a verb.

You are incorrect.

moonlet 13 hours ago||
What background or context do you have that you base this claim on?
tptacek 13 hours ago|||
He's obviously right about the noun/verb thing. You can just look this up on Google Scholar. I think you're sort of broadly wrong about how fussy the definition of a "sandbox" is, but you're at least saying something coherent there, even if it's an idiosyncratic definition.
akerl_ 13 hours ago||||
I already gave you a link above with a definition of sandbox, the noun, and a list of example technologies that it applies to.

If you’re going to get fired up about people you feel are misusing this term, and then ignore citations about its actual definition, I think the ball’s in your court to back up your claim.

moonlet 12 hours ago||
I mean… I’m flattered you think I’m making some kind of statement here but there is no claim. I literally stated an opinion I hold in a comment on HN, I didn’t write a you a thesis. Followed by explaining further the details of that opinion.

I’ve asked what background leads to your conclusion, because if you have eg written some sandboxing tooling, I’d be curious to give it a look. Always up to learn things, and I am more than a little baffled by how upset the comments I’m replying to here sound. You’ve linked me to Wikipedia, and another commenter asserts I can ‘just look it up on google scholar’. That seems pretty dismissive and reductive overall.

antonvs 9 hours ago|||
[flagged]
eyberg 12 hours ago|||
No they are not. The "industry" totally disagrees with this statement as well.
tptacek 1 hour ago||
This is definitely false. "The industry" calls everything a sandbox.
Imustaskforhelp 15 hours ago|||
I recently had a question about what AI sandboxes use and I think Modal uses gvisor under the hood and I think others use firecracker/generally favour it as well

Firecracker kind of ends up being in the VM categories and I would place gvisor in a similar category too under the VM

So in my opinion, VM's are sandboxes.

Of course there is also libriscv https://github.com/libriscv/libriscv which is a sandbox (The fastest RISC-V sandbox)

There is also https://github.com/Zouuup/landrun Run any Linux process in a secure, unprivileged sandbox using Landlock. Think firejail, but lightweight, user-friendly, and baked into the kernel.

Your mileage may vary but I consider firecracker to be the AI sandbox usually. Othertimes it can be that they abstract on a cloud provider and open up servers in that or similar (I feel E2B does this on top of gcp)

eyberg 12 hours ago||
A lot of these "ai sandbox" conversations target code that is already running in a public cloud. Running firecracker doesn't give you magical isolation properties vs running an application in ec2 - it's the same boundary. If you're trying to compare to running multi-tenant workloads in containers on the same vm vs different tenants on different vms - sure that's an improvement but no one said you had to run containers to begin with.

Furthermore, running lots of random 3rd party programs in the same instance, be it a container, or an ec2 vm, or a firecracker vm all have the same issues - it is inherently totally unsafe. If you want to "sandbox" something you need to detail what exactly you are wanting to isolate.

A lot of people might suggest not being able to write to the filesystem, read env vars, or talk over the network but these are table stakes for a lot of the workloads that people want to "isolate" to begin with.

So not only is there this incorrect view that you are isolating anything at all, but I'm not convinced that the most important things, like being able to run arbitrary 3rd party programs, is even being considered.

lagniappe 17 hours ago|||
Please brother may i have some pledge unveil
ape4 12 hours ago||
By your definition, a physical child's sandbox isn't a sandbox.