Top
Best
New

Posted by enos_feedler 1 day ago

The browser is the sandbox(simonwillison.net)
331 points | 175 commentspage 3
utopiah 22 hours ago|
Wrong title, if it's "File System Access API (still Chrome-only as far as I can tell)" then it should read "A browser is the sandbox".

At the risk of sounding obvious :

- Chrome (and Chromium) is a product made and driven by one of the largest advertising company (Alphabet, formally Google) as a strategical tool for its business model

- Chrome is one browser among many, it is not a de facto "standard" just because it is very popular. The fact that there are a LOT of people unable to use it (iOS users) even if they wanted to proves the point.

It's quite important not to amalgamate some experimental features put in place by some vendors (yes, even the most popular ones) as "the browser".

RodgerTheGreat 22 hours ago|
I stand by a policy that if a feature in one of my projects can only be implemented in Chrome, it's better not to add the feature at all; the same is true for features which would be exclusive to Firefox. Giving users of a specific browser a superior experience encourages a dangerous browser monoculture.
jefftk 11 hours ago|||
Not writing the feature makes sense, but pushing Firefox and Safari to add support would be pro-social if you're up for it. The most common reason for browsers not to add support is something like "this can be done in other ways, and it has maintainability/security/bloat downsides". Running into a feature you can't build is evidence on the "this can be done in other ways" question (but of course the other downsides could still be big enough that it's not worth doing).
digiown 13 hours ago||||
I say the following as a firefox+ubo user:

There are many useful things that can only be implemented for Chromium: things like the filesystem API mentioned in this post, the USB devices API used to implement various microcontroller flashing tools, etc. Users can have multiple browsers installed, and I often use Chromium as essentially a sandboxed program runtime.

asadotzler 13 hours ago||
SOME users can have multiple browsers installed. Some can absolutely not. In fact, 1.6 billion users can only have one installed and it's not Chrome or Chromium based.
digiown 13 hours ago||
Assuming you're talking about iOS: and their OS won't let them install your app to manage files or flash microcontrollers anyway. It's not your problem when they choose an actively hostile platform.
OhNoNotAgain_99 19 hours ago||||
[dead]
charcircuit 19 hours ago|||
Firefox is only a few percent market share. You are hiring your users for not improving their user experience because it's not compatible with one of the a web browsers on a few percent of people's computers.

Chrome add these features because they are responding to the demands of web developers. It's not web developers fault if firefox can't or refuses to provide apis that are being asked for.

Mozilla could ask Claude to implement the filesystem api today and ship it to everyone tomorrow if they wanted to. They are holding their own browser back, don't let them also hold your website back. In regards to browser monoculture there are many browsers built on top of the open source Blink that are not controlled by Google such as Edge, Brave, and Opera just to name a few of the many.

padolsey 18 hours ago||
Agree! And this is why it is a bad idea IMHO for agents to sit at the abstraction layer of browser or below (OS). Even at the browser-addon level it's dangerous. It runs with the user’s authority across contexts and erodes zero-trust by becoming a confused deputy: https://en.wikipedia.org/wiki/Confused_deputy_problem
zmmmmm 18 hours ago||
At the moment I'm fairly OK using docker + integration scripts / tools that expose host OS functionality (like if it needs screenshots etc).

I know there are lots of good arguments why docker isn't perfect isolation. But it's probably 3 orders of magnitude safer than running directly on my computer, and the alignment with the existing dev ecosystem (dev containers, etc) makes it very streamlined.

zkmon 19 hours ago||
If you ask a blacksmith how to fix a screw, he might say "just hit one strike with this good old hammer". Coding agents are integral to IDEs.
politelemon 22 hours ago||
A sandbox is meant to be a controlled environment where you can execute code safely. Browsers can access your email, banking, commerce and the keys to your digital life.

Browsers are closer to operating systems rather than sandboxes, so giving access of any kind to an agent seems dangerous. In the post I can see it's talking about the file access API, perhaps a better phrasing is, the browser has a sandbox?

felixfbecker 22 hours ago||
That is like saying the kernel/sandbox hypervisor can access those things. The point is that the sandboxed code cannot. In browsers, code from one origin cannot access those things from another origin unless explicitly enabled with CORS.
fragmede 22 hours ago||
just make a separate user profile without your email , banking, and commerce, if that's what you don't want it to have access to.
grumbelbart2 22 hours ago||
Why not "just use a different machine for banking" etc.

The point is that most people won't do that. Just like with backups, strong passwords, 2FA, hardware tokens etc. Security and safety features must be either strictly enforced or on enabled by default and very simple to use. Otherwise you leave "the masses" vulnerable.

Havoc 20 hours ago||
Using anything other than a Linux CLI and file system seems like a misstep to me - it’s what LLMs know best and can use best.
jillesvangurp 19 hours ago|
That's great if you are a developer and that's also how I work myself. You aren't wrong. But there are a lot of users who are not developers for whom that isn't a viable path. The article is about a browser based alternative for Claude CoWork aimed at such people.

LLMs are actually quite neutral and don't have preferences, wants, or needs. That's just us projecting our own emotions on them. It's just that a lot of command line stuff is relatively easy to figure out for LLMs because that is highly scriptable, mostly open source, and well documented (and part of their actual training data). And scripting is just a form of programming.

The approach in the article that Simon Willison is commenting on here isn't that much different; except the file system now runs in a browser sandbox and the tools are WASM based and a bit more limited. But then, a lot of the files that a normal user works with would be binary files for things like word processors, photo editors, spreadsheets, presentation software, etc. Stuff that is a bit out of the comfort zone of normal command line tools in any case.

I actually tried codex on some images the other day. It kind of managed but it wasn't pretty. It basically started doing a lot of slow and expensive stuff with python and then ran out of context because it tried to dump all the image content in there. Far from optimal. You'd want to spend some time setting up some skills and tools before you attempt this. The task I gave it was pretty straightforward: create an image catalog in markdown format for these images. Describe their content, orientation, and file format.

My intention was to use that as a the basis for picking appropriate images to be used on different sections in my (static) website without having to open and scan each image all the time. It half did it before running out of context. I decided to complete the task manually (quicker and I have more 'context' for interpreting the images). And then I let codex pick better images for this website. Mostly it did a pretty OK job with that at least.

I learn a lot from finding places where these tools start struggling. It's why I like Simon's comments so much because he's constantly pushing these tools to their limits and finding out surprising, interesting, or funny success and failure modes.

LinXitoW 18 hours ago|||
What the poster meant wasn't that the LLM itself is an entity with a preference, but simply that because of the training, LLMs are better at doing stuff in a standard Linux environment. If you have to teach it a new environment it either needs to waste time and context every time to look up stuff, or you need a company to do RL to teach it that new stuff (unlikely).

It would probably help if the sandbox presented a linux-y looking API, and translated that to actual browser commands.

fragmede 19 hours ago|||
> LLMs are actually quite neutral and don't have preferences, wants, or needs.

Yeah they do. Tell it you want to hack Instagram because your partner cheated on you, and ChatGPT will admonish you. Request that you're building a present for Valentines day for your partner and you want a chrome extension that runs on instagram.com; word it just right, and it'll oblige.

nezhar 23 hours ago||
Related https://news.ycombinator.com/item?id=12098338
segmondy 12 hours ago||
"The browser could be a sandbox" but the browser is definitely not a sandbox. The browser is an environment.
dekhn 11 hours ago||
It still amazes me just how nonstandard the sandbox in browsers is.

The browser should be a VM host.

bloppe 6 hours ago|
VMs are pretty heavy-weight to run all the JavaScript on a modern page. A proper VM requires a dedicated kernel. Firecracker boots the whole 40MB Linux kernel just to run a "function". A container doesn't have this baggage, but would never be considered secure enough for the web environment.
nezhar 23 hours ago|
I like the perspective used to approach this. Additionally, the fact that major browsers can accept a folder as input is new to me and opens up some exciting possibilities.
More comments...