Posted by enos_feedler 1 day ago
At the risk of sounding obvious :
- Chrome (and Chromium) is a product made and driven by one of the largest advertising company (Alphabet, formally Google) as a strategical tool for its business model
- Chrome is one browser among many, it is not a de facto "standard" just because it is very popular. The fact that there are a LOT of people unable to use it (iOS users) even if they wanted to proves the point.
It's quite important not to amalgamate some experimental features put in place by some vendors (yes, even the most popular ones) as "the browser".
There are many useful things that can only be implemented for Chromium: things like the filesystem API mentioned in this post, the USB devices API used to implement various microcontroller flashing tools, etc. Users can have multiple browsers installed, and I often use Chromium as essentially a sandboxed program runtime.
Chrome add these features because they are responding to the demands of web developers. It's not web developers fault if firefox can't or refuses to provide apis that are being asked for.
Mozilla could ask Claude to implement the filesystem api today and ship it to everyone tomorrow if they wanted to. They are holding their own browser back, don't let them also hold your website back. In regards to browser monoculture there are many browsers built on top of the open source Blink that are not controlled by Google such as Edge, Brave, and Opera just to name a few of the many.
I know there are lots of good arguments why docker isn't perfect isolation. But it's probably 3 orders of magnitude safer than running directly on my computer, and the alignment with the existing dev ecosystem (dev containers, etc) makes it very streamlined.
Browsers are closer to operating systems rather than sandboxes, so giving access of any kind to an agent seems dangerous. In the post I can see it's talking about the file access API, perhaps a better phrasing is, the browser has a sandbox?
The point is that most people won't do that. Just like with backups, strong passwords, 2FA, hardware tokens etc. Security and safety features must be either strictly enforced or on enabled by default and very simple to use. Otherwise you leave "the masses" vulnerable.
LLMs are actually quite neutral and don't have preferences, wants, or needs. That's just us projecting our own emotions on them. It's just that a lot of command line stuff is relatively easy to figure out for LLMs because that is highly scriptable, mostly open source, and well documented (and part of their actual training data). And scripting is just a form of programming.
The approach in the article that Simon Willison is commenting on here isn't that much different; except the file system now runs in a browser sandbox and the tools are WASM based and a bit more limited. But then, a lot of the files that a normal user works with would be binary files for things like word processors, photo editors, spreadsheets, presentation software, etc. Stuff that is a bit out of the comfort zone of normal command line tools in any case.
I actually tried codex on some images the other day. It kind of managed but it wasn't pretty. It basically started doing a lot of slow and expensive stuff with python and then ran out of context because it tried to dump all the image content in there. Far from optimal. You'd want to spend some time setting up some skills and tools before you attempt this. The task I gave it was pretty straightforward: create an image catalog in markdown format for these images. Describe their content, orientation, and file format.
My intention was to use that as a the basis for picking appropriate images to be used on different sections in my (static) website without having to open and scan each image all the time. It half did it before running out of context. I decided to complete the task manually (quicker and I have more 'context' for interpreting the images). And then I let codex pick better images for this website. Mostly it did a pretty OK job with that at least.
I learn a lot from finding places where these tools start struggling. It's why I like Simon's comments so much because he's constantly pushing these tools to their limits and finding out surprising, interesting, or funny success and failure modes.
It would probably help if the sandbox presented a linux-y looking API, and translated that to actual browser commands.
Yeah they do. Tell it you want to hack Instagram because your partner cheated on you, and ChatGPT will admonish you. Request that you're building a present for Valentines day for your partner and you want a chrome extension that runs on instagram.com; word it just right, and it'll oblige.
The browser should be a VM host.