Posted by enos_feedler 22 hours ago
I would like to humbly propose that we simply provision another computer for the agent to use.
I don't know why this needs to be complicated. A nano EC2 instance is like $5/m. I suspect many of us currently have the means to do this on prem without resorting to virtualization.
Could this be used with arbitrary local tools as well? I could be missing something but I don't see how you could use a non remote MCP server with this setup.
So if you can work within the constraints there are a lot of benefits you get as a platform: latency goes down a lot, performance may go up depending on user hardware (usually more powerful than the type of VM you'd use for this), bandwidth can go down significantly if you design this right, and your uptime and costs as a platform will improve if you don't need to make sure you can run thousands of VMs at once (or pay a premium for a platform that does it for you)[1]
All that said I'm not sure trying to put an entire OS or something like WebContainers in the user's browser is the way, I think you need to build a slightly custom runtime for this type of local agentic environment. But I'm convinced it's the best way to get the smoothest user experience and smoothest platform growth. We did this at Framer to be able to recompile any part of a website into React code at 60+ frames per second, which meant less tricks necessary to make the platform both feel snappy and be able to publish in a second.
[1] For big model providers like OpenAI and Anthropic there's an interesting edge they have in that they run a tremendous amount of GPU-heavy loads and have a lot of CPUs available for this purpose.
Browsers as agent environment opens up a ton of exciting possibilities. For example, agents now have an instant way to offer UIs based on tech governed by standards(HTML/CSS) instead of platform specific UI bindings. A way to run third party code safely in wasm containers. A way to store information in disk with enough confidence that it won't explode the user's disk drive. All this basically for free.
My bet is that eventually we'll end up with a powerful agentic tool that uses the browser environment to plan and execute personal agents or to deploy business agents that doesn't access system resources any more than browsers do at the moment.
Generated via ChatGPT, this canvas shows a basic pyramid and has sliders that you can use to change the pyramid, and download the glTF to your local machine. You can also click the edit w/ ChatGPT and tweak the UI however you're able to prompt it into doing.
https://chatgpt.com/canvas/shared/697743f616d4819184aef28e70...
It's easily explained by the fact that all the javascript code is exposed in a browser and all the network connections are trivially inspectable and blockable. It's much harder to collect data and do shady things with that level of inspectability. And it's much harder to ban alternative clients for the main paid offer. Especially if AI companies want to leave the door open to pushing ads to your conversations.
- https://vermaden.wordpress.com/2021/12/15/secure-containeriz...
Browser sandboxes are swiss cheese. In 2024 alone, Google reported 75 zero-day exploits that break out of their browser's sandbox.
Browsers are the worst security paradigm. They have tens of millions of lines of code, far more than operating system kernels. The more lines of code, the more bugs. They include features you don't need, with no easy way to disable them or opt-in on a case-by-case basis. The more features, the more an attacker can chain them into a usable attack. It's a smorgasbord of attack surface. The ease with which the sandbox gets defeated every year is proof.
So why is everyone always using browsers, anyway? Because they mutated into an application platform that's easy to use and easy to deploy. But it's a dysfunctional one. You can't download and verify the application via signature, like every other OS's application platform. There's no published, vetted list of needed permissions. The "stack" consists of a mess of RPC calls to random remote hosts, often hundreds if not thousands required to render a single page. If any one of them gets compromised, or is just misconfigured, in any number of ways, so does the entire browser and everything it touches. Oh, and all the security is tied up in 350 different organizations (CAs) around the world, which if any are compromised, there goes all the security. But don't worry, Google and Apple are hard at work to control them (which they can do, because they control the application platform) to give them more control over us.
This isn't secure, and there's really no way to secure it. And Google knows that. But it's the instrument making them hundreds of billions of dollars.
That's enough reason for me to say, f** no. Google will try as hard as possible to promote this even if it's not technically the best solution.
Something more like a TEE inside the browser of sorts. Not sure if there is anything like this.
But then you'd also want the frame content to use `integrity` on nested resoures.
CSP frame-src can help for now.