Posted by Alifatisk 7 hours ago
The next one would be to also decouple the visual part of a website from the data/interactions: Let the users tell their in-browser agent how to render - or even offer different views on the same data. (And possibly also WHAT to render: So your LLM could work as an in-website adblocker for example; Similar to browser extensions such as a LinkedIn/Facebook feed blocker)
I really like the way you can expose your schema through adding fields to a web form, that feels like a really nice extension and a great way to piggyback on your existing logic.
To me this seems much more promising than either needing an MCP server or the MCP Apps proposal.
It's great they are working on standardizing this so websites don't have to integrate with LLMs. The real opportunity seems to be able to automatically generate the tool calls / MCP schema by inspecting the website offline - I automated this using PLayright MCP.
Every generation needs its own acronyms and specifications. If a new one looks like an old one likely the old one was ahead of its time.
Instead of parsing or screen-shooting the current page to understand the context, an AI agent running in the browser can query the page tools to extract data or execute actions without dealing with API authentication.
It's a pragmatic solution. An AI agent, in theory, can use the accessibility DOM to improve access to the page (or some HTML data annotation); however, it doesn't provide it with straightforward information about the actions it can take on the current page.
I see two major roadblocks with this idea:
1. Security: Who has access to these MCPs? This makes it easier for browser plugins to act on your behalf, but end users often don't understand the scope of granting plugins access to their pages.
2. Incentive: Exposing these tools makes accessing website data extremely easy for AI agents. While that's great for end users, many businesses will be reluctant to spend time implementing it (that's the same reason social networks and media websites killed RSS... more flexibility for end users, but not aligned with their business incentives)
But I’d happily add a little mcp server to it in js, if that means someone else can point their LLM at it and be taught how to play sudoku.
The browser has tons of functionality baked in, everything from web workers to persistence.
This would also allow for interesting ways of authenticating/manipulating data from existing sites. Say I'm logged into image-website-x. I can then use the WebMCP to allow agents to interact with the images I've stored there. The WebMCP becomes a much more intuitive way than interpreting the DOM elements
Instead of letting the agent call a server (MCP), the agent downloads javascript and executes it itself (WebMCP).