Top
Best
New

Posted by Alifatisk 7 hours ago

WebMCP Proposal(webmachinelearning.github.io)
117 points | 62 commentspage 2
Garlef 5 hours ago|
I think this is a good idea.

The next one would be to also decouple the visual part of a website from the data/interactions: Let the users tell their in-browser agent how to render - or even offer different views on the same data. (And possibly also WHAT to render: So your LLM could work as an in-website adblocker for example; Similar to browser extensions such as a LinkedIn/Facebook feed blocker)

Raed667 4 hours ago|
Why would Facebook or LinkedIn ever give you this?
mcintyre1994 6 hours ago||
Wes Bos has a pretty cool demo of this: https://www.youtube.com/watch?v=sOPhVSeimtI

I really like the way you can expose your schema through adding fields to a web form, that feels like a really nice extension and a great way to piggyback on your existing logic.

To me this seems much more promising than either needing an MCP server or the MCP Apps proposal.

innagadadavida 5 hours ago|
Demo I built 5 months ago: https://www.youtube.com/watch?v=02O2OaNsLIk This exposes ecommerce specific tool calls as regular javascript functions as it is more lightweight than going the MCP route.

It's great they are working on standardizing this so websites don't have to integrate with LLMs. The real opportunity seems to be able to automatically generate the tool calls / MCP schema by inspecting the website offline - I automated this using PLayright MCP.

jayd16 6 hours ago||
Have any sickos tried to point AI at SOAP APIs with WSDL definitions, yet?
chopete3 6 hours ago|
Likely no.

Every generation needs its own acronyms and specifications. If a new one looks like an old one likely the old one was ahead of its time.

DevKoala 3 hours ago||
Most teams that want their data to be operated programmatically expose an API. For who does this solve a problem?
diegof79 2 hours ago|
Mainly for web browser plugin authors implementing AI assistants (Gemini/Claude/OpenAI/Copilot).

Instead of parsing or screen-shooting the current page to understand the context, an AI agent running in the browser can query the page tools to extract data or execute actions without dealing with API authentication.

It's a pragmatic solution. An AI agent, in theory, can use the accessibility DOM to improve access to the page (or some HTML data annotation); however, it doesn't provide it with straightforward information about the actions it can take on the current page.

I see two major roadblocks with this idea:

1. Security: Who has access to these MCPs? This makes it easier for browser plugins to act on your behalf, but end users often don't understand the scope of granting plugins access to their pages.

2. Incentive: Exposing these tools makes accessing website data extremely easy for AI agents. While that's great for end users, many businesses will be reluctant to spend time implementing it (that's the same reason social networks and media websites killed RSS... more flexibility for end users, but not aligned with their business incentives)

DevKoala 27 minutes ago||
But think about it. Will you do it for your web property? Is someone else going to do it for my web property when I have clearly blocked robots? Will I do it for another web property for my agent to work and hope they don’t update their design or protect themselves against it?
datadrivenangel 5 hours ago||
The problem with agents browsing the web, is that most interesting things on the web are either information or actions, and for mostly static information (resources that change on the scale of days) the format doesn't matter so MCP is pointless, and for actions, the owner of the system will likely want to run the MCP server as an external API... so this is cool but does not have room.
OtherShrezzing 4 hours ago|
I disagree. I run a sudoku site. It’s completely static, and it gets a few tens of thousands of hits per day, as users only download the js bundle & a tiny html page. It costs me a rounding error on my monthly hosting to keep it running. To add an api or hosted mcp server to this app would massively overcomplicate it, double the hosting costs (at least), and create a needless attack surface.

But I’d happily add a little mcp server to it in js, if that means someone else can point their LLM at it and be taught how to play sudoku.

baalimago 6 hours ago||
Very cool! I imagine it'll be possible to start a static webserver + WebMCP app then use browser as virtualization layer instead of npm/uvx.

The browser has tons of functionality baked in, everything from web workers to persistence.

This would also allow for interesting ways of authenticating/manipulating data from existing sites. Say I'm logged into image-website-x. I can then use the WebMCP to allow agents to interact with the images I've stored there. The WebMCP becomes a much more intuitive way than interpreting the DOM elements

dvt 5 hours ago||
I’m working on a DOM agent and I think MCP is overkill. You have a few “layers” you can imply by just executing some simple JS (eg: visible text, clickable surfaces, forms, etc). 90% of the time, the agent can imply the full functionality, except for the obvious edge cases (which trip up even humans): infinite scrolling, hijacking navigation, etc.
Garlef 5 hours ago||
Question: Are you writing this under the assumption that the proposed WebMCP is for navigating websites? If so: It is not. From what I've gathered, this is an alternative to providing an MCP server.

Instead of letting the agent call a server (MCP), the agent downloads javascript and executes it itself (WebMCP).

0x696C6961 5 hours ago|||
In what world is this simpler than just giving the agent a list of functions it can call?
Mic92 5 hours ago|||
So usually MCP tool calls a sequential and therefore waste a lot of tokens. There is some research from Antrophic (I think there was also some blog post from cloudflare) on how code sandboxes are actually a more efficient interface for llm agents because they are really good at writing code and combining multiple "calls" into one piece of code. Another data point is that code is more deterministic and reliable so you reduce the hallucination of llms.
foota 5 hours ago||
What do the calls being sequential have to do with tokens? Do you just mean that the LLM has to think everytime they get a response (as opposed to being able to compose them)?
zozbot234 5 hours ago||
LLMs can use CLI interfaces to compose multiple tool calls, filter the outputs etc. instead of polluting their own context with a full response they know they won't care about. Command line access ends up being cleaner than the usual MCP-and-tool-calls workflow. It's not just Anthropic, the Moltbot folks found this to be the case too.
foota 4 hours ago||
That makes sense! The only flaw here imo is that sometimes that thinking is useful. Sub-agents for tool calls imo make a nice sort of middle ground where they can both be flexible and save context. Maybe we need some tool call composing feature, a la io_uring :)
dvt 5 hours ago|||
Who implements those functions? E.g., store.order has to have its logic somewhere.
Mic92 5 hours ago||
Do expose the accessibility tree of a website to llms? What do you do with websites that lack that? Some agents I saw use screenshots, but that seems also kind of wasteful. Something in-between would be interesting.
dvt 5 hours ago||
I actually do use cross-platform accessibility shenanigans, but for websites this is rarely as good as just doing like two passes on the DOM, it even figures out hard stuff like Google search (where ids/classes are mangled).
cedws 3 hours ago||
You could get rid of the need for the browser completely just by publishing an OpenAPI spec for the API your frontend calls. Why introduce this and add a massive dependency on a browser with a JavaScript engine and all the security nightmares that comes with?
curtisblaine 2 hours ago|
Because the nightmares associated with having an API, authentication, database, persistent server etc. are worse. If all you have is an SPA you shouldn't be forced to set up an API just to be called by an LLM.
kekqqq 6 hours ago||
Finally, I was hoping for this to be implemented in 2026. Rendered DOM is for humans, not for agents.
TZubiri 4 hours ago|
What problem does this solve?
More comments...