Posted by theredsix 5 hours ago
ABP is designed to keep the acting agent synchronized with the browser at every step. After each action (click, type, etc), it freezes JavaScript execution and rendering, then captures the resulting state. It also compiles the notable events that occurred during that action loop, such as navigation, file pickers, permission prompts, alerts, and downloads, and sends that along with a screenshot of the frozen page state back to the agent.
The result is that browser interaction starts to feel more like a multimodal chat loop. The agent takes an action, gets back a fresh visual state and a structured summary of what happened, then decides what to do next from there. That fits much better with how LLMs already work.
A few common browser-use failures ABP helps eliminate: * A modal appears after the last Playwright screenshot and blocks the input the agent was about to use * Dynamic filters cause the page to reflow between steps * An autocomplete dropdown opens and covers the element the agent intended to click * alert() / confirm() interrupts the flow * Downloads are triggered, but the agent has no reliable way to know when they’ve completed
As proof, ABP with opus 4.6 as the driver scores 90.5% on the Online Mind2Web benchmark. I think modern LLMs already understand websites, they just need a better tool to interact with them. Happy to answer questions about the architecture, forking chrome or anything else in the comments below.
Try it out: `claude mcp add browser -- npx -y agent-browser-protocol --mcp` (Codex/OpenCode instructions in the docs)
Demo video: https://www.loom.com/share/387f6349196f417d8b4b16a5452c3369
Worth flagging: a forked Chromium with frozen-state semantics will behave differently from stock Chrome, and fingerprinting libraries will notice. We run browser automation on ephemeral cloud desktops at Cyqle (https://cyqle.in) and keeping the browser environment consistent across sessions was its own fight. Curious if ABP exposes hooks for controlling OS-level context — display resolution, font rendering, timezone — since those affect agent reliability almost as much as DOM staleness.
I'm so sick of reading OpenClaw comments! No activity for 7 months, and then in the past day, five comments from an LLM pitching your tool. What are you doing man? This degrades the quality of HN so badly.
And what does opus score with "regular" browser harnesses?