The thing I am working on is improving at the moment agentic tool usage success rates for my research and I use this as a proxy to access everything with the cookies I allow in the session.
Yes, I know it likely breaks everybody's terms of service but at the same time I'm not loading gigabytes of ads, images, markup, to accomplish things.
If anyone is interested I can take some time and publish it this week.
I also use it to automatically retrieve page responsiveness behavior in complex web apps. It uses playwright to adjust the width and monitor entire trees for exact changes which it writes structured data that includes the complete cascade of styles relevant with screenshots to support the snapshots.
There are tools you can buy that let you do this kind of inspection manually, but they are designed for humans. So, lots of clickety-clackety and human speed results.
---
My first reaction to seeing this FP was why are people still releasing MCPs? So far I've managed to completely avoid that hype loop and went straight to building custom CLIs even before skills were a thing.
I think people are still not realizing the power and efficiency of direct access to things you want and skills to guide the AI in using the access effectively.
Maybe I'm missing something in this particular use case?
It's God's gift to them when it lets them bypass ads and dl copyrighted material. But it's Satan's curse on humanity when the Zuck does it to train his llm and dl copyrighted material.
My excuse for not keeping up is that I'm in so deep that Claude Code can predict the stock market.
I'll still publish mine and see if has any value but agent browser looks very complete.
Thank you for sharing!
Here is what actually fixed it: https://github.com/yt-dlp/ejs/pull/53/changes
yt-dlp is relatively stable, but still occasionally breaks for long periods. I get the sense YouTube is becoming increasingly adversarial to yt-dlp as well.
I don't know the details, but it doesn't seem like yt-dlp is running the entire YouTube JS+DOM environment. Something like a real headless browser seems like it would break less often, but be much heavier weight. And Youtube might have all sorts of other mitigations against this approach.
I don't know anything about yt-dlp.
It would probably help people who want to go to a concert and have a chance to beat the scalpers cornering the market on an event in 30 seconds hitting the marketplace services with 20,000 requests.
I can try to see if can bypass yt-dlp. But that is always a cat and mouse game.
Never tried that level of autonomy though. How long is your iteration cycle?
If I had to guess, mine was maybe 10-20 minutes over a few prompts.
I use Patchright + Ghostery and I have a cleaver tool that uses web sockets to pass 1 second interval screenshots to the a dashboard and pointer / keyboard events to the server which allow interacting with websites so that a user can create authentication that is stored in the chrome user profile with all the cookies, history, local storage, ect.. in the cloud on a server.
Can you list some websites that don't require subscription that you would like to me to test against? I used this for Robinhood and I think Linked in would be a good example for people to use.
As someone that does heavy agentic coding (using basically all the tools), this is so far from the truth. People claiming this have probably never worked in large enterprise environments where things like authentication, RBAC, rate limiting, abuse detection, centralized management/updates/ops, etc. are a huge part of the development and deployment workflow.
In these situations you can't just use skills and cli tools without a gigantic amount of retooling and increased operational and security complexity. MCP is really useful here, and allows centralized eng and ops teams to manage their services in a way that aligns with the organizations overall posture, policies, and infrastructure.
> Google is so far behind agentic cli coding. Gemini CLI is awful.
This part I totally agree. It's really hard to express how bad it is (and it's really disappointing.)
When folks say MCP is dead, I don't get it. What other alternatives exist in place of MCP? Arbitrary code via curl/sdks to call a remote endpoint?
cli?
for example aws cli. It's a full interface to aws API. Why would you need mcp for that?
and if you have any doubts, agents use it with a great effect even without any relevant skill. "aws help" is fully discoverable.
MCP's remaining moats I think are:
- No-install product integrations (just paste in mcp config into app)
- Non-developer end users / no shell needed (no terminal)
- Multi-tenant auth (many users, dynamic OAuth)
- Security sandboxing (restrict what agents can do), credential sandboxing (agents never see secrets)
- Compliance/audit (structured logs, schema enforcement)?
If you're a developer building for developers though, CLI seems to be a clear winner right
But since when has this industry done the right thing informed by wisdom and hindsight?
I use it extensively, many of my colleagues do. I get a ton of value out of it. Some prefer Antigravity, but I prefer Gemini CLI. I get fairly long trajectories out of it, and some of my colleagues are getting day-long trajectories out of it. It has improved massively since I started using it when it first came out.
Some people will push back on this. They are holding out hope that the recent improvements Anthropic has made in this regard have improved the context rot problem with MCP. Anthropic's changes improve things a little. But it is akin to putting lipstick on a pig. It helps, but not much.
The reason MCP is dying/dead is because MCP servers, once configured, bloat up context even when they are not being used. Why would anybody want that?
Use agent skills. And say goodbye to MCP. We need to move on from MCP.
It is hard to say nowadays, when things change so quickly
What about all the CLI tools not baked into the model's priors?
Every time someone says "extensibility mechanism X is dead!", I think "Well, I guess that guy isn't doing anything that needs to extend the statistical average of 2010s-era Reddit"
https://github.com/pasky/chrome-cdp-skill
For example, I use codex to manage a local music library, and it was able to use the skill to open a YT Music tab in my browser, search for each album, and get the URL to pass to yt-dlp.
Do note that it only works for Chrome browsers rn, so you have to edit the script to point to a different Chromium browser's binary (e.g. I use Helium) but it's simple enough
edit: upon rereading, I now realize the (different) prompt injection risk you were calling out re: the handoff to yt-dlp. Separate profiles won't save you from that, though there are other approaches.
But if I understood the original commenter's use case, they're just searching YT Music to get the URL to a given song. This appears[0] to work fine without being logged in. So you could parameterize or wrap the call to yt-dlp and only have your cookie jar usable there.
Chrome's 'allow pasting' gets ignored reflexively by most users anyway. If this agent can touch DevTools the attack surface expands far faster than most people realize or will ever audit.
I think using the Pi coding agent really got me used to this way of thinking: https://mariozechner.at/posts/2025-11-30-pi-coding-agent/#to...
Also. AAarrgh, my new thing to be annoyed at is AI drivel written slop.
"No browser automation framework, no separate browser instance, no re-login."
Oh really, nice. No separate computer either? No separate power station, no house, no star wars? No something else we didn't ask for? Just one a toggle and you go? Whoaaaaaa.
Edit: lol even the skill itself is vibe coded:
Lightweight Chrome DevTools Protocol CLI. Connects directly via WebSocket — no Puppeteer, works with 100+ tabs, instant connection.
I feel like there's nothing fucking left on the internet anymore that is not some mean of whatever the LLM is trained to talk like now.
HN is becoming close to unusable, and this isn’t like the previous times where people say it’s like reddit or something. It is inundated with bot spam, it just happens the bot spam is sufficiently engaging and well-written that it is really hard to address.
DevTools MCP and its new CLI are maintained by the team behind Chrome DevTools & Puppeteer and it certainly has a more comprehensive feature set. I'd expect it to be more reliable, but.. hey open source competition breeds innovation and I love that. :)
(I used to work on the DevTools team. And I still do, too)
Great news to all of us keenly aware of MCP's wild token costs. ;)
The CLI hasn't been announced yet (sorry guys!), but it is shipping in the latest v0.20.0 release. (Disclaimer: I used to work on the DevTools team. And I still do, too)
This is incorrect. Plenty of people have run the numbers. Tool search does not fix all problems with MCP.
What we actually need is a standard for websites to expose a machine-readable interaction layer alongside the human one. Something like robots.txt but for agent capabilities — declaring available actions, their parameters, authentication requirements, and response schemas. Not scraping the DOM and hoping the AI figures out which button to click.
The web already went through this evolution once: we went from screen-scraping HTML to structured APIs. Now we're regressing back to scraping because agents need to interact with sites that only have human interfaces. A lightweight standard — call it agents.json or whatever — where sites declare "here are the actions you can take, here are the endpoints, here's the auth flow" would eliminate 90% of the token waste, security concerns, and fragility discussed in this thread.
Until that exists, we'll keep building increasingly clever hacks on top of a 30-year-old document format that was never designed for machine consumption.
For example, you can get a markdown out of most OpenAI documentation by appending .md like this: https://developers.openai.com/api/docs/libraries.md
Not definitive, but still useful.
Citation needed.
> The web already went through this evolution once: we went from screen-scraping HTML to structured APIs. Now we're regressing back to scraping because agents need to interact with sites that only have human interfaces.
To me, sites that "only have human interfaces" are more likely that not be that way totally on purpose, attempting to maximize human retention/engagement and are more likely to require strict anti-bot measures like Proof-of-Work to be usable at all.
We had this 20 years ago with the Semantic Web movement, XHTML, and microformats. Sadly, it didn't pan out for various reasons, most of them non-technical. There's remnants of it today with RSS feeds, which is either unsupported or badly supported by most web sites.
Once advertising became the dominant business model on the web, it wasn't in publishers' interest to provide a machine-readable format of their content. Adtech corporations took control of the web, and here we are. Nowadays even API access is tightly controlled (see Reddit, Twitter, etc.).
So your idea will never pan out in practice. We'll have to continue to rely on hacks and scraping will continue to be a gray area. These new tools make automated scraping easier, for better or worse, but publishers will find new ways to mitigate it. And so it goes.
Besides, if these new tools are "superintelligent", surely they're able to navigate a web site. Captchas are broken and bot detection algorithms (or "AI" themselves) are unreliable. So I'd say the leverage is on the consumer side, for now.
Which is called ARIA and has been a thing forever.
Favourite unexpected use case for me was telling gemini to use it as a SVG editing repl, where it was able to produce some fantastic looking custom icons for me after 3-4 generate/refresh/screenshot iterations.
Also works very nicely with electron apps, both reverse engineering and extending.
Will check this out to see if they’ve solved the token burn problem.