Top
Best
New

Posted by namukang 4/7/2025

Show HN: Browser MCP – Automate your browser using Cursor, Claude, VS Code(browsermcp.io)
616 points | 217 comments
rmac 4/8/2025|
[!warning!]

1) this projects' chrome extension sends detailed telemetry to posthog and amplitude:

- https://storage.googleapis.com/cobrowser-images/telemetry.pn...

- https://storage.googleapis.com/cobrowser-images/pings.png

2) this project includes source for the local mcp server, but not for its chrome extension, which is likely bundling https://github.com/ruifigueira/playwright-crx without attribution

super suss

namukang 4/8/2025||
Hey, creator of Browser MCP here.

1. Yes, the extension uses an anonymous device ID and sends an analytics event when a tool call is used. You can inspect the network traffic to verify that zero personalized or identifying information is sent.

I collect anonymized usage data to get an idea of how often people are using the extension in the same way that websites count visitors. I split my time between many projects and having a sense of how many active users there are is helpful for deciding which ones to focus on.

2. The extension is completely written by me, and I wrote in this GitHub issue why the repo currently only contains the MCP server (in short, I use a monorepo that contains code used by all my extensions and extracting this extension and maintaining multiple monorepos while keeping them in sync would require quite a bit of work): https://github.com/BrowserMCP/mcp/issues/1#issuecomment-2784...

I understand that you're frustrated with the way I've built this project, but there's really nothing nefarious going on here. Cheers!

asaddhamani 4/8/2025|||
Hey, as a maker, I get it. You spent time building something, and you want to understand how it gets used. If you're not collecting personal info, there is nothing wrong with this.

Knee-jerk reactions aren't helpful. Yes, too much tracking is not good, but some tracking is definitely important to improving a product over time and focusing your efforts.

Trias11 4/8/2025|||
When people see “I collect” they won’t even bother reading further.

This is showstopper.

Noble reasons won’t matter.

Spyware perception.

wyldberry 4/8/2025||
This seems to be the opposite of what happens in reality.
nlarew 4/8/2025|||
"detailed" is an anonymized deviceId and a counter of tool calls? Heaven forbid an app want to get some basic insights into how people use it.
tomrod 4/8/2025|||
Correct. Telemetry should _always_ be opt-in and explicitly an easy choice to not engage.

Any other mode of operation is morally bankrupt.

nlarew 4/8/2025||
Really? The hyperbole does not help anyone here.

I don't sign a term sheet when I order at McDonalds but you can be damn sure they count how many big macs I order. Does that make them morally bankrupt? Or is it just a normal business operation that is actually totally reasonable?

genevra 4/14/2025|||
McDonald's is probably one of the most morally bankrupt companies out there, but I see your point.
tomrod 4/9/2025|||
> Does that make them morally bankrupt?

Yes, it does.

observationist 4/8/2025|||
This automatic sense of entitlement to surveil users is the absolute embodiment of the banality of evil.

It's 2025 - we want informed consent and voluntary participation with the default assumption that no, we do not want you watching over our shoulders, and no, you are not entitled to covertly harvest all the data you want and monetize that without notifying users or asking permissions. The whole ToS gotcha game is bullshit, and it's way past time for this behavior to stop.

Ignorance and inertia bolstering the status quo doesn't make it any less wrong to pile more bullshit like this onto the existing massive pile of bullshit we put up with. It's still bullshit.

nlarew 4/8/2025||
You're making a huge jump from "gathering anonymous counters to understand how many people use the thing" to "harvest all the data you want and monetize it".

If they were tracking my identity across sites and actually selling it to the highest bidder that's one thing that we'll definitely agree on. This is so so far from that.

You're welcome to build and use your own MCP browser automation if you're so hostile to the developer that built something cool and free for you to use.

observationist 4/9/2025||
The supply chain vulnerability in any extension is obvious. The problems with telemetry - any at all - are wide ranging and it's crazy to me that people don't see this.

Any covert, involuntary, automatic surveillance of a person for any reason whatsoever should have a court order and legal authority behind it - it's gross and exposes the target to vulnerabilities they're not cognizant of.

For telemetry tracking user behavior to be useful at all, it's got to be associated with a user. The idea of telemetry anonymization is marketing speak for "we obfuscated it, we know deanonymization is trivial, but people are stupid, especially regulators."

Any anonymization done is sufficiently obfuscated such that corporate asses get covered in the case of any regulatory investigation. There's no legitimate, mathematically valid anonymization of user data that you could do without destroying the information that you're trying to get in the first place through these tools. This means that any aggregation of user data useful to a malicious actor will inevitably be compromised - the second Posthog or Amplitude become a desirable target, they'll get pwned and breached, and much handwringing will be done, and there will be no recourse or recompense for damages done.

The only strategy to prevent the dissemination of surveillance data is not to collect it in the first place. It should be illegal to collect the data without voluntary, user initiated participation, and any information collected should be ephemeral with regular inspection to ensure compliance. Any violation of user privacy should result in crippling fines, something like 5% of the value of the company per user per day of violation - if you can't responsibly manage the data, you shouldn't be collecting it.

This means all the automatic continuous development a/b testing intrusive corner cutting corporate bullshit would have to stop. Continually leaking surveillance data to malicious actors year over year with no repercussions has thoroughly demonstrated that people cannot be trusted with safekeeping data.

I will build and use my own automation if I need to, based on products that don't covertly, involuntarily, ignorantly surveil their users, without even being aware of potential for harm, and I'll continue to point it out when it shows up in random projects and products, because it's wrong and it should stop.

We should stop embracing the things that enshittify the world, and stop sacrificing things like "other people's privacy" for convenience or profit.

payneio 4/18/2025||
Thanks for pushing for a realignment of product expectations. I agree.
bn-l 4/8/2025||
The only chrome extensions you should install are ones you can build yourself from source.
neycoda 4/8/2025||
... And have reviewed and understand completely
EGreg 4/8/2025||
So ... pretty much none

Keep in mind, extensions can update themselves at any time, including when they're bought out by someone else. In fact, I bet that's a huge draw... imagine buying an extension that "can read and modify data on all your websites" and then pushing an update that, oh I dunno, exfiltrates everyone's passwords from their gmail. How would most people even catch that?

DO NOT have any extensions running by default except "on click".

There should be at least some kind of static checker of extensions for their calls to fetch or other network APIs. The Web is just too permissive with updating code, you've got eval and much more. It would be great if browsers had only a narrow bottleneck through which code could be updated, and would ask the user first.

(That wouldn't really solve everything since there can be sleeper code that is "switched on" with certain data coming over the wire, but better than what we have now.)

metadat 4/8/2025|||
It would be interesting if you could easily install browser extensions via a source repository URL (e.g. GitHub, or any git URL), then at least there would be more transparency about who/what you are trusting by installing it. Blindly trusting a mostly anonymous chrome store "install" button seems insane, since they don't do any significant policing. Wasn't the promise of safety one of the primary reasons Google started the chrome store?
econ 4/8/2025||
Like user.script/grease monkey. It use to be that you could publish a reasonably large script and someone would review it. Even better was to start out simple then gradually update it so that existing users can continue reviewing by looking at the changes.

I think the permission system should be much more complicated so that the user gets a prompt that explains what is needed and why.

Furthermore there should be [paid] independent reviewers to sign off on extensions. This adds a lot of credibility, specially to a first time publication without users. That would also give app stores someone to talk to before deleting something. Nefarious actors working for app stores can have their credibility questioned.

rahimnathwani 4/8/2025||||

  Keep in mind, extensions can update themselves at any time
GP suggested only installing extensions you can build yourself from source. Most extensions that auto update do so via the Chrome store. If you install an extension from source, that won't happen.
bn-l 4/8/2025|||
> So ... pretty much none

You’d be surprised. It describes all the extensions I use.

bhouston 4/7/2025||
So the website claims:

"Avoids bot detection and CAPTCHAs by using your real browser fingerprint."

Yeah, not really.

I've used a similar system a few weeks back (one I wrote myself), having AI control my browser using my logged in session, and I started to get Captcha's during my human sessions in the browser and eventually I got blocked from a bunch of websites. Now that I've stopped using my browser session in that way, the blocks eventually went away, but be warned, you'll lose access yourself to websites doing this, it isn't a silver bullet.

tempest_ 4/7/2025||
The caveat with these things is usually "when used with high quality proxies".

Also I assume this extension is pretty obvious so it wont take long for CF bot detection to see it the same as playwrite or whatever else.

unixfox 4/8/2025|||
The extension enable debugging in your browser (a banner appears telling you about automation). It's possible to detect that in JavaScript.

Hence why projects like this exist: https://github.com/Kaliiiiiiiiii-Vinyzu/patchright. They hide the debugging part from JavaScript.

DeathArrow 4/7/2025|||
It might depend on the speed with which you click on the elements on the website.
SSLy 4/7/2025||
it does, CF bans my own honest to God clicks if I do them too fast.
omgwtfbyobbq 4/7/2025|||
About five years ago, maybe more, Google started sending me captchas if I ran too many repetitive searches. I could be wrong, but it feel like most large platforms have fairly sophisticated anti-bot/scraping stuff in place.
SubiculumCode 4/8/2025|||
Google does the same to me: Don't they know, I keep modifying my searches because their results sucked so bad I had to try 30 times to find the piece of information I needed?
what 4/8/2025||||
GitHub regularly blocks me for some reason. They tell me to slow down and I’m blocked for hours. I don’t get it.
Tepix 4/8/2025|||
Remember when github disabled searches for users who aren‘t logged in? Well, they just set the threshold for searches to 0 these days so they have de-facto disabled them again, this time avoiding the shitstorm.
rcakebread 4/8/2025|||
Make sure you are logged in. It was blocking me after just a couple searches if not logged in.
clown_strike 4/8/2025|||
Yandex does the same.
michaelbuckbee 4/7/2025||||
I use Vimium (Chrome extension for using keyboard control of the browser) and this happens to me as well since the behavior looks "unnatural".
sitkack 4/7/2025||
Must suck for people with assistive software. I get blocked on CF for now damn reason.
verve_rat 4/8/2025||
Yeah, I do wonder if there are any ADA implications with that?
TeMPOraL 4/8/2025||
I really really hope there are. Not just because of people who need these provisions, but also for everyone else, as accessibility is the last line of defense for preserving end-user interoperability.

Screen readers need to see a de-bullshittified, machine-readable version of the site + this is required by law sometimes, and generally considered a nice thing to enable -> the site becomes not just screen-reader friendly, but end user automation-friendly in general.

(I don't know how long this will hold, though. LLMs are already capable of becoming a screen reader without any special provisions - they can make sense of the UI the same way a sighted person can. I wouldn't trust them much now, but they'll only get better.)

wordofx 4/8/2025||||
I wish people would stop using CF. It’s just making the internet worse.
fastball 4/8/2025||
How so?
bombela 4/7/2025||||
Same here. And I am also using vimium.
PantaloonFlames 4/7/2025|||
SSLy the speed clicker
SkyBelow 4/7/2025|||
What do you think they might be looking for that could be detected pretty quickly? I'm wondering if it is something like they can track mouse movement and calculate when a mouse is moving too cleanly, so adding some more human like noise to the mouse movement can better bypass the system. Others have mentioned doing too many actions too fast, but what about potential timing between actions. Even if every click isn't that fast, if they have a very consistent delay that would be another non-human sign.
tempoponet 4/7/2025|||
Modern captchas use a number of tools including many of the approaches you mentioned. This why you might sometimes see a CloudFlare "I am not a robot" checkbox that checks itself and moves along before you have much time to even react. It's looking at a number of signals to determine that you're probably human before you've even checked the box.
dalemhurley 4/7/2025||
When I am using keyboard navigation, shortcuts and autofills, I seem to get mistaken for a bot a lot. These Captchas are really bad at detecting bots and really good at falsely labelling humans as bots.
Quarrel 4/8/2025|||
With AI feeding / scraping traffic to sites growing ridiculously fast, I think captchas & their equivalent are only going to be on the rise, and given the rise in so many people selling residential proxies I see, I don't doubt that measures and counter-measures on both sides are getting more and more sophisticated.

> These Captchas are really bad at detecting bots and really good at falsely labelling humans as bots.

As a human it feels that way to you. I suspect their false-positive rate is very low.

Of course, you may well be right that you get pinged more because of your style of browsing, which sux.

diatone 4/8/2025||||
Given the volume of bots they tend to be remarkably good at detecting bots

source: I work in a team that uses this kind of bot detection and yes, it works. And yes we do our best to keep false positives down

magicalhippo 4/8/2025||||
They're detecting patterns predominantly bots use. The fact that some humans also use them doesn't change that.

Back when I was playing Call of Duty 4, I got routinely accused of cheating because some people didn't think it was possible to click the mouse button as fast as I did.

To them it looked like I had some auto-trigger bot or Xbox controller.

I did in fact just have a good mouse and a quick finger.

animuchan 4/8/2025||
What's different is the badness of the outcome: if children mislabel you as a cheater in CoD, you may get kicked from the server.

If CloudFlare mislabels you as a bot, however, you may be unable to access medical services, or your bank account, or unable to check in for a flight, stuff like that. Actual important things.

So yes, I think it's not unreasonable to expect more from CF. The fact that some humans are routinely mischaracterized as bots should be a blocker level issue.

magicalhippo 4/8/2025||
Does it suck? Yes, absolutely. Should CF continuously work to reduce false positives? Yes, absolutely.

I've never failed the CF bot test so don't know how that feels. Though I have managed to get to level 8 or 9 on Google's ReCaptcha in recent times, and actually given up a couple of times.

Though my point was just it's gonna boil down to a duck test, so if you walk like a duck and quack like a duck, CF might just think you're a duck.

willsmith72 4/8/2025|||
Well you have to have false positives or negatives. Maybe they prefer positives
kmacdough 4/8/2025||||
> I'm wondering if it is something like they can track mouse movement

Yes, this is a big signal they use.

> adding some more human like noise to the mouse

Yes, this is a standard avoidance strategy. Easier said than done. For every new noise generation method, they work on detection. They also detect more global usage patterns and other signals, so you'd need to immitate the entire workflow of being human. At least within the noise of their current models.

econ 4/8/2025|||
Have a lot of small things count towards the result. Users behave quite linearly, extra points if they act differently all of a sudden.
mrweasel 4/8/2025||
There's also the whole issue of captchas being in place because people cannot be trusted to behave appropriately with automation tools.

"Avoids bot detection and CAPTCHAs" - Sure asshole, but understand that's only in place because of people like you. If you truly need access to something, ask for an API, may you need to pay for it, maybe you don't. May you get it, maybe the site owner tells you to go pound sand and you should take that as you're behaviour and/or use case is not wanted.

TeMPOraL 4/8/2025||
Actually, the CAPTCHAs are in place mostly because of assholes like you abusing other assholes like you[0].

Most of the automated misbehavior is businesses doing it to other businesses - in many cases, it's direct competition, or a third party the competition outsources it to. Hell, your business is probably doing it to them too (ask the marketing agency you're outsourcing to).

> If you truly need access to something, ask for an API, may you need to pay for it, maybe you don't.

Like you'd give it to me when you know I want it to skip your ads, or plug it to some automation or a streamlined UI, so I don't have to waste minutes of my life navigating your bloated, dog-slow SPA? But no, can't have users be invisible in analytics and operate outside your carefully designed sales funnel.

> May you get it, maybe the site owner tells you to go pound sand and you should take that as you're behaviour and/or use case is not wanted.

Like they have a final say in this.

This is an evergreen discussion, and well-trodden ground. There is a reason the browser is also called "user agent"; there is a well-established separation between user's and server's zone of controls, so as a site owner, stop poking your nose where it doesn't belong.

--

[0] - Not "you" 'mrweasel personally, but "you" the imaginary speaker of your second paragraph.

mrweasel 4/8/2025||
It seems that we have very different types of businesses in mind. I really didn't consider tracking users and displaying ads, but I also don't think this is where these types of tools would be used. Well, they might, but that's as part of some content farm, undesirable bots and downright scams, so nothing of value is really lost if this didn't exist.

If you have a sales funnel, as in you take orders and ship something to a customer, consumer or business, I almost guarantee you that you can request an API, if the company you want to purchase from is large enough. They'll probably give you the API access for free, or as part of a signup fee and give you access to discounts. Sometimes that API might be an email, or a monthly Excel dump, but it's an API.

When we're talking site that purely survive on tracking users and reselling their data, then yes, they aren't going to give you API access. Some sites, like Reddit does offer it I think, but the price is going to be insane, reflecting their unwillingness to interact with users in this way.

> Not "you" 'mrweasel personally

Understood, but thank you :-)

TeMPOraL 4/8/2025||
> It seems that we have very different types of businesses in mind. I really didn't consider tracking users and displaying ads, but I also don't think this is where these types of tools would be used.

I wasn't thinking primarily about tracking and ads here either, when it comes to B2B automation. What I meant was e.g. shops automatically scrapping competing stores on a continued basis, to adjust their own prices - a modern version of the old "send your employees incognito to the nearby stores and have them secretly note down prices". Then you also have comparison-shopping (pricing aggregators) sites that are after the same data, too.

And then of course there's automated reviews (reading and writing), trying to improve your standing and/or sabotage competition. There's all kinds of more or less legit business intelligence happening, etc. Then there's wholesale copying of sites (or just their data) for SEO content farms, and... I could go on.

Point being, it's not the people who want to streamline their own work, make access more convenient for themselves, etc. that are the badly-behaving actors and reasons for anti-bot defenses.

> If you have a sales funnel, as in you take orders and ship something to a customer, consumer or business, I almost guarantee you that you can request an API, if the company you want to purchase from is large enough. They'll probably give you the API access for free, or as part of a signup fee and give you access to discounts. Sometimes that API might be an email, or a monthly Excel dump, but it's an API.

The problem from a POV of a regular users like me is, I'm not in this for business directly; the services I use are either too small to bother providing me special APIs, or I am too small for them to care. All I need is to streamline my access patterns to services I already use, perhaps consolidate it with other services (that's what MCP is doing, with LLM being the glue), but otherwise not doing anything disruptive to their operations. And I'm denied that, because... Bots Bad, AI Bad, Also Pay Us For Privilege?

> When we're talking site that purely survive on tracking users and reselling their data, then yes, they aren't going to give you API access. Some sites, like Reddit does offer it I think, but the price is going to be insane, reflecting their unwillingness to interact with users in this way.

Reddit is an interesting case because the changes to their API and 3rd-party client policies happened recently, and clearly in response to the rise of LLMs. A lot of companies suddenly realized the vast troves of user-generated content they host are valuable beyond just building marketing profiles, and now they try to lock it all up in order to extort rent for it.

StevenNunez 4/7/2025||
I feel like I slept for a day and now MCPs are everywhere... I don't know what MCPs are and at this point I'm too afraid to ask.
oulipo 4/7/2025||
It's just a way to provide a "library of methods" / API that the LLM models can "call", so basically giving them method names, their parameters, the type of the output, and what they are for,

and then the LLM model will ask the MCP server to call the functions, check the result, call the next function if needed, etc

Right now if you go to ChatGPT you can't really tell it "open Google maps with my account, search for bike shops near NYC, and grab their phone numbers", because all he can do is reply in text or make images

with a "browser MCP" it is now possible: ChatGPT has a way to tell your browser "open Google maps", "show me a screenshot", "click at that position", etc

mattfrommars 4/8/2025|||
Isn't the idea of AI agent talking to each by telling LLM model to reply say in, JSON and with some parameter value map to, say function in Python code? That in retrospect, given context {prompt} to LLM will be able to call said function code?

Is this what 'calling' is?

oulipo 4/8/2025||
Yes exactly. MCP just formalize this a bit better
throwaway314155 4/7/2025||||
> with a "browser MCP" it is now possible: ChatGPT has a way to tell your browser "open Google maps", "show me a screenshot", "click at that position", etc

It seems strange to me to focus on this sort of standard well in advance of models being reliable enough to, ya know, actually be able perform these operations on behalf of the user with any sort of strong reliability that you would need for widespread adoption to be successful.

Cryptocurrency "if you build it they'll come" vibes.

taberiand 4/8/2025|||
I think MCPs compensate for the unreliability issue by providing a minimal and well defined interface to a controlled set of actions. That way, the llm doesn't have to be as reliable thinking what it needs to do and in acting, just in choosing what to do from a short list.
throwaway314155 4/8/2025||
You can provide an MCP for Pokemon Red, but Claude will still flounder for weeks, making absurd mistakes on a game literally designed for children.

Believe me. It's not there yet.

taberiand 4/8/2025||
Is there an MCP for pokemon red?
throwaway314155 4/8/2025||
Not that im aware of, but that actually would be an interesting project.

I was referring more broadly to ClaudePlaysPokemon, a twitch stream where claude is given tool calling into a Gameboy Color emulator in order to try to play Pokemon. It has slowly made progress and i recommend looking at the stream to see just how flawed LLM's are currently for even the shortest of timelines w.r.t. planning.

I compared the two because the tool calling API here is a similar enough to an MCP configuration with the same hooks/tools (happy to be corrected on that though)

acedTrex 4/7/2025|||
The speed that every major LLM foundational model provider has jumped on this bandwagon feels VERY artificial and astro turfy...
XCSme 4/7/2025||
Maybe because the LLM improvements haven't been that good in the last year, they needed some new thing to hype it/market it.

EDIT: Don't get me wrong, the benchmark scores are indeed higher, but in my personal experience, LLMs make as many mistakes as they did before, still too unreliable to use for cases where you actually need a factually correct answer.

acedTrex 4/8/2025||
This is in my opinion exactly what it is. A bunch of people throwing stuff at the wall trying to show "impact."
dimitri-vs 4/8/2025|||
You actually can, its called Operator and its a complete waste of time, just like 99% of agents/MCPs.
oulipo 4/8/2025||
Operator is basically MCP...
jastuk 4/7/2025|||
And the worst part is that it opens a pandora's box of potential exploits; https://elenacross7.medium.com/%EF%B8%8F-the-s-in-mcp-stands...
TeMPOraL 4/8/2025|||
That's not fault of MCP though, that's the fault of vendors peddling their MCPs while clinging to the SaaS model.

Yes, MCP is a way to streamline giving LLMs ability to run arbitrary code on your machine, however indirectly. It's meant to be used on "your side of the airlock", where you trust the things that run. Obviously it's too powerful for it to be used with third-party tools you neither trust nor control; it's not that different than downloading random binaries from the Internet.

I suppose it's good to spell out the risks, but it doesn't make sense blaming MCP itself, because those risks are fundamental aspects of the features it provides.

kmacdough 4/8/2025||
It's not blame, but it's a striking reality that needs to be kept at the forefront.

It introduces a substantial set of novel failure modes, like cross-tool shadowing, which aren't obvious to most folks. Making use of any externally developed tooling — even open source tools on internal architecture — requires more careful consideration and analysis than most would expect. Despite the warnings, there will certainly be major breaches on these lines.

joshwarwick15 4/7/2025||||
Most of these are not a real concern with remote servers with Oauth. If you install the PayPal MCP MCP server from im-deffo-not-hacking-you.com than https://mcp.paypal.com/sse its the same sec model as anything else online...

The article also reeks of LLM ironically

tuananh 4/8/2025||
it still is. if user has 1 bad tool, it's done!

https://invariantlabs.ai/blog/mcp-security-notification-tool...

joshwarwick15 4/8/2025||
Its the same security model as NPM/left pad yep, but consumers still use electron apps? It's a novel attack method, but its not a novel attack surface
halJordan 4/7/2025|||
At the risk of it sounding like i support theft; the automobile, you know, enabled the likes of Bonnie and Clyde and that whole era of lawlessness. Until the fbi and crossing county lines became a thing.

So im not sure id give up the sum total progress of the automobile just because the first decade was a bad one

orbital-decay 4/8/2025|||
MCP is a standard to plug useful tools into AI models so they can use them. The concept looks confusingly reversed and non-obvious to a normal person, although devs don't see this because it looks like their tooling.
hedgehog-ai 4/8/2025|||
I know what you mean, I think MCP is being widely adopted but it's not grassroots.. its a quick entry to this market by an established AI company trying to dominate the mind/market share of developers before consensus can be reached developers.
whalesalad 4/8/2025||
It’s RPC specifically for an LLM. But yes it’s the new soup de jour trend sweeping the globe.
andy_ppp 4/7/2025||
When I go to a shopping website I want to be able to tell my browser "hey please go through all the sideboards on this list and filter out for the ones that are larger than 155cm and smaller than 100cm, prioritise the ones with dark wood and space for vinyl records which are 31.43cm tall" for example.

Is there any browser that can do this yet as it seems extremely useful to be able to extract details from the page!

mfkhalil 4/7/2025||
Hey, we’re working on MatterRank which is pretty similar to this but currently works on web search. (e.g. I want to prioritize results that talk about X and have Y bias and I want to deprioritize those that are trying to sell me something). Feel free to try it out at https://matterrank.ai

Would also be interested in hearing more about what you’re envisioning for your use case. Are you thinking a browser extension that acts on sites you’re already on, or some sort of shopping aggregator that lets you do this, or something else entirely?

Niksko 4/8/2025||
Not OP but I definitely sympathise with them. I don't know how practical it is to implement or how profitable it would be, but the problem I often have is this: * I have something I want to buy and have specific needs for it (height, color, shape, other properties) * I know that there's a good chance the website I'm on sells a product that meets those needs (or possibly several such that I'd want to choose from) * my criteria are more specific than the filters available on the site e.g. I want a specific length down to a few cm because I want the biggest thing that will fit in a fixed space * crucially for an AI use case: the information exists on the individual product pages. They all list dimensions and specifications. I just don't want to have to go through them all.

Example: find me all of the desks on IKEA that come in light coloured wood, are 55 inches wide, and rank them from deepest to shallowest. Oh, and make sure they're in stock at my nearest IKEA, or are delivering within the next week.

unixfox 4/8/2025|||
You could do that with browser-use: https://browser-use.com/
bravura 4/7/2025||
When doing interior decoration, I am definitely interested in finding objects that fit very specific prompts.
neilellis 4/7/2025||
Well done, just tested on Claude Desktop and it worked smoothly and a lot less clunky than playwright. This is the right direction to go in.

I don't know if you've done it already, but it would be great to pause automation when you detect a captcha on the page and then notify the user that the automation needs attention. Playwright keeps trying to plough through captchas.

thenaturalist 4/7/2025||
Crazy, in looking up some info on the web and creating a Spreadsheet on Google Sheets to insert the results, it worked almost perfectly the first time and completely failed subsequently on 8-10 different tries.

Is there an issue with the lag between what is happening in the browser and the MCP app (in my case Claude Desktop)?

I have a feeling the first time I tried it, I was fast enough clicking the "Allow for this chat" permissions, whereas by the time I clicked the permission on subsequent chats, the LLM just reports "It seems we had an issue with the click. Let me try again with a different reference.".

Actions which worked flawlessly the first time (rename a Google spreadsheet by clicking on the title and inputting the name) fail 100% of subsequent attempts.

Same with identifying cells A1, B1, etc. and inserting into the rows.

Almost perfect on 1st try, not reproducible in 100% of attempts afterwards.

Kudos to how smooth this experience is though, very nice setup & execution!

EDIT 2: The lag & speed to click the allow action make it seemingly unusable in Claude Desktop. :(

otherayden 4/7/2025||
Such a rich UI like google sheets seems like a bad use case for such a general "browser automation" MCP server. Would be cool to see an MCP server like this, but with specific tools that let the LLM read and write to google sheets cells. I'm sure it would knock these tasks out of the park if it had a more specific abstraction instead of generally interacting with a webpage
mkummer 4/7/2025||
Agreed, I'd been working on a Google Sheets specific MCP last week – just got it published here: https://github.com/mkummer225/google-sheets-mcp
rahimnathwani 4/7/2025||
This is cool. You should submit this as a 'Show HN'.

Also consider publishing it so people can use it without having to use git.

freeone3000 4/8/2025||
Publishing it where? It can’t be a github page, it’s too complex; anything else incurs real costs.
rahimnathwani 4/8/2025||
I mean publish it on the npm registry (https://www.npmjs.com/signup). That way, it would be easy to install, just by adding some lines to claude_desktop_config.json:

  {
    "mcpServers": {
      "ragdocs": {
        "command": "npx",
        "args": [
          "-y",
          "@qpd-v/mcp-server-ragdocs"
        ],
        "env": {
          "QDRANT_URL": "http://127.0.0.1:6333",
          "EMBEDDING_PROVIDER": "ollama",
          "OLLAMA_URL": "http://localhost:11434"
        }
      },
     }
    }
  }
xingwu 4/10/2025|||
I have worked on a google sheets MCP, for data scraping it worked pretty well leveraging Claude's built-in search functionalities.

example: https://x.com/xing101/status/1903391600040083488 set up: https://github.com/xing5/mcp-google-sheets

throwaway314155 4/8/2025|||
What you're experiencing is commonly referred to as "luck". It's the same reason people consistently think newer versions of ChatGPT are nerfed in some way. In reality, people just got lucky originally and have unrealistic expectations based on this originally positive outcome.

There's no bug or glitch happening. It's just statistically unlikely to perform the action you wanted and you landed a good dice roll on your first turn.

weq 4/9/2025||
haha yeh as someone who has built automation for years i can agree with this. You cant just click on something in a script, you need to reliably click on something. As a user, its very easy for you to make adjustments like clicking twice on a link if it doesnt load in time. Thats pretty much what your automation suite needs to end up with. A series of a functions to emulate user actions. You then combine that together with your scripts to create reliable scripts that can run in different conditions. LLMs wont do that for you, u need to instruct them specifically.
lizardking 4/8/2025||
For me it can't click anywhere on google sheets. I get the following error

--Error: Cannot access a chrome-extension:// URL of different extension

nonethewiser 4/7/2025||
Stuff like this makes me giddy for manual tasks like reimbursement requests. Its such a chore (and it doesnt help our process isnt great).

Every month, go to service providers, log in, find and download statement, create google doc with details filled in, download it, write new email and upload all the files. Maybe double chek the attachments are right but that requires downloading them again instead of being able to view in email).

Automating this is already possible (and a real expense tracking app can eliminate about half of this work) but I think AI tools have the potential to elminate a lot of the nittier-grittier specification of it. This is especially important because these sorts of workflows are often subject to little changes.

doug_life 4/8/2025||
This may be obvious to most here, but you need Node.js installed for the MCP server to run. This critical detail is not in the set up instructions.
namukang 4/8/2025||
Added!

https://docs.browsermcp.io/setup-server#node-js

wetpaws 4/8/2025||
[dead]
serverlessmania 4/7/2025||
Did something similar but controls a hardware synth, allowing me to do sound design without touching the physical knobs: https://github.com/zerubeus/elektron-mcp
dmix 4/7/2025|
Oh good idea.

Imagine it controlling plugins remotely, have an LLM do mastering and sound shaping with existing tools. The complex overly-graphical UIs of VSTs might be a barrier to performance there, but you could hook into those labeled midi mapping interfaces to control the knobs and levels.

Gehinnn 4/7/2025|
Would be nice if it could use the Accessibility Tree from chrome dev tools to navigate the page instead of relying on screenshots (https://developer.chrome.com/blog/full-accessibility-tree)
mgraczyk 4/8/2025|
In fact you have it backwards. It has no screenshots at the moment, only the accessibility tree
More comments...