Show HN: MCP-Shield – Detect security issues in MCP servers

Posted by nick_wolf 4/15/2025

Show HN: MCP-Shield – Detect security issues in MCP servers(github.com)

I noticed the growing security concerns around MCP (https://news.ycombinator.com/item?id=43600192) and built an open source tool that can detect several patterns of tool poisoning attacks, exfiltration channels and cross-origin manipulations.

MCP-Shield scans your installed servers (Cursor, Claude Desktop, etc.) and shows what each tool is trying to do at the instruction level, beyond just the API surface. It catches hidden instructions that try to read sensitive files, shadow other tools' behavior, or exfiltrate data.

Example of what it detects:

- Hidden instructions attempting to access ~/.ssh/id_rsa

- Cross-origin manipulations between server that can redirect WhatsApp messages

- Tool shadowing that overrides behavior of other MCP tools

- Potential exfiltration channels through optional parameters

I've included clear examples of detection outputs in the README and multiple example vulnerabilities in the repo so you can see the kinds of things it catches.

This is an early version, but I'd appreciate feedback from the community, especially around detection patterns and false positives.

134 points | 39 commentspage 2

khafra 4/15/2025|

Nice! This is a much-needed space for security tooling, and I appreciate that you've put some thought into the new attack vectors. I also like the combination of signature-based analysis, and having an LLM do its own deep dive.

I expect a lot of people to refine the tool as they use it; one big challenge in maintaining the project is going to be incorporating pull requests that improve the prompt in different directions.

nick_wolf 4/15/2025|

Thanks for the kind words – really appreciate you taking the time to look it over and get what we're trying to do here.

Yeah, combining the regex/pattern checks with having Claude take a look felt like the right balance... catch the low-hanging fruit quickly but also get a deeper dive for the trickier stuff. Glad that resonates.

Maintaining the core prompt quality as people contribute improvements... that's going to be interesting. Keeping it effective and preventing it from becoming a kitchen sink of conflicting instructions will be key. Definitely something we'll need to figure out as we go.

deadbabe 4/15/2025||

Instead of bending over backwards to secure an MCP server why not just run it as an OS user with very limited minimal permissions?

stpedgwdgfhgdd 4/15/2025||

Suggestion: Integrate with https://kgateway.dev/

marcfisc 4/15/2025||

Cool work! Thanks for citing our (InvariantLabs) blog posts! I really like the identify-as feature!

We recently launched a similar tool ourselfs, called mcp-scan: https://github.com/invariantlabs-ai/mcp-scan

nick_wolf 4/15/2025|

Thanks! Glad identify-as makes sense. Your prior research was definitely valuable context, appreciate you putting that out there.

Checked out mcp-scan yesterday, nice work! Good to see more tools emerging for MCP security. Feels like these kinds of tools are essential right now for highlighting the risks. Long term, hopefully the insights gained push the protocol itself, or the big wrappers like Claude/Cursor, towards building in more robust, integrated verification deeper down as the ecosystem matures.

NicolaiS 4/15/2025||

Sorry, but this will never work very well.

The tool contains a bunch of "denylist regexes", i.e.

    `user (should not|must not|cannot) see`

But these can easily be bypassed. Any real security tool should use allowlists, but that is ofc much harder with natural languages.

MCP-Shield can also analyse using Claude, but that code contains an easy to exploit prompt injection: https://github.com/riseandignite/mcp-shield/blob/19de96efe5e...

pcwelder 4/15/2025||

Cool.

If I'm not wrong you don't detect prompt injection done in the tool results? Any plans for that?

nick_wolf 4/15/2025|

Hmm, yeah, that's a fair point. You're right, we're looking at the tool definitions – the descriptions, schemas, etc. – not the stuff that comes back after a tool runs.

It's tricky, because actually running the tools... that's where things get hairy. We'd have to invoke potentially untrusted code during a scan, figure out how to generate valid inputs for who-knows-what schemas, and deal with whatever side effects happen.

So, honestly, no solid plans for that right now. The focus is squarely on the static analysis side – what the server claims it can do. Trying to catch vulnerabilities in those definitions feels like the right scope for this particular tool.

I think that analyzing the actual results is more about a runtime concern. Like, something the client needs to be responsible for when it gets the data back, or maybe a different kind of monitoring tool altogether. Still feels like an open question where that kind of check really fits best. It's definitely a gap, though. Something to chew on.

bosky101 4/15/2025||

I'd like to remind you that tools is a json array to any modern llm inference api. That rather than returning text, tells you which function to call.

I'm all for abstraction of a level of indirection. But this is pushing things too far.

We now have an entire ecosystem, layers of unneeded engineering, cohorts of talent and capital going to create man in the middle servers that forces us to get this array from around the world + maintain a server with several gb of deps to get a json array that you should't trust.

2) It makes sense if every server has a tools.txt equivalent of their own swagger. Eg i would trust google photos to maintain and document their tools rather than the 10,000 MCP servers possibly alive for no reason and already out of date by the time you are done reading this comment. In addition to being over engineered, to trust a random server as a proxy never made any sense.

3) nobody wants to run servers. Can't find this meme, but found it here on HN several times.

Sorry but I would rather not wait a year for this industry to crash and burn and take down genai apps galore or worse, start leaking this data and your bills.

Kudos to document any security gaps though.

emsign 4/15/2025||

New! Snakeoil now AI enhanced

puliczek 4/15/2025||

Looking promising, need to scan my servers :) Just added your tool to https://github.com/Puliczek/awesome-mcp-security

nurettin 4/15/2025|

You install a service that gives access to a random language generator, then you try to secure it with a project that is literally a few hours old. This is like tripping over your own slippers.

More comments...