What’s to stop an attacker from using prompt injection against this firewall? I ...

rfonseca · 2025-05-31T06:04:42 1748671482

I may be missing something, but in addition to this threat of prompt injection, you also have to trade trusting the arbitrary MCP server for trusting MCP Defender.

In the default mode, the app will interpose on the communication between, say, Claude, and a local MCP server. It will send the contents of the message (which may include the very sensitive information it is trying to protect) to a remote LLM, which you have to trust. The "scans" will be stored on a log on the server. Not to mention the potential extra delay for every MCP exchange?

This may be more secure, but is it really?

gsundeep · 2025-05-31T06:50:35 1748674235

We'll be adding the ability to run MCP Defender through a local LLM soon, so using that approach no data will leave your computer to perform a scan.

Yes, there is a delay for MCP exchange, but I imagine that most MCP calls in the future will be done in "YOLO" mode where the user prompts a large task and an agent makes 1000's of MCP calls over hours to accomplish it. This would add some time to the overall task but IMO this is a small price to pay for added security. Also, the delay will decrease over time.

gsundeep · 2025-05-31T06:54:20 1748674460

While Cursor and other apps can include security checks in their system prompt, MCP Defender provides an extra unified layer of security across all apps. Also, we're going to be adding the ability to have multiple models perform the scan in parallel so any prompt injection attack would have to work against all of the models you select.

jimmcslim · 2025-05-31T08:27:48 1748680068

It's turtles all the way down!

dkersten · 2025-06-01T09:35:30 1748770530

Obviously you run MCP Defender on traffic sent to MCP Defender to protect MCP Defender from prompt injection.

quinnjh · 2025-05-31T04:41:51 1748666511

> What’s to stop an attacker from using prompt injection against this firewall?

Clearly you need a firewall-firewall.

..defense in depth?

gsundeep · 2025-05-31T07:43:20 1748677400

We'll soon be adding the ability to have multiple models perform the scan in parallel, so any attack would have to bypass all of the models.

superb_dev · 2025-05-31T20:26:46 1748723206

So literally a firewall-firewall?