Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

What’s to stop an attacker from using prompt injection against this firewall? I don’t understand how your AI is anymore secure than the AI it’s protecting


I may be missing something, but in addition to this threat of prompt injection, you also have to trade trusting the arbitrary MCP server for trusting MCP Defender.

In the default mode, the app will interpose on the communication between, say, Claude, and a local MCP server. It will send the contents of the message (which may include the very sensitive information it is trying to protect) to a remote LLM, which you have to trust. The "scans" will be stored on a log on the server. Not to mention the potential extra delay for every MCP exchange?

This may be more secure, but is it really?


We'll be adding the ability to run MCP Defender through a local LLM soon, so using that approach no data will leave your computer to perform a scan.

Yes, there is a delay for MCP exchange, but I imagine that most MCP calls in the future will be done in "YOLO" mode where the user prompts a large task and an agent makes 1000's of MCP calls over hours to accomplish it. This would add some time to the overall task but IMO this is a small price to pay for added security. Also, the delay will decrease over time.


While Cursor and other apps can include security checks in their system prompt, MCP Defender provides an extra unified layer of security across all apps. Also, we're going to be adding the ability to have multiple models perform the scan in parallel so any prompt injection attack would have to work against all of the models you select.


It's turtles all the way down!


Obviously you run MCP Defender on traffic sent to MCP Defender to protect MCP Defender from prompt injection.


> What’s to stop an attacker from using prompt injection against this firewall?

Clearly you need a firewall-firewall.

..defense in depth?


We'll soon be adding the ability to have multiple models perform the scan in parallel, so any attack would have to bypass all of the models.


So literally a firewall-firewall?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: