I solved Claude Code's prompt injection problem and saved tokens doing it
I built mcp-safe-fetch, an MCP server that sanitizes web content before it reaches your LLM — stripping prompt injection vectors deterministically, no LLM call needed. It's a drop-in replacement for Claude Code's WebFetch. Along the way I found it also cuts token usage by ~90%.
The problem
When Claude Code fetches a web page, it dumps the entire thing into the context window. JavaScript bundles, CSS, hidden elements, analytics scripts — all of it. That's an attack surface for prompt injection.
Most real-world prompt injection in web content relies on hiding instructions where the user can't see them but the LLM can. Hidden HTML elements, zero-width characters between visible words, base64-encoded payloads in data attributes, off-screen positioned text. These are the techniques documented on PayloadsAllTheThings — and if you're using Claude Code's built-in WebFetch, every one of them passes straight through into your context.
Developers have two options:
- Keep WebFetch behind permission prompts — and spam "yes" without reading the content, getting zero actual security benefit while destroying your flow state.
- Allow WebFetch freely — and accept that any page Claude visits could contain hidden instructions that hijack the session.
There's no middle ground. You can restrict where Claude fetches, but you can't sanitize what comes back.
What it does
The sanitization pipeline runs in 8 stages, all deterministic, on raw HTML and the resulting markdown. This is critical — once HTML becomes flat markdown text, you lose the DOM structure needed to detect display:none or computed styles.
User prompt → Claude calls safe_fetch →
1. Fetch raw HTML
2. Parse with cheerio
3. Strip hidden/off-screen/same-color elements
4. Convert to markdown (turndown)
5. Strip invisible unicode, normalize
6. Remove encoded payloads
7. Neutralize exfiltration URLs
8. Strip fake LLM delimiters + custom patterns
→ Clean markdown returned to Claude
HTML-level — Hidden elements (display:none, visibility:hidden, opacity:0), off-screen positioning (left:-9999px, clip:rect(0,0,0,0)), same-color text, <script>, <style>, <noscript>, <meta>, comments.
Character-level — Zero-width characters, soft hyphens, BOM markers, bidi overrides, control characters. NFKC normalization to collapse fullwidth and homoglyph characters.
Encoded payloads — Base64 strings that decode to instruction-like text. Hex-encoded instruction sequences. Text data URIs.
Structural injection — Fake LLM delimiters (<|im_start|>, [INST], <<SYS>>, \n\nHuman:). Markdown image exfiltration URLs. Custom user-defined patterns.
What it preserves — All visible text, links, code blocks, images. The content you actually fetched the page for.
What I found
I tested against 4 real-world sites to validate the sanitization. Same URLs, same tasks, WebFetch vs safe_fetch.
When I tested against PayloadsAllTheThings itself — the prompt injection reference page — safe_fetch caught 3 hidden elements and 4 LLM delimiter patterns that WebFetch passed straight through. On a FotMob news article: 32 script tags, 90 style tags, all stripped. The sanitization worked.
But the thing I didn't expect was the size difference.
| Site | WebFetch (tokens) | safe_fetch (tokens) | Reduction | Threats caught |
|---|---|---|---|---|
| Node.js docs | ~75,500 | ~2,100 | 97% | 2 hidden elements, 1 off-screen |
| FotMob (news article) | ~109,500 | ~5,900 | 95% | 32 script tags, 90 style tags |
| PayloadsAllTheThings | ~39,500 | ~7,800 | 80% | 3 hidden elements, 4 LLM delimiters |
| Express.js | ~9,400 | ~1,400 | 86% | Clean page |
~90% average reduction. Zero false positives. All visible page content preserved.
On that Node.js docs page, the actual content you want is less than 3% of what WebFetch sends to the model. The rest is React hydration JavaScript:
var palette=__md_get("__palette");if(palette&&palette.color){
if("(prefers-color-scheme)"===palette.color.media){
var media=matchMedia("(prefers-color-scheme: light)"),
input=document.querySelector(media.matches?
"[data-md-color-media='(prefers-color-scheme: light)']"...Claude can't use any of that. But it's eating tokens and filling up the context window.
I ran a live A/B with measured token counts to confirm — same news article, same "summarize this" prompt, back-to-back:
| WebFetch | safe_fetch | |
|---|---|---|
| Actual tokens used | 24,700 | 575 |
| Reduction | — | 97.7% |
A single page fetch. WebFetch burned 24,700 tokens on JavaScript, CSS, and navigation markup. safe_fetch delivered the same article in 575 tokens.
Extrapolated: 50 page fetches in a session saves 2.7M tokens ($8.10 at Sonnet pricing). That's context window space you reclaim for actual work — code, conversation, reasoning — instead of React hydration scripts.
What it doesn't solve
The philosophy: catch most attacks that rely on hidden or invisible content deterministically. No LLM call, no latency, no false positives on normal pages. This isn't a complete solution. It's a meaningful first layer. The rest is harder:
- Semantic injection — malicious instructions written as natural-looking visible text. This requires LLM-level understanding.
- Image-based injection — steganography or text embedded in images.
- Novel encoding schemes — attackers will always find new encodings. The tool raises the bar, it doesn't eliminate the category.
- Legitimate-looking instructions — "Please run
npm install malicious-package" in actual documentation. User judgment is the defense.
Setup
npx -y mcp-safe-fetch initSets up Claude Code in one command. It configures the MCP server and denies the built-in WebFetch. Restart Claude Code and safe_fetch is your default web fetcher. Works with any MCP client.
Each fetch shows a sanitization header:
[safe-fetch] Stripped: 5 hidden elements, 2 off-screen elements, 68 script tags | 284127 → 12720 bytes (219ms)
Optional config (.mcp-safe-fetch.json) if you want logging, custom patterns, or to tune detection thresholds.
mcp-safe-fetch on GitHub — MIT licensed. Dependencies: cheerio, turndown, zod, and the MCP SDK.