All posts

Mocking and Stubbing in MCP App Tests: Simulations, Fixtures, and Patterns

Abe Wheeler
MCP Apps MCP App Testing MCP App Framework ChatGPT Apps ChatGPT App Testing Claude Connectors Claude Connector Testing Claude Connector Framework
Testing MCP Apps with mocks, stubs, and simulation files.

Testing MCP Apps with mocks, stubs, and simulation files.

Testing MCP Apps means testing across multiple layers: the host runtime (ChatGPT or Claude), the MCP protocol, your tool handlers, your resource components, and whatever external APIs your tools call. Each layer needs its own mocking strategy. If you mock at the wrong level, your tests either take forever or miss real bugs.

TL;DR: Use simulation files for deterministic e2e states, vi.mock("sunpeak") for unit testing resource components, vi.mock() on your API modules for tool handler tests, and the inspector fixture from sunpeak/test for full e2e tests that mock the host runtime. All of this runs locally and in CI without paid accounts.

The Four Layers You Need to Mock

An MCP App has four layers that interact during a tool call, and each one needs a different mocking approach:

  1. Host runtime (ChatGPT or Claude renders your resource component in an iframe)
  2. MCP protocol (the host calls your tool, your tool returns structured content)
  3. Tool handler (your server-side function that processes arguments and calls APIs)
  4. External services (databases, third-party APIs, your own backend)

Testing all four layers together against real hosts is slow, expensive, and flaky. Mocking lets you isolate each layer and test it independently.

Simulation Files: Deterministic Tool States

Simulation files are the foundation of MCP App testing. They’re JSON files that define a complete tool invocation with controlled inputs and outputs, so your resource component always gets the same data.

Put them in tests/simulations/ (project-level) or src/resources/<name>/simulations/ (resource-level). sunpeak auto-discovers any *.json file in these directories.

Here’s a basic simulation file:

{
  "tool": "show-dashboard",
  "userMessage": "Show me the sales dashboard for Q1",
  "toolInput": {
    "quarter": "Q1",
    "year": 2026
  },
  "toolResult": {
    "content": [{ "type": "text", "text": "Dashboard loaded for Q1 2026" }],
    "structuredContent": {
      "quarter": "Q1",
      "year": 2026,
      "revenue": 142000,
      "deals": 47,
      "topProduct": "Enterprise Plan"
    }
  }
}

The fields map directly to the MCP protocol:

  • tool references a tool file in your src/tools/ directory
  • toolInput is what the host sends as tool arguments
  • toolResult.structuredContent is the data your resource component receives via useToolData()
  • userMessage gives the inspector conversational context for display

When you run pnpm dev, the inspector loads these simulations and lets you toggle between them in the UI. No tool handler runs. No external API gets called. You see exactly what your component renders for a given data shape.

Simulation Files for Edge Cases

The real value of simulation files is edge-case coverage. Create separate files for each state you need to test:

{
  "tool": "show-dashboard",
  "userMessage": "Show me the dashboard for a quarter with no data",
  "toolInput": { "quarter": "Q4", "year": 2027 },
  "toolResult": {
    "structuredContent": {
      "quarter": "Q4",
      "year": 2027,
      "revenue": 0,
      "deals": 0,
      "topProduct": null
    }
  }
}

Good edge cases to cover in simulation files:

  • Empty arrays and zero values
  • Null or missing optional fields
  • Very long strings (product names, descriptions)
  • Large data sets (100+ items in a list)
  • Unicode and special characters
  • Error content (the tool handler returned an error message)

Each simulation file becomes a test fixture you can reference in e2e tests and inspect visually during development.

Mocking Server Tools in Simulations

If your MCP App uses server tools (tools the resource component can call back to the server), you can mock those responses in simulation files too. Add a serverTools field:

{
  "tool": "review-purchase",
  "userMessage": "Buy some wireless headphones",
  "toolInput": { "cartId": "cart_abc123", "items": [{ "name": "Headphones", "price": 79 }] },
  "toolResult": {
    "structuredContent": {
      "title": "Confirm Your Order",
      "total": 79,
      "items": [{ "name": "Headphones", "price": 79 }]
    }
  },
  "serverTools": {
    "complete_purchase": [
      {
        "when": { "confirmed": true },
        "result": {
          "content": [{ "type": "text", "text": "Order confirmed" }],
          "structuredContent": { "orderId": "ORD-001", "status": "confirmed" }
        }
      },
      {
        "when": { "confirmed": false },
        "result": {
          "content": [{ "type": "text", "text": "Order cancelled" }],
          "structuredContent": { "status": "cancelled" }
        }
      }
    ]
  }
}

The inspector matches the when object against the arguments your component passes to callServerTool(). This lets you test multi-step interactions (confirm/cancel flows, pagination, form submissions) without a running server.

Unit Testing Resource Components with vi.mock

Simulation files handle e2e states. For unit tests, you need to mock sunpeak’s React hooks directly so your component renders in happy-dom without the host runtime.

Here’s the pattern. At the top of your test file, replace the sunpeak module with mocks:

import { render, screen } from '@testing-library/react';
import { describe, it, expect, vi, beforeEach } from 'vitest';
import { DashboardResource } from './dashboard';

// Module-level mock state — change between tests
let mockToolOutput: Record<string, unknown> = {};
let mockState: Record<string, unknown> = {};
const mockSetState = vi.fn();
const mockCallServerTool = vi.fn();

vi.mock('sunpeak', () => ({
  useToolData: () => ({
    output: mockToolOutput,
    input: null,
    inputPartial: null,
    isError: false,
    isLoading: false,
    isCancelled: false,
    cancelReason: null,
  }),
  useAppState: () => [mockState, mockSetState],
  useCallServerTool: () => mockCallServerTool,
  useDisplayMode: () => 'inline',
  useRequestDisplayMode: () => ({
    requestDisplayMode: vi.fn(),
    availableModes: ['inline', 'fullscreen'],
  }),
  useDeviceCapabilities: () => ({ hover: true, touch: false }),
  useHostInfo: () => ({ hostVersion: undefined, hostCapabilities: { serverTools: true } }),
  useUpdateModelContext: () => vi.fn(),
  useTimeZone: () => 'America/New_York',
  useLocale: () => 'en-US',
  SafeArea: ({ children, ...props }: { children: React.ReactNode; [key: string]: unknown }) => (
    <div data-testid="safe-area" {...props}>{children}</div>
  ),
}));

describe('DashboardResource', () => {
  beforeEach(() => {
    vi.clearAllMocks();
    mockToolOutput = { quarter: 'Q1', revenue: 142000, deals: 47 };
    mockState = {};
  });

  it('renders revenue', () => {
    render(<DashboardResource />);
    expect(screen.getByText('$142,000')).toBeInTheDocument();
  });

  it('handles zero revenue', () => {
    mockToolOutput = { quarter: 'Q4', revenue: 0, deals: 0 };
    render(<DashboardResource />);
    expect(screen.getByText('$0')).toBeInTheDocument();
  });
});

A few things to note about this pattern:

Mock every hook your component uses. If your component calls useDisplayMode() and you don’t mock it, the test crashes. Start with the full set of hooks and remove ones you don’t need.

Use module-level variables for mock return values. This lets you change the data between tests without redefining the entire mock. Set defaults in beforeEach and override in individual tests.

Mock SafeArea as a plain div. The real SafeArea component uses host-specific padding calculations. In unit tests, a div with a test ID is enough.

Testing Loading and Error States

Change the mock return values to test non-happy-path states:

it('shows loading spinner', () => {
  // Override just the fields that matter
  vi.mocked(useToolData).mockReturnValue({
    output: null,
    input: null,
    inputPartial: null,
    isError: false,
    isLoading: true,
    isCancelled: false,
    cancelReason: null,
  });
  render(<DashboardResource />);
  expect(screen.getByTestId('loading-spinner')).toBeInTheDocument();
});

it('shows error message', () => {
  vi.mocked(useToolData).mockReturnValue({
    output: null,
    input: null,
    inputPartial: null,
    isError: true,
    isLoading: false,
    isCancelled: false,
    cancelReason: null,
  });
  render(<DashboardResource />);
  expect(screen.getByText(/something went wrong/i)).toBeInTheDocument();
});

Testing Display Mode Behavior

If your component renders differently in fullscreen vs. inline mode:

it('shows expanded view in fullscreen', () => {
  vi.mocked(useDisplayMode).mockReturnValue('fullscreen');
  mockToolOutput = { quarter: 'Q1', revenue: 142000, deals: 47, chart: true };
  render(<DashboardResource />);
  expect(screen.getByTestId('chart-container')).toBeInTheDocument();
});

it('hides chart in inline mode', () => {
  vi.mocked(useDisplayMode).mockReturnValue('inline');
  mockToolOutput = { quarter: 'Q1', revenue: 142000, deals: 47, chart: true };
  render(<DashboardResource />);
  expect(screen.queryByTestId('chart-container')).not.toBeInTheDocument();
});

Mocking External APIs in Tool Handler Tests

Tool handlers are server-side functions that run when the host calls your tool. They typically fetch data from external APIs or databases and return structuredContent for your resource component.

Mock the data layer, not the handler:

// tests/tools/search-tickets.test.ts
import { describe, it, expect, vi } from 'vitest';
import handler from '../../src/tools/search-tickets';

vi.mock('../../src/lib/api', () => ({
  searchTickets: vi.fn().mockResolvedValue([
    { id: 'TICK-1', title: 'Login broken', status: 'open', priority: 'high' },
    { id: 'TICK-2', title: 'Slow dashboard', status: 'in_progress', priority: 'medium' },
  ]),
}));

describe('search-tickets handler', () => {
  it('returns structuredContent with matching tickets', async () => {
    const result = await handler(
      { query: 'login', status: 'open' },
      {} as any // extras object — mock as needed
    );

    expect(result.structuredContent).toBeDefined();
    expect(result.structuredContent.tickets).toHaveLength(2);
    expect(result.structuredContent.tickets[0].id).toBe('TICK-1');
  });

  it('returns empty results for no matches', async () => {
    const { searchTickets } = await import('../../src/lib/api');
    vi.mocked(searchTickets).mockResolvedValueOnce([]);

    const result = await handler(
      { query: 'nonexistent', status: 'open' },
      {} as any
    );

    expect(result.structuredContent.tickets).toHaveLength(0);
  });

  it('handles API errors gracefully', async () => {
    const { searchTickets } = await import('../../src/lib/api');
    vi.mocked(searchTickets).mockRejectedValueOnce(new Error('API timeout'));

    const result = await handler(
      { query: 'login', status: 'open' },
      {} as any
    );

    expect(result.isError).toBe(true);
  });
});

This pattern works for any external dependency: REST APIs, GraphQL clients, database queries, or SDK calls. Mock at the module boundary, not inside the handler.

What to Assert on Tool Handlers

Beyond basic return values, test these things in your tool handlers:

  • structuredContent shape: Does the return value match what your resource component expects?
  • Token size: The structuredContent payload has a 25,000 token limit. Test with large data sets to make sure you’re filtering or paginating before hitting the limit.
  • Error handling: What happens when the API returns a 500? When the response is malformed?
  • Input validation: Does the handler reject bad arguments before making API calls?
  • Annotations: If you’re submitting to the Claude Connector Directory, every tool needs readOnlyHint or destructiveHint. Assert on annotations in a unit test.

E2E Testing with the inspector Fixture

The inspector fixture from sunpeak/test is the e2e equivalent of simulation files. It calls your tool, renders the resource component inside the sunpeak inspector in a real Chromium browser, and gives you a Playwright frame locator for assertions.

import { test, expect } from 'sunpeak/test';

test('dashboard renders revenue for Q1', async ({ inspector }) => {
  const result = await inspector.renderTool('show-dashboard', {
    quarter: 'Q1',
    year: 2026,
  });
  const app = result.app();

  await expect(app.locator('text=$142,000')).toBeVisible();
});

The inspector fixture mocks the host runtime. Your tool handler still runs for real (against whatever mocks you’ve set up in your test environment), but the host rendering, iframe sandboxing, and MCP protocol transport are all handled by the local inspector.

Testing Across Hosts and Display Modes

The inspector.renderTool() method accepts options for display mode and theme:

test('dashboard shows chart in fullscreen dark mode', async ({ inspector }) => {
  const result = await inspector.renderTool('show-dashboard', { quarter: 'Q1' }, {
    displayMode: 'fullscreen',
    theme: 'dark',
  });
  const app = result.app();

  await expect(app.locator('[data-testid="chart-container"]')).toBeVisible();
});

Tests automatically run against both ChatGPT and Claude hosts via Playwright projects. The defineConfig() from sunpeak/test/config sets up both host projects, so every test runs twice: once in the ChatGPT runtime and once in the Claude runtime. You don’t loop over hosts manually.

Choosing the Right Mock for the Job

Here’s when to use each approach:

Simulation files when you want to:

  • Test resource component rendering with specific data shapes
  • Cover edge cases (empty data, nulls, large payloads)
  • Test server tool interactions (confirm/cancel flows)
  • Visually inspect states in the inspector during development
  • Share test fixtures between the inspector UI and e2e tests

vi.mock(“sunpeak”) when you want to:

  • Unit test resource components fast (milliseconds, no browser)
  • Test loading, error, and cancelled states
  • Test display mode behavior
  • Test component logic in isolation from the MCP protocol
  • Run tests in happy-dom without Playwright

vi.mock() on API modules when you want to:

  • Unit test tool handlers without hitting real APIs
  • Test error handling for API failures
  • Verify structuredContent shape and token size
  • Test input validation

The inspector fixture when you want to:

  • E2E test the full rendering pipeline in a real browser
  • Test cross-host differences (ChatGPT vs. Claude rendering)
  • Test iframe sandboxing behavior
  • Test display mode transitions and theme switching
  • Run visual regression tests with --visual

Most MCP App projects use all four. Simulation files and vi.mock("sunpeak") cover the fast, frequent tests. The inspector fixture catches integration issues. Tool handler mocks keep your server logic honest.

Common Mistakes

Mocking too much. If you mock every dependency, your tests pass but don’t catch real bugs. Use simulation files and the inspector fixture for integration coverage. Save vi.mock() for the layers you actually need to isolate.

Not resetting mocks between tests. Call vi.clearAllMocks() in beforeEach. If one test sets mockToolOutput to an error state and the next test forgets to reset it, you get confusing failures.

Forgetting to mock all hooks. If your component calls useDisplayMode() and your mock doesn’t include it, the test crashes with a cryptic error. Start with the full set of sunpeak hooks and remove the ones your component doesn’t use.

Testing implementation instead of behavior. Don’t assert that mockSetState was called with specific arguments. Assert that the UI changed. If you click “Confirm” and the component should show “Order confirmed”, assert on the text, not the state update.

Hardcoding mock data that drifts from real data. When your API response shape changes, update your simulation files and mock data. Stale mocks are worse than no mocks because they give you false confidence.

Putting It Together

A well-tested MCP App project looks like this:

tests/
  simulations/
    show-dashboard-q1.json          # Happy path
    show-dashboard-empty.json       # Zero values
    show-dashboard-large.json       # 100+ items
    review-purchase.json            # With serverTools mock
  e2e/
    dashboard.spec.ts               # inspector fixture tests
    review.spec.ts                  # Multi-step flow tests
src/
  resources/
    dashboard/
      dashboard.test.tsx            # vi.mock("sunpeak") unit tests
  tools/
    show-dashboard.test.ts          # vi.mock() on API module

Run everything with pnpm test. Use pnpm test:unit or pnpm test:e2e to run them separately. Add pnpm test:visual for screenshot comparison. All of it works locally and in GitHub Actions CI/CD without paid host accounts or API credits.

sunpeak’s testing framework handles the hard parts: starting the dev server, setting up Playwright projects for both hosts, discovering simulation files, and providing the inspector fixture. You write the mocks and assertions.

Get Started

Documentation →
npx sunpeak new

Further Reading

Frequently Asked Questions

How do I mock MCP tool calls in tests?

For unit tests, use vi.mock() to replace sunpeak hooks like useToolData with functions that return controlled data. For e2e tests, use simulation files that define deterministic toolInput and toolResult values. The inspector fixture from sunpeak/test loads these simulations and renders your resource components with the mock data in a real browser.

What are simulation files in MCP App testing?

Simulation files are JSON files in your tests/simulations/ directory that define deterministic tool states. Each file specifies the tool name, a userMessage, toolInput (the arguments your tool receives), and toolResult (the structuredContent your resource component renders). sunpeak auto-discovers all *.json files in the simulations directory and uses them for both the inspector UI and e2e tests.

How do I mock external API calls in MCP App tool handler tests?

Use vi.mock() to replace your API client module with a mock that returns controlled responses. Import the mock, set return values with mockResolvedValue(), and assert that your tool handler returns the expected structuredContent. This keeps tool handler tests fast and deterministic without hitting real APIs.

How do I stub sunpeak hooks like useToolData and useAppState in unit tests?

Call vi.mock("sunpeak", () => ({ useToolData: () => ({ output: mockData, isError: false, isLoading: false }), useAppState: () => [mockState, mockSetState] })) at the top of your test file. Set module-level variables for your mock return values and change them between tests to cover different states.

Can I mock server tool responses in MCP App simulation files?

Yes. Add a serverTools field to your simulation file. Each key is a server tool name, and the value is an array of objects with a when condition and a result. The inspector matches the when condition against the tool arguments and returns the matching result, so you can test multi-step interactions without a real backend.

Do I need a ChatGPT or Claude account to test MCP Apps with mocks?

No. sunpeak runs a local inspector that replicates both the ChatGPT and Claude host runtimes. All mocking, from simulation files to the inspector fixture, runs locally without any paid subscriptions, API keys, or AI credits. Tests run the same way in CI/CD.

What is the inspector fixture in sunpeak testing?

The inspector fixture is a Playwright test fixture exported from sunpeak/test. It provides a renderTool() method that invokes a tool with mock arguments, renders the resource component inside the sunpeak inspector, and returns a frame locator for assertions. Tests automatically run against both ChatGPT and Claude hosts via Playwright projects.

How do I test error states and edge cases in MCP App resource components?

Create simulation files with edge-case data: empty arrays, null fields, very long strings, missing optional fields, or isError set to true. In unit tests, set your mock useToolData to return isError: true or isLoading: true to test loading and error states. Cover cancelled states by setting isCancelled: true.