Mocking and Stubbing in MCP App Tests: Simulations, Fixtures, and Patterns (June 2026)

June 10, 2026 Abe Wheeler

MCP Apps MCP App Testing MCP App Framework ChatGPT Apps ChatGPT App Testing Claude Connectors Claude Connector Testing Claude Connector Framework

Testing MCP Apps with mocks, stubs, and simulation files.

Mocking an MCP App is different from mocking a normal React app because your UI is only one part of the system. The host chooses and calls a tool, the MCP server returns a result, the host renders a UI resource in a sandboxed iframe, and the app talks back to the host through the MCP Apps bridge.

That gives you more places where a bug can hide. It also gives you cleaner places to test. A good mock setup isolates the level you care about without pretending the whole protocol does not exist.

TL;DR: Use simulation files for deterministic app states, vi.mock("sunpeak") for fast resource component unit tests, vi.mock() on API modules for tool handler tests, and the inspector fixture from sunpeak/test for browser tests against replicated ChatGPT and Claude runtimes. Keep structuredContent, content, and _meta separate in fixtures because hosts and models treat them differently. Use live host tests only for the few flows that local simulations cannot prove.

What Changed Since Early 2026

The biggest testing change is that MCP Apps are now the portable path for interactive AI-host UI. The official MCP Apps specification describes the core pattern: a tool declares a UI resource with _meta.ui.resourceUri, the host fetches that resource, renders it in a sandboxed iframe, and communicates with it over JSON-RPC messages using postMessage.

OpenAI’s current Apps SDK reference points developers toward the MCP Apps standard bridge by default. ChatGPT still exposes window.openai for compatibility and ChatGPT-specific features, but the standard ui/* bridge is the better contract to design around when you want the same app to run across hosts.

That matters for mocks because you should test the shared contract first:

toolInput and partial input from the host
toolResult.structuredContent for app-readable data
toolResult.content for model-visible or transcript-visible content
toolResult._meta for widget-only data
tools/call for UI-triggered server tool calls
Display mode, theme, locale, viewport, and host context

The MCP extension support matrix tracks host support for MCP Apps. For developers, the practical takeaway is simple: write tests around the protocol-shaped data, then add host-specific tests only where your app uses host-specific features.

The Four Mock Boundaries

An MCP App has four boundaries you can mock. Pick one boundary per test.

Host runtime: ChatGPT, Claude, or another host renders your resource, sends tool data, enforces iframe sandboxing, and handles bridge calls.
MCP protocol contract: The server exposes tools and resources. Tool calls return content, structuredContent, _meta, isError, and annotations.
Tool handler: Your server-side function validates input, calls APIs, and shapes data for the UI and model.
External services: Databases, REST APIs, GraphQL clients, queues, storage, and vendor SDKs.

Do not mock all four in one test. If you mock the host, the protocol, the handler, and the API at once, the test only proves that your mock can render your mock. Instead, decide what you are trying to learn:

Can the resource render a known state? Use a simulation file or mocked hooks.
Does the tool handler return the shape the resource expects? Mock the API module, not the handler.
Does the app render correctly in a real iframe with host chrome? Use the inspector fixture.
Does the real host accept the server and display the app? Use a small live test suite.

Simulation Files Are Your Main Fixture Format

Simulation files are JSON fixtures for the host and protocol boundary. In a sunpeak project, put them in tests/simulations/ for project-wide states or next to a resource when the fixture only applies there. The inspector auto-discovers them and lets you switch states from the sidebar.

Here is a current simulation shape:

{
  "tool": "show-dashboard",
  "userMessage": "Show me the sales dashboard for Q1",
  "toolInput": {
    "quarter": "Q1",
    "year": 2026
  },
  "toolResult": {
    "content": [{ "type": "text", "text": "Dashboard loaded for Q1 2026." }],
    "structuredContent": {
      "quarter": "Q1",
      "year": 2026,
      "revenue": 142000,
      "deals": 47,
      "topProduct": "Enterprise Plan"
    },
    "_meta": {
      "traceId": "sim_q1_dashboard",
      "rawRows": 47
    }
  }
}

The fields map to the production contract:

tool references the tool filename without the extension.
userMessage gives the inspector conversation context.
toolInput is what the model or host sends to your tool.
toolResult.content is the text response the host can show in the transcript or feed to the model.
toolResult.structuredContent is the data your resource usually reads through useToolData().
toolResult._meta is for widget-only data that should stay out of model context.

That last split is easy to miss. If a field is needed for UI rendering but should not be model-visible, keep it in _meta. If the model needs to reason over it later, put it in structuredContent or summarize it in content.

Edge Cases Worth Turning Into Simulations

Write one simulation per meaningful state. Do not make one giant fixture with every edge case mixed together because it becomes hard to tell what failed.

Good MCP App simulation cases include:

Happy path with realistic production-shaped data
Empty arrays and zero values
Missing optional fields
Null values that the schema allows
Long names, long descriptions, and long unbroken strings
Large paginated data, not an unbounded dump
Unicode, right-to-left text, emoji, and special characters
Loading or partial-input states if your component handles streamed input
Tool errors with isError
Cancelled tool calls
Permission-denied and unauthenticated states
Host-specific states, only when your UI branches by host capability

For a dashboard, you might keep fixtures like this:

tests/simulations/
  show-dashboard-q1.json
  show-dashboard-empty.json
  show-dashboard-large-page.json
  show-dashboard-api-error.json
  show-dashboard-permission-denied.json

The file names become documentation. A teammate or agent can scan the directory and understand which states your app claims to support.

Mock Server Tools in Simulation Files

Interactive MCP Apps often call server tools from inside the UI. A purchase review app might call complete_purchase. A table might call load_next_page. A dashboard might call export_report.

Do not make those calls hit a real backend in routine UI tests. Mock them in the simulation:

{
  "tool": "review-purchase",
  "userMessage": "Buy the wireless headphones in my cart",
  "toolInput": {
    "cartId": "cart_abc123"
  },
  "toolResult": {
    "content": [{ "type": "text", "text": "Review this order before purchase." }],
    "structuredContent": {
      "title": "Confirm Your Order",
      "total": 79,
      "items": [{ "name": "Headphones", "price": 79 }]
    }
  },
  "serverTools": {
    "complete_purchase": [
      {
        "when": { "confirmed": true },
        "result": {
          "content": [{ "type": "text", "text": "Order confirmed." }],
          "structuredContent": { "orderId": "ORD-001", "status": "confirmed" }
        }
      },
      {
        "when": { "confirmed": false },
        "result": {
          "content": [{ "type": "text", "text": "Order cancelled." }],
          "structuredContent": { "status": "cancelled" }
        }
      }
    ]
  }
}

The inspector matches the when object against the arguments your UI passes through useCallServerTool(). This lets you test multi-step flows without a live payment provider, database, or queue.

JSON is a good interchange format, but hand-written JSON can drift. For complex apps, keep fixture builders in TypeScript and export JSON simulations from those builders. Use the same builders in unit tests.

// tests/fixtures/dashboard.ts
import { z } from 'zod';

export const dashboardSchema = z.object({
  quarter: z.string(),
  year: z.number(),
  revenue: z.number(),
  deals: z.number(),
  topProduct: z.string().nullable(),
});

export function dashboardFixture(
  overrides: Partial<z.infer<typeof dashboardSchema>> = {}
) {
  const data = {
    quarter: 'Q1',
    year: 2026,
    revenue: 142000,
    deals: 47,
    topProduct: 'Enterprise Plan',
    ...overrides,
  };

  return dashboardSchema.parse(data);
}

Then your component unit test and your tool handler test can both import dashboardFixture(). Your simulation files should be generated from or checked against the same schema. The goal is not to create a big fixture framework. The goal is to stop updating five copies of the same fake payload by hand.

Unit Test Resource Components With Hook Mocks

Use unit tests for component logic that does not need a browser: branching, formatting, disabled states, empty states, and data validation. Mock sunpeak’s hooks at the module boundary.

import { render, screen } from '@testing-library/react';
import { beforeEach, describe, expect, it, vi } from 'vitest';
import { DashboardResource } from './dashboard';

let mockOutput: Record<string, unknown> | null = null;
let mockInput: Record<string, unknown> | null = null;
let mockIsError = false;
let mockIsLoading = false;
let mockDisplayMode = 'inline';
const mockSetState = vi.fn();
const mockCallServerTool = vi.fn();

vi.mock('sunpeak', () => ({
  useToolData: () => ({
    input: mockInput,
    inputPartial: null,
    output: mockOutput,
    isError: mockIsError,
    isLoading: mockIsLoading,
    isCancelled: false,
    cancelReason: null,
  }),
  useAppState: () => [{}, mockSetState],
  useCallServerTool: () => mockCallServerTool,
  useDisplayMode: () => mockDisplayMode,
  useRequestDisplayMode: () => ({
    availableModes: ['inline', 'pip', 'fullscreen'],
    requestDisplayMode: vi.fn(),
  }),
  useHostInfo: () => ({
    hostVersion: undefined,
    hostCapabilities: { serverTools: true },
  }),
  SafeArea: ({ children }: { children: React.ReactNode }) => (
    <div data-testid="safe-area">{children}</div>
  ),
}));

describe('DashboardResource', () => {
  beforeEach(() => {
    vi.clearAllMocks();
    mockInput = { quarter: 'Q1', year: 2026 };
    mockOutput = { quarter: 'Q1', revenue: 142000, deals: 47 };
    mockIsError = false;
    mockIsLoading = false;
    mockDisplayMode = 'inline';
  });

  it('renders revenue', () => {
    render(<DashboardResource />);
    expect(screen.getByText('$142,000')).toBeInTheDocument();
  });

  it('renders an empty state', () => {
    mockOutput = { quarter: 'Q4', revenue: 0, deals: 0 };
    render(<DashboardResource />);
    expect(screen.getByText(/no deals/i)).toBeInTheDocument();
  });
});

Mock every hook your component calls. If you forget useDisplayMode() or useCallServerTool(), the test will fail for the wrong reason.

Test Loading, Error, and Cancelled States

MCP Apps have more transient states than a normal data card because the host can stream inputs, delay approval-gated input, return an error, or cancel a tool call. Those states deserve tests.

it('shows loading copy while output is unavailable', () => {
  mockOutput = null;
  mockIsLoading = true;

  render(<DashboardResource />);
  expect(screen.getByText(/loading dashboard/i)).toBeInTheDocument();
});

it('shows an error message when the tool fails', () => {
  mockOutput = null;
  mockIsError = true;

  render(<DashboardResource />);
  expect(screen.getByText(/could not load dashboard/i)).toBeInTheDocument();
});

If your UI reads partial input, test that separately. Partial input is preview data. Treat it as incomplete until the host sends final tool input or a tool result.

Mock the Standard Bridge Before Host Globals

For ChatGPT Apps, it is tempting to mock window.openai in every test because older examples used that global directly. That still has a place when you are testing ChatGPT-only capabilities such as file uploads or ChatGPT-specific extensions.

For portable MCP Apps, start with the standard contract instead:

useToolData() for tool input and output
useCallServerTool() for tools/call
useSendMessage() or the relevant action hook for follow-up messages
useUpdateModelContext() for model-visible UI state
useDisplayMode() and useRequestDisplayMode() for layout
useHostInfo() or capability hooks for feature detection

This keeps tests aligned with ChatGPT, Claude, and other MCP App hosts. Add window.openai mocks only in files that import ChatGPT-specific APIs.

Mock External APIs in Tool Handler Tests

Tool handlers are server-side functions. They should be tested close to production because they own schema validation, auth checks, API calls, result shaping, and errors.

Mock the API client, not the handler:

// tests/tools/search-tickets.test.ts
import { describe, expect, it, vi } from 'vitest';
import handler from '../../src/tools/search-tickets';

vi.mock('../../src/lib/api', () => ({
  searchTickets: vi.fn().mockResolvedValue([
    { id: 'TICK-1', title: 'Login broken', status: 'open', priority: 'high' },
    { id: 'TICK-2', title: 'Slow dashboard', status: 'in_progress', priority: 'medium' },
  ]),
}));

describe('search-tickets handler', () => {
  it('returns structuredContent with matching tickets', async () => {
    const result = await handler(
      { query: 'login', status: 'open' },
      {} as never
    );

    expect(result.structuredContent).toBeDefined();
    expect(result.structuredContent.tickets).toHaveLength(2);
    expect(result.content?.[0]?.type).toBe('text');
  });

  it('handles API errors', async () => {
    const { searchTickets } = await import('../../src/lib/api');
    vi.mocked(searchTickets).mockRejectedValueOnce(new Error('API timeout'));

    const result = await handler(
      { query: 'login', status: 'open' },
      {} as never
    );

    expect(result.isError).toBe(true);
  });
});

These tests should assert the protocol-level result, not just the raw data:

Does structuredContent match the resource schema?
Does content give the model a concise useful summary?
Does _meta avoid leaking hidden UI-only data into model context?
Does the handler set isError for recoverable failures?
Does validation reject bad input before calling the API?
Do tool annotations match the action, such as read-only versus destructive?

Add Contract Tests Between Tools and Resources

A common MCP App bug is a resource expecting tickets while the tool returns items. Unit tests can miss this if the resource mock uses the old shape.

Add a small contract test that calls the real handler with mocked services, validates the result with the resource schema, and then renders the resource with that exact output.

import { expect, test } from 'vitest';
import handler from '../../src/tools/search-tickets';
import { ticketListSchema } from '../../src/resources/tickets/schema';

test('search-tickets output matches TicketListResource input', async () => {
  const result = await handler({ query: 'login' }, {} as never);

  const parsed = ticketListSchema.safeParse(result.structuredContent);
  expect(parsed.success).toBe(true);
});

This test is small, but it catches the drift that makes mocks dangerous.

E2E Test With the Inspector Fixture

The inspector fixture from sunpeak/test renders your app in a real browser inside the local sunpeak inspector. It handles the host runtime, iframe traversal, and Playwright locator setup.

import { expect, test } from 'sunpeak/test';

test('dashboard renders revenue for Q1', async ({ inspector }) => {
  const result = await inspector.renderTool('show-dashboard', {
    quarter: 'Q1',
    year: 2026,
  });
  const app = result.app();

  await expect(app.getByText('$142,000')).toBeVisible();
});

Use E2E tests for behavior that needs a browser:

CSS layout in inline, PiP, and fullscreen modes
Dark and light themes
Safe area and viewport behavior
Button clicks that call server tools
Keyboard and focus behavior
File upload or download UI, if your host supports it
iframe sandbox restrictions

sunpeak’s testing framework can run these tests against replicated ChatGPT and Claude runtimes without connecting to either host. For existing MCP servers that are not built with sunpeak, use:

npx sunpeak test init --server http://localhost:8000/mcp
npx sunpeak test

That scaffolds tests around the server you already have.

Cover Host, Theme, and Display Mode

Display mode and theme bugs are common because the component is the same but the container changes.

import { expect, test } from 'sunpeak/test';

for (const displayMode of ['inline', 'pip', 'fullscreen'] as const) {
  test(`dashboard works in ${displayMode}`, async ({ inspector }) => {
    const result = await inspector.renderTool(
      'show-dashboard',
      { quarter: 'Q1' },
      { displayMode, theme: 'dark' }
    );

    await expect(result.app().getByTestId('dashboard')).toBeVisible();
  });
}

If your Playwright config uses defineConfig() from sunpeak/test/config, tests can run against both ChatGPT and Claude host projects. That gives you coverage for host chrome, CSS variables, and bridge behavior while keeping the test code host-agnostic.

When to Use Live Host Tests

Local mocks and simulations should handle most of your test suite. Live host tests are still useful, but they are too slow and account-dependent for every branch.

Save live tests for:

First connection to ChatGPT or Claude after deploy
OAuth and account linking
App submission or directory review flows
ChatGPT-only or Claude-only APIs
Real model tool selection behavior
Final pre-release smoke tests

Keep the live suite small. One live test per core resource is usually more useful than trying to mirror your whole local test matrix against production hosts.

Common Mistakes

Putting everything in structuredContent. Large hidden records, tokens, trace IDs, and UI-only backing data belong in _meta, not in model-visible structured data.

Only testing the happy path. Every resource should have fixtures for empty, error, cancelled, long text, and large data states. These states break more often than the happy path.

Mocking host globals for portable code. If your component only uses standard sunpeak hooks, mock the hooks. Save window.openai or host-specific mocks for host-specific modules.

Letting fixtures drift. Validate simulation files against schemas. Reuse fixture builders. Add a contract test between each tool handler and its resource.

Skipping server tool mocks. If the UI calls useCallServerTool(), simulate those responses. Otherwise your E2E test only covers the first screen.

Testing implementation details. Avoid asserting that setState was called with an exact object unless that is the behavior. Prefer user-visible assertions: the row appears, the button disables, the confirmation message renders.

A Practical Test Matrix

Most MCP App projects do well with this shape:

tests/
  fixtures/
    dashboard.ts                    # Shared fixture builders and schemas
  simulations/
    show-dashboard-q1.json          # Happy path
    show-dashboard-empty.json       # Empty data
    show-dashboard-large.json       # Large payload
    show-dashboard-error.json       # Tool error
    review-purchase.json            # serverTools mock
  e2e/
    dashboard.spec.ts               # Inspector fixture tests
    review.spec.ts                  # Multi-step server tool flow
  visual/
    dashboard.visual.spec.ts        # Theme and display mode screenshots
src/
  resources/
    dashboard/
      dashboard.test.tsx            # Hook mocks for component unit tests
      schema.ts                     # Resource input/output schema
  tools/
    show-dashboard.test.ts          # API client mocks and contract tests

Run the fast tests all the time:

pnpm test:unit
pnpm test:e2e

Run visual tests when UI changes:

pnpm test:visual

Run live tests before release:

pnpm test:live

Where sunpeak Fits

You can hand-roll most of this with Vitest, Playwright, an MCP client, and your own iframe host. That is fine for a narrow prototype.

For production MCP Apps, the repetitive work is the host test harness: starting the MCP server, loading simulations, rendering the resource in ChatGPT and Claude-like runtimes, switching display modes, crossing iframes, and running the same states in CI.

sunpeak handles that loop. The local MCP App Inspector can inspect any MCP server:

npx sunpeak inspect --server http://localhost:8000/mcp

The testing framework can scaffold tests for any server:

npx sunpeak test init --server http://localhost:8000/mcp

For new projects, start with:

npx sunpeak new

Then keep your mocks honest: protocol-shaped simulation files, schema-checked fixture builders, API-boundary stubs, and a small live suite for the real-host checks that matter.

Get Started

Documentation →


npx sunpeak new

Frequently Asked Questions

How do I mock MCP tool calls in tests?

Use simulation files for browser and E2E tests, and use vi.mock() for unit tests. A simulation file defines the tool name, user message, tool input, tool result, and optional server tool responses. Unit tests can mock sunpeak hooks such as useToolData, useAppState, useCallServerTool, and useDisplayMode so the resource component renders with controlled data.

What should an MCP App simulation file include?

A useful simulation includes the tool filename, userMessage, toolInput, and toolResult. The toolResult should include structuredContent for app data, content for model-visible text, and _meta for widget-only data that should not enter model context. Add serverTools when the app UI calls additional tools through the host bridge.

How do I mock ChatGPT App data without using window.openai?

Mock the standard MCP Apps data contract first. ChatGPT implements the MCP Apps UI bridge, including ui/notifications/tool-input, ui/notifications/tool-result, tools/call, ui/message, and ui/update-model-context. Framework hooks such as useToolData and useCallServerTool wrap that bridge, so your tests can mock the hooks rather than hardcoding ChatGPT-only globals.

How do I mock external API calls in MCP App tool handler tests?

Mock the module that talks to the API, database, or SDK, not the tool handler itself. In Vitest, vi.mock() the client module, set mockResolvedValue or mockRejectedValue per test, then assert that the handler returns the expected structuredContent, content, isError value, and annotations.

How do I test server tools called from an MCP App UI?

Add a serverTools object to the simulation file. Each key is a server tool name, and each value lists when conditions and results. The inspector matches the UI call arguments against the when object and returns the matching mock result, which lets you test confirm flows, pagination, edits, and retries without a live backend.

Can I test MCP Apps without a ChatGPT or Claude account?

Yes. sunpeak runs a local inspector that replicates ChatGPT and Claude app runtimes. You can load simulation files, switch hosts, themes, display modes, and device sizes, then run the same states in Playwright E2E and visual regression tests. No paid host account, tunnel, deployment, or AI credits are required for local inspector tests.

What is the difference between simulation tests and live host tests?

Simulation tests use controlled data in a local inspector, so they are fast, repeatable, and good for CI. Live host tests connect to a real host such as ChatGPT or Claude, so they are slower and should be saved for pre-release checks that only the real host can prove, such as account configuration, submission behavior, or host-specific UI extensions.

How do I keep MCP App mocks from drifting away from production data?

Create shared fixture builders, validate simulation files with the same Zod schemas used by the tool and resource, and add contract tests that call the real tool handler. Treat every simulation as a documented production state: happy path, empty state, loading, error, cancelled, large payload, long text, and permission-denied.