Mocking and Stubbing in MCP App Tests: Simulations, Fixtures, and Patterns (June 2026)
Testing MCP Apps with mocks, stubs, and simulation files.
Mocking an MCP App is different from mocking a normal React app because your UI is only one part of the system. The host chooses and calls a tool, the MCP server returns a result, the host renders a UI resource in a sandboxed iframe, and the app talks back to the host through the MCP Apps bridge.
That gives you more places where a bug can hide. It also gives you cleaner places to test. A good mock setup isolates the level you care about without pretending the whole protocol does not exist.
TL;DR: Use simulation files for deterministic app states, vi.mock("sunpeak") for fast resource component unit tests, vi.mock() on API modules for tool handler tests, and the inspector fixture from sunpeak/test for browser tests against replicated ChatGPT and Claude runtimes. Keep structuredContent, content, and _meta separate in fixtures because hosts and models treat them differently. Use live host tests only for the few flows that local simulations cannot prove.
What Changed Since Early 2026
The biggest testing change is that MCP Apps are now the portable path for interactive AI-host UI. The official MCP Apps specification describes the core pattern: a tool declares a UI resource with _meta.ui.resourceUri, the host fetches that resource, renders it in a sandboxed iframe, and communicates with it over JSON-RPC messages using postMessage.
OpenAI’s current Apps SDK reference points developers toward the MCP Apps standard bridge by default. ChatGPT still exposes window.openai for compatibility and ChatGPT-specific features, but the standard ui/* bridge is the better contract to design around when you want the same app to run across hosts.
That matters for mocks because you should test the shared contract first:
toolInputand partial input from the hosttoolResult.structuredContentfor app-readable datatoolResult.contentfor model-visible or transcript-visible contenttoolResult._metafor widget-only datatools/callfor UI-triggered server tool calls- Display mode, theme, locale, viewport, and host context
The MCP extension support matrix tracks host support for MCP Apps. For developers, the practical takeaway is simple: write tests around the protocol-shaped data, then add host-specific tests only where your app uses host-specific features.
The Four Mock Boundaries
An MCP App has four boundaries you can mock. Pick one boundary per test.
- Host runtime: ChatGPT, Claude, or another host renders your resource, sends tool data, enforces iframe sandboxing, and handles bridge calls.
- MCP protocol contract: The server exposes tools and resources. Tool calls return
content,structuredContent,_meta,isError, and annotations. - Tool handler: Your server-side function validates input, calls APIs, and shapes data for the UI and model.
- External services: Databases, REST APIs, GraphQL clients, queues, storage, and vendor SDKs.
Do not mock all four in one test. If you mock the host, the protocol, the handler, and the API at once, the test only proves that your mock can render your mock. Instead, decide what you are trying to learn:
- Can the resource render a known state? Use a simulation file or mocked hooks.
- Does the tool handler return the shape the resource expects? Mock the API module, not the handler.
- Does the app render correctly in a real iframe with host chrome? Use the inspector fixture.
- Does the real host accept the server and display the app? Use a small live test suite.
Simulation Files Are Your Main Fixture Format
Simulation files are JSON fixtures for the host and protocol boundary. In a sunpeak project, put them in tests/simulations/ for project-wide states or next to a resource when the fixture only applies there. The inspector auto-discovers them and lets you switch states from the sidebar.
Here is a current simulation shape:
{
"tool": "show-dashboard",
"userMessage": "Show me the sales dashboard for Q1",
"toolInput": {
"quarter": "Q1",
"year": 2026
},
"toolResult": {
"content": [{ "type": "text", "text": "Dashboard loaded for Q1 2026." }],
"structuredContent": {
"quarter": "Q1",
"year": 2026,
"revenue": 142000,
"deals": 47,
"topProduct": "Enterprise Plan"
},
"_meta": {
"traceId": "sim_q1_dashboard",
"rawRows": 47
}
}
}
The fields map to the production contract:
toolreferences the tool filename without the extension.userMessagegives the inspector conversation context.toolInputis what the model or host sends to your tool.toolResult.contentis the text response the host can show in the transcript or feed to the model.toolResult.structuredContentis the data your resource usually reads throughuseToolData().toolResult._metais for widget-only data that should stay out of model context.
That last split is easy to miss. If a field is needed for UI rendering but should not be model-visible, keep it in _meta. If the model needs to reason over it later, put it in structuredContent or summarize it in content.
Edge Cases Worth Turning Into Simulations
Write one simulation per meaningful state. Do not make one giant fixture with every edge case mixed together because it becomes hard to tell what failed.
Good MCP App simulation cases include:
- Happy path with realistic production-shaped data
- Empty arrays and zero values
- Missing optional fields
- Null values that the schema allows
- Long names, long descriptions, and long unbroken strings
- Large paginated data, not an unbounded dump
- Unicode, right-to-left text, emoji, and special characters
- Loading or partial-input states if your component handles streamed input
- Tool errors with
isError - Cancelled tool calls
- Permission-denied and unauthenticated states
- Host-specific states, only when your UI branches by host capability
For a dashboard, you might keep fixtures like this:
tests/simulations/
show-dashboard-q1.json
show-dashboard-empty.json
show-dashboard-large-page.json
show-dashboard-api-error.json
show-dashboard-permission-denied.json
The file names become documentation. A teammate or agent can scan the directory and understand which states your app claims to support.
Mock Server Tools in Simulation Files
Interactive MCP Apps often call server tools from inside the UI. A purchase review app might call complete_purchase. A table might call load_next_page. A dashboard might call export_report.
Do not make those calls hit a real backend in routine UI tests. Mock them in the simulation:
{
"tool": "review-purchase",
"userMessage": "Buy the wireless headphones in my cart",
"toolInput": {
"cartId": "cart_abc123"
},
"toolResult": {
"content": [{ "type": "text", "text": "Review this order before purchase." }],
"structuredContent": {
"title": "Confirm Your Order",
"total": 79,
"items": [{ "name": "Headphones", "price": 79 }]
}
},
"serverTools": {
"complete_purchase": [
{
"when": { "confirmed": true },
"result": {
"content": [{ "type": "text", "text": "Order confirmed." }],
"structuredContent": { "orderId": "ORD-001", "status": "confirmed" }
}
},
{
"when": { "confirmed": false },
"result": {
"content": [{ "type": "text", "text": "Order cancelled." }],
"structuredContent": { "status": "cancelled" }
}
}
]
}
}
The inspector matches the when object against the arguments your UI passes through useCallServerTool(). This lets you test multi-step flows without a live payment provider, database, or queue.
Share Fixture Builders Between Tests
JSON is a good interchange format, but hand-written JSON can drift. For complex apps, keep fixture builders in TypeScript and export JSON simulations from those builders. Use the same builders in unit tests.
// tests/fixtures/dashboard.ts
import { z } from 'zod';
export const dashboardSchema = z.object({
quarter: z.string(),
year: z.number(),
revenue: z.number(),
deals: z.number(),
topProduct: z.string().nullable(),
});
export function dashboardFixture(
overrides: Partial<z.infer<typeof dashboardSchema>> = {}
) {
const data = {
quarter: 'Q1',
year: 2026,
revenue: 142000,
deals: 47,
topProduct: 'Enterprise Plan',
...overrides,
};
return dashboardSchema.parse(data);
}
Then your component unit test and your tool handler test can both import dashboardFixture(). Your simulation files should be generated from or checked against the same schema. The goal is not to create a big fixture framework. The goal is to stop updating five copies of the same fake payload by hand.
Unit Test Resource Components With Hook Mocks
Use unit tests for component logic that does not need a browser: branching, formatting, disabled states, empty states, and data validation. Mock sunpeak’s hooks at the module boundary.
import { render, screen } from '@testing-library/react';
import { beforeEach, describe, expect, it, vi } from 'vitest';
import { DashboardResource } from './dashboard';
let mockOutput: Record<string, unknown> | null = null;
let mockInput: Record<string, unknown> | null = null;
let mockIsError = false;
let mockIsLoading = false;
let mockDisplayMode = 'inline';
const mockSetState = vi.fn();
const mockCallServerTool = vi.fn();
vi.mock('sunpeak', () => ({
useToolData: () => ({
input: mockInput,
inputPartial: null,
output: mockOutput,
isError: mockIsError,
isLoading: mockIsLoading,
isCancelled: false,
cancelReason: null,
}),
useAppState: () => [{}, mockSetState],
useCallServerTool: () => mockCallServerTool,
useDisplayMode: () => mockDisplayMode,
useRequestDisplayMode: () => ({
availableModes: ['inline', 'pip', 'fullscreen'],
requestDisplayMode: vi.fn(),
}),
useHostInfo: () => ({
hostVersion: undefined,
hostCapabilities: { serverTools: true },
}),
SafeArea: ({ children }: { children: React.ReactNode }) => (
<div data-testid="safe-area">{children}</div>
),
}));
describe('DashboardResource', () => {
beforeEach(() => {
vi.clearAllMocks();
mockInput = { quarter: 'Q1', year: 2026 };
mockOutput = { quarter: 'Q1', revenue: 142000, deals: 47 };
mockIsError = false;
mockIsLoading = false;
mockDisplayMode = 'inline';
});
it('renders revenue', () => {
render(<DashboardResource />);
expect(screen.getByText('$142,000')).toBeInTheDocument();
});
it('renders an empty state', () => {
mockOutput = { quarter: 'Q4', revenue: 0, deals: 0 };
render(<DashboardResource />);
expect(screen.getByText(/no deals/i)).toBeInTheDocument();
});
});
Mock every hook your component calls. If you forget useDisplayMode() or useCallServerTool(), the test will fail for the wrong reason.
Test Loading, Error, and Cancelled States
MCP Apps have more transient states than a normal data card because the host can stream inputs, delay approval-gated input, return an error, or cancel a tool call. Those states deserve tests.
it('shows loading copy while output is unavailable', () => {
mockOutput = null;
mockIsLoading = true;
render(<DashboardResource />);
expect(screen.getByText(/loading dashboard/i)).toBeInTheDocument();
});
it('shows an error message when the tool fails', () => {
mockOutput = null;
mockIsError = true;
render(<DashboardResource />);
expect(screen.getByText(/could not load dashboard/i)).toBeInTheDocument();
});
If your UI reads partial input, test that separately. Partial input is preview data. Treat it as incomplete until the host sends final tool input or a tool result.
Mock the Standard Bridge Before Host Globals
For ChatGPT Apps, it is tempting to mock window.openai in every test because older examples used that global directly. That still has a place when you are testing ChatGPT-only capabilities such as file uploads or ChatGPT-specific extensions.
For portable MCP Apps, start with the standard contract instead:
useToolData()for tool input and outputuseCallServerTool()fortools/calluseSendMessage()or the relevant action hook for follow-up messagesuseUpdateModelContext()for model-visible UI stateuseDisplayMode()anduseRequestDisplayMode()for layoutuseHostInfo()or capability hooks for feature detection
This keeps tests aligned with ChatGPT, Claude, and other MCP App hosts. Add window.openai mocks only in files that import ChatGPT-specific APIs.
Mock External APIs in Tool Handler Tests
Tool handlers are server-side functions. They should be tested close to production because they own schema validation, auth checks, API calls, result shaping, and errors.
Mock the API client, not the handler:
// tests/tools/search-tickets.test.ts
import { describe, expect, it, vi } from 'vitest';
import handler from '../../src/tools/search-tickets';
vi.mock('../../src/lib/api', () => ({
searchTickets: vi.fn().mockResolvedValue([
{ id: 'TICK-1', title: 'Login broken', status: 'open', priority: 'high' },
{ id: 'TICK-2', title: 'Slow dashboard', status: 'in_progress', priority: 'medium' },
]),
}));
describe('search-tickets handler', () => {
it('returns structuredContent with matching tickets', async () => {
const result = await handler(
{ query: 'login', status: 'open' },
{} as never
);
expect(result.structuredContent).toBeDefined();
expect(result.structuredContent.tickets).toHaveLength(2);
expect(result.content?.[0]?.type).toBe('text');
});
it('handles API errors', async () => {
const { searchTickets } = await import('../../src/lib/api');
vi.mocked(searchTickets).mockRejectedValueOnce(new Error('API timeout'));
const result = await handler(
{ query: 'login', status: 'open' },
{} as never
);
expect(result.isError).toBe(true);
});
});
These tests should assert the protocol-level result, not just the raw data:
- Does
structuredContentmatch the resource schema? - Does
contentgive the model a concise useful summary? - Does
_metaavoid leaking hidden UI-only data into model context? - Does the handler set
isErrorfor recoverable failures? - Does validation reject bad input before calling the API?
- Do tool annotations match the action, such as read-only versus destructive?
Add Contract Tests Between Tools and Resources
A common MCP App bug is a resource expecting tickets while the tool returns items. Unit tests can miss this if the resource mock uses the old shape.
Add a small contract test that calls the real handler with mocked services, validates the result with the resource schema, and then renders the resource with that exact output.
import { expect, test } from 'vitest';
import handler from '../../src/tools/search-tickets';
import { ticketListSchema } from '../../src/resources/tickets/schema';
test('search-tickets output matches TicketListResource input', async () => {
const result = await handler({ query: 'login' }, {} as never);
const parsed = ticketListSchema.safeParse(result.structuredContent);
expect(parsed.success).toBe(true);
});
This test is small, but it catches the drift that makes mocks dangerous.
E2E Test With the Inspector Fixture
The inspector fixture from sunpeak/test renders your app in a real browser inside the local sunpeak inspector. It handles the host runtime, iframe traversal, and Playwright locator setup.
import { expect, test } from 'sunpeak/test';
test('dashboard renders revenue for Q1', async ({ inspector }) => {
const result = await inspector.renderTool('show-dashboard', {
quarter: 'Q1',
year: 2026,
});
const app = result.app();
await expect(app.getByText('$142,000')).toBeVisible();
});
Use E2E tests for behavior that needs a browser:
- CSS layout in inline, PiP, and fullscreen modes
- Dark and light themes
- Safe area and viewport behavior
- Button clicks that call server tools
- Keyboard and focus behavior
- File upload or download UI, if your host supports it
- iframe sandbox restrictions
sunpeak’s testing framework can run these tests against replicated ChatGPT and Claude runtimes without connecting to either host. For existing MCP servers that are not built with sunpeak, use:
npx sunpeak test init --server http://localhost:8000/mcp
npx sunpeak test
That scaffolds tests around the server you already have.
Cover Host, Theme, and Display Mode
Display mode and theme bugs are common because the component is the same but the container changes.
import { expect, test } from 'sunpeak/test';
for (const displayMode of ['inline', 'pip', 'fullscreen'] as const) {
test(`dashboard works in ${displayMode}`, async ({ inspector }) => {
const result = await inspector.renderTool(
'show-dashboard',
{ quarter: 'Q1' },
{ displayMode, theme: 'dark' }
);
await expect(result.app().getByTestId('dashboard')).toBeVisible();
});
}
If your Playwright config uses defineConfig() from sunpeak/test/config, tests can run against both ChatGPT and Claude host projects. That gives you coverage for host chrome, CSS variables, and bridge behavior while keeping the test code host-agnostic.
When to Use Live Host Tests
Local mocks and simulations should handle most of your test suite. Live host tests are still useful, but they are too slow and account-dependent for every branch.
Save live tests for:
- First connection to ChatGPT or Claude after deploy
- OAuth and account linking
- App submission or directory review flows
- ChatGPT-only or Claude-only APIs
- Real model tool selection behavior
- Final pre-release smoke tests
Keep the live suite small. One live test per core resource is usually more useful than trying to mirror your whole local test matrix against production hosts.
Common Mistakes
Putting everything in structuredContent. Large hidden records, tokens, trace IDs, and UI-only backing data belong in _meta, not in model-visible structured data.
Only testing the happy path. Every resource should have fixtures for empty, error, cancelled, long text, and large data states. These states break more often than the happy path.
Mocking host globals for portable code. If your component only uses standard sunpeak hooks, mock the hooks. Save window.openai or host-specific mocks for host-specific modules.
Letting fixtures drift. Validate simulation files against schemas. Reuse fixture builders. Add a contract test between each tool handler and its resource.
Skipping server tool mocks. If the UI calls useCallServerTool(), simulate those responses. Otherwise your E2E test only covers the first screen.
Testing implementation details. Avoid asserting that setState was called with an exact object unless that is the behavior. Prefer user-visible assertions: the row appears, the button disables, the confirmation message renders.
A Practical Test Matrix
Most MCP App projects do well with this shape:
tests/
fixtures/
dashboard.ts # Shared fixture builders and schemas
simulations/
show-dashboard-q1.json # Happy path
show-dashboard-empty.json # Empty data
show-dashboard-large.json # Large payload
show-dashboard-error.json # Tool error
review-purchase.json # serverTools mock
e2e/
dashboard.spec.ts # Inspector fixture tests
review.spec.ts # Multi-step server tool flow
visual/
dashboard.visual.spec.ts # Theme and display mode screenshots
src/
resources/
dashboard/
dashboard.test.tsx # Hook mocks for component unit tests
schema.ts # Resource input/output schema
tools/
show-dashboard.test.ts # API client mocks and contract tests
Run the fast tests all the time:
pnpm test:unit
pnpm test:e2e
Run visual tests when UI changes:
pnpm test:visual
Run live tests before release:
pnpm test:live
Where sunpeak Fits
You can hand-roll most of this with Vitest, Playwright, an MCP client, and your own iframe host. That is fine for a narrow prototype.
For production MCP Apps, the repetitive work is the host test harness: starting the MCP server, loading simulations, rendering the resource in ChatGPT and Claude-like runtimes, switching display modes, crossing iframes, and running the same states in CI.
sunpeak handles that loop. The local MCP App Inspector can inspect any MCP server:
npx sunpeak inspect --server http://localhost:8000/mcp
The testing framework can scaffold tests for any server:
npx sunpeak test init --server http://localhost:8000/mcp
For new projects, start with:
npx sunpeak new
Then keep your mocks honest: protocol-shaped simulation files, schema-checked fixture builders, API-boundary stubs, and a small live suite for the real-host checks that matter.
Get Started
npx sunpeak new
Further Reading
- MCP testing framework - sunpeak testing tools for any MCP server
- MCP App Inspector - local ChatGPT and Claude runtime replicas
- Complete guide to testing ChatGPT Apps and MCP Apps
- E2E testing MCP Apps - Playwright and the inspector fixture
- Unit testing MCP Apps, ChatGPT Apps, and Claude Connectors
- Testing MCP App data flow - content, structuredContent, _meta, and host bridge state
- MCP Apps specification - official protocol overview
- OpenAI Apps SDK reference - MCP Apps UI bridge and ChatGPT extensions
- MCP extension support matrix - current host support
Frequently Asked Questions
How do I mock MCP tool calls in tests?
Use simulation files for browser and E2E tests, and use vi.mock() for unit tests. A simulation file defines the tool name, user message, tool input, tool result, and optional server tool responses. Unit tests can mock sunpeak hooks such as useToolData, useAppState, useCallServerTool, and useDisplayMode so the resource component renders with controlled data.
What should an MCP App simulation file include?
A useful simulation includes the tool filename, userMessage, toolInput, and toolResult. The toolResult should include structuredContent for app data, content for model-visible text, and _meta for widget-only data that should not enter model context. Add serverTools when the app UI calls additional tools through the host bridge.
How do I mock ChatGPT App data without using window.openai?
Mock the standard MCP Apps data contract first. ChatGPT implements the MCP Apps UI bridge, including ui/notifications/tool-input, ui/notifications/tool-result, tools/call, ui/message, and ui/update-model-context. Framework hooks such as useToolData and useCallServerTool wrap that bridge, so your tests can mock the hooks rather than hardcoding ChatGPT-only globals.
How do I mock external API calls in MCP App tool handler tests?
Mock the module that talks to the API, database, or SDK, not the tool handler itself. In Vitest, vi.mock() the client module, set mockResolvedValue or mockRejectedValue per test, then assert that the handler returns the expected structuredContent, content, isError value, and annotations.
How do I test server tools called from an MCP App UI?
Add a serverTools object to the simulation file. Each key is a server tool name, and each value lists when conditions and results. The inspector matches the UI call arguments against the when object and returns the matching mock result, which lets you test confirm flows, pagination, edits, and retries without a live backend.
Can I test MCP Apps without a ChatGPT or Claude account?
Yes. sunpeak runs a local inspector that replicates ChatGPT and Claude app runtimes. You can load simulation files, switch hosts, themes, display modes, and device sizes, then run the same states in Playwright E2E and visual regression tests. No paid host account, tunnel, deployment, or AI credits are required for local inspector tests.
What is the difference between simulation tests and live host tests?
Simulation tests use controlled data in a local inspector, so they are fast, repeatable, and good for CI. Live host tests connect to a real host such as ChatGPT or Claude, so they are slower and should be saved for pre-release checks that only the real host can prove, such as account configuration, submission behavior, or host-specific UI extensions.
How do I keep MCP App mocks from drifting away from production data?
Create shared fixture builders, validate simulation files with the same Zod schemas used by the tool and resource, and add contract tests that call the real tool handler. Treat every simulation as a documented production state: happy path, empty state, loading, error, cancelled, large payload, long text, and permission-denied.