Mocking and Stubbing in MCP App Tests: Simulations, Fixtures, and Patterns
Testing MCP Apps with mocks, stubs, and simulation files.
Testing MCP Apps means testing across multiple layers: the host runtime (ChatGPT or Claude), the MCP protocol, your tool handlers, your resource components, and whatever external APIs your tools call. Each layer needs its own mocking strategy. If you mock at the wrong level, your tests either take forever or miss real bugs.
TL;DR: Use simulation files for deterministic e2e states, vi.mock("sunpeak") for unit testing resource components, vi.mock() on your API modules for tool handler tests, and the inspector fixture from sunpeak/test for full e2e tests that mock the host runtime. All of this runs locally and in CI without paid accounts.
The Four Layers You Need to Mock
An MCP App has four layers that interact during a tool call, and each one needs a different mocking approach:
- Host runtime (ChatGPT or Claude renders your resource component in an iframe)
- MCP protocol (the host calls your tool, your tool returns structured content)
- Tool handler (your server-side function that processes arguments and calls APIs)
- External services (databases, third-party APIs, your own backend)
Testing all four layers together against real hosts is slow, expensive, and flaky. Mocking lets you isolate each layer and test it independently.
Simulation Files: Deterministic Tool States
Simulation files are the foundation of MCP App testing. They’re JSON files that define a complete tool invocation with controlled inputs and outputs, so your resource component always gets the same data.
Put them in tests/simulations/ (project-level) or src/resources/<name>/simulations/ (resource-level). sunpeak auto-discovers any *.json file in these directories.
Here’s a basic simulation file:
{
"tool": "show-dashboard",
"userMessage": "Show me the sales dashboard for Q1",
"toolInput": {
"quarter": "Q1",
"year": 2026
},
"toolResult": {
"content": [{ "type": "text", "text": "Dashboard loaded for Q1 2026" }],
"structuredContent": {
"quarter": "Q1",
"year": 2026,
"revenue": 142000,
"deals": 47,
"topProduct": "Enterprise Plan"
}
}
}
The fields map directly to the MCP protocol:
toolreferences a tool file in yoursrc/tools/directorytoolInputis what the host sends as tool argumentstoolResult.structuredContentis the data your resource component receives viauseToolData()userMessagegives the inspector conversational context for display
When you run pnpm dev, the inspector loads these simulations and lets you toggle between them in the UI. No tool handler runs. No external API gets called. You see exactly what your component renders for a given data shape.
Simulation Files for Edge Cases
The real value of simulation files is edge-case coverage. Create separate files for each state you need to test:
{
"tool": "show-dashboard",
"userMessage": "Show me the dashboard for a quarter with no data",
"toolInput": { "quarter": "Q4", "year": 2027 },
"toolResult": {
"structuredContent": {
"quarter": "Q4",
"year": 2027,
"revenue": 0,
"deals": 0,
"topProduct": null
}
}
}
Good edge cases to cover in simulation files:
- Empty arrays and zero values
- Null or missing optional fields
- Very long strings (product names, descriptions)
- Large data sets (100+ items in a list)
- Unicode and special characters
- Error content (the tool handler returned an error message)
Each simulation file becomes a test fixture you can reference in e2e tests and inspect visually during development.
Mocking Server Tools in Simulations
If your MCP App uses server tools (tools the resource component can call back to the server), you can mock those responses in simulation files too. Add a serverTools field:
{
"tool": "review-purchase",
"userMessage": "Buy some wireless headphones",
"toolInput": { "cartId": "cart_abc123", "items": [{ "name": "Headphones", "price": 79 }] },
"toolResult": {
"structuredContent": {
"title": "Confirm Your Order",
"total": 79,
"items": [{ "name": "Headphones", "price": 79 }]
}
},
"serverTools": {
"complete_purchase": [
{
"when": { "confirmed": true },
"result": {
"content": [{ "type": "text", "text": "Order confirmed" }],
"structuredContent": { "orderId": "ORD-001", "status": "confirmed" }
}
},
{
"when": { "confirmed": false },
"result": {
"content": [{ "type": "text", "text": "Order cancelled" }],
"structuredContent": { "status": "cancelled" }
}
}
]
}
}
The inspector matches the when object against the arguments your component passes to callServerTool(). This lets you test multi-step interactions (confirm/cancel flows, pagination, form submissions) without a running server.
Unit Testing Resource Components with vi.mock
Simulation files handle e2e states. For unit tests, you need to mock sunpeak’s React hooks directly so your component renders in happy-dom without the host runtime.
Here’s the pattern. At the top of your test file, replace the sunpeak module with mocks:
import { render, screen } from '@testing-library/react';
import { describe, it, expect, vi, beforeEach } from 'vitest';
import { DashboardResource } from './dashboard';
// Module-level mock state — change between tests
let mockToolOutput: Record<string, unknown> = {};
let mockState: Record<string, unknown> = {};
const mockSetState = vi.fn();
const mockCallServerTool = vi.fn();
vi.mock('sunpeak', () => ({
useToolData: () => ({
output: mockToolOutput,
input: null,
inputPartial: null,
isError: false,
isLoading: false,
isCancelled: false,
cancelReason: null,
}),
useAppState: () => [mockState, mockSetState],
useCallServerTool: () => mockCallServerTool,
useDisplayMode: () => 'inline',
useRequestDisplayMode: () => ({
requestDisplayMode: vi.fn(),
availableModes: ['inline', 'fullscreen'],
}),
useDeviceCapabilities: () => ({ hover: true, touch: false }),
useHostInfo: () => ({ hostVersion: undefined, hostCapabilities: { serverTools: true } }),
useUpdateModelContext: () => vi.fn(),
useTimeZone: () => 'America/New_York',
useLocale: () => 'en-US',
SafeArea: ({ children, ...props }: { children: React.ReactNode; [key: string]: unknown }) => (
<div data-testid="safe-area" {...props}>{children}</div>
),
}));
describe('DashboardResource', () => {
beforeEach(() => {
vi.clearAllMocks();
mockToolOutput = { quarter: 'Q1', revenue: 142000, deals: 47 };
mockState = {};
});
it('renders revenue', () => {
render(<DashboardResource />);
expect(screen.getByText('$142,000')).toBeInTheDocument();
});
it('handles zero revenue', () => {
mockToolOutput = { quarter: 'Q4', revenue: 0, deals: 0 };
render(<DashboardResource />);
expect(screen.getByText('$0')).toBeInTheDocument();
});
});
A few things to note about this pattern:
Mock every hook your component uses. If your component calls useDisplayMode() and you don’t mock it, the test crashes. Start with the full set of hooks and remove ones you don’t need.
Use module-level variables for mock return values. This lets you change the data between tests without redefining the entire mock. Set defaults in beforeEach and override in individual tests.
Mock SafeArea as a plain div. The real SafeArea component uses host-specific padding calculations. In unit tests, a div with a test ID is enough.
Testing Loading and Error States
Change the mock return values to test non-happy-path states:
it('shows loading spinner', () => {
// Override just the fields that matter
vi.mocked(useToolData).mockReturnValue({
output: null,
input: null,
inputPartial: null,
isError: false,
isLoading: true,
isCancelled: false,
cancelReason: null,
});
render(<DashboardResource />);
expect(screen.getByTestId('loading-spinner')).toBeInTheDocument();
});
it('shows error message', () => {
vi.mocked(useToolData).mockReturnValue({
output: null,
input: null,
inputPartial: null,
isError: true,
isLoading: false,
isCancelled: false,
cancelReason: null,
});
render(<DashboardResource />);
expect(screen.getByText(/something went wrong/i)).toBeInTheDocument();
});
Testing Display Mode Behavior
If your component renders differently in fullscreen vs. inline mode:
it('shows expanded view in fullscreen', () => {
vi.mocked(useDisplayMode).mockReturnValue('fullscreen');
mockToolOutput = { quarter: 'Q1', revenue: 142000, deals: 47, chart: true };
render(<DashboardResource />);
expect(screen.getByTestId('chart-container')).toBeInTheDocument();
});
it('hides chart in inline mode', () => {
vi.mocked(useDisplayMode).mockReturnValue('inline');
mockToolOutput = { quarter: 'Q1', revenue: 142000, deals: 47, chart: true };
render(<DashboardResource />);
expect(screen.queryByTestId('chart-container')).not.toBeInTheDocument();
});
Mocking External APIs in Tool Handler Tests
Tool handlers are server-side functions that run when the host calls your tool. They typically fetch data from external APIs or databases and return structuredContent for your resource component.
Mock the data layer, not the handler:
// tests/tools/search-tickets.test.ts
import { describe, it, expect, vi } from 'vitest';
import handler from '../../src/tools/search-tickets';
vi.mock('../../src/lib/api', () => ({
searchTickets: vi.fn().mockResolvedValue([
{ id: 'TICK-1', title: 'Login broken', status: 'open', priority: 'high' },
{ id: 'TICK-2', title: 'Slow dashboard', status: 'in_progress', priority: 'medium' },
]),
}));
describe('search-tickets handler', () => {
it('returns structuredContent with matching tickets', async () => {
const result = await handler(
{ query: 'login', status: 'open' },
{} as any // extras object — mock as needed
);
expect(result.structuredContent).toBeDefined();
expect(result.structuredContent.tickets).toHaveLength(2);
expect(result.structuredContent.tickets[0].id).toBe('TICK-1');
});
it('returns empty results for no matches', async () => {
const { searchTickets } = await import('../../src/lib/api');
vi.mocked(searchTickets).mockResolvedValueOnce([]);
const result = await handler(
{ query: 'nonexistent', status: 'open' },
{} as any
);
expect(result.structuredContent.tickets).toHaveLength(0);
});
it('handles API errors gracefully', async () => {
const { searchTickets } = await import('../../src/lib/api');
vi.mocked(searchTickets).mockRejectedValueOnce(new Error('API timeout'));
const result = await handler(
{ query: 'login', status: 'open' },
{} as any
);
expect(result.isError).toBe(true);
});
});
This pattern works for any external dependency: REST APIs, GraphQL clients, database queries, or SDK calls. Mock at the module boundary, not inside the handler.
What to Assert on Tool Handlers
Beyond basic return values, test these things in your tool handlers:
- structuredContent shape: Does the return value match what your resource component expects?
- Token size: The
structuredContentpayload has a 25,000 token limit. Test with large data sets to make sure you’re filtering or paginating before hitting the limit. - Error handling: What happens when the API returns a 500? When the response is malformed?
- Input validation: Does the handler reject bad arguments before making API calls?
- Annotations: If you’re submitting to the Claude Connector Directory, every tool needs
readOnlyHintordestructiveHint. Assert on annotations in a unit test.
E2E Testing with the inspector Fixture
The inspector fixture from sunpeak/test is the e2e equivalent of simulation files. It calls your tool, renders the resource component inside the sunpeak inspector in a real Chromium browser, and gives you a Playwright frame locator for assertions.
import { test, expect } from 'sunpeak/test';
test('dashboard renders revenue for Q1', async ({ inspector }) => {
const result = await inspector.renderTool('show-dashboard', {
quarter: 'Q1',
year: 2026,
});
const app = result.app();
await expect(app.locator('text=$142,000')).toBeVisible();
});
The inspector fixture mocks the host runtime. Your tool handler still runs for real (against whatever mocks you’ve set up in your test environment), but the host rendering, iframe sandboxing, and MCP protocol transport are all handled by the local inspector.
Testing Across Hosts and Display Modes
The inspector.renderTool() method accepts options for display mode and theme:
test('dashboard shows chart in fullscreen dark mode', async ({ inspector }) => {
const result = await inspector.renderTool('show-dashboard', { quarter: 'Q1' }, {
displayMode: 'fullscreen',
theme: 'dark',
});
const app = result.app();
await expect(app.locator('[data-testid="chart-container"]')).toBeVisible();
});
Tests automatically run against both ChatGPT and Claude hosts via Playwright projects. The defineConfig() from sunpeak/test/config sets up both host projects, so every test runs twice: once in the ChatGPT runtime and once in the Claude runtime. You don’t loop over hosts manually.
Choosing the Right Mock for the Job
Here’s when to use each approach:
Simulation files when you want to:
- Test resource component rendering with specific data shapes
- Cover edge cases (empty data, nulls, large payloads)
- Test server tool interactions (confirm/cancel flows)
- Visually inspect states in the inspector during development
- Share test fixtures between the inspector UI and e2e tests
vi.mock(“sunpeak”) when you want to:
- Unit test resource components fast (milliseconds, no browser)
- Test loading, error, and cancelled states
- Test display mode behavior
- Test component logic in isolation from the MCP protocol
- Run tests in happy-dom without Playwright
vi.mock() on API modules when you want to:
- Unit test tool handlers without hitting real APIs
- Test error handling for API failures
- Verify structuredContent shape and token size
- Test input validation
The inspector fixture when you want to:
- E2E test the full rendering pipeline in a real browser
- Test cross-host differences (ChatGPT vs. Claude rendering)
- Test iframe sandboxing behavior
- Test display mode transitions and theme switching
- Run visual regression tests with
--visual
Most MCP App projects use all four. Simulation files and vi.mock("sunpeak") cover the fast, frequent tests. The inspector fixture catches integration issues. Tool handler mocks keep your server logic honest.
Common Mistakes
Mocking too much. If you mock every dependency, your tests pass but don’t catch real bugs. Use simulation files and the inspector fixture for integration coverage. Save vi.mock() for the layers you actually need to isolate.
Not resetting mocks between tests. Call vi.clearAllMocks() in beforeEach. If one test sets mockToolOutput to an error state and the next test forgets to reset it, you get confusing failures.
Forgetting to mock all hooks. If your component calls useDisplayMode() and your mock doesn’t include it, the test crashes with a cryptic error. Start with the full set of sunpeak hooks and remove the ones your component doesn’t use.
Testing implementation instead of behavior. Don’t assert that mockSetState was called with specific arguments. Assert that the UI changed. If you click “Confirm” and the component should show “Order confirmed”, assert on the text, not the state update.
Hardcoding mock data that drifts from real data. When your API response shape changes, update your simulation files and mock data. Stale mocks are worse than no mocks because they give you false confidence.
Putting It Together
A well-tested MCP App project looks like this:
tests/
simulations/
show-dashboard-q1.json # Happy path
show-dashboard-empty.json # Zero values
show-dashboard-large.json # 100+ items
review-purchase.json # With serverTools mock
e2e/
dashboard.spec.ts # inspector fixture tests
review.spec.ts # Multi-step flow tests
src/
resources/
dashboard/
dashboard.test.tsx # vi.mock("sunpeak") unit tests
tools/
show-dashboard.test.ts # vi.mock() on API module
Run everything with pnpm test. Use pnpm test:unit or pnpm test:e2e to run them separately. Add pnpm test:visual for screenshot comparison. All of it works locally and in GitHub Actions CI/CD without paid host accounts or API credits.
sunpeak’s testing framework handles the hard parts: starting the dev server, setting up Playwright projects for both hosts, discovering simulation files, and providing the inspector fixture. You write the mocks and assertions.
Get Started
npx sunpeak new
Further Reading
- Complete guide to testing ChatGPT Apps and MCP Apps
- How to test Claude Connectors - unit tests, inspector, and CI/CD
- MCP App CI/CD - run your tests in GitHub Actions
- Live testing Claude Connectors and ChatGPT Apps with Playwright
- MCP App tutorial - build and test your first MCP App
- MCP App framework
- ChatGPT App framework
- Claude Connector framework
- Testing framework
Frequently Asked Questions
How do I mock MCP tool calls in tests?
For unit tests, use vi.mock() to replace sunpeak hooks like useToolData with functions that return controlled data. For e2e tests, use simulation files that define deterministic toolInput and toolResult values. The inspector fixture from sunpeak/test loads these simulations and renders your resource components with the mock data in a real browser.
What are simulation files in MCP App testing?
Simulation files are JSON files in your tests/simulations/ directory that define deterministic tool states. Each file specifies the tool name, a userMessage, toolInput (the arguments your tool receives), and toolResult (the structuredContent your resource component renders). sunpeak auto-discovers all *.json files in the simulations directory and uses them for both the inspector UI and e2e tests.
How do I mock external API calls in MCP App tool handler tests?
Use vi.mock() to replace your API client module with a mock that returns controlled responses. Import the mock, set return values with mockResolvedValue(), and assert that your tool handler returns the expected structuredContent. This keeps tool handler tests fast and deterministic without hitting real APIs.
How do I stub sunpeak hooks like useToolData and useAppState in unit tests?
Call vi.mock("sunpeak", () => ({ useToolData: () => ({ output: mockData, isError: false, isLoading: false }), useAppState: () => [mockState, mockSetState] })) at the top of your test file. Set module-level variables for your mock return values and change them between tests to cover different states.
Can I mock server tool responses in MCP App simulation files?
Yes. Add a serverTools field to your simulation file. Each key is a server tool name, and the value is an array of objects with a when condition and a result. The inspector matches the when condition against the tool arguments and returns the matching result, so you can test multi-step interactions without a real backend.
Do I need a ChatGPT or Claude account to test MCP Apps with mocks?
No. sunpeak runs a local inspector that replicates both the ChatGPT and Claude host runtimes. All mocking, from simulation files to the inspector fixture, runs locally without any paid subscriptions, API keys, or AI credits. Tests run the same way in CI/CD.
What is the inspector fixture in sunpeak testing?
The inspector fixture is a Playwright test fixture exported from sunpeak/test. It provides a renderTool() method that invokes a tool with mock arguments, renders the resource component inside the sunpeak inspector, and returns a frame locator for assertions. Tests automatically run against both ChatGPT and Claude hosts via Playwright projects.
How do I test error states and edge cases in MCP App resource components?
Create simulation files with edge-case data: empty arrays, null fields, very long strings, missing optional fields, or isError set to true. In unit tests, set your mock useToolData to return isError: true or isLoading: true to test loading and error states. Cover cancelled states by setting isCancelled: true.