All posts

Snapshot Testing MCP Apps, ChatGPT Apps, and Claude Connectors (June 2026)

Abe Wheeler
MCP Apps MCP App Testing MCP App Framework ChatGPT Apps ChatGPT App Testing ChatGPT App Framework Claude Connectors Claude Connector Testing Claude Connector Framework
Snapshot testing MCP App resource components and tool output.

Snapshot testing MCP App resource components and tool output.

MCP App resource components are React components, but the thing you are testing is bigger than a React tree. A tool returns content, structuredContent, and sometimes private _meta. The tool points at a ui:// resource. The host renders that resource in an iframe, then passes display mode, theme, safe area, and host context through the bridge.

Snapshot testing is useful because those contracts are easy to break with a small refactor. A renamed field in structuredContent, a missing _meta.ui.resourceUri, or an accidental loading-state change can keep an MCP App, ChatGPT App, or Claude Connector from rendering the data the user expects.

TL;DR: Use toMatchSnapshot() for resource component markup and toMatchInlineSnapshot() for small tool result contracts. Snapshot structuredContent, selected _meta, resource links, and important host states. Normalize timestamps and random IDs before asserting. Then use sunpeak E2E, visual, live-host, and eval tests for the behavior and browser rendering snapshots cannot cover.

What Changed Since the First Version

The core snapshot idea has not changed: render a stable output, save it, and fail when the output changes. The MCP App ecosystem has moved, though, so the better 2026 snapshot target is no longer just “React component HTML.”

As of the June 2026 refresh, the MCP Apps specification is the shared UI layer for MCP-compatible hosts. OpenAI’s current Apps SDK docs describe tool results in terms of structuredContent, content, and _meta, and the portable MCP Apps fields should be the source of truth for new cross-host apps. sunpeak now positions the local Inspector and test runner around replicated ChatGPT and Claude runtimes, simulation fixtures, visual regression tests, live host tests, and multi-model evals.

That means snapshot tests should protect the contracts between those layers:

  • The backend tool contract your model and UI consume.
  • The resource metadata that tells the host which UI to render.
  • The React resource markup for each meaningful state.
  • The host-state branches for display mode, theme, safe area, and host capabilities.

Snapshots are still a fast unit-test layer. They are most useful when they prove that the data and resource contract did not drift before you pay the cost of browser tests.

What to Snapshot in an MCP App

For most MCP Apps, snapshot testing works best as a small set of contract tests instead of one giant render snapshot. Start with these layers.

LayerSnapshotWhy it matters
Tool outputstructuredContent and selected _metaCatches data-shape changes before the resource breaks
Tool definitioninputSchema, outputSchema, annotations, _meta.ui.resourceUriCatches broken host wiring and missing resource links
Resource metadataURI, MIME type, CSP, permissions, visibilityCatches iframe and host bridge config drift
Resource componentFocused HTML subtreeCatches markup changes that affect the UI contract
Host statedisplay mode, theme, safe area, capabilitiesCatches branches that only render in specific hosts or modes

Do not snapshot everything just because you can. A snapshot should help a reviewer answer one question: “Did the contract change in a way we meant to change?”

Snapshot Testing Resource Components

Start with a focused resource component snapshot. Mock the sunpeak hooks, render the component, and snapshot the smallest meaningful subtree.

import { render } from '@testing-library/react';
import { beforeEach, describe, expect, it, vi } from 'vitest';
import { DashboardResource } from './dashboard';

let mockToolOutput: Record<string, unknown> = {};
let mockDisplayMode: 'inline' | 'fullscreen' | 'pip' = 'inline';

vi.mock('sunpeak', () => ({
  useToolData: () => ({
    output: mockToolOutput,
    input: null,
    inputPartial: null,
    isError: false,
    isLoading: false,
    isCancelled: false,
    cancelReason: null,
  }),
  useAppState: () => [{}, vi.fn()],
  useDisplayMode: () => mockDisplayMode,
  useRequestDisplayMode: () => ({
    availableModes: ['inline', 'fullscreen', 'pip'],
    requestDisplayMode: vi.fn(),
  }),
  useHostInfo: () => ({
    hostName: 'chatgpt',
    hostVersion: undefined,
    hostCapabilities: { serverTools: true },
  }),
  SafeArea: ({ children }: { children: React.ReactNode }) => <div>{children}</div>,
}));

describe('DashboardResource snapshots', () => {
  beforeEach(() => {
    vi.clearAllMocks();
    mockDisplayMode = 'inline';
    mockToolOutput = {
      quarter: 'Q2',
      year: 2026,
      revenue: 142000,
      deals: 47,
      topProduct: 'Enterprise Plan',
    };
  });

  it('renders dashboard summary', () => {
    const { container } = render(<DashboardResource />);
    const summary = container.querySelector('[data-testid="dashboard-summary"]');

    expect(summary).toMatchSnapshot();
  });

  it('renders empty state', () => {
    mockToolOutput = {
      quarter: 'Q2',
      year: 2026,
      revenue: 0,
      deals: 0,
      topProduct: null,
    };

    const { container } = render(<DashboardResource />);
    expect(container.querySelector('[data-testid="empty-state"]')).toMatchSnapshot();
  });
});

The first run writes a .snap file next to the test:

src/resources/dashboard/
  dashboard.tsx
  dashboard.test.tsx
  __snapshots__/
    dashboard.test.tsx.snap

On later runs, Vitest compares the current output to the saved snapshot. If someone changes a class, removes an element, or changes the copy, the test fails with a text diff.

- Snapshot  - 1
+ Received  + 1

  <section data-testid="dashboard-summary">
    <h2>Q2 2026</h2>
-   <span class="revenue">$142,000</span>
+   <span class="revenue-amount">$142,000</span>
    <p>47 deals</p>
  </section>

That kind of diff is useful. It tells you exactly what changed without launching a browser.

Snapshot Tool Results, Not Just Markup

The most valuable MCP App snapshots often live on the backend side. Your resource component depends on structuredContent. The host and model depend on content. The UI may depend on private _meta. If those fields drift, a pretty React snapshot will not save you.

Snapshot the tool result after removing fields that should change on every run:

import { describe, expect, it, vi } from 'vitest';
import handler from '../../src/tools/show-dashboard';

vi.mock('../../src/lib/api', () => ({
  getDashboardData: vi.fn().mockResolvedValue({
    generatedAt: '2026-06-17T12:05:02.331Z',
    revenue: 142000,
    deals: 47,
    topProduct: 'Enterprise Plan',
  }),
}));

function stableToolResult(result: Awaited<ReturnType<typeof handler>>) {
  return {
    content: result.content,
    structuredContent: {
      ...result.structuredContent,
      generatedAt: '<iso timestamp>',
    },
    meta: {
      resourceUri: result._meta?.ui?.resourceUri,
      visibility: result._meta?.ui?.visibility,
    },
  };
}

describe('show-dashboard tool result', () => {
  it('returns the UI contract', async () => {
    const result = await handler({ quarter: 'Q2', year: 2026 }, {} as any);

    expect(stableToolResult(result)).toMatchInlineSnapshot(`
      {
        "content": [
          {
            "text": "Dashboard for Q2 2026: $142,000 revenue across 47 deals.",
            "type": "text",
          },
        ],
        "meta": {
          "resourceUri": "ui://dashboard",
          "visibility": "model-and-app",
        },
        "structuredContent": {
          "deals": 47,
          "generatedAt": "<iso timestamp>",
          "quarter": "Q2",
          "revenue": 142000,
          "topProduct": "Enterprise Plan",
          "year": 2026,
        },
      }
    `);
  });
});

This catches changes that matter to MCP Apps:

  • structuredContent.revenue was renamed to amount.
  • The text fallback disappeared, so non-UI clients get a blank result.
  • _meta.ui.resourceUri points at the wrong resource.
  • _meta.ui.visibility changed and the resource can no longer call the tool it needs.

Pair this with a schema assertion when the tool declares an outputSchema. The schema tells you the value is valid. The snapshot tells you the reviewed contract changed.

it('matches output schema and reviewed snapshot', async () => {
  const result = await handler({ quarter: 'Q2', year: 2026 }, {} as any);

  expect(() => DashboardOutput.parse(result.structuredContent)).not.toThrow();
  expect(result.structuredContent).toMatchInlineSnapshot(`
    {
      "deals": 47,
      "quarter": "Q2",
      "revenue": 142000,
      "topProduct": "Enterprise Plan",
      "year": 2026,
    }
  `);
});

The tool result is only one side of the contract. A UI-capable MCP tool also needs to point at the resource the host should render. In portable MCP Apps, that usually means _meta.ui.resourceUri on the tool definition and text/html;profile=mcp-app on the resource.

Snapshot the small metadata object that wires the pieces together:

import { describe, expect, it } from 'vitest';
import { getToolDefinition, getResource } from '../test/mcp-introspection';

describe('dashboard MCP App wiring', () => {
  it('keeps tool and resource metadata stable', async () => {
    const tool = await getToolDefinition('show-dashboard');
    const resource = await getResource('ui://dashboard');

    expect({
      tool: {
        name: tool.name,
        title: tool.title,
        annotations: tool.annotations,
        outputSchema: tool.outputSchema,
        resourceUri: tool._meta?.ui?.resourceUri,
      },
      resource: {
        uri: resource.uri,
        mimeType: resource.mimeType,
        meta: resource._meta?.ui,
      },
    }).toMatchInlineSnapshot(`
      {
        "resource": {
          "meta": {
            "csp": {
              "connectDomains": [
                "https://api.example.com",
              ],
              "resourceDomains": [
                "https://cdn.example.com",
              ],
            },
            "domain": "https://dashboard.example.com",
          },
          "mimeType": "text/html;profile=mcp-app",
          "uri": "ui://dashboard",
        },
        "tool": {
          "annotations": {
            "openWorldHint": false,
            "readOnlyHint": true,
          },
          "name": "show-dashboard",
          "outputSchema": {
            "type": "object",
          },
          "resourceUri": "ui://dashboard",
          "title": "Show dashboard",
        },
      }
    `);
  });
});

This is also a good place to protect Claude Connector annotations. Directory review and host UX can depend on hints such as readOnlyHint, destructiveHint, and openWorldHint, so accidental changes should be visible in review.

Snapshot Display Modes and Host State

MCP Apps can render inline, fullscreen, or picture-in-picture. They also react to host, theme, safe-area insets, and host capabilities. A component may show a compact summary inline and a richer table in fullscreen. Snapshot those branches directly.

const cases = [
  { hostName: 'chatgpt', theme: 'light', displayMode: 'inline' },
  { hostName: 'chatgpt', theme: 'dark', displayMode: 'fullscreen' },
  { hostName: 'claude', theme: 'light', displayMode: 'inline' },
  { hostName: 'claude', theme: 'dark', displayMode: 'pip' },
] as const;

it.each(cases)('renders %o', ({ hostName, theme, displayMode }) => {
  mockHostInfo = { hostName, hostCapabilities: { serverTools: true } };
  mockTheme = theme;
  mockDisplayMode = displayMode;

  const { container } = render(<DashboardResource />);
  expect(container.querySelector('[data-testid="dashboard-shell"]')).toMatchSnapshot();
});

Keep the matrix intentional. You do not need to snapshot every permutation if most permutations produce the same markup. Use snapshots for branches that change the DOM. Use browser tests for CSS-only differences, safe-area layout, real focus behavior, and iframe sizing.

With sunpeak, the same state matrix can move up into E2E tests through the Inspector. Unit snapshots verify the React contract. Inspector tests verify the rendered iframe inside replicated ChatGPT and Claude runtimes.

Snapshot Loading, Error, Empty, and Cancelled States

Non-happy paths are easy to miss because they often require exact tool or host timing. Snapshot them once so they cannot disappear quietly.

import { useToolData } from 'sunpeak';

it('renders loading state', () => {
  vi.mocked(useToolData).mockReturnValue({
    output: null,
    input: null,
    inputPartial: null,
    isError: false,
    isLoading: true,
    isCancelled: false,
    cancelReason: null,
  });

  const { container } = render(<DashboardResource />);
  expect(container.querySelector('[data-testid="loading"]')).toMatchSnapshot();
});

it('renders cancelled state', () => {
  vi.mocked(useToolData).mockReturnValue({
    output: null,
    input: null,
    inputPartial: null,
    isError: false,
    isLoading: false,
    isCancelled: true,
    cancelReason: 'User cancelled the request',
  });

  const { container } = render(<DashboardResource />);
  expect(container.querySelector('[data-testid="cancelled"]')).toMatchSnapshot();
});

These tests are small, but they protect real user experience. A spinner, empty table, auth error, or cancelled state is still part of the app contract.

Normalize Nondeterministic Values

Snapshots fail when any serialized value changes. That is useful for reviewed output and painful for unstable output. Normalize or remove values that are expected to change.

Good candidates for normalization:

  • ISO timestamps.
  • Random IDs.
  • Request IDs and trace IDs.
  • Build hashes.
  • OAuth tokens, session IDs, and user-specific private data.
  • Relative ordering from APIs that do not guarantee order.

One pattern is to make a small serializer for the exact value you are snapshotting.

function stableDashboardSnapshot(result: DashboardResult) {
  return {
    ...result,
    generatedAt: '<iso timestamp>',
    requestId: '<request id>',
    rows: [...result.rows].sort((a, b) => a.id.localeCompare(b.id)),
  };
}

expect(stableDashboardSnapshot(result)).toMatchInlineSnapshot();

Do not hide real instability with too much normalization. If order matters to the resource component, do not sort it away. If an ID appears in the DOM and a click handler depends on it, test the stable contract that the UI actually needs.

When Snapshots Help

Snapshots are a good fit when the output is structured, reviewed, and cheap to serialize.

  • Complex resource markup with tables, nested cards, filters, or grouped data.
  • Tool handlers that return structuredContent used by the UI.
  • Metadata that links tools, resources, domains, CSP, and host permissions.
  • Display-mode branches where the DOM changes.
  • Loading, error, empty, and cancelled states.
  • Claude Connector annotations and other tool discovery fields.

They are less useful for tiny components. A component that renders one label is better covered by a normal assertion:

expect(screen.getByText('No results')).toBeInTheDocument();

They are also the wrong tool for visual bugs. A CSS change can break the layout while the HTML snapshot stays exactly the same. For that, use visual regression tests.

How Snapshots Fit in a sunpeak Test Suite

sunpeak gives you several test layers for MCP Apps, ChatGPT Apps, and Claude Connectors:

  • Unit tests for pure functions, tools, hooks, and resource components.
  • Snapshot tests for reviewed HTML, structuredContent, and metadata contracts.
  • Inspector E2E tests for user behavior in replicated ChatGPT and Claude runtimes.
  • Visual regression tests for screenshots across hosts, themes, display modes, and viewport sizes.
  • Live host tests when you need to confirm behavior in the real ChatGPT or Claude host.
  • Multi-model evals when you need to check whether models choose and call tools correctly.

Use snapshots early in that stack. They fail fast, often in milliseconds, and tell you whether a contract changed before a browser opens. Then use the Inspector and Playwright tests for behavior the snapshot cannot prove.

For example, a dashboard resource might use this split:

TestWhat it proves
Tool result snapshotstructuredContent still matches the reviewed UI contract
Resource metadata snapshotThe tool still points at ui://dashboard with the right CSP
Component snapshotThe summary DOM still has the expected structure
Inspector E2E testA user can open fullscreen and filter rows
Visual testThe table still fits in inline mode and dark theme
Live testThe deployed app still loads in the real host

That division keeps each test honest. If a text snapshot starts checking layout, it will miss the bug. If a browser test starts checking every JSON field, it becomes slow and hard to review.

Managing Snapshots in Practice

The long-term problem with snapshots is review discipline. They only help if people read the diff.

Update intentionally. When a snapshot fails, read the diff before running -u. If the change was expected, update the snapshot. If the change was accidental, fix the code.

Commit snapshots with the code change. A snapshot update should sit next to the tool, metadata, or resource change that caused it. Reviewers need both sides to judge the change.

Keep snapshots focused. Prefer the table, summary, or metadata object over the whole page. Smaller snapshots are easier to review and less likely to churn.

Delete stale snapshots. When you remove a test, its snapshot can remain in the .snap file. Run the invalid snapshot cleanup command periodically.

Name test states like a reviewer. renders dashboard is vague. renders inline empty state without actions tells the reviewer what contract the snapshot protects.

Running Snapshot Tests

Snapshot tests run as part of your unit test suite.

# Run all unit tests, including snapshots
pnpm test:unit

# Update snapshots after an intentional change
pnpm test:unit -- -u

# Clean up orphaned snapshots
pnpm test:unit -- --clearInvalidSnapshots

In CI/CD, pnpm test should run the unit layer before slower E2E and visual tests. That gives you a fast failure when a contract changed, then broader coverage when the contract is still stable. The MCP App CI/CD guide covers the full pipeline.

Snapshot testing is not the whole MCP App testing strategy. It is the cheap contract layer that keeps structuredContent, metadata, and resource markup from drifting. Use it with sunpeak simulation fixtures, mocks, Inspector E2E tests, and visual regression tests, and you get fast reviewable diffs plus real browser confidence.

Get Started

Documentation →
npx sunpeak new

Further Reading

Frequently Asked Questions

What is snapshot testing for MCP Apps?

Snapshot testing serializes the output of your MCP App resource component, tool handler, resource metadata, or rendered HTML and compares it to a saved baseline. In MCP Apps, snapshots are useful because a small change to structuredContent, _meta.ui.resourceUri, display mode handling, or iframe markup can break the UI even when the tool still returns data.

What should I snapshot test in an MCP App?

Start with three things: resource component markup, tool handler structuredContent, and the metadata that connects a tool to a UI resource. Add snapshots for loading, error, empty, cancelled, inline, fullscreen, picture-in-picture, light theme, and dark theme states when those states change the output.

Should MCP App snapshots include structuredContent or _meta?

Snapshot structuredContent when your resource component depends on its exact shape. Snapshot _meta when it contains app-only data, resource links, CSP, visibility, or host bridge configuration. Do not snapshot secrets, OAuth tokens, request IDs, timestamps, randomized IDs, or user-specific private data.

How do I keep MCP App snapshots from becoming noisy?

Snapshot the smallest output that proves the contract, normalize nondeterministic fields before asserting, and avoid whole-page snapshots for simple components. A useful snapshot tells a reviewer what changed. A noisy snapshot makes every refactor look risky.

What is the difference between snapshot testing and visual regression testing for MCP Apps?

Snapshot testing compares serialized HTML, JSON, or metadata as text. Visual regression testing compares screenshots in a real browser. Snapshots are fast and catch structural or data-shape changes. Visual regression tests catch CSS, layout, theme, safe-area, and rendering bugs that text snapshots cannot see.

Can I snapshot test ChatGPT Apps and Claude Connectors the same way?

Yes. ChatGPT Apps and interactive Claude Connectors both use MCP App-style resources, tool results, and host bridge state. Keep the core snapshots host-neutral, then add host-specific snapshots only when ChatGPT or Claude changes the resource output, metadata, or available display modes.

How do I update snapshots after an intentional MCP App change?

Run your unit snapshot command with -u, usually pnpm test:unit -- -u. Review the diff before committing it. The code change and the matching snapshot update should be reviewed together so accidental structuredContent, metadata, or markup changes do not slip through.

Do snapshot tests replace MCP App E2E tests?

No. Snapshot tests are a fast contract layer. They do not prove that user clicks work, that the iframe renders correctly inside a host, or that CSS holds up across themes and display modes. Use snapshots with E2E tests, visual regression tests, live host tests, and tool-calling evals.