MCP App Conformance Testing for ChatGPT Apps and Codex Connectors
Conformance testing verifies that MCP App tools, resources, and host bridge behavior are wired correctly.
Most MCP App test plans jump straight to browser automation. That makes sense once the app renders, but it skips the first failure point: can the host discover the app, read the resource, trust the metadata, and get useful fallback content if the UI does not render?
That is the job of conformance testing. It is the fast test layer that proves your MCP App surface is wired correctly before you spend time on E2E tests, visual regression tests, or live host testing.
TL;DR: Add conformance tests that call tools/list, read every ui:// resource referenced by an app tool, validate text/html;profile=mcp-app, check tool schemas and annotations, call each tool with fixture input, and verify the result has useful content plus valid structuredContent. Run these tests in CI before slower rendered tests. Conformance tests do not prove the whole app works, but they catch the broken wiring that prevents ChatGPT Apps and Codex Connectors from loading at all.
What Conformance Testing Means
For MCP Apps, conformance testing means verifying your server exposes the app contract that MCP hosts expect.
The MCP Apps overview describes the core pattern: a tool declares a UI resource, the host fetches that resource, and the resource renders interactive HTML inside a sandboxed iframe. The same app can still return normal MCP tool output, so clients that do not support app UI should get useful text.
A conformance test does not click every button or compare screenshots. It answers narrower questions:
- Can the host list the tool?
- Does the tool have a useful name, title, description, input schema, and annotations?
- Does the tool point at a readable
ui://resource? - Does that resource return HTML with the MCP App MIME type?
- Does the resource declare the CSP, permissions, and metadata it needs?
- Does the tool return valid
content,structuredContent, and_meta? - Does the app degrade cleanly when the client does not render UI?
Those checks sound basic, but they catch a lot of production bugs. A renamed resource folder, missing outputSchema, broken annotation, stale ui:// URI, or empty fallback message can make an app fail before your React code ever runs.
The Conformance Checklist
Use this checklist as the first test file in an MCP App project.
| Layer | What to verify | Why it fails |
|---|---|---|
| Tool discovery | Tool appears in tools/list with stable name, title, description, input schema, and annotations | Missing export, renamed file, bad build output |
| UI resource link | UI-capable tool points at a ui:// resource | _meta.ui.resourceUri drifted from the real resource name |
| Resource read | resources/read returns one HTML resource | Host can list the tool but cannot fetch the iframe |
| MIME type | Resource uses text/html;profile=mcp-app | Host treats the resource as plain HTML or a generic resource |
| Resource metadata | CSP, permissions, border hints, and other _meta.ui fields are present when needed | External assets, tool calls, or host framing fail at runtime |
| Tool result | Tool returns useful content, valid structuredContent, and UI-only _meta | UI renders, but the model or fallback client gets bad data |
| Graceful fallback | Non-UI clients get a short text summary | The app works only in UI hosts |
| Host bridge smoke | Resource initializes without console errors in each target host mode | window bridge assumptions break before user interaction |
Run conformance tests before deeper tests. If conformance fails, an E2E failure will usually be noisy and less direct.
Test Tool Discovery First
Start with tools/list. Every host discovers your MCP server through tool metadata, so missing or vague metadata is the fastest thing to catch.
import { test, expect } from 'sunpeak/test';
test('app tools are discoverable and documented', async ({ mcp }) => {
const tools = await mcp.listTools();
const appTools = tools.filter((tool) => tool._meta?.ui?.resourceUri);
expect(appTools.length).toBeGreaterThan(0);
for (const tool of appTools) {
expect(tool.name).toMatch(/^[a-z0-9_-]+$/);
expect(tool.title ?? tool.annotations?.title).toBeTruthy();
expect(tool.description.length).toBeGreaterThan(20);
expect(tool.inputSchema?.type).toBe('object');
expect(tool.annotations).toBeDefined();
}
});
This test catches three common problems:
- The tool never registered because a file was renamed or not exported.
- The tool exists but has a thin description, so the model has poor discovery context.
- The tool has no annotations, which makes host review and confirmation behavior harder to reason about.
Pair this with deeper tool annotation tests for readOnlyHint, destructiveHint, idempotentHint, and openWorldHint.
Validate UI Resource Links
The next check is the ui:// resource. In the MCP Apps flow, a UI-capable tool declares a UI resource URI. The host fetches that URI before or after the tool call, depending on the host and app state.
If the resource URI is stale, your tool can still work as a backend MCP tool while the app UI fails to load.
test('ui tools reference readable MCP App resources', async ({ mcp }) => {
const tools = await mcp.listTools();
const appTools = tools.filter((tool) => tool._meta?.ui?.resourceUri);
for (const tool of appTools) {
const uri = tool._meta.ui.resourceUri;
expect(uri, `${tool.name} resource URI`).toMatch(/^ui:\/\//);
const resource = await mcp.readResource(uri);
expect(resource.contents).toHaveLength(1);
const html = resource.contents[0];
expect(html.mimeType).toBe('text/html;profile=mcp-app');
expect(html.text).toContain('<html');
}
});
The exact helper names vary by test runner, but the shape is the same: discover tools, collect resource URIs, read each resource, and assert the MIME type plus body.
This test belongs in CI because it protects against ordinary refactors. A developer can rename src/resources/report/report.tsx to src/resources/dashboard/dashboard.tsx and forget to update the tool metadata. The build can pass. The conformance test fails.
Check Extension Fallbacks
MCP Apps are an extension to the core protocol. The MCP extensions overview calls out graceful fallback: if one side supports an extension and the other does not, the app should fall back to core protocol behavior or reject the request clearly.
For app tools, the simplest fallback is useful content.
test('ui tools return fallback text content', async ({ mcp }) => {
const result = await mcp.callTool('show-report', {
reportId: 'demo-report',
});
expect(result.isError).toBeFalsy();
expect(result.content?.[0]).toMatchObject({
type: 'text',
});
expect(result.content[0].text.length).toBeGreaterThan(20);
});
Do not make the fallback text a duplicate of the whole UI payload. Keep it short and useful:
content: [
{
type: 'text',
text: 'Displayed the Q2 revenue report with 14 regions and 3 flagged anomalies.',
},
];
That gives the model and non-UI clients enough context without stuffing the conversation with rows the resource can render from structuredContent.
Validate Tool Result Shape
Conformance testing should also verify that each tool result matches the contract the resource expects.
If you use Zod or another schema library, export the output schema and use it in the test:
import { z } from 'zod';
import { test, expect } from 'sunpeak/test';
const ReportOutput = z.object({
title: z.string(),
rows: z.array(
z.object({
label: z.string(),
value: z.number(),
})
),
});
test('show-report returns valid structuredContent', async ({ mcp }) => {
const result = await mcp.callTool('show-report', {
reportId: 'demo-report',
});
expect(result.isError).toBeFalsy();
expect(() => ReportOutput.parse(result.structuredContent)).not.toThrow();
});
Then add a leak check for fields that should stay out of model-visible data:
test('ui-only fields stay out of model-visible output', async ({ mcp }) => {
const result = await mcp.callTool('show-report', {
reportId: 'demo-report',
});
const modelVisible = JSON.stringify({
content: result.content,
structuredContent: result.structuredContent,
});
expect(modelVisible).not.toMatch(/internal/i);
expect(modelVisible).not.toMatch(/token/i);
expect(modelVisible).not.toMatch(/cursor/i);
expect(modelVisible).not.toMatch(/secret/i);
});
This overlaps with MCP App data-flow testing, which goes deeper on content, structuredContent, _meta, and host bridge state. For conformance, keep the check blunt. You want a fast signal that the result can load and does not expose obvious UI-only fields.
Resource Metadata Checks
Resource metadata is where many app loading bugs hide. The MCP App resource needs enough metadata for the host to frame it safely.
At minimum, test the MIME type. Then add checks for the metadata your app actually uses:
test('app resources declare required UI metadata', async ({ mcp }) => {
const resource = await mcp.readResource('ui://report');
const html = resource.contents[0];
const meta = html._meta?.ui;
expect(html.mimeType).toBe('text/html;profile=mcp-app');
expect(meta?.csp?.connectDomains ?? []).toEqual(
expect.arrayContaining(['https://api.example.com'])
);
expect(meta?.csp?.resourceDomains ?? []).toEqual(
expect.arrayContaining(['https://cdn.example.com'])
);
});
Do not add fake domains to make the example pass. In your app, assert the exact origins your resource needs. If the resource fetches from https://api.yourcompany.com, test that origin. If the resource does not call external APIs, assert that the domain lists are empty.
This keeps CSP changes reviewable. A pull request that adds a new external domain should change a conformance test, which makes the security impact visible in code review.
Host Bridge Smoke Tests
After server-side conformance passes, add one rendered smoke test per app resource. This is still conformance testing, not full workflow testing. You only want to prove the iframe can initialize, receive tool data, and render a stable root element.
import { test, expect } from 'sunpeak/test';
test('report app initializes in the host runtime', async ({ inspector }) => {
const result = await inspector.renderTool('show-report', {
reportId: 'demo-report',
});
const app = result.app();
await expect(app.getByTestId('report-root')).toBeVisible();
await expect(app.getByRole('heading', { name: /revenue report/i })).toBeVisible();
});
Keep this test boring. It should fail only when the app cannot mount. Put interaction details in E2E tests and visual details in visual regression tests.
Good smoke assertions:
- Root element is visible.
- Main heading renders.
- No fatal error boundary appears.
- A required action button is enabled.
- An empty, loading, or error fixture still mounts.
Weak smoke assertions:
- Exact pixel layout.
- Every row in a large table.
- A multi-step workflow.
- Model tool selection.
Those belong in later test layers.
Add a CI Conformance Job
Run conformance tests on every pull request. They should be fast enough to run before the rest of the test suite.
name: MCP App conformance
on:
pull_request:
push:
branches: [main]
jobs:
conformance:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: pnpm/action-setup@v4
- uses: actions/setup-node@v4
with:
node-version: 22
cache: pnpm
- run: pnpm install --frozen-lockfile
- run: pnpm test:e2e tests/e2e/conformance.spec.ts
For an existing MCP server that is not built with sunpeak, scaffold the test harness once:
npx sunpeak test init --server http://localhost:8000/mcp
Then keep your conformance spec under version control. The tests can call your server through MCP even if the server is written in Python, Go, Rust, or another stack.
What Conformance Tests Should Not Cover
Conformance tests should stay small. Do not turn them into a second E2E suite.
Skip these in conformance:
- Full user workflows across multiple tool calls.
- Visual regression baselines.
- Browser-specific layout checks.
- LLM tool-selection accuracy.
- Third-party API behavior.
- Core MCP protocol behavior that your SDK already owns.
Cover those elsewhere. Use integration tests for tool behavior, E2E tests for rendered workflows, evals for model tool selection, and security tests for auth, CSP, and data exposure.
The point of conformance is fast failure. If a pull request breaks discovery, resource loading, MIME types, annotations, or fallback text, you want a clear failure in the first minute of CI.
Where sunpeak Fits
You can write these tests with any MCP SDK and any test runner. The pattern is protocol-level: list tools, read resources, call tools, and render one host smoke test.
sunpeak makes that workflow easier because the same test runner gives you:
- The
mcpfixture fortools/list,resources/read, andtools/call. - The
inspectorfixture for one rendered host smoke test. - Simulation files for stable tool inputs and outputs.
- CI-friendly tests that do not need paid host accounts or host credits.
If you already have an MCP server, start with:
npx sunpeak test init --server http://localhost:8000/mcp
Then add tests/e2e/conformance.spec.ts with the checks above. Once conformance is green, add the deeper tests that prove the app actually behaves correctly.
Get Started
npx sunpeak new
Further Reading
- MCP App testing strategy - which tests to write first
- Integration testing MCP Apps - call tools through the MCP protocol
- E2E testing MCP Apps - render resources in host runtimes
- Testing MCP tool annotations
- Testing MCP App data flow
- Pre-submission testing for MCP Apps
- MCP App CI/CD with GitHub Actions
- Testing framework
- MCP App framework
- ChatGPT App framework
- MCP Apps overview - Model Context Protocol
- MCP extensions overview - capability negotiation and graceful fallback
- Apps SDK testing guide - OpenAI
Frequently Asked Questions
What is MCP App conformance testing?
MCP App conformance testing verifies that an MCP server exposes the app surface a host expects. It checks that tools are listed with valid schemas and annotations, UI-capable tools point at readable ui:// resources, resources use text/html;profile=mcp-app, tool results match their declared output shape, and the app still returns useful text for clients that do not render UI.
Is conformance testing the same as E2E testing for MCP Apps?
No. Conformance testing proves that the protocol surface is valid enough for a host to discover, fetch, and render the app. E2E testing proves that user workflows behave correctly in a browser. Run conformance tests first because they fail faster and explain wiring bugs more clearly than a full rendered UI test.
What should an MCP App conformance test check first?
Start with tools/list and resources/read. Verify every UI-capable tool has a title, description, inputSchema, outputSchema when useful, annotations, and a _meta.ui.resourceUri that starts with ui://. Then read that resource and verify it returns HTML with the text/html;profile=mcp-app MIME type.
How do I test ui:// resources in an MCP App?
Call resources/read for each ui:// URI referenced by your tools. Assert that the resource exists, has exactly the MIME type your target hosts expect, includes the app HTML, and declares any CSP or permission metadata it needs. Also test that stale, renamed, or missing resource URIs fail the build.
How do I test MCP App fallback behavior for non-UI clients?
Call each UI-capable tool and assert that content includes a short text summary even when structuredContent drives the UI. Clients that do not support MCP Apps can still show content, so the tool should not return an empty text response just because the app UI exists.
Should conformance tests run in CI?
Yes. Conformance tests are good CI smoke tests because they do not need a live host account, a browser session, or API credits. Run them on every pull request before slower E2E, visual regression, and live-host tests.
Can conformance testing catch ChatGPT App and Codex Connector bugs?
Yes. ChatGPT Apps and Codex Connectors both depend on the MCP server exposing clean tools, schemas, resources, and tool results. Conformance tests catch missing resources, broken schemas, incorrect annotations, and empty fallback content before those bugs reach a host-specific test.