Testing MCP App Data Flow: content, structuredContent, _meta, and Host Bridge State
Testing content, structuredContent, _meta, and host bridge state in MCP Apps.
Every MCP App has a data contract. A tool returns a result, the host sends that result to a sandboxed resource, the resource renders UI, and the model uses some part of the result as context for the next answer. Most rendering bugs come from treating that contract as “some JSON” instead of testing where each field goes.
This matters more now because MCP Apps and ChatGPT Apps have more than one data lane. content, structuredContent, _meta, app state, widget state, and host bridge globals all look like places to put data. They are not interchangeable.
TL;DR: Test content, structuredContent, and _meta as separate contracts. Keep content short and model-readable. Put render data in structuredContent. Put UI-only metadata in _meta. Test useAppState or setWidgetState updates as model-visible state. Add sentinel tests so internal IDs, cursors, tokens, and UI hints do not leak into fields the model can read.
The Data Lanes in an MCP App
The MCP Apps protocol renders interactive HTML in a sandboxed iframe and communicates with that iframe over a postMessage bridge. When a tool result arrives, the host can deliver several pieces of data to the resource.
| Field | Primary reader | Use it for | Do not use it for |
|---|---|---|---|
content | Model | Short text summary, citations, user-readable status | Full UI payloads, internal IDs, secrets |
structuredContent | Resource and sometimes model context | Typed data the UI renders | Private UI-only state, large hidden payloads |
_meta | Resource | UI-only metadata, cursors, cache keys, view hints | Anything required for model reasoning |
| App state or widget state | Host and model context | User selections the model should know about | Private component internals |
| Host globals | Resource | Theme, display mode, locale, safe area | Business data |
OpenAI’s Apps SDK reference maps the same ideas onto window.openai: toolOutput is your structuredContent, toolResponseMetadata is _meta, and setWidgetState stores UI state between renders. If you build against the standard MCP Apps bridge, the names differ, but the testing problem is the same.
What to Test First
Start with the highest-risk boundary: the tool result. Your tool handler should prove four things before the UI ever renders:
contentis present when the model needs a summary.structuredContentmatches the resource component’s expected schema._metacontains only UI-only fields.- No sensitive or internal-only field appears in the wrong lane.
Here is a small tool result for an invoice viewer:
return {
content: [
{
type: 'text',
text: 'Displayed 12 invoices for April 2026.',
},
],
structuredContent: {
period: '2026-04',
invoices: invoices.map((invoice) => ({
id: invoice.publicId,
customer: invoice.customerName,
total: invoice.total,
status: invoice.status,
})),
},
_meta: {
nextCursor: cursor,
viewId: view.id,
internalAccountId: account.id,
},
};
That split gives the model a concise summary, gives the resource the rows it needs to render, and keeps pagination and internal IDs in the UI-only lane.
Integration Tests for Tool Results
Use an integration test to call the tool through the MCP layer and assert the result shape. With sunpeak, the mcp fixture exercises the real MCP server instead of a mocked handler:
import { test, expect } from 'sunpeak/test';
test('invoice tool returns clean data lanes', async ({ mcp }) => {
const result = await mcp.callTool('list-invoices', {
period: '2026-04',
});
expect(result.isError).toBeFalsy();
expect(result.content?.[0]).toMatchObject({
type: 'text',
text: expect.stringContaining('Displayed'),
});
expect(result.structuredContent).toMatchObject({
period: '2026-04',
invoices: expect.any(Array),
});
const firstInvoice = result.structuredContent.invoices[0];
expect(firstInvoice).toHaveProperty('id');
expect(firstInvoice).toHaveProperty('customer');
expect(firstInvoice).toHaveProperty('total');
expect(result._meta).toMatchObject({
nextCursor: expect.any(String),
viewId: expect.any(String),
});
});
That test checks the happy path, but it does not yet protect against leaks. Add a second test that fails when internal fields drift into model-visible data:
test('internal fields stay out of model-visible data', async ({ mcp }) => {
const result = await mcp.callTool('list-invoices', {
period: '2026-04',
});
const modelVisible = JSON.stringify({
content: result.content,
structuredContent: result.structuredContent,
});
expect(modelVisible).not.toMatch(/internalAccountId/i);
expect(modelVisible).not.toMatch(/nextCursor/i);
expect(modelVisible).not.toMatch(/session/i);
expect(modelVisible).not.toMatch(/token/i);
});
This is a cheap test, and it catches a common refactor bug: someone adds { ...invoice } to structuredContent and accidentally exposes database IDs, cursors, or raw provider payloads.
Sentinel Tests for _meta
When you need to prove _meta stays UI-only, use a harmless sentinel value. A sentinel is just a string that should never appear in model-readable fields.
const UI_ONLY_SENTINEL = 'UI_ONLY_SENTINEL_DO_NOT_ECHO';
test('ui-only metadata is not copied into content or structuredContent', async ({ mcp }) => {
const result = await mcp.callTool('list-invoices', {
period: '2026-04',
debugSentinel: UI_ONLY_SENTINEL,
});
expect(JSON.stringify(result._meta)).toContain(UI_ONLY_SENTINEL);
expect(JSON.stringify(result.content)).not.toContain(UI_ONLY_SENTINEL);
expect(JSON.stringify(result.structuredContent)).not.toContain(UI_ONLY_SENTINEL);
});
Do not use a real secret as a sentinel. Use a fake value that is easy to grep in logs. If the test fails, you have proof that UI-only data is being copied into a field the model can see.
E2E Tests for Resource Rendering
The integration test proves the server returned the right lanes. The E2E test proves the resource reads the right lane.
import { test, expect } from 'sunpeak/test';
test('invoice resource renders structured content and uses meta for pagination', async ({
inspector,
}) => {
const result = await inspector.renderTool('list-invoices', {
input: { period: '2026-04' },
output: {
content: [{ type: 'text', text: 'Displayed 2 invoices for April 2026.' }],
structuredContent: {
period: '2026-04',
invoices: [
{ id: 'inv_001', customer: 'Acme Co', total: '$1,200', status: 'paid' },
{ id: 'inv_002', customer: 'Northwind', total: '$840', status: 'open' },
],
},
_meta: {
nextCursor: 'cursor_next_page',
viewId: 'view_invoice_list',
},
},
});
const app = result.app();
await expect(app.getByRole('heading', { name: 'April 2026 invoices' })).toBeVisible();
await expect(app.getByText('Acme Co')).toBeVisible();
await expect(app.getByText('$1,200')).toBeVisible();
await app.getByRole('button', { name: 'Load more' }).click();
await expect(result.lastToolCall()).resolves.toMatchObject({
name: 'list-invoices',
args: { cursor: 'cursor_next_page' },
});
});
The important part is that the UI renders invoice rows from structuredContent, while pagination reads nextCursor from _meta. If the component starts reading rows from _meta, or stores the cursor in structuredContent, this test should fail.
Testing ChatGPT window.openai Access
If you use ChatGPT-specific APIs directly, avoid scattering window.openai calls through your components. Put them behind a small adapter so unit tests can mock one module.
export function getToolOutput<T>() {
return window.openai?.toolOutput as T | undefined;
}
export function getToolResponseMetadata<T>() {
return window.openai?.toolResponseMetadata as T | undefined;
}
export function persistWidgetState(state: unknown) {
window.openai?.setWidgetState?.(state);
}
Then unit test your component against the adapter:
import { render, screen, fireEvent } from '@testing-library/react';
import { vi, test, expect } from 'vitest';
import { InvoiceWidget } from './InvoiceWidget';
import * as bridge from './chatgptBridge';
test('renders toolOutput and persists selected row', () => {
vi.spyOn(bridge, 'getToolOutput').mockReturnValue({
invoices: [{ id: 'inv_001', customer: 'Acme Co' }],
});
vi.spyOn(bridge, 'getToolResponseMetadata').mockReturnValue({
viewId: 'view_invoice_list',
});
const persist = vi.spyOn(bridge, 'persistWidgetState').mockImplementation(() => {});
render(<InvoiceWidget />);
fireEvent.click(screen.getByText('Acme Co'));
expect(persist).toHaveBeenCalledWith({
selectedInvoiceId: 'inv_001',
});
});
Also test that your component does not crash when window.openai is missing:
test('renders fallback outside ChatGPT', () => {
vi.spyOn(bridge, 'getToolOutput').mockReturnValue(undefined);
vi.spyOn(bridge, 'getToolResponseMetadata').mockReturnValue(undefined);
render(<InvoiceWidget />);
expect(screen.getByText('No invoice data available')).toBeVisible();
});
That fallback matters because MCP Apps are meant to run across hosts. Host-specific bridge APIs should be optional, not required for basic rendering.
Testing App State
State is different from _meta. _meta is UI-only tool result metadata. App state or widget state is user interaction state the host may preserve and expose back to the model.
Use state for things the user did that the model should know:
- Selected a row
- Applied a filter
- Chose a date range
- Completed a step in a multi-step form
Do not use it for private component internals:
- Cache keys
- API cursors
- DOM measurements
- Internal account IDs
- Temporary tokens
For portable MCP Apps with sunpeak, test useAppState like any other hook:
import { render, screen, fireEvent } from '@testing-library/react';
import { vi, test, expect } from 'vitest';
import { InvoiceFilters } from './InvoiceFilters';
const setAppState = vi.fn();
let appState = { status: 'open' };
vi.mock('sunpeak', () => ({
useAppState: () => [appState, setAppState],
}));
test('status filter syncs to app state', () => {
render(<InvoiceFilters />);
fireEvent.click(screen.getByRole('button', { name: 'Paid' }));
expect(setAppState).toHaveBeenCalledWith({
status: 'paid',
});
});
Pair that with a render test for restored state:
test('restored app state controls selected filter', () => {
appState = { status: 'paid' };
render(<InvoiceFilters />);
expect(screen.getByRole('button', { name: 'Paid' })).toHaveAttribute(
'aria-pressed',
'true'
);
});
The first test proves user action writes state. The second proves the component can restore from state after the host re-renders it.
A Practical Data-Flow Checklist
Use this checklist for every tool that renders an MCP App resource:
contentgives the model a short, useful summary.contentdoes not duplicate largestructuredContentarrays.structuredContentvalidates against a schema the resource owns.structuredContentcontains only fields the model may safely see._metacarries UI-only metadata such as cursors, internal IDs, and view hints._metais optional from the resource’s point of view, or the resource renders a clear fallback.- App state contains user choices the model should know.
- App state does not contain secrets, cache internals, or raw provider payloads.
- Host-specific bridge APIs are feature-detected.
- Tests cover at least one missing-field, empty-state, and malformed-data case.
If you do nothing else, add two tests: one schema test for structuredContent, and one leak test that fails when known internal field names appear in content or structuredContent.
Where sunpeak Helps
You can test this contract with any MCP test harness, but sunpeak makes the loop short because it gives you both sides: the mcp fixture for tool-result contract tests and the inspector fixture for iframe rendering tests. The same simulation file can define content, structuredContent, _meta, input, theme, and display mode, then run locally and in CI.
That means you can test the full data path without opening ChatGPT, burning credits, or relying on manual prompts. For a new project, run npx sunpeak new. For an existing MCP server, use npx sunpeak test init --server http://localhost:8000/mcp and start with the data-flow tests above.
Get Started
npx sunpeak new
Further Reading
- MCP App TypeScript types - type the tool-to-resource contract
- Regression testing MCP Apps - catch structuredContent shape changes
- E2E testing MCP Apps - render resources with simulation files
- Mocking and stubbing MCP App tests
- Testing multi-tool MCP Apps
- MCP App error handling - test tool-result states
- MCP App framework
- Testing framework
- sunpeak docs - MCP App resources, tools, and simulations
- MCP Apps overview - official Model Context Protocol docs
- Apps SDK reference - OpenAI
- Apps SDK testing guide - OpenAI
Frequently Asked Questions
What is the difference between content, structuredContent, and _meta in an MCP App?
content is the human-readable tool result the model can read. structuredContent is typed JSON that your MCP App resource renders and that some hosts may also expose to the model. _meta is resource-only metadata for UI concerns such as internal IDs, pagination cursors, prefetched payloads, or view hints. Test all three separately so model-visible data stays concise and UI-only data does not leak into model context.
How do I test structuredContent in an MCP App?
Write an integration test that calls the tool through the MCP layer and asserts the structuredContent schema, required fields, and serializability. Then write an E2E test that renders the same tool result in the inspector and asserts the resource displays that data correctly. This catches both backend contract bugs and frontend rendering bugs.
Should secrets go in structuredContent or _meta?
Neither field should contain long-lived secrets. If the resource needs a short-lived UI token, keep it scoped, short-lived, and app-specific, then pass it through the narrowest field your host supports. Use _meta for UI-only values that the model should never see. Do not put API keys, OAuth refresh tokens, session cookies, or private conversation data in any tool result.
How do I test that _meta stays UI-only?
Create a test payload with a sentinel value in _meta, render the app, and assert the value is available only through the resource code path. Pair that with an integration assertion that content does not include the sentinel and that structuredContent does not copy it. For live host checks, use a harmless sentinel such as UI_ONLY_SENTINEL and verify the model never repeats it.
How do I test window.openai toolOutput and toolResponseMetadata?
For ChatGPT-specific components, wrap window.openai access in a small adapter and mock that adapter in unit tests. Assert that toolOutput feeds your render path, toolResponseMetadata feeds only UI internals, and setWidgetState is called after meaningful user interactions. Also test the fallback path where window.openai is undefined so the component does not crash in non-ChatGPT hosts.
What should I put in content when the UI renders the full answer?
Keep content short and factual. It should tell the model what the resource showed, not duplicate the entire UI payload. For example, use content like "Displayed 12 invoices for April 2026" and put the invoice rows in structuredContent. This gives the model enough context while keeping token use and data exposure under control.
Can I test MCP App host bridge state in CI?
Yes. Use simulation files and Playwright tests against a local MCP App inspector. Render the resource with controlled tool input, structuredContent, _meta, theme, and display mode values. Then assert the DOM, state transitions, and tool calls. This lets you test host bridge behavior in CI without a paid ChatGPT account or manual browser session.
What is the most common MCP App data-flow bug?
The most common bug is mixing model-visible data and UI-only data. Developers often put every field into structuredContent because it is easy for the resource to read, then the model sees internal IDs, large payloads, or UI hints it should not reason about. Contract tests should fail when internal fields appear in content or structuredContent.