All posts

Testing MCP App Data Flow: content, structuredContent, _meta, and Host Bridge State

Abe Wheeler
MCP Apps MCP App Testing MCP App Framework ChatGPT Apps ChatGPT App Testing ChatGPT App Framework Codex Connectors Codex Connector Testing Host Bridge structuredContent
Testing content, structuredContent, _meta, and host bridge state in MCP Apps.

Testing content, structuredContent, _meta, and host bridge state in MCP Apps.

Every MCP App has a data contract. A tool returns a result, the host sends that result to a sandboxed resource, the resource renders UI, and the model uses some part of the result as context for the next answer. Most rendering bugs come from treating that contract as “some JSON” instead of testing where each field goes.

This matters more now because MCP Apps and ChatGPT Apps have more than one data lane. content, structuredContent, _meta, app state, widget state, and host bridge globals all look like places to put data. They are not interchangeable.

TL;DR: Test content, structuredContent, and _meta as separate contracts. Keep content short and model-readable. Put render data in structuredContent. Put UI-only metadata in _meta. Test useAppState or setWidgetState updates as model-visible state. Add sentinel tests so internal IDs, cursors, tokens, and UI hints do not leak into fields the model can read.

The Data Lanes in an MCP App

The MCP Apps protocol renders interactive HTML in a sandboxed iframe and communicates with that iframe over a postMessage bridge. When a tool result arrives, the host can deliver several pieces of data to the resource.

FieldPrimary readerUse it forDo not use it for
contentModelShort text summary, citations, user-readable statusFull UI payloads, internal IDs, secrets
structuredContentResource and sometimes model contextTyped data the UI rendersPrivate UI-only state, large hidden payloads
_metaResourceUI-only metadata, cursors, cache keys, view hintsAnything required for model reasoning
App state or widget stateHost and model contextUser selections the model should know aboutPrivate component internals
Host globalsResourceTheme, display mode, locale, safe areaBusiness data

OpenAI’s Apps SDK reference maps the same ideas onto window.openai: toolOutput is your structuredContent, toolResponseMetadata is _meta, and setWidgetState stores UI state between renders. If you build against the standard MCP Apps bridge, the names differ, but the testing problem is the same.

What to Test First

Start with the highest-risk boundary: the tool result. Your tool handler should prove four things before the UI ever renders:

  • content is present when the model needs a summary.
  • structuredContent matches the resource component’s expected schema.
  • _meta contains only UI-only fields.
  • No sensitive or internal-only field appears in the wrong lane.

Here is a small tool result for an invoice viewer:

return {
  content: [
    {
      type: 'text',
      text: 'Displayed 12 invoices for April 2026.',
    },
  ],
  structuredContent: {
    period: '2026-04',
    invoices: invoices.map((invoice) => ({
      id: invoice.publicId,
      customer: invoice.customerName,
      total: invoice.total,
      status: invoice.status,
    })),
  },
  _meta: {
    nextCursor: cursor,
    viewId: view.id,
    internalAccountId: account.id,
  },
};

That split gives the model a concise summary, gives the resource the rows it needs to render, and keeps pagination and internal IDs in the UI-only lane.

Integration Tests for Tool Results

Use an integration test to call the tool through the MCP layer and assert the result shape. With sunpeak, the mcp fixture exercises the real MCP server instead of a mocked handler:

import { test, expect } from 'sunpeak/test';

test('invoice tool returns clean data lanes', async ({ mcp }) => {
  const result = await mcp.callTool('list-invoices', {
    period: '2026-04',
  });

  expect(result.isError).toBeFalsy();

  expect(result.content?.[0]).toMatchObject({
    type: 'text',
    text: expect.stringContaining('Displayed'),
  });

  expect(result.structuredContent).toMatchObject({
    period: '2026-04',
    invoices: expect.any(Array),
  });

  const firstInvoice = result.structuredContent.invoices[0];
  expect(firstInvoice).toHaveProperty('id');
  expect(firstInvoice).toHaveProperty('customer');
  expect(firstInvoice).toHaveProperty('total');

  expect(result._meta).toMatchObject({
    nextCursor: expect.any(String),
    viewId: expect.any(String),
  });
});

That test checks the happy path, but it does not yet protect against leaks. Add a second test that fails when internal fields drift into model-visible data:

test('internal fields stay out of model-visible data', async ({ mcp }) => {
  const result = await mcp.callTool('list-invoices', {
    period: '2026-04',
  });

  const modelVisible = JSON.stringify({
    content: result.content,
    structuredContent: result.structuredContent,
  });

  expect(modelVisible).not.toMatch(/internalAccountId/i);
  expect(modelVisible).not.toMatch(/nextCursor/i);
  expect(modelVisible).not.toMatch(/session/i);
  expect(modelVisible).not.toMatch(/token/i);
});

This is a cheap test, and it catches a common refactor bug: someone adds { ...invoice } to structuredContent and accidentally exposes database IDs, cursors, or raw provider payloads.

Sentinel Tests for _meta

When you need to prove _meta stays UI-only, use a harmless sentinel value. A sentinel is just a string that should never appear in model-readable fields.

const UI_ONLY_SENTINEL = 'UI_ONLY_SENTINEL_DO_NOT_ECHO';

test('ui-only metadata is not copied into content or structuredContent', async ({ mcp }) => {
  const result = await mcp.callTool('list-invoices', {
    period: '2026-04',
    debugSentinel: UI_ONLY_SENTINEL,
  });

  expect(JSON.stringify(result._meta)).toContain(UI_ONLY_SENTINEL);
  expect(JSON.stringify(result.content)).not.toContain(UI_ONLY_SENTINEL);
  expect(JSON.stringify(result.structuredContent)).not.toContain(UI_ONLY_SENTINEL);
});

Do not use a real secret as a sentinel. Use a fake value that is easy to grep in logs. If the test fails, you have proof that UI-only data is being copied into a field the model can see.

E2E Tests for Resource Rendering

The integration test proves the server returned the right lanes. The E2E test proves the resource reads the right lane.

import { test, expect } from 'sunpeak/test';

test('invoice resource renders structured content and uses meta for pagination', async ({
  inspector,
}) => {
  const result = await inspector.renderTool('list-invoices', {
    input: { period: '2026-04' },
    output: {
      content: [{ type: 'text', text: 'Displayed 2 invoices for April 2026.' }],
      structuredContent: {
        period: '2026-04',
        invoices: [
          { id: 'inv_001', customer: 'Acme Co', total: '$1,200', status: 'paid' },
          { id: 'inv_002', customer: 'Northwind', total: '$840', status: 'open' },
        ],
      },
      _meta: {
        nextCursor: 'cursor_next_page',
        viewId: 'view_invoice_list',
      },
    },
  });

  const app = result.app();

  await expect(app.getByRole('heading', { name: 'April 2026 invoices' })).toBeVisible();
  await expect(app.getByText('Acme Co')).toBeVisible();
  await expect(app.getByText('$1,200')).toBeVisible();

  await app.getByRole('button', { name: 'Load more' }).click();
  await expect(result.lastToolCall()).resolves.toMatchObject({
    name: 'list-invoices',
    args: { cursor: 'cursor_next_page' },
  });
});

The important part is that the UI renders invoice rows from structuredContent, while pagination reads nextCursor from _meta. If the component starts reading rows from _meta, or stores the cursor in structuredContent, this test should fail.

Testing ChatGPT window.openai Access

If you use ChatGPT-specific APIs directly, avoid scattering window.openai calls through your components. Put them behind a small adapter so unit tests can mock one module.

export function getToolOutput<T>() {
  return window.openai?.toolOutput as T | undefined;
}

export function getToolResponseMetadata<T>() {
  return window.openai?.toolResponseMetadata as T | undefined;
}

export function persistWidgetState(state: unknown) {
  window.openai?.setWidgetState?.(state);
}

Then unit test your component against the adapter:

import { render, screen, fireEvent } from '@testing-library/react';
import { vi, test, expect } from 'vitest';
import { InvoiceWidget } from './InvoiceWidget';
import * as bridge from './chatgptBridge';

test('renders toolOutput and persists selected row', () => {
  vi.spyOn(bridge, 'getToolOutput').mockReturnValue({
    invoices: [{ id: 'inv_001', customer: 'Acme Co' }],
  });
  vi.spyOn(bridge, 'getToolResponseMetadata').mockReturnValue({
    viewId: 'view_invoice_list',
  });
  const persist = vi.spyOn(bridge, 'persistWidgetState').mockImplementation(() => {});

  render(<InvoiceWidget />);

  fireEvent.click(screen.getByText('Acme Co'));

  expect(persist).toHaveBeenCalledWith({
    selectedInvoiceId: 'inv_001',
  });
});

Also test that your component does not crash when window.openai is missing:

test('renders fallback outside ChatGPT', () => {
  vi.spyOn(bridge, 'getToolOutput').mockReturnValue(undefined);
  vi.spyOn(bridge, 'getToolResponseMetadata').mockReturnValue(undefined);

  render(<InvoiceWidget />);

  expect(screen.getByText('No invoice data available')).toBeVisible();
});

That fallback matters because MCP Apps are meant to run across hosts. Host-specific bridge APIs should be optional, not required for basic rendering.

Testing App State

State is different from _meta. _meta is UI-only tool result metadata. App state or widget state is user interaction state the host may preserve and expose back to the model.

Use state for things the user did that the model should know:

  • Selected a row
  • Applied a filter
  • Chose a date range
  • Completed a step in a multi-step form

Do not use it for private component internals:

  • Cache keys
  • API cursors
  • DOM measurements
  • Internal account IDs
  • Temporary tokens

For portable MCP Apps with sunpeak, test useAppState like any other hook:

import { render, screen, fireEvent } from '@testing-library/react';
import { vi, test, expect } from 'vitest';
import { InvoiceFilters } from './InvoiceFilters';

const setAppState = vi.fn();
let appState = { status: 'open' };

vi.mock('sunpeak', () => ({
  useAppState: () => [appState, setAppState],
}));

test('status filter syncs to app state', () => {
  render(<InvoiceFilters />);

  fireEvent.click(screen.getByRole('button', { name: 'Paid' }));

  expect(setAppState).toHaveBeenCalledWith({
    status: 'paid',
  });
});

Pair that with a render test for restored state:

test('restored app state controls selected filter', () => {
  appState = { status: 'paid' };

  render(<InvoiceFilters />);

  expect(screen.getByRole('button', { name: 'Paid' })).toHaveAttribute(
    'aria-pressed',
    'true'
  );
});

The first test proves user action writes state. The second proves the component can restore from state after the host re-renders it.

A Practical Data-Flow Checklist

Use this checklist for every tool that renders an MCP App resource:

  • content gives the model a short, useful summary.
  • content does not duplicate large structuredContent arrays.
  • structuredContent validates against a schema the resource owns.
  • structuredContent contains only fields the model may safely see.
  • _meta carries UI-only metadata such as cursors, internal IDs, and view hints.
  • _meta is optional from the resource’s point of view, or the resource renders a clear fallback.
  • App state contains user choices the model should know.
  • App state does not contain secrets, cache internals, or raw provider payloads.
  • Host-specific bridge APIs are feature-detected.
  • Tests cover at least one missing-field, empty-state, and malformed-data case.

If you do nothing else, add two tests: one schema test for structuredContent, and one leak test that fails when known internal field names appear in content or structuredContent.

Where sunpeak Helps

You can test this contract with any MCP test harness, but sunpeak makes the loop short because it gives you both sides: the mcp fixture for tool-result contract tests and the inspector fixture for iframe rendering tests. The same simulation file can define content, structuredContent, _meta, input, theme, and display mode, then run locally and in CI.

That means you can test the full data path without opening ChatGPT, burning credits, or relying on manual prompts. For a new project, run npx sunpeak new. For an existing MCP server, use npx sunpeak test init --server http://localhost:8000/mcp and start with the data-flow tests above.

Get Started

Documentation →
npx sunpeak new

Further Reading

Frequently Asked Questions

What is the difference between content, structuredContent, and _meta in an MCP App?

content is the human-readable tool result the model can read. structuredContent is typed JSON that your MCP App resource renders and that some hosts may also expose to the model. _meta is resource-only metadata for UI concerns such as internal IDs, pagination cursors, prefetched payloads, or view hints. Test all three separately so model-visible data stays concise and UI-only data does not leak into model context.

How do I test structuredContent in an MCP App?

Write an integration test that calls the tool through the MCP layer and asserts the structuredContent schema, required fields, and serializability. Then write an E2E test that renders the same tool result in the inspector and asserts the resource displays that data correctly. This catches both backend contract bugs and frontend rendering bugs.

Should secrets go in structuredContent or _meta?

Neither field should contain long-lived secrets. If the resource needs a short-lived UI token, keep it scoped, short-lived, and app-specific, then pass it through the narrowest field your host supports. Use _meta for UI-only values that the model should never see. Do not put API keys, OAuth refresh tokens, session cookies, or private conversation data in any tool result.

How do I test that _meta stays UI-only?

Create a test payload with a sentinel value in _meta, render the app, and assert the value is available only through the resource code path. Pair that with an integration assertion that content does not include the sentinel and that structuredContent does not copy it. For live host checks, use a harmless sentinel such as UI_ONLY_SENTINEL and verify the model never repeats it.

How do I test window.openai toolOutput and toolResponseMetadata?

For ChatGPT-specific components, wrap window.openai access in a small adapter and mock that adapter in unit tests. Assert that toolOutput feeds your render path, toolResponseMetadata feeds only UI internals, and setWidgetState is called after meaningful user interactions. Also test the fallback path where window.openai is undefined so the component does not crash in non-ChatGPT hosts.

What should I put in content when the UI renders the full answer?

Keep content short and factual. It should tell the model what the resource showed, not duplicate the entire UI payload. For example, use content like "Displayed 12 invoices for April 2026" and put the invoice rows in structuredContent. This gives the model enough context while keeping token use and data exposure under control.

Can I test MCP App host bridge state in CI?

Yes. Use simulation files and Playwright tests against a local MCP App inspector. Render the resource with controlled tool input, structuredContent, _meta, theme, and display mode values. Then assert the DOM, state transitions, and tool calls. This lets you test host bridge behavior in CI without a paid ChatGPT account or manual browser session.

What is the most common MCP App data-flow bug?

The most common bug is mixing model-visible data and UI-only data. Developers often put every field into structuredContent because it is easy for the resource to read, then the model sees internal IDs, large payloads, or UI hints it should not reason about. Contract tests should fail when internal fields appear in content or structuredContent.