All posts

The Complete Guide to Testing ChatGPT Apps and MCP Apps

Abe Wheeler
ChatGPT Apps MCP Apps Testing ChatGPT App Testing MCP App Testing Tutorial
The sunpeak ChatGPT App simulator with testing capabilities.

The sunpeak ChatGPT App simulator with testing capabilities.

[Updated 2026-03-04] Testing ChatGPT Apps and MCP Apps is brutal. Your App needs to work properly with all kinds of states: it needs to account for host runtime state, host theme state, MCP server state, backend state, and now it needs to work across ChatGPT and Claude (each with its own UI chrome, color palette, rendering behavior, and runtime APIs).

Without proper testing infrastructure, you’re either deploying blind or burning credits on every test, paying for subscriptions your whole team barely uses, and wasting a bunch of time testing manually.

TL;DR: Use sunpeak’s built-in testing with Vitest for unit tests (pnpm test) and Playwright for e2e tests (pnpm test:e2e). sunpeak’s local host simulator ships both a ChatGPT host and a Claude host, so you test across hosts, display modes, runtimes, and themes locally and in CI. No paid accounts. No AI credits. Define states in simulation files and run everything automatically.

localhost:3000?host=chatgpt&theme=dark

This guide covers everything you need to test ChatGPT Apps with confidence.

Why Testing ChatGPT Apps is Different

ChatGPT Apps run in a specialized runtime environment. Your React components don’t just render in a browser. They render inside the ChatGPT App runtime with:

  • Host frontend state - Inline, in picture-in-picture, and fullscreen display modes, light or dark theme, etc.
  • Tool invocations - The AI host calls your app’s tools with specific inputs
  • Backend state - Various possible states for users and sessions in your database
  • App state - Persistent state that survives across invocations
  • Multiple hosts - ChatGPT and Claude each have their own UI chrome, color palette, layout conventions, and rendering behavior

Testing each combination manually isn’t feasible, the combinatorics are brutal.

The Cross-Host Problem

MCP Apps run on ChatGPT, Claude, and other hosts. Each host renders your app differently. Your app needs to look right in both.

Testing manually against the real hosts means:

  • A ChatGPT Plus subscription ($20/mo per team member)
  • A Claude Pro subscription ($20/mo per team member)
  • Burning AI credits every time you test, because the model processes your tool call each time you want to see your UI
  • Waiting for the model to respond before you can see your component render
  • No way to run these tests in CI/CD, since you can’t automate real ChatGPT or Claude interactions

During active development, you might test dozens of times a day. Across a team of five, that’s $200/month in subscriptions alone, plus whatever credits you burn. And you still can’t run automated regression tests.

sunpeak’s simulator ships both a ChatGPT host and a Claude host built-in. Switch between them with the host dropdown in the sidebar, or pass ?host=claude in the URL. Your automated tests run against both hosts on every push, on your CI/CD runners, with zero external dependencies. No paid accounts, no API keys, no credits.

Setting Up Your Testing Environment

If you’re using the sunpeak ChatGPT App framework, testing is pre-configured. Start with:

pnpm add -g sunpeak && sunpeak new
cd sunpeak-app

Your project includes:

  • Vitest configured with jsdom, React Testing Library, and jest-dom matchers
  • Playwright configured to test against the ChatGPT App simulator
  • Simulation files in tests/simulations/ for deterministic states

Unit Testing with Vitest

Unit tests validate individual components in isolation. Run them with:

pnpm test

Create tests alongside your components in src/resources with the .test.tsx extension:

import { render, screen } from '@testing-library/react';
import { Counter } from '../src/resources/counter/counter';

describe('Counter', () => {
  it('renders the initial count', () => {
    render(<Counter />);
    expect(screen.getByText('0')).toBeInTheDocument();
  });

  it('increments when button is clicked', async () => {
    render(<Counter />);
    await userEvent.click(screen.getByRole('button', { name: /increment/i }));
    expect(screen.getByText('1')).toBeInTheDocument();
  });
});

Unit tests run fast and catch component-level bugs early. They’re ideal for testing:

  • Component rendering logic
  • User interactions within a component
  • Props and state handling

End-to-End Testing with Playwright

E2E tests validate your ChatGPT App running in the simulator. Run them with:

pnpm test:e2e

Create tests in tests/e2e/ with the .spec.ts extension:

import { test, expect } from '@playwright/test';
import { createSimulatorUrl } from 'sunpeak';

test('counter increments in fullscreen mode', async ({ page }) => {
  await page.goto(createSimulatorUrl({
    simulation: 'show-counter',
    displayMode: 'fullscreen',
    theme: 'dark',
  }));

  await page.getByRole('button', { name: /increment/i }).click();
  await expect(page.getByText('1')).toBeVisible();
});

The createSimulatorUrl utility generates URLs with your test configuration:

  • simulation - Your simulation file name (mocks tool calls and responses)
  • displayMode - inline, pip, or fullscreen (tests display adaptation)
  • theme - light or dark (tests theme handling)
  • deviceType - mobile, tablet, desktop, or unknown (tests responsive behavior)
  • touch / hover - Enable or disable touch/hover capabilities
  • safeAreaTop, safeAreaBottom, etc. - Simulate device notches and insets

Creating Simulation Files

Simulation files define deterministic states for testing. Create them in tests/simulations/:

{
  "tool": "show_counter",
  "userMessage": "Show me a counter starting at 5",
  "toolInput": {
    "arguments": { "initialCount": 5 }
  },
  "toolResult": {
    "content": [{ "type": "text", "text": "Counter displayed" }],
    "structuredContent": {
      "count": 5
    }
  }
}

This simulation:

  • References the tool file to mock by name (matches src/tools/show_counter.ts)
  • Shows userMessage in the simulator chat interface
  • Sets toolInput with mock input accessible via useToolData()
  • Provides toolResult with mock output data passed to your component via useToolData()

Use simulations to test specific states without manual setup:

// Test the counter with toolResult.structuredContent.count = 5
await page.goto(createSimulatorUrl({ simulation: 'show-counter' }));
await expect(page.getByText('5')).toBeVisible();

// Test a different initial state
await page.goto(createSimulatorUrl({ simulation: 'counter-initial' }));
await expect(page.getByText('0')).toBeVisible();

Testing Across Display Modes

ChatGPT Apps appear in three display modes. Test all of them:

const displayModes = ['inline', 'pip', 'fullscreen'] as const;

for (const displayMode of displayModes) {
  test(`renders correctly in ${displayMode} mode`, async ({ page }) => {
    await page.goto(createSimulatorUrl({
      simulation: 'show-counter',
      displayMode,
    }));

    await expect(page.getByRole('button')).toBeVisible();
  });
}

Each mode has different constraints:

  • Inline - Embedded in chat
  • Picture-in-picture - Floating window
  • Fullscreen - Maximum space, modal overlay

Your app should adapt gracefully to each.

Testing Theme Adaptation

Test both light and dark themes:

test('adapts to dark theme', async ({ page }) => {
  await page.goto(createSimulatorUrl({
    simulation: 'show-counter',
    theme: 'dark',
  }));

  // Verify dark theme styles are applied
  const button = page.getByRole('button');
  await expect(button).toHaveCSS('background-color', 'rgb(255, 184, 0)');
});

Testing Across Hosts

sunpeak’s simulator ships a ChatGPT host and a Claude host. Both are registered automatically when the simulator starts. In the browser, you switch hosts with the sidebar dropdown. In tests, pass the host parameter to createSimulatorUrl.

The simplest pattern loops over both hosts:

import { createSimulatorUrl } from 'sunpeak/chatgpt';

const hosts = ['chatgpt', 'claude'] as const;

for (const host of hosts) {
  test(`counter renders correctly on ${host}`, async ({ page }) => {
    await page.goto(createSimulatorUrl({
      simulation: 'show-counter',
      displayMode: 'fullscreen',
      theme: 'dark',
      host,
    }));

    await expect(page.getByRole('button', { name: /increment/i })).toBeVisible();
  });
}

This generates separate test cases for each host. When a test fails on Claude but passes on ChatGPT (or vice versa), you’ll know immediately which host has the issue.

You can combine host testing with display mode and theme testing for full coverage:

const hosts = ['chatgpt', 'claude'] as const;
const themes = ['light', 'dark'] as const;
const displayModes = ['inline', 'pip', 'fullscreen'] as const;

for (const host of hosts) {
  for (const theme of themes) {
    for (const displayMode of displayModes) {
      test(`renders on ${host} / ${theme} / ${displayMode}`, async ({ page }) => {
        await page.goto(createSimulatorUrl({
          simulation: 'show-counter',
          host,
          theme,
          displayMode,
        }));

        await expect(page.getByRole('button')).toBeVisible();
      });
    }
  }
}

That’s 12 test cases (2 hosts x 2 themes x 3 display modes) from a few lines of code. Each runs against the local simulator in seconds, with no network requests, no paid accounts, and no AI credits.

These same tests run on your CI/CD runners. A GitHub Actions workflow doesn’t need ChatGPT Plus credentials or Claude API keys. The simulator is self-contained.

Running Tests in CI/CD

Add testing to your GitHub Actions workflow:

name: Test
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v5
      - uses: pnpm/action-setup@v4
        with:
          version: 10
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'pnpm'

      - run: pnpm install
      - run: pnpm test
      - run: pnpm exec playwright install chromium --with-deps
      - run: pnpm test:e2e

Playwright tests automatically:

  1. Start the sunpeak dev server
  2. Wait for it to be ready
  3. Run tests against both the ChatGPT and Claude hosts in the simulator
  4. Shut down when complete

No API keys, paid subscriptions, or AI credits are needed on your CI runners. The simulator is entirely self-contained. Your team gets automated cross-host regression testing on every push without any external dependencies.

Debugging Failing Tests

When tests fail, use these debugging techniques:

Playwright Debug Mode

pnpm test:e2e --ui

Opens a visual debugger where you can:

  • Step through tests
  • Inspect the DOM at each step
  • See screenshots and traces

Vitest Verbose Output

pnpm test --reporter=verbose

Shows detailed output including:

  • Individual assertion results
  • Component render output
  • Error stack traces

Screenshot on Failure

Playwright automatically captures screenshots on failure. Find them in test-results/.

Testing Best Practices

One assertion per test. Keep tests focused and easy to debug:

// Good: focused test
test('increment button is visible', async ({ page }) => {
  await page.goto(createSimulatorUrl({ simulation: 'show-counter' }));
  await expect(page.getByRole('button', { name: /increment/i })).toBeVisible();
});

// Avoid: multiple unrelated assertions
test('counter works', async ({ page }) => {
  // Too many things being tested at once
});

Test behavior, not implementation. Focus on what users see:

// Good: tests user-visible behavior
await expect(page.getByText('5')).toBeVisible();

// Avoid: tests implementation details
await expect(component.state.count).toBe(5);

Use descriptive test names. Make failures self-explanatory:

// Good: clear failure message
test('displays error message when API call fails', ...)

// Avoid: vague description
test('handles error', ...)

Clean up between tests. Reset state to avoid test pollution:

afterEach(async () => {
  // Reset any global state
});

Get Started

Documentation →
pnpm add -g sunpeak && sunpeak new

Further Reading

Frequently Asked Questions

How do I test a ChatGPT App locally without a paid ChatGPT account?

Use sunpeak, the ChatGPT App framework. Run "sunpeak dev" to start a local simulator at localhost:3000 that ships both a ChatGPT host and a Claude host. You can test all display modes, themes, tool invocations, and host-specific rendering without any paid subscription or burning AI credits.

What testing frameworks work with ChatGPT Apps?

sunpeak includes pre-configured support for Vitest (unit testing) and Playwright (end-to-end testing). Run "pnpm test" for unit tests and "pnpm test:e2e" for end-to-end tests. Both frameworks integrate with the sunpeak ChatGPT App simulator for deterministic UI testing.

How do I run ChatGPT App tests in CI/CD pipelines?

sunpeak projects include testing infrastructure ready for CI/CD. Add "pnpm test" and "pnpm test:e2e" to your pipeline. Playwright tests automatically start the dev server, run against both the ChatGPT and Claude hosts in the simulator, and shut down when complete. No paid accounts, API keys, or AI credits needed on your CI runners.

What are simulation files in ChatGPT App testing?

Simulation files are JSON files in tests/simulations/ that define deterministic UI states for testing. They specify a tool name (referencing a tool file), toolInput (mock input), toolResult (mock output), and a userMessage. The sunpeak framework auto-discovers any *.json file in the simulations directory.

Can I test different ChatGPT App display modes with sunpeak?

Yes. Use the createSimulatorUrl utility to test inline, picture-in-picture, and fullscreen display modes. Pass displayMode as a parameter along with theme (light/dark) and device type to validate your ChatGPT App (built as an MCP App) across all configurations.

How do I test my MCP App on both ChatGPT and Claude without paid accounts?

sunpeak's simulator ships both a ChatGPT host and a Claude host built-in. Switch hosts with the sidebar dropdown or pass ?host=claude in the URL. In Playwright tests, pass the host parameter to createSimulatorUrl and loop over both hosts. All tests run locally and in CI with zero external dependencies, no paid subscriptions, and no AI credits burned.

How much does it cost to test a ChatGPT App against the real ChatGPT?

Manual testing against real ChatGPT requires a ChatGPT Plus subscription ($20/month per team member) and burns AI credits on every test. During active development, you might test dozens of times a day. sunpeak eliminates this cost entirely. The local simulator replicates the ChatGPT and Claude runtimes, so you test for free, locally and in CI/CD.

What is the difference between unit tests and e2e tests for ChatGPT Apps?

Unit tests (Vitest) test individual React components in isolation using jsdom. E2E tests (Playwright) test the full ChatGPT App running in the sunpeak simulator, including user interactions, tool calls, and display mode transitions.

How do I debug failing ChatGPT App tests?

Run "pnpm test:e2e --ui" to open Playwright in debug mode with a visual interface. You can step through tests, inspect the DOM, and see screenshots at each step. For unit tests, use "pnpm test --reporter=verbose" for detailed output. sunpeak's testing infrastructure makes debugging straightforward.