Skip to main content

Overview

sunpeak provides two levels of automated Playwright testing for MCP Apps:
  1. E2E tests against the inspector — the inspector replicates ChatGPT and Claude runtimes locally, and simulations (JSON fixtures) define reproducible tool states. Playwright loads a simulation in the inspector via URL and asserts against the rendered resource. Test every combination of host, theme, display mode, and device type without deploying or burning API credits.
  2. Live tests against real hosts — sunpeak/test provides Playwright fixtures that open real ChatGPT (and future hosts), send messages, wait for app iframes, and let you assert against the rendered result. All host DOM interaction (auth, selectors, iframe access) is maintained by sunpeak — you only write resource assertions.
CommandWhat it testsRuntime
pnpm testUnit tests (Vitest)jsdom
pnpm test:e2eE2E tests against the inspectorPlaywright + inspector
pnpm test:liveLive tests against real ChatGPTPlaywright + real host

E2E Testing

E2E tests are Playwright specs in tests/e2e/*.spec.ts. The dev server starts automatically — Playwright launches it before running tests.
pnpm test:e2e                              # Run all
pnpm test:e2e --ui                         # Playwright UI mode
pnpm test:e2e tests/e2e/albums.spec.ts     # Single file

Writing E2E Tests

Use createInspectorUrl to load a simulation in the inspector with specific host/theme/display mode settings:
import { test, expect } from '@playwright/test';
import { createInspectorUrl } from 'sunpeak/inspector';

test('should render in light mode', async ({ page }) => {
  await page.goto(createInspectorUrl({
    simulation: 'show-albums',
    theme: 'light',
  }));

  const albumCard = page.locator('button:has-text("Summer Slice")');
  await expect(albumCard).toBeVisible();
});

test('should render in fullscreen mode', async ({ page }) => {
  await page.goto(createInspectorUrl({
    simulation: 'show-albums',
    theme: 'dark',
    displayMode: 'fullscreen',
  }));

  // Test fullscreen-specific behavior
});

URL Parameters

Inspector set to fullscreen dark mode via URL params Inspector set to inline light mode via URL params The createInspectorUrl function accepts parameters for configuring host, theme, display mode, device type, safe area insets, and more. See the Inspector API Reference for the complete list.

Testing Backend-Only Tools

If your resource calls backend tools via useCallServerTool, define mock responses using the serverTools field in the simulation JSON. The inspector resolves these mocks based on the tool call arguments:
// tests/simulations/review-purchase.json
{
  "tool": "review-purchase",
  "toolResult": { "structuredContent": { "..." } },
  "serverTools": {
    "review": [
      {
        "when": { "confirmed": true },
        "result": {
          "content": [{ "type": "text", "text": "Completed." }],
          "structuredContent": { "status": "success", "message": "Completed." }
        }
      },
      {
        "when": { "confirmed": false },
        "result": {
          "content": [{ "type": "text", "text": "Cancelled." }],
          "structuredContent": { "status": "cancelled", "message": "Cancelled." }
        }
      }
    ]
  }
}
import { test, expect } from '@playwright/test';
import { createInspectorUrl } from 'sunpeak/inspector';

test('should show success when server confirms', async ({ page }) => {
  await page.goto(createInspectorUrl({
    simulation: 'review-purchase',
  }));

  const iframe = page.frameLocator('iframe').frameLocator('iframe');
  await iframe.locator('button:has-text("Place Order")').click();

  // The serverTools mock matches { confirmed: true } and returns success
  await expect(iframe.locator('text=Completed.')).toBeVisible();
});
The serverTools field supports both simple (single result) and conditional (when/result array) forms. See Simulation API Reference for details.

Example E2E Test Structure

A typical e2e test file tests a resource across different modes:
import { test, expect } from '@playwright/test';
import { createInspectorUrl } from 'sunpeak/inspector';

test.describe('Albums Resource', () => {
  test.describe('Light Mode', () => {
    test('should render album cards', async ({ page }) => {
      await page.goto(createInspectorUrl({
        simulation: 'show-albums',
        theme: 'light',
      }));

      await expect(page.locator('button:has-text("Summer Slice")')).toBeVisible();
    });
  });

  test.describe('Dark Mode', () => {
    test('should render with dark theme', async ({ page }) => {
      await page.goto(createInspectorUrl({
        simulation: 'show-albums',
        theme: 'dark',
      }));

      // Test dark mode specific behavior
    });
  });

  test.describe('Fullscreen Mode', () => {
    test('should render in fullscreen', async ({ page }) => {
      await page.goto(createInspectorUrl({
        simulation: 'show-albums',
        theme: 'light',
        displayMode: 'fullscreen',
      }));

      // Test fullscreen specific behavior
    });
  });
});

Live Testing

Live tests validate your MCP Apps inside real ChatGPT — not the inspector. They open a browser, navigate to ChatGPT, send messages that trigger tool calls against your MCP server, and verify the rendered app using Playwright assertions. This catches issues that inspector tests can’t: real MCP connection behavior, actual LLM tool invocation, host-specific iframe rendering, and production resource loading.

Prerequisites

  • ChatGPT account with MCP/Apps support
  • Tunnel toolngrok, Cloudflare Tunnel, or similar
  • Browser session — Logged into chatgpt.com in Chrome, Arc, Brave, or Edge

One-Time Setup

  1. Go to Settings > Apps > Create in ChatGPT
  2. Set the app name to match your package.json name exactly. Live tests type /{appName} ... to invoke your app, and ChatGPT matches on this name.
  3. Enter your tunnel URL with the /mcp path (e.g., https://abc123.ngrok.io/mcp)
  4. Save the connection
This only needs to be done once per tunnel URL pattern.

Running Live Tests

# Terminal 1: Start a tunnel to your MCP server
ngrok http 8000

# Terminal 2: Run live tests
pnpm test:live
The test runner:
  1. Imports your ChatGPT session from your browser (Chrome, Arc, Brave, or Edge). Falls back to a manual login window if no session is found.
  2. Starts sunpeak dev --prod-resources automatically
  3. Refreshes the MCP server connection in ChatGPT settings (once in globalSetup, before all workers)
  4. Runs tests/live/*.spec.ts files fully in parallel — each test gets its own chat window
Live tests always run with a visible browser window. chatgpt.com uses bot detection that blocks headless browsers.

Writing Live Tests

Import test and expect from sunpeak/test to get a live fixture that handles auth, message sending, and iframe access automatically:
// tests/live/weather.spec.ts
import { test, expect } from 'sunpeak/test';

test('weather tool renders forecast', async ({ live }) => {
  // invoke() starts a new chat, sends the prompt, and returns the app iframe
  const app = await live.invoke('show me the weather in Austin');
  await expect(app.locator('h1')).toBeVisible();
});
The live fixture provides:
  • invoke(prompt) — starts a new chat, sends the prompt (with host-specific formatting like /{appName} for ChatGPT), waits for the app iframe, and returns a FrameLocator
  • startNewChat() — opens a fresh conversation (for multi-step flows)
  • sendMessage(text) — sends a message with host-appropriate formatting
  • waitForAppIframe() — waits for the MCP app iframe to render and returns a FrameLocator
  • sendRawMessage(text) — sends a message without any prefix
  • setColorScheme(scheme, appFrame?) — switches the host to 'light' or 'dark' theme; optionally pass an app FrameLocator to wait for it to update
  • page — raw Playwright Page object for advanced assertions
The Playwright config is a one-liner:
// tests/live/playwright.config.ts
import { defineLiveConfig } from 'sunpeak/test/config';
export default defineLiveConfig();
The config generates one Playwright project per host (by default, just chatgpt). When new hosts are supported, add them with a one-line change:
export default defineLiveConfig({ hosts: ['chatgpt', 'claude'] });
All host DOM interaction (selectors, login, settings navigation, iframe access) is maintained by sunpeak — you only write resource assertions. The same test code runs across all hosts.

Troubleshooting

On first run, a browser window opens for you to log in to ChatGPT. The session is saved to .auth/chatgpt.json but typically only lasts a few hours because Cloudflare’s cf_clearance cookie is HttpOnly and cannot be persisted across runs. When you see this error, just re-authenticate in the browser window that opens. If it keeps failing, delete the .auth/ directory and run pnpm test:live again.
Verify your tunnel is running and the URL is correct. The test checks the tunnel’s /health endpoint before proceeding.
ChatGPT occasionally updates their UI. sunpeak checks selector health at startup. If selectors are stale, please file an issue.
Live tests use specific prompts like “Use the show-albums tool to…” to reliably trigger tool calls. If a tool isn’t called, the test retries once. Persistent failures may indicate the tool isn’t properly connected — check ChatGPT settings.

Dive Deeper

Inspector

The inspector that powers E2E tests.

Simulations API Reference

JSON schema, conventions, and auto-discovery.

Inspector API Reference

createInspectorUrl parameters and Inspector component props.