Live Testing - sunpeak

Overview

Live tests validate your MCP Apps inside real ChatGPT — not the inspector. They open your browser, navigate to ChatGPT, send messages that trigger tool calls against your MCP server, and verify the rendered app using Playwright assertions. This catches issues that inspector tests can’t: real MCP connection behavior, actual LLM tool invocation, host-specific iframe rendering, and production resource loading.

Prerequisites

ChatGPT account — You need a ChatGPT account with MCP/Apps support
Tunnel tool — ngrok, Cloudflare Tunnel, or similar
Browser session — Logged into chatgpt.com in Chrome, Arc, Brave, or Edge

One-Time Setup

Add your MCP server in ChatGPT settings:

Go to Settings > Apps > Create in ChatGPT
Enter your tunnel URL with the /mcp path (e.g., https://abc123.ngrok.io/mcp)
Save the connection

This only needs to be done once per tunnel URL pattern.

Running Live Tests

# Terminal 1: Start a tunnel to your MCP server
ngrok http 8000

pnpm
npm
yarn

# Terminal 2: Run live tests
pnpm test:live

# Terminal 2: Run live tests
npm run test:live

# Terminal 2: Run live tests
yarn test:live

The test runner:

Imports your ChatGPT session from your browser (Chrome, Arc, Brave, or Edge). Falls back to a manual login window if no session is found. Sessions typically last a few hours — Cloudflare’s HttpOnly cf_clearance cookie cannot be persisted, so re-authentication is needed when it expires.
Starts sunpeak dev --prod-resources automatically
Refreshes the MCP server connection in ChatGPT settings (once in globalSetup, before all workers)
Runs tests/live/*.spec.ts files fully in parallel — each test gets its own chat window

Live tests always run with a visible browser window. chatgpt.com uses bot detection that blocks headless browsers, so a visible browser is required for reliable results.

Running via Validate

You can also run live tests as part of the full validation pipeline:

sunpeak validate --live

Writing Live Tests

Live test specs live in tests/live/ — one file per resource, just like e2e tests. Import test and expect from sunpeak/test/live to get a live fixture that handles login, MCP server refresh, and host-specific message formatting automatically.

// tests/live/weather.spec.ts
import { test, expect } from 'sunpeak/test/live';

test('weather tool renders forecast', async ({ live }) => {
  // invoke() starts a new chat, sends the prompt, and returns the app iframe
  const app = await live.invoke('show me the weather in Austin');
  await expect(app.locator('h1')).toBeVisible();
});

The `live` Fixture

The live fixture provides:

invoke(prompt) — one-liner: starts a new chat, sends the prompt (with host-specific formatting like /{appName} for ChatGPT), waits for the app iframe, and returns a FrameLocator
startNewChat() — opens a fresh conversation (for multi-step flows)
sendMessage(text) — sends a message with host-appropriate formatting (read from your package.json)
waitForAppIframe() — waits for the MCP app iframe to render and returns a FrameLocator
sendRawMessage(text) — sends a message without any prefix
setColorScheme(scheme, appFrame?) — switches the host to 'light' or 'dark' theme; optionally pass an app FrameLocator to wait for it to update
page — raw Playwright Page object for advanced assertions

Configuration

The Playwright config is a one-liner:

// tests/live/playwright.config.ts
import { defineLiveConfig } from 'sunpeak/test/live/config';
export default defineLiveConfig();

The config generates one Playwright project per host (by default, just chatgpt). Tests switch themes internally using live.setColorScheme(). When new hosts are supported, add them with a one-line change:

import { defineLiveConfig } from 'sunpeak/test/live/config';
export default defineLiveConfig({ hosts: ['chatgpt', 'claude'] });

All host DOM interaction (selectors, login, settings navigation, iframe access) is maintained by sunpeak — you only write resource assertions. The same test code runs across all hosts.

Troubleshooting

'Not logged into ChatGPT' error

On first run, a browser window opens for you to log in to ChatGPT. The session is saved to .auth/chatgpt.json but typically only lasts a few hours because Cloudflare’s cf_clearance cookie is HttpOnly and cannot be persisted across runs. When you see this error, just re-authenticate in the browser window that opens. If it keeps failing, delete the .auth/ directory and run pnpm test:live again.

Tunnel not reachable

Verify your tunnel is running and the URL is correct. The test checks the tunnel’s /health endpoint before proceeding.

'ChatGPT DOM may have changed' warning

ChatGPT occasionally updates their UI. The ChatGPTPage class checks selector health at startup. If selectors are stale, update the SELECTORS constant in chatgpt-page.mjs.

Tool not called by ChatGPT

Live tests use specific prompts like “Use the show-albums tool to…” to reliably trigger tool calls. If a tool isn’t called, the test retries once. Persistent failures may indicate the tool isn’t properly connected — check ChatGPT settings.

Documentation Index

​Overview

​Prerequisites

​One-Time Setup

​Running Live Tests

​Running via Validate

​Writing Live Tests

​The live Fixture

​Configuration

​Troubleshooting

Overview

Prerequisites

One-Time Setup

Running Live Tests

Running via Validate

Writing Live Tests

The `live` Fixture

Configuration

Troubleshooting