MCP App Testing Framework

sunpeak is an open-source testing framework for MCP Apps. Unit tests, E2E tests, visual regression, and live tests against real ChatGPT and Claude. Add it to any MCP server in any language. No paid accounts, no AI credits.

pnpm add -g sunpeak && sunpeak test init

Definition

The sunpeak testing framework provides Playwright E2E tests, Vitest unit tests, visual regression testing, and live tests against real hosts for MCP Apps. Tests run against replicated ChatGPT and Claude runtimes locally and in CI/CD.

Why MCP Apps Need Their Own Testing Framework

MCP Apps run inside AI hosts like ChatGPT and Claude, not in a browser you control. You can't open DevTools. You can't write a Cypress test against chatgpt.com. Every code change means deploying, opening the host, starting a conversation, triggering the tool, and checking the result manually. Across two hosts, two themes, and three display modes, that's 24 combinations per change.

sunpeak replicates those host runtimes locally. Your tests call tools, render resources in simulated ChatGPT and Claude, and assert against the result with Playwright. Same display modes, themes, safe areas, and conversation chrome as the real hosts.

The testing framework works with any MCP server. Run sunpeak test init to scaffold tests into an existing project, or sunpeak test in a sunpeak project where everything is preconfigured.

Three Levels of MCP App Testing

Unit Tests

Vitest with happy-dom for testing component logic, data transformations, and utilities without a browser.

sunpeak test --unit
  • Fast, no browser required
  • Vitest-compatible API
  • Test logic independent of host rendering

E2E Tests

Playwright tests against the sunpeak inspector. Call tools, render resources in simulated hosts, and assert against the rendered output.

sunpeak test --e2e
  • Tests against ChatGPT and Claude runtimes
  • Simulation fixtures for deterministic states
  • Visual regression with --visual flag

Live Tests

Playwright tests against real ChatGPT. sunpeak handles auth, message sending, and iframe access. You write assertions.

sunpeak test --live
  • Real host validation
  • Catches host-specific iframe behavior
  • Host DOM managed by sunpeak

Test CLI

Command What it runs Runtime
sunpeak test Unit + E2E tests happy-dom / Playwright + inspector
sunpeak test --unit Unit tests only Vitest + happy-dom
sunpeak test --e2e E2E tests only Playwright + inspector
sunpeak test --visual E2E + visual regression Playwright + inspector + screenshots
sunpeak test --live Live tests against real ChatGPT Playwright + real host
sunpeak test init Scaffold test infrastructure Adds Playwright config, tests, simulations

How It Works

1

Scaffold Tests

Run sunpeak test init in your project. It detects your project type (JS/TS, Python, Go) and creates Playwright config, test files, and simulation fixtures. For non-JS projects, it creates a self-contained tests/sunpeak/ directory.

2

Define Simulations

Create JSON fixtures in tests/simulations/ that define tool input, tool result, and server tool mocks. Each simulation is a reproducible state your resource can render. The inspector loads them automatically.

3

Write Tests

Import { test, expect } from sunpeak/test. Use the mcp fixture to call tools, set themes and display modes, and assert against the rendered resource with Playwright locators and MCP-specific matchers.

4

Run in CI/CD

Add sunpeak test to your pipeline. It starts the dev server, runs unit and E2E tests against both ChatGPT and Claude runtimes, and shuts down when complete. No accounts, keys, or credits on your CI runners.

import { test, expect } from 'sunpeak/test';

test('albums render in light mode', async ({ mcp }) => {
  const result = await mcp.callTool('show-albums', {}, { theme: 'light' });
  const app = result.app();
  await expect(app.locator('button:has-text("Summer Slice")')).toBeVisible();
});

test('albums render in fullscreen', async ({ mcp }) => {
  const result = await mcp.callTool('show-albums', {}, { displayMode: 'fullscreen' });
  const app = result.app();
  await expect(app.locator('button:has-text("Summer Slice")')).toBeVisible();
  // Compare against saved baseline (only runs with --visual flag)
  await mcp.screenshot('albums-fullscreen');
});

What You Can Test

  • Multi-Host Rendering

    Tests run against both ChatGPT and Claude runtimes automatically via Playwright projects. One test file covers both hosts.

  • Themes & Display Modes

    Test light/dark themes and inline/fullscreen/pip display modes. Use setTheme() and setDisplayMode() or pass options to callTool().

  • Visual Regression

    Capture screenshots with mcp.screenshot() and compare against baselines. Configure thresholds and max diff pixel ratios in defineConfig().

  • Backend Tool Mocking

    Simulation files can mock callServerTool responses with simple or conditional matching. Test interactive flows without a real backend.

  • MCP-Specific Assertions

    Custom matchers: toHaveTextContent(), toHaveStructuredContent(), toBeError() alongside standard Playwright locators.

  • Any MCP Server, Any Language

    Use sunpeak test init with any MCP server. Configure the server via HTTP URL or startup command. Python, Go, TypeScript, anything.

Who It's For

MCP App Developers

Stop manually refreshing ChatGPT and Claude after every code change. Write tests once, run them against both hosts automatically. Catch regressions before they ship.

MCP Server Authors

Test MCP servers written in any language. Run sunpeak test init --server URL to add test infrastructure to Python, Go, or TypeScript servers.

Coding Agents

Agents like Claude Code, Codex, and Cursor can run sunpeak test to validate MCP Apps without manual testing in a real host. Automated testing in the agent loop.

Getting Started

Add sunpeak testing to any MCP project:

pnpm add -g sunpeak && sunpeak test init

Or for an external MCP server:

sunpeak test init --server URL

Then run sunpeak test to execute unit and E2E tests. See the testing documentation for the full guide.

Testing Docs → Inspector →

Frequently Asked Questions

Do I need a sunpeak project to use the testing framework?

No. Run "sunpeak test init" in any JavaScript, TypeScript, Python, or Go project. It scaffolds Playwright config and a starter test file. For non-JS projects, it creates a self-contained tests/sunpeak/ directory with everything included.

What test runners does sunpeak use?

Unit tests use Vitest with happy-dom. E2E tests use Playwright against the sunpeak inspector (replicated ChatGPT and Claude runtimes). Live tests use Playwright against real ChatGPT. You write standard Playwright assertions plus MCP-specific matchers like toHaveTextContent and toHaveStructuredContent.

How do simulation files work?

Simulation files are JSON fixtures in tests/simulations/ that define a tool call scenario: tool input, tool result, and optional server tool mocks. The inspector loads them to render your MCP App in a specific state. Each simulation is a reproducible test scenario you can assert against.

Can I test across ChatGPT and Claude automatically?

Yes. The sunpeak test runner uses Playwright projects to run each test against both ChatGPT and Claude host runtimes automatically. One test file, both hosts. Configure which hosts to test in defineConfig().

What is visual regression testing?

Run "sunpeak test --visual" to capture screenshots of your MCP App and compare them against saved baselines. If the UI changes unexpectedly, the test fails with a diff image. Run "sunpeak test --visual --update" to update baselines after intentional changes.

How do live tests differ from E2E tests?

E2E tests run against the local inspector with simulation fixtures. They are fast, deterministic, and free. Live tests run against real ChatGPT using Playwright. sunpeak handles auth, message sending, and iframe access. You only write assertions against the rendered app.

Does sunpeak testing work in CI/CD?

Yes. Add "sunpeak test" to your CI pipeline. It starts the dev server automatically, runs unit and E2E tests, and shuts down when complete. No paid host accounts, API keys, or AI credits needed on CI runners.

Is sunpeak testing free?

Yes. sunpeak is MIT licensed and open source. The testing framework, inspector, CLI, and all tooling are free to use.

Open Source & MIT Licensed

sunpeak is free to use, modify, and distribute.

Want to inspect MCP Apps interactively? See the Inspector page. Building MCP Apps? See the MCP App Framework page.