MCP App Testing Framework
sunpeak is an open-source testing framework for MCP Apps. Unit tests, E2E tests, visual regression, and live tests against real ChatGPT and Claude. Add it to any MCP server in any language. No paid accounts, no AI credits.
pnpm add -g sunpeak && sunpeak test init
Definition
The sunpeak testing framework provides Playwright E2E tests, Vitest unit tests, visual regression testing, and live tests against real hosts for MCP Apps. Tests run against replicated ChatGPT and Claude runtimes locally and in CI/CD.
Why MCP Apps Need Their Own Testing Framework
MCP Apps run inside AI hosts like ChatGPT and Claude, not in a browser you control. You can't open DevTools. You can't write a Cypress test against chatgpt.com. Every code change means deploying, opening the host, starting a conversation, triggering the tool, and checking the result manually. Across two hosts, two themes, and three display modes, that's 24 combinations per change.
sunpeak replicates those host runtimes locally. Your tests call tools, render resources in simulated ChatGPT and Claude, and assert against the result with Playwright. Same display modes, themes, safe areas, and conversation chrome as the real hosts.
The testing framework works with any MCP server. Run sunpeak test init to scaffold tests into an existing project, or sunpeak test in a sunpeak project where everything is preconfigured.
Three Levels of MCP App Testing
Unit Tests
Vitest with happy-dom for testing component logic, data transformations, and utilities without a browser.
sunpeak test --unit
- Fast, no browser required
- Vitest-compatible API
- Test logic independent of host rendering
E2E Tests
Playwright tests against the sunpeak inspector. Call tools, render resources in simulated hosts, and assert against the rendered output.
sunpeak test --e2e
- Tests against ChatGPT and Claude runtimes
- Simulation fixtures for deterministic states
- Visual regression with --visual flag
Live Tests
Playwright tests against real ChatGPT. sunpeak handles auth, message sending, and iframe access. You write assertions.
sunpeak test --live
- Real host validation
- Catches host-specific iframe behavior
- Host DOM managed by sunpeak
Test CLI
| Command | What it runs | Runtime |
|---|---|---|
sunpeak test | Unit + E2E tests | happy-dom / Playwright + inspector |
sunpeak test --unit | Unit tests only | Vitest + happy-dom |
sunpeak test --e2e | E2E tests only | Playwright + inspector |
sunpeak test --visual | E2E + visual regression | Playwright + inspector + screenshots |
sunpeak test --live | Live tests against real ChatGPT | Playwright + real host |
sunpeak test init | Scaffold test infrastructure | Adds Playwright config, tests, simulations |
How It Works
Scaffold Tests
Run sunpeak test init in your project. It detects your project type (JS/TS, Python, Go) and creates Playwright
config, test files, and simulation fixtures. For non-JS projects, it creates a self-contained
tests/sunpeak/ directory.
Define Simulations
Create JSON fixtures in tests/simulations/ that define tool input, tool result, and server tool mocks. Each simulation is a reproducible
state your resource can render. The inspector loads them automatically.
Write Tests
Import { test, expect } from sunpeak/test. Use the mcp fixture to call tools, set themes and display modes, and assert against the rendered
resource with Playwright locators and MCP-specific matchers.
Run in CI/CD
Add sunpeak test to your pipeline. It starts the dev server, runs unit and E2E tests against both
ChatGPT and Claude runtimes, and shuts down when complete. No accounts, keys, or credits
on your CI runners.
import { test, expect } from 'sunpeak/test';
test('albums render in light mode', async ({ mcp }) => {
const result = await mcp.callTool('show-albums', {}, { theme: 'light' });
const app = result.app();
await expect(app.locator('button:has-text("Summer Slice")')).toBeVisible();
});
test('albums render in fullscreen', async ({ mcp }) => {
const result = await mcp.callTool('show-albums', {}, { displayMode: 'fullscreen' });
const app = result.app();
await expect(app.locator('button:has-text("Summer Slice")')).toBeVisible();
// Compare against saved baseline (only runs with --visual flag)
await mcp.screenshot('albums-fullscreen');
}); What You Can Test
- Multi-Host Rendering
Tests run against both ChatGPT and Claude runtimes automatically via Playwright projects. One test file covers both hosts.
- Themes & Display Modes
Test light/dark themes and inline/fullscreen/pip display modes. Use
setTheme()andsetDisplayMode()or pass options tocallTool(). - Visual Regression
Capture screenshots with
mcp.screenshot()and compare against baselines. Configure thresholds and max diff pixel ratios indefineConfig(). - Backend Tool Mocking
Simulation files can mock
callServerToolresponses with simple or conditional matching. Test interactive flows without a real backend. - MCP-Specific Assertions
Custom matchers:
toHaveTextContent(),toHaveStructuredContent(),toBeError()alongside standard Playwright locators. - Any MCP Server, Any Language
Use
sunpeak test initwith any MCP server. Configure the server via HTTP URL or startup command. Python, Go, TypeScript, anything.
Who It's For
MCP App Developers
Stop manually refreshing ChatGPT and Claude after every code change. Write tests once, run them against both hosts automatically. Catch regressions before they ship.
MCP Server Authors
Test MCP servers written in any language. Run sunpeak test init --server URL to add test infrastructure to Python, Go, or TypeScript servers.
Coding Agents
Agents like Claude Code, Codex, and Cursor can run sunpeak test to validate MCP Apps without manual testing in a real host. Automated testing in the agent
loop.
Getting Started
Add sunpeak testing to any MCP project:
pnpm add -g sunpeak && sunpeak test init
Or for an external MCP server:
sunpeak test init --server URL
Then run sunpeak test to execute unit and E2E tests. See the testing documentation for the full guide.
Frequently Asked Questions
Do I need a sunpeak project to use the testing framework?
No. Run "sunpeak test init" in any JavaScript, TypeScript, Python, or Go project. It scaffolds Playwright config and a starter test file. For non-JS projects, it creates a self-contained tests/sunpeak/ directory with everything included.
What test runners does sunpeak use?
Unit tests use Vitest with happy-dom. E2E tests use Playwright against the sunpeak inspector (replicated ChatGPT and Claude runtimes). Live tests use Playwright against real ChatGPT. You write standard Playwright assertions plus MCP-specific matchers like toHaveTextContent and toHaveStructuredContent.
How do simulation files work?
Simulation files are JSON fixtures in tests/simulations/ that define a tool call scenario: tool input, tool result, and optional server tool mocks. The inspector loads them to render your MCP App in a specific state. Each simulation is a reproducible test scenario you can assert against.
Can I test across ChatGPT and Claude automatically?
Yes. The sunpeak test runner uses Playwright projects to run each test against both ChatGPT and Claude host runtimes automatically. One test file, both hosts. Configure which hosts to test in defineConfig().
What is visual regression testing?
Run "sunpeak test --visual" to capture screenshots of your MCP App and compare them against saved baselines. If the UI changes unexpectedly, the test fails with a diff image. Run "sunpeak test --visual --update" to update baselines after intentional changes.
How do live tests differ from E2E tests?
E2E tests run against the local inspector with simulation fixtures. They are fast, deterministic, and free. Live tests run against real ChatGPT using Playwright. sunpeak handles auth, message sending, and iframe access. You only write assertions against the rendered app.
Does sunpeak testing work in CI/CD?
Yes. Add "sunpeak test" to your CI pipeline. It starts the dev server automatically, runs unit and E2E tests, and shuts down when complete. No paid host accounts, API keys, or AI credits needed on CI runners.
Is sunpeak testing free?
Yes. sunpeak is MIT licensed and open source. The testing framework, inspector, CLI, and all tooling are free to use.
Open Source & MIT Licensed
sunpeak is free to use, modify, and distribute.
Want to inspect MCP Apps interactively? See the Inspector page. Building MCP Apps? See the MCP App Framework page.