Quickstart
No sunpeak project required. Scaffold tests for any running MCP server:Testing Levels
1. E2E Tests
Playwright specs that call your MCP tools and render them in simulated ChatGPT and Claude runtimes. Themcp fixture from sunpeak/test handles inspector navigation, iframe traversal, and host switching. Simulations (JSON fixtures) define reproducible tool states so you can test every combination of host, theme, display mode, and device without deploying or burning API credits.
Visual regression is built in. Pass --visual to compare screenshots against baselines, or --visual --update to regenerate them.
E2E Testing
Write Playwright tests against simulated ChatGPT and Claude runtimes.
Visual Regression
Screenshot comparison across themes, display modes, and hosts.
2. Live Tests
Playwright specs that run against real ChatGPT (and future hosts). They open a browser, send messages that trigger tool calls against your MCP server, and verify the rendered app. This catches issues that inspector tests cannot: real MCP connection behavior, actual LLM tool invocation, host-specific iframe rendering, and production resource loading.Live Testing
Validate your MCP Apps inside real AI chat hosts.
3. Evals
Multi-model tool calling tests. Evals connect to your MCP server via the MCP protocol, discover its tools, and send prompts to multiple LLM models (GPT-4o, Claude, Gemini, etc.). Each eval case runs N times per model and reports statistical pass/fail counts, so you can measure whether your tool descriptions work reliably across models.Evals
Test tool calling reliability across GPT-4o, Claude, Gemini, and more.
CLI Commands
| Command | What it runs | Runtime |
|---|---|---|
sunpeak test | Unit (if configured) + E2E tests | Vitest + Playwright |
sunpeak test --e2e | E2E tests only | Playwright + inspector |
sunpeak test --visual | E2E with visual regression | Playwright + inspector |
sunpeak test --visual --update | Update visual baselines | Playwright + inspector |
sunpeak test --live | Live tests against real hosts | Playwright + real host |
sunpeak test --eval | Evals against multiple models | Vitest + Vercel AI SDK |
sunpeak test --unit | Unit tests (app framework only) | Vitest + happy-dom |
--e2e --live --eval runs all three.
--eval and --live are not included in the default sunpeak test run because they require API keys and cost money. You must opt in explicitly.Scaffolding
For existing MCP servers (not built with sunpeak), runnpx sunpeak test init to generate all the test infrastructure:
tests/e2e/with smoke and visual regression test specstests/evals/with eval config,.env.example, and example eval specstests/live/with live test config and example specs
tests/sunpeak/ directory with its own package.json.
For sunpeak framework projects, sunpeak new scaffolds all of this automatically.
Learn More
Inspector
The multi-host inspector that powers E2E tests.
Simulations
JSON fixtures for reproducible tool states.
E2E Testing
Playwright specs against simulated hosts.
Visual Regression
Screenshot baselines and comparison.
Live Testing
Tests against real ChatGPT and Claude.
Evals
Multi-model tool calling reliability.