Standalone Usage
You do not need to build your MCP server with sunpeak to use the testing framework. Point it at any MCP server URL:sunpeak test init generates the test directory structure, Playwright configs, example specs, and eval boilerplate. From there, configure defineConfig() with your server URL and write tests against your tools and resources.
For sunpeak framework projects, the dev server starts automatically and no server URL is needed.
Testing Levels
1. Unit Tests
Standard Vitest with happy-dom for component and hook logic testing. No special framework integration required.Unit Testing
Fast component and hook tests with Vitest.
2. E2E Tests
Playwright specs that call your MCP tools and render them in simulated ChatGPT and Claude runtimes. Themcp fixture from sunpeak/test handles inspector navigation, iframe traversal, and host switching. Simulations (JSON fixtures) define reproducible tool states so you can test every combination of host, theme, display mode, and device without deploying or burning API credits.
Visual regression is built in. Pass --visual to compare screenshots against baselines, or --visual --update to regenerate them.
E2E Testing
Write Playwright tests against simulated ChatGPT and Claude runtimes.
Visual Regression
Screenshot comparison across themes, display modes, and hosts.
3. Live Tests
Playwright specs that run against real ChatGPT (and future hosts). They open a browser, send messages that trigger tool calls against your MCP server, and verify the rendered app. This catches issues that inspector tests cannot: real MCP connection behavior, actual LLM tool invocation, host-specific iframe rendering, and production resource loading.Live Testing
Validate your MCP Apps inside real AI chat hosts.
4. Evals
Multi-model tool calling tests. Evals connect to your MCP server via the MCP protocol, discover its tools, and send prompts to multiple LLM models (GPT-4o, Claude, Gemini, etc.). Each eval case runs N times per model and reports statistical pass/fail counts, so you can measure whether your tool descriptions work reliably across models.Evals
Test tool calling reliability across GPT-4o, Claude, Gemini, and more.
CLI Commands
| Command | What it runs | Runtime |
|---|---|---|
sunpeak test | Unit + E2E tests | Vitest + Playwright |
sunpeak test --unit | Unit tests only | Vitest + happy-dom |
sunpeak test --e2e | E2E tests only | Playwright + inspector |
sunpeak test --visual | E2E with visual regression | Playwright + inspector |
sunpeak test --visual --update | Update visual baselines | Playwright + inspector |
sunpeak test --live | Live tests against real hosts | Playwright + real host |
sunpeak test --eval | Evals against multiple models | Vitest + Vercel AI SDK |
--unit --e2e --live --eval runs all four.
--eval and --live are not included in the default sunpeak test run because they require API keys and cost money. You must opt in explicitly.Scaffolding
For existing MCP servers (not built with sunpeak), runsunpeak test init to generate all the test infrastructure:
tests/e2e/with example Playwright specs and configtests/simulations/with example simulation JSON fixturestests/evals/with eval config,.env.example, and example eval specstests/live/with live test config and example specs
sunpeak new scaffolds all of this automatically.
Learn More
Inspector
The multi-host inspector that powers E2E tests.
Simulations
JSON fixtures for reproducible tool states.
Unit Testing
Fast component and hook tests with Vitest.
E2E Testing
Playwright specs against simulated hosts.
Visual Regression
Screenshot baselines and comparison.
Live Testing
Tests against real ChatGPT and Claude.
Evals
Multi-model tool calling reliability.