Skip to main content

Prerequisites

  • Node.js 20+ is required, even if your MCP server is written in Python, Go, or another language. The testing framework runs on Node.js and Playwright.
  • Your MCP server running locally (HTTP or stdio)

1. Install sunpeak

pnpm add -g sunpeak

2. Try the inspector

Before writing tests, try the inspector to verify sunpeak can connect to your server:
sunpeak inspect --server http://localhost:8000/mcp
This opens the inspector at http://localhost:3000, where you can call your tools and see them rendered in simulated ChatGPT and Claude runtimes. Browse your tools, switch hosts and themes, and verify everything connects.

3. Scaffold test infrastructure

Once the inspector works, scaffold automated tests:
sunpeak test init --server http://localhost:8000/mcp
Or with a stdio command:
sunpeak test init --server "python server.py"
This creates test files for all four testing levels. For non-JS projects, everything goes into a self-contained tests/sunpeak/ directory with its own package.json. Install dependencies:
cd tests/sunpeak
npm install
npx playwright install chromium

4. Run the smoke test

sunpeak test
The scaffolded smoke test verifies that the inspector can connect to your server and load. You should see one passing test.

5. Write your first real test

Open the scaffolded smoke test (smoke.test.ts) and add a test for one of your tools. Replace your-tool with an actual tool name from your server:
import { test, expect } from 'sunpeak/test';

test('server is reachable and inspector loads', async ({ mcp }) => {
  await expect(mcp.page.locator('#root')).not.toBeEmpty();
});

test('my tool returns a result', async ({ mcp }) => {
  const result = await mcp.callTool('your-tool', { key: 'value' });
  expect(result).not.toBeError();
});

// If your tool renders a UI, you can interact with it:
test('my tool renders a UI', async ({ mcp }) => {
  const result = await mcp.callTool('your-tool', { key: 'value' });
  const app = result.app();
  await expect(app.getByText('Expected text')).toBeVisible();
});
The mcp fixture handles all the plumbing: starting the inspector, connecting to your server, navigating to the tool, and traversing the double-iframe sandbox. Each test runs automatically against both ChatGPT and Claude hosts.
Run sunpeak inspect --server <url> to browse your tools interactively and find the right tool names and arguments to use in tests.

6. Add more test levels

The scaffolded files include templates for all four testing levels:
LevelFileCommandCost
E2Esmoke.test.tssunpeak testFree
Visualvisual.test.tssunpeak test --visualFree
Livelive/example.test.tssunpeak test --liveHost credits
Evalsevals/example.eval.tssunpeak test --evalAPI keys
Start with E2E tests (free, fast, local). Add visual regression when you want to catch CSS regressions. Add live tests and evals when you need production host validation and multi-model reliability testing.

Language-specific tips

For stdio servers, pass the full command including any virtual environment activation:
// playwright.config.ts
import { defineConfig } from 'sunpeak/test/config';
export default defineConfig({
  server: {
    // Option 1: uv (recommended)
    command: 'uv', args: ['run', 'python', 'server.py'],

    // Option 2: venv absolute path
    // command: '.venv/bin/python', args: ['server.py'],

    // Option 3: HTTP server (no shell needed)
    // url: 'http://localhost:8000/mcp',
  },
});
HTTP servers (FastAPI, Flask) are the simplest option because you start them separately and sunpeak just connects to the URL.
import { defineConfig } from 'sunpeak/test/config';
export default defineConfig({
  server: {
    command: 'go', args: ['run', './cmd/server'],

    // Or connect to a running HTTP server:
    // url: 'http://localhost:8000/mcp',
  },
});
import { defineConfig } from 'sunpeak/test/config';
export default defineConfig({
  server: {
    command: 'cargo', args: ['run', '--release'],
    // url: 'http://localhost:8000/mcp',
  },
});

Next steps

E2E Testing

Write Playwright tests against simulated hosts.

Visual Regression

Screenshot comparison across themes and hosts.

Live Testing

Test against real ChatGPT and Claude.

Evals

Multi-model tool calling reliability.