Getting Started

Prerequisites

Node.js 20+ is required, even if your MCP server is written in Python, Go, or another language. The testing framework runs on Node.js and Playwright.
Your MCP server running locally (HTTP or stdio)

1. Try the inspector (optional)

Before writing tests, try the inspector to verify sunpeak can connect to your server:

HTTP server
Python (stdio)
Go (stdio)
Node.js (stdio)

npx sunpeak inspect --server http://localhost:8000/mcp

npx sunpeak inspect --server "python server.py"

# With uv:
npx sunpeak inspect --server "uv run python server.py"

npx sunpeak inspect --server "go run ./cmd/server"

npx sunpeak inspect --server "node server.js"

This opens the inspector at http://localhost:3000, where you can call your tools and see them rendered in simulated ChatGPT and Claude runtimes. Browse your tools, switch hosts and themes, and verify everything connects.

2. Scaffold test infrastructure

Once the inspector works, scaffold automated tests:

npx sunpeak test init --server http://localhost:8000/mcp

Or with a stdio command:

npx sunpeak test init --server "python server.py"

This creates test files for all four testing levels. For non-JS projects, everything goes into a self-contained tests/sunpeak/ directory with its own package.json. Install dependencies:

Non-JS project
JS/TS project

cd tests/sunpeak
npm install
npx playwright install chromium

npm add -D sunpeak @playwright/test
npx playwright install chromium

3. Run the smoke test

npx sunpeak test

For non-JS projects, sunpeak test auto-discovers tests/sunpeak/playwright.config.ts when no root-level config exists. You can run it from your project root without cd-ing into the test directory. The scaffolded smoke test verifies that the inspector can connect to your server and load. You should see one passing test.

4. Write your first real test

Open the scaffolded smoke test (smoke.test.ts) and add a test for one of your tools. Replace your-tool with an actual tool name from your server:

import { test, expect } from 'sunpeak/test';

test('server is reachable and inspector loads', async ({ inspector }) => {
  await expect(inspector.page.locator('#root')).not.toBeEmpty();
});

test('my tool returns a result', async ({ mcp }) => {
  const result = await mcp.callTool('your-tool', { key: 'value' });
  expect(result.isError).toBeFalsy();
});

// If your tool renders a UI, you can interact with it:
test('my tool renders a UI', async ({ inspector }) => {
  const result = await inspector.renderTool('your-tool', { key: 'value' });
  const app = result.app();
  await expect(app.getByText('Expected text')).toBeVisible();
});

The mcp and inspector fixtures handle all the plumbing: starting the inspector, connecting to your server, navigating to the tool, and traversing the double-iframe sandbox. Each test runs automatically against both ChatGPT and Claude hosts. There are two fixtures: mcp for protocol-level testing (callTool, listTools, etc., returning raw MCP data) and inspector for UI testing (renderTool, which renders the result in the inspector). When you pass input to renderTool, the tool is called on your real server and the result is rendered. Without input, the tool uses pre-baked simulation fixture data (if available) for fast, deterministic tests. See Simulations for more on when to use each approach.

Run npx sunpeak inspect --server <url> to browse your tools interactively and find the right tool names and arguments to use in tests.

5. Add more test levels

The scaffolded files include templates for all four testing levels:

Level	File	Command	Cost
E2E	`smoke.test.ts`	`sunpeak test`	Free
Visual	`visual.test.ts`	`sunpeak test --visual`	Free
Live	`live/example.test.ts`	`sunpeak test --live`	Host credits
Evals	`evals/example.eval.ts`	`sunpeak test --eval`	API keys

Start with E2E tests (free, fast, local). Add visual regression when you want to catch CSS regressions. Add live tests and evals when you need production host validation and multi-model reliability testing.

Language-specific tips

Python

For stdio servers, pass the full command including any virtual environment activation:

// playwright.config.ts
import { defineConfig } from 'sunpeak/test/config';
export default defineConfig({
  server: {
    // Option 1: uv (recommended)
    command: 'uv', args: ['run', 'python', 'server.py'],

    // Option 2: venv absolute path
    // command: '.venv/bin/python', args: ['server.py'],

    // Option 3: HTTP server (no shell needed)
    // url: 'http://localhost:8000/mcp',

    // Pass environment variables to the server process
    env: { PYTHONPATH: './src', DATABASE_URL: 'sqlite:///test.db' },

    // Set the working directory
    cwd: './my-python-server',
  },
});

HTTP servers (FastAPI, Flask) are the simplest option because you start them separately and sunpeak just connects to the URL.

import { defineConfig } from 'sunpeak/test/config';
export default defineConfig({
  server: {
    command: 'go', args: ['run', './cmd/server'],
    env: { GO_ENV: 'test' },

    // Or connect to a running HTTP server:
    // url: 'http://localhost:8000/mcp',
  },
});

Rust

import { defineConfig } from 'sunpeak/test/config';
export default defineConfig({
  server: {
    command: 'cargo', args: ['run', '--release'],
    // url: 'http://localhost:8000/mcp',
  },
});

Next steps

E2E Testing

Write Playwright tests against simulated hosts.

Visual Regression

Screenshot comparison across themes and hosts.

Live Testing

Test against real ChatGPT and Claude.

Evals

Multi-model tool calling reliability.

Documentation Index

​Prerequisites

​1. Try the inspector (optional)

​2. Scaffold test infrastructure

​3. Run the smoke test

​4. Write your first real test

​5. Add more test levels

​Language-specific tips

​Next steps

E2E Testing

Visual Regression

Live Testing

Evals

Prerequisites

1. Try the inspector (optional)

2. Scaffold test infrastructure

3. Run the smoke test

4. Write your first real test

5. Add more test levels

Language-specific tips

Next steps