Skip to main content

E2E Testing

E2E tests are Playwright specs in tests/e2e/*.spec.ts. The dev server starts automatically — Playwright launches it before running tests. Tests run against both ChatGPT and Claude hosts via Playwright projects.
sunpeak test                               # Run unit + e2e
sunpeak test --e2e                         # E2E only
sunpeak test --e2e --ui                    # Playwright UI mode
sunpeak test --e2e tests/e2e/albums.spec.ts  # Single file

Writing E2E Tests

Import test and expect from sunpeak/test. The mcp fixture handles inspector navigation, double-iframe traversal, and host selection:
import { test, expect } from 'sunpeak/test';

test('should render album cards in light mode', async ({ mcp }) => {
  const result = await mcp.callTool('show-albums', {}, { theme: 'light' });
  const app = result.app();
  await expect(app.locator('button:has-text("Summer Slice")')).toBeVisible();
});

test('should render in fullscreen mode', async ({ mcp }) => {
  const result = await mcp.callTool('show-albums', {}, { displayMode: 'fullscreen' });
  const app = result.app();
  await expect(app.locator('button:has-text("Summer Slice")')).toBeVisible();
});

test('prod tools empty state', async ({ mcp }) => {
  await mcp.openTool('show-albums');
  await expect(mcp.page.locator('text=Press Run to call the tool')).toBeVisible();
});
The config is a one-liner:
// playwright.config.ts
import { defineConfig } from 'sunpeak/test/config';
export default defineConfig();
This auto-detects sunpeak projects and creates per-host Playwright projects (chatgpt, claude). Each test runs once per host automatically — no host loops needed.
For non-sunpeak projects, pass a server option to defineConfig:
import { defineConfig } from 'sunpeak/test/config';
export default defineConfig({
  server: 'http://localhost:8000/mcp',
});

URL Parameters

Inspector set to fullscreen dark mode via URL params Inspector set to inline light mode via URL params The mcp.callTool() method accepts options for theme, displayMode, and prodResources. For advanced URL parameters, see the Inspector API Reference.

Testing Backend-Only Tools

If your resource calls backend tools via useCallServerTool, define mock responses using the serverTools field in the simulation JSON. The inspector resolves these mocks based on the tool call arguments:
// tests/simulations/review-purchase.json
{
  "tool": "review-purchase",
  "toolResult": { "structuredContent": { "..." } },
  "serverTools": {
    "review": [
      {
        "when": { "confirmed": true },
        "result": {
          "content": [{ "type": "text", "text": "Completed." }],
          "structuredContent": { "status": "success", "message": "Completed." }
        }
      },
      {
        "when": { "confirmed": false },
        "result": {
          "content": [{ "type": "text", "text": "Cancelled." }],
          "structuredContent": { "status": "cancelled", "message": "Cancelled." }
        }
      }
    ]
  }
}
import { test, expect } from 'sunpeak/test';

test('should show success when server confirms', async ({ mcp }) => {
  const result = await mcp.callTool('review-purchase');
  const app = result.app();

  await app.locator('button:has-text("Place Order")').evaluate((el) => (el as HTMLElement).click());

  // The serverTools mock matches { confirmed: true } and returns success
  await expect(app.locator('text=Completed.')).toBeVisible({ timeout: 10000 });
});

test('should show cancel when user rejects', async ({ mcp }) => {
  const result = await mcp.callTool('review-purchase');
  const app = result.app();

  await app.locator('button:has-text("Cancel")').evaluate((el) => (el as HTMLElement).click());

  // The serverTools mock matches { confirmed: false } and returns cancelled
  await expect(app.locator('text=Cancelled.')).toBeVisible({ timeout: 10000 });
});
The serverTools field supports both simple (single result) and conditional (when/result array) forms. See Simulation API Reference for details.

Example E2E Test Structure

A typical e2e test file tests a resource across different modes. Each test runs automatically against both ChatGPT and Claude hosts:
import { test, expect } from 'sunpeak/test';

test('should render album cards with correct styles', async ({ mcp }) => {
  const result = await mcp.callTool('show-albums', {}, { theme: 'light' });
  const app = result.app();

  const albumCard = app.locator('button:has-text("Summer Slice")');
  await expect(albumCard).toBeVisible();

  const styles = await albumCard.evaluate((el) => {
    const computed = window.getComputedStyle(el);
    return { cursor: computed.cursor, borderRadius: computed.borderRadius };
  });
  expect(styles.cursor).toBe('pointer');
  expect(styles.borderRadius).toBe('12px');
});

test('should render with dark theme', async ({ mcp }) => {
  const result = await mcp.callTool('show-albums', {}, { theme: 'dark' });
  const app = result.app();
  await expect(app.locator('button:has-text("Summer Slice")')).toBeVisible();
});

test('should render in fullscreen', async ({ mcp }) => {
  const result = await mcp.callTool('show-albums', {}, { displayMode: 'fullscreen' });
  const app = result.app();
  await expect(app.locator('button:has-text("Summer Slice")')).toBeVisible();
});

test('pip mode works (ChatGPT only)', async ({ mcp }) => {
  test.skip(mcp.host === 'claude', 'Claude does not support PiP');
  const result = await mcp.callTool('show-albums');
  await mcp.setDisplayMode('pip');
  await expect(result.app().locator('button:has-text("Summer Slice")')).toBeVisible();
});

Best Practices

Test one thing per test case. Clear tests are maintainable tests.
// Good
it('displays error message when API fails', () => {})

// Bad
it('test 1', () => {})
Test what users see and interact with, not implementation details:
// Good
expect(screen.getByText('Submit')).toBeInTheDocument();

// Bad
expect(component.props.buttonLabel).toBe('Submit');
Use mcp.callTool() options to test your resources in different configurations. Tests run across ChatGPT and Claude hosts automatically via Playwright projects. Pass theme and displayMode as options:
await mcp.callTool('show-weather', {}, { theme: 'dark', displayMode: 'fullscreen' });
See MCP Apps Display Modes for how hosts handle inline, fullscreen, and PiP views.
Use mcp.screenshot() in tests that cover important visual states (light/dark theme, fullscreen, empty states). Visual tests catch CSS regressions that functional assertions miss:
await mcp.screenshot('albums-dark');
await mcp.screenshot('albums-fullscreen', { target: 'page' });
Use afterEach to reset state between tests:
afterEach(() => {
  // Clean up mocks, reset state, etc.
});

Learn More

Visual Regression Testing

Screenshot comparison and baseline management.

Inspector

The runtime that powers E2E tests.

Simulations

JSON schema, conventions, and auto-discovery.