All posts

How to Test Claude Connectors: Unit Tests, Local Inspector, and CI/CD

Abe Wheeler
Claude Connectors Claude Connector Testing Claude Connector Framework MCP Apps MCP App Testing Claude Apps
Testing Claude Connectors locally with the sunpeak inspector.

Testing Claude Connectors locally with the sunpeak inspector.

TL;DR: Test Claude Connectors locally without a Claude account using sunpeak’s inspector (sunpeak dev). Unit test tool handlers with Vitest, test full UI rendering with Playwright, and use simulation files for deterministic edge-case coverage. Run the same tests in GitHub Actions CI/CD. Save live Claude testing for pre-release validation only.

Testing Claude Connectors by hand is slow and expensive. Every test cycle means opening Claude, typing a prompt, waiting for the model to respond, checking the result, and doing it again when something breaks. If your connector renders UI, you also need to verify the iframe loads, the data displays correctly, and the component handles edge cases. A Claude Pro subscription costs $20/month per team member, and every test burns AI credits.

There is a better way. This post covers how to test Claude Connectors at every stage of development: unit tests for tool handlers, local inspector tests for UI rendering, simulation files for edge cases, Playwright e2e tests for full integration, and CI/CD for automated regression testing. All of it runs locally and in your pipeline without a Claude account.

The Claude Connector Testing Pyramid

Think of Claude Connector testing in three layers:

  1. Unit tests (fast, cheap, run in milliseconds). Test tool handlers, schemas, annotations, and utility functions in isolation with Vitest.
  2. Inspector tests (medium speed, full UI). Test your complete connector in sunpeak’s local inspector, which replicates the Claude runtime. Simulation files give you deterministic data. Playwright automates these tests.
  3. Live tests (slow, requires accounts). Test against the real Claude for final validation before shipping. Reserve this for pre-release checks.

Most of your testing should happen in layers 1 and 2. Layer 3 is a safety net, not a daily workflow.

Unit Testing Tool Handlers

Your tool handler is a function that takes arguments and returns content. Test it like any other function.

// tests/tools/search-tickets.test.ts

import { describe, it, expect, vi } from 'vitest';
import handler from '../../src/tools/search-tickets';

// Mock your data layer
vi.mock('../../src/lib/api', () => ({
  searchTickets: vi.fn().mockResolvedValue([
    { id: 'TICK-1', title: 'Login broken', status: 'open', priority: 'high' },
    { id: 'TICK-2', title: 'Slow dashboard', status: 'in_progress', priority: 'medium' },
  ]),
}));

describe('search-tickets handler', () => {
  it('returns structuredContent with matching tickets', async () => {
    const result = await handler(
      { query: 'login', status: 'open' },
      {} as any
    );

    expect(result.structuredContent).toBeDefined();
    expect(result.structuredContent.tickets).toHaveLength(2);
    expect(result.structuredContent.tickets[0].id).toBe('TICK-1');
  });

  it('handles empty results', async () => {
    const { searchTickets } = await import('../../src/lib/api');
    (searchTickets as any).mockResolvedValueOnce([]);

    const result = await handler({ query: 'nonexistent' }, {} as any);
    expect(result.structuredContent.tickets).toHaveLength(0);
  });
});

Run with pnpm test. These tests finish in under a second because there is no browser, no server, and no network.

What to Unit Test

Focus on the logic your handler contains:

  • Return shape. Does the handler return structuredContent or content with the right fields? Your resource component will break silently if the data shape is wrong.
  • Input handling. Does the handler use defaults for optional parameters? Does it validate inputs before making API calls?
  • Error paths. What happens when your external API returns a 500? When the database query times out? When the user passes an ID that does not exist?
  • Data transformation. If you are transforming API responses (and you should), test that the transformation produces the right shape.

Unit Testing Tool Configs and Annotations

Tool annotations are required for Connectors Directory submission, and missing annotations cause 30% of rejections. A quick unit test catches this before you deploy:

// tests/tools/annotations.test.ts

import { describe, it, expect } from 'vitest';
import { tool as searchTool } from '../../src/tools/search-tickets';
import { tool as updateTool } from '../../src/tools/update-ticket-status';

describe('tool annotations', () => {
  it('search tool is marked read-only', () => {
    expect(searchTool.annotations?.readOnlyHint).toBe(true);
  });

  it('update tool is marked destructive', () => {
    expect(updateTool.annotations?.destructiveHint).toBe(true);
  });
});

You can also test that every tool file in your src/tools/ directory has annotations:

// tests/tools/all-annotations.test.ts

import { describe, it, expect } from 'vitest';
import { readdirSync } from 'fs';
import { join } from 'path';

const toolsDir = join(__dirname, '../../src/tools');
const toolFiles = readdirSync(toolsDir).filter((f) => f.endsWith('.ts'));

describe('all tools have annotations', () => {
  toolFiles.forEach((file) => {
    it(`${file} has readOnlyHint or destructiveHint`, async () => {
      const mod = await import(join(toolsDir, file));
      const annotations = mod.tool?.annotations;
      expect(annotations).toBeDefined();

      const hasHint =
        annotations?.readOnlyHint === true ||
        annotations?.destructiveHint === true;
      expect(hasHint).toBe(true);
    });
  });
});

This test auto-discovers tool files, so it catches new tools that ship without annotations. Add it once and forget about it.

Testing with the Local Inspector

Unit tests cover logic. The inspector covers rendering. Run sunpeak dev and you get a local Claude replica at localhost:3000 that loads your connector’s tools and resources without any network calls or Claude account.

sunpeak dev

Select Claude from the Host dropdown in the inspector sidebar. Your tools appear in the tool list. Click a tool, provide mock input, and see your resource component render with real data.

This is where simulation files come in.

Simulation Files

Simulation files are JSON files that define deterministic tool states. Each file specifies a title and the output data your resource component will receive:

// src/resources/ticket-list/simulations/open-tickets.json
{
  "title": "Three open tickets",
  "output": {
    "tickets": [
      { "id": "TICK-1", "title": "Login page 500 error", "status": "open", "priority": "high" },
      { "id": "TICK-2", "title": "Dashboard load time", "status": "open", "priority": "medium" },
      { "id": "TICK-3", "title": "Email notifications delayed", "status": "open", "priority": "low" }
    ]
  }
}
// src/resources/ticket-list/simulations/empty-results.json
{
  "title": "No matching tickets",
  "output": {
    "tickets": []
  }
}
// src/resources/ticket-list/simulations/long-list.json
{
  "title": "20 tickets with pagination",
  "output": {
    "tickets": [
      { "id": "TICK-1", "title": "Issue one", "status": "open", "priority": "high" },
      { "id": "TICK-2", "title": "Issue two", "status": "closed", "priority": "low" }
    ],
    "nextCursor": "abc123",
    "totalCount": 20
  }
}

The inspector auto-discovers these files and lets you switch between them in the sidebar. You see exactly what your users see when Claude returns each tool response.

Edge Cases to Cover with Simulations

Create simulations for the states that break UIs:

  • Empty data. Empty arrays, null fields, zero values. Does your component show a helpful empty state or crash?
  • Long strings. Ticket titles with 200 characters, descriptions with paragraphs of text. Does your layout overflow or truncate?
  • Missing optional fields. If assignee is optional in your tool schema, what happens when the tool result omits it?
  • Single item vs many items. A list with one item and a list with 50 items look different. Test both.
  • Error states. Tool results with an error field or unexpected shapes. Your component should fail gracefully.

E2E Testing with Playwright

Playwright tests automate what you do manually in the inspector. They start the dev server, open the inspector in a real browser, load a simulation, and assert on the rendered UI.

// tests/e2e/ticket-list.spec.ts

import { test, expect } from '@playwright/test';
import { createInspectorUrl } from 'sunpeak/inspector';

test('ticket list renders open tickets', async ({ page }) => {
  const url = createInspectorUrl({
    resource: 'ticket-list',
    simulation: 'open-tickets',
    host: 'claude',
  });

  await page.goto(url);

  const tickets = page.locator('[data-testid="ticket-row"]');
  await expect(tickets).toHaveCount(3);
  await expect(tickets.first()).toContainText('Login page 500 error');
  await expect(tickets.first()).toContainText('high');
});

test('ticket list shows empty state', async ({ page }) => {
  const url = createInspectorUrl({
    resource: 'ticket-list',
    simulation: 'empty-results',
    host: 'claude',
  });

  await page.goto(url);
  await expect(page.getByText('No tickets found')).toBeVisible();
});

Run with:

pnpm test:e2e

Playwright starts the sunpeak dev server automatically, runs the tests, and shuts down. No manual setup required.

Testing Across Hosts

If your connector should work in both Claude and ChatGPT (and it can since both support MCP), test both hosts:

// tests/e2e/cross-host.spec.ts

import { test, expect } from '@playwright/test';
import { createInspectorUrl } from 'sunpeak/inspector';

const hosts = ['claude', 'chatgpt'] as const;

for (const host of hosts) {
  test(`ticket detail renders on ${host}`, async ({ page }) => {
    const url = createInspectorUrl({
      resource: 'ticket-detail',
      simulation: 'open-ticket',
      host,
    });

    await page.goto(url);
    await expect(page.getByText('TICK-1')).toBeVisible();
  });
}

This catches host-specific rendering differences. ChatGPT and Claude have different iframe dimensions, CSS variables, and dark mode behavior. Testing both locally is free and fast.

Testing Display Modes

Claude renders connector UI in different display modes: inline (embedded in the chat), full (expanded view), and pip (picture-in-picture). Your component should look right in all of them:

// tests/e2e/display-modes.spec.ts

import { test, expect } from '@playwright/test';
import { createInspectorUrl } from 'sunpeak/inspector';

const modes = ['inline', 'full', 'pip'] as const;

for (const displayMode of modes) {
  test(`dashboard renders in ${displayMode} mode`, async ({ page }) => {
    const url = createInspectorUrl({
      resource: 'metrics-dashboard',
      simulation: 'q4-metrics',
      host: 'claude',
      displayMode,
    });

    await page.goto(url);
    await expect(page.getByText('Page Views')).toBeVisible();
  });
}

Running Tests in CI/CD

Add both unit tests and e2e tests to your GitHub Actions workflow:

# .github/workflows/test.yml

name: Test Claude Connector
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: pnpm

      - run: pnpm install

      # Unit tests
      - run: pnpm test

      # Install Playwright browsers
      - run: pnpm exec playwright install --with-deps chromium

      # E2E tests against the local inspector
      - run: pnpm test:e2e

Every push runs both test suites. No Claude account, no API keys, no AI credits on your CI runners. If a tool handler breaks, a resource component crashes on empty data, or an annotation goes missing, the pipeline catches it.

For a deeper look at CI/CD configuration, see the MCP App GitHub Actions guide.

Live Testing Against Real Claude

After your connector passes local and CI tests, you can run a final round of tests against the real Claude runtime. This validates things the local inspector cannot fully replicate: actual LLM tool selection, real OAuth flows, and production-specific session handling.

The live testing guide covers this in detail. The short version: Playwright opens a real Claude conversation, sends a message that should trigger your tool, and asserts on the result:

// tests/live/claude-live.spec.ts

import { test, expect } from '@playwright/test';

test('Claude calls search-tickets tool', async ({ page }) => {
  // Navigate to Claude and authenticate (setup in globalSetup)
  await page.goto('https://claude.ai/new');

  // Send a message that should trigger the tool
  await page.getByRole('textbox').fill('Search for open support tickets');
  await page.keyboard.press('Enter');

  // Wait for the connector UI to render
  const iframe = page.frameLocator('iframe[title*="ticket"]');
  await expect(iframe.locator('[data-testid="ticket-row"]')).toBeVisible({
    timeout: 15_000,
  });
});

This requires a Claude account and burns credits, so run it sparingly. A good workflow: local inspector tests on every commit, live tests on release branches or as a manual CI trigger.

Testing Checklist

Before shipping your Claude Connector, make sure you have covered:

  • Every tool handler has unit tests for happy path and error paths
  • Every tool has readOnlyHint or destructiveHint annotations
  • Resource components have simulation files for empty, single, and many-item states
  • Playwright e2e tests load each simulation and check the UI
  • Tests run on both Claude and ChatGPT hosts (if cross-platform)
  • Display modes (inline, full, pip) render without layout breakage
  • CI/CD runs pnpm test and pnpm test:e2e on every push
  • Token payload from structuredContent stays under 25,000 tokens (test with large simulations)
  • Tool schemas reject invalid inputs (test with bad arguments in unit tests)

If you are submitting to the Connectors Directory, the annotation and token limit items are hard requirements. Better to catch them in tests than in a rejection email two weeks later.

Get Started

Documentation →
pnpm add -g sunpeak && sunpeak new

Further Reading

Frequently Asked Questions

How do I test a Claude Connector without a Claude account?

Use sunpeak to run a local inspector that replicates the Claude runtime. Run sunpeak dev to start the inspector at localhost:3000, select Claude from the Host dropdown, and test your connector tools and UI locally. No Claude subscription, no network calls, no AI credits burned. Simulation files provide deterministic mock data so your tests produce the same result every time.

What testing frameworks work with Claude Connectors?

sunpeak projects come with Vitest for unit tests and Playwright for end-to-end tests. Vitest tests individual tool handlers and resource components in isolation. Playwright tests the full connector running inside the sunpeak inspector, including tool calls, UI rendering, and user interactions. Both frameworks run locally and in CI/CD.

How do I unit test a Claude Connector tool handler?

Import your tool handler function directly and call it with mock arguments and an extras object. Assert on the returned content or structuredContent. For handlers that call external APIs, mock the fetch calls with vi.fn() or msw. Vitest runs these tests in milliseconds with no server or browser required.

What are simulation files in Claude Connector testing?

Simulation files are JSON files that define deterministic tool states for testing. Each file specifies a title and output (the structuredContent your resource component receives). Place them in your resource directory and the sunpeak inspector auto-discovers them. Use simulations for visual testing during development and as fixtures for Playwright e2e tests.

How do I run Claude Connector tests in GitHub Actions?

Add pnpm test for unit tests and pnpm test:e2e for Playwright tests to your GitHub Actions workflow. Playwright tests start the sunpeak dev server automatically, run against the local inspector (no Claude account needed), and shut down when complete. The full test suite runs in CI with zero external dependencies.

How do I test Claude Connector annotations like readOnlyHint and destructiveHint?

Import the tool config object from your tool file and assert that the annotations field contains the expected values. Every tool submitted to the Connectors Directory must have either readOnlyHint: true or destructiveHint: true. A unit test that checks annotations catches missing values before you deploy.

Can I test my Claude Connector against the real Claude?

Yes. After local testing, you can run Playwright tests against the real Claude runtime for final validation. The sunpeak live testing setup uses Playwright to open a real Claude conversation, trigger your connector tools, and assert on the rendered UI. This requires a Claude account and costs AI credits, so reserve it for pre-release validation rather than everyday development.

How do I test Claude Connector error handling?

Create simulation files with edge-case data: empty arrays, null fields, very long strings, missing optional fields. Write Playwright tests that load these simulations and verify your resource component handles them gracefully. For tool handler errors, unit test that your handler returns meaningful error messages when external APIs fail or input validation catches bad arguments.