How to Test Claude Connectors: Unit Tests, Local Inspector, and CI/CD (June 2026)

June 15, 2026Abe Wheeler

Claude ConnectorsClaude Connector TestingClaude Connector FrameworkMCP AppsMCP App TestingClaude AppsMCP Testing Framework

Testing Claude Connectors locally with the sunpeak inspector.

Testing Claude Connectors by hand still breaks down fast. A realistic test cycle now includes more than “does Claude call my tool?” You need to know whether the remote MCP server is reachable, whether tool schemas are clear, whether structuredContent matches outputSchema, whether read and write actions ask for the right confirmation, whether an interactive resource renders inside Claude, and whether the same connector still works in ChatGPT or another MCP Apps host.

TL;DR: Test Claude Connectors in layers. Unit test tool handlers, contract test MCP schemas and annotations, use the sunpeak inspector to render resources in a local Claude runtime, cover edge cases with simulations, run Playwright E2E and visual tests in CI, and save real Claude live tests for OAuth, tool selection, public network reachability, and pre-release checks. Local tests should do most of the work because they are deterministic and do not need a paid host account.

Claude Connectors are remote MCP servers. Claude can use them to read data, take actions, and, for interactive connectors, render live interfaces such as dashboards, task boards, or document views in the conversation. Anthropic now documents custom connectors using remote MCP across Claude, Cowork, and Claude Desktop, with Free users limited to one custom connector. The important testing change is that your connector is no longer just a local script. It is a networked product boundary with user permissions, OAuth, host UI behavior, and model-selected tools.

This guide covers a testing workflow that fits that reality.

The Claude Connector Testing Pyramid

Use four layers:

Unit tests. Test tool handlers, input validation, data transforms, access checks, and error branches without a server or browser.
Protocol contract tests. Test MCP tool descriptors, annotations, inputSchema, outputSchema, structuredContent, content, _meta, and resource links.
Inspector and E2E tests. Render the connector in a local Claude runtime, switch host states, load simulations, and assert against the actual iframe UI.
Live tests. Connect the deployed or tunneled server to the real Claude app and test host-specific behavior before release.

Most regressions should fail in layers 1 through 3. Live tests are useful, but they are slower, less deterministic, and tied to real accounts, settings, and rate limits.

Start with Tool Handler Unit Tests

Your tool handler is normal application code. Test it before you involve MCP.

// tests/tools/search-tickets.test.ts

import { describe, expect, it, vi } from 'vitest';
import handler from '../../src/tools/search-tickets';

vi.mock('../../src/lib/tickets', () => ({
  searchTickets: vi.fn().mockResolvedValue([
    { id: 'TICK-1', title: 'Login error', status: 'open', priority: 'high' },
    { id: 'TICK-2', title: 'Slow dashboard', status: 'open', priority: 'medium' },
  ]),
}));

describe('search-tickets handler', () => {
  it('returns structuredContent with the expected shape', async () => {
    const result = await handler({ query: 'login', status: 'open' }, {} as any);

    expect(result.structuredContent).toEqual({
      tickets: [
        { id: 'TICK-1', title: 'Login error', status: 'open', priority: 'high' },
        { id: 'TICK-2', title: 'Slow dashboard', status: 'open', priority: 'medium' },
      ],
    });
  });

  it('returns a useful empty state payload', async () => {
    const { searchTickets } = await import('../../src/lib/tickets');
    vi.mocked(searchTickets).mockResolvedValueOnce([]);

    const result = await handler({ query: 'nothing' }, {} as any);

    expect(result.structuredContent).toEqual({ tickets: [] });
    expect(result.content?.[0]?.type).toBe('text');
  });
});

Good unit coverage answers a few plain questions:

Does the handler return the shape the resource expects?
Does it reject invalid inputs before calling external APIs?
Does it handle empty results, missing records, and upstream errors?
Does it enforce the current user’s permissions before returning data?
Does it avoid putting secrets, access tokens, or UI-only data in model-visible fields?

For write tools, unit test idempotency. If Claude retries an operation or a user repeats a request, your handler should not create duplicate records unless that is the intended behavior.

Contract Test the MCP Surface

MCP clients discover your connector through tools/list and invoke it through tools/call. That means your tests should cover the protocol surface, not just the imported handler.

At minimum, assert that every tool has:

A stable name with no spaces.
A useful description that tells the model when to call it.
An inputSchema with required fields and clear descriptions.
Correct annotations, especially readOnlyHint and destructiveHint.
An outputSchema when the tool returns structuredContent.
A resource link or app metadata when the tool renders UI.

// tests/contracts/tools.test.ts

import { describe, expect, it } from 'vitest';
import { server } from '../../src/server';

describe('MCP tool contracts', () => {
  it('defines explicit safety annotations', async () => {
    const tools = await server.listTools();

    for (const tool of tools) {
      expect(tool.name).toMatch(/^[A-Za-z0-9_.-]+$/);
      expect(tool.description?.length).toBeGreaterThan(30);
      expect(tool.inputSchema).toBeDefined();

      const annotations = tool.annotations ?? {};
      expect(typeof annotations.readOnlyHint).toBe('boolean');
      expect(typeof annotations.destructiveHint).toBe('boolean');
    }
  });
});

Do not copy that exact assertion into every project without thinking. Some non-destructive write tools should use readOnlyHint: false and destructiveHint: false. The better version is a policy table that maps tool names to expected behavior:

const expectedToolSafety = {
  search_tickets: { readOnlyHint: true, destructiveHint: false },
  create_ticket: { readOnlyHint: false, destructiveHint: false },
  delete_ticket: { readOnlyHint: false, destructiveHint: true },
} as const;

Then fail the test when a tool’s annotations drift from the policy.

Validate outputSchema and structuredContent

The MCP tools specification defines structuredContent as the structured result of a tool call and says tools may provide outputSchema for validation. That matters for interactive Claude Connectors because the same payload often feeds the model and the rendered UI.

If your resource expects tickets, but the handler returns items, the host might still show an iframe. Your app will be empty or broken. Test the contract directly.

// tests/contracts/search-tickets-output.test.ts

import { describe, expect, it } from 'vitest';
import Ajv from 'ajv';
import { searchTicketsTool, searchTicketsHandler } from '../../src/tools/search-tickets';

const ajv = new Ajv();

describe('search_tickets output', () => {
  it('matches outputSchema', async () => {
    const result = await searchTicketsHandler({ query: 'login' }, {} as any);
    const validate = ajv.compile(searchTicketsTool.outputSchema);

    expect(validate(result.structuredContent)).toBe(true);
    expect(validate.errors).toBeNull();
  });
});

Cover these result shapes:

Empty arrays.
One item.
Many items.
Long text fields.
Missing optional fields.
Pagination cursors.
Permission-denied results.
Upstream API failures.
Expired OAuth tokens.

If you use TypeScript types or Zod schemas, generate both the runtime validation and UI types from the same source where possible. The goal is one contract, not three hand-maintained copies.

Use the Local Claude Inspector

Unit and contract tests prove the server side. The inspector proves the rendered connector.

In a sunpeak project:

pnpm dev

Then open the inspector at localhost:3000 and select Claude from the Host dropdown. For an existing MCP server, point the inspector at it:

npx sunpeak inspect --server http://localhost:8000/mcp

The sunpeak inspector replicates Claude and ChatGPT host runtimes locally. You can switch host, theme, device width, display mode, tool input, tool result, and app context without deploying or signing into a real host. That makes it the right place to test interactive resource states during development.

Build Simulations for Every UI State

Simulation files turn hard-to-reproduce host states into fixtures. A useful simulation describes the tool input, tool result, and mock server data needed to render one state.

// tests/simulations/open-tickets.json
{
  "tool": "search_tickets",
  "title": "Open tickets",
  "userMessage": "Show open login tickets",
  "toolInput": {
    "arguments": {
      "query": "login",
      "status": "open"
    }
  },
  "toolResult": {
    "content": [
      {
        "type": "text",
        "text": "Found 3 open tickets."
      }
    ],
    "structuredContent": {
      "tickets": [
        { "id": "TICK-1", "title": "Login page error", "status": "open", "priority": "high" },
        { "id": "TICK-2", "title": "SSO callback timeout", "status": "open", "priority": "medium" },
        { "id": "TICK-3", "title": "Password reset email delay", "status": "open", "priority": "low" }
      ]
    }
  }
}

Create simulations for states that break UI:

Empty result.
Loading skeleton.
Error result.
Permission denied.
OAuth expired.
Long translated strings.
Many rows with pagination.
Mobile viewport.
Dark theme.
Write confirmation pending.
Write success.
Write failure after confirmation.

The important part is reuse. The same simulation should help a developer inspect the UI manually and help Playwright assert the state in CI.

Write E2E Tests Against the Inspector

The inspector fixture from sunpeak/test automates the local host runtime. It renders the connector resource in an iframe and gives you a scoped locator for assertions.

// tests/e2e/ticket-list.spec.ts

import { expect, test } from 'sunpeak/test';

test('ticket list renders open tickets in Claude', async ({ inspector }) => {
  const result = await inspector.renderTool('search_tickets', {
    query: 'login',
    status: 'open',
  });

  const app = result.app();

  await expect(app.getByText('Login page error')).toBeVisible();
  await expect(app.getByText('SSO callback timeout')).toBeVisible();
  await expect(app.getByText('Password reset email delay')).toBeVisible();
});

test('ticket list handles empty results', async ({ inspector }) => {
  const result = await inspector.renderTool('search_tickets', { query: 'nothing' });

  await expect(result.app().getByText('No tickets found')).toBeVisible();
});

Run the tests:

pnpm test

For cross-host apps, run the same simulation against Claude and ChatGPT. OpenAI documents ChatGPT Apps as MCP servers plus optional iframe UI, and the MCP Apps UI standard is intended to work across compatible hosts. That does not mean every host has identical chrome, safe areas, or display behavior. It means your tests should prove the differences you care about.

Test Host States, Not Just Happy Paths

Interactive connectors fail in host-specific places. Add tests for:

Theme. Light and dark mode should keep contrast and chart colors readable.
Viewport. Mobile widths should not clip buttons, tables, or form labels.
Display mode. Test the display modes your target hosts expose, such as inline and fullscreen for Claude interactive connectors, and pip or fullscreen where another host supports them.
Safe areas. Sticky footers and floating action bars should not hide behind host chrome.
Tool output size. Large structuredContent should not make the UI unusable or flood model context.
Resource load failure. The user should see a useful error if assets, CSP, or API calls fail.
Cancelled actions. A user who cancels a write action should not leave partial state in the UI.

This is where visual regression tests help. A screenshot diff will catch a clipped approval button faster than a text assertion.

// tests/visual/ticket-list.visual.spec.ts

import { expect, test } from 'sunpeak/test';

test('ticket list visual state is stable in Claude dark mode', async ({ inspector }) => {
  const result = await inspector.renderTool('search_tickets', {
    query: 'login',
    status: 'open',
  }, {
    theme: 'dark',
    viewport: { width: 390, height: 844 },
  });

  await expect(result.app()).toHaveScreenshot('ticket-list-claude-dark-mobile.png');
});

Test OAuth and Public Network Reachability Separately

Local inspector tests should not depend on a real OAuth provider. Mock the token state in simulations and unit tests:

No token.
Expired token.
Token without required scopes.
User lacks permission in the source system.
Refresh token succeeds.
Refresh token fails.

Then test the real OAuth flow in live or staging tests. Anthropic documents that custom connectors using remote MCP connect from Anthropic cloud infrastructure, so a server that works from your laptop may still fail when Claude tries to reach it. For live checks, verify:

The MCP endpoint is public over HTTPS.
The URL includes the right path, usually /mcp.
OAuth callback URLs match the deployed environment.
Team or Enterprise owners can add the connector at the organization level when needed.
Individual users still authenticate with the source service.
Disconnect and reconnect work.

Keep these tests outside the fast CI path. They are integration checks against the real world.

Run Tests in GitHub Actions

A normal CI workflow should run unit, contract, E2E, and visual tests against the local inspector:

# .github/workflows/test.yml

name: Test Claude Connector

on:
  pull_request:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: pnpm

      - run: pnpm install --frozen-lockfile
      - run: pnpm exec playwright install --with-deps chromium
      - run: pnpm test

If your repo uses pnpm validate, run that instead. The point is to make the complete local test suite block merges.

Put live Claude tests in a separate workflow:

name: Live Claude Connector Tests

on:
  workflow_dispatch:

jobs:
  live:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: pnpm
      - run: pnpm install --frozen-lockfile
      - run: pnpm exec playwright install --with-deps chromium
      - run: pnpm test:live
        env:
          CLAUDE_EMAIL: ${{ secrets.CLAUDE_EMAIL }}
          CLAUDE_PASSWORD: ${{ secrets.CLAUDE_PASSWORD }}
          CONNECTOR_URL: ${{ secrets.CONNECTOR_URL }}

Do not make live tests the only signal. They should confirm the release, not replace deterministic tests.

A Practical Pre-Release Checklist

Before shipping a Claude Connector, check these items:

If you are submitting to the Claude Connectors Directory, also test the submission-facing details: connector description, read/write capabilities, auth setup, reviewer test account, and any interactive UI states you expect reviewers to try.

Where sunpeak Fits

You can build these tests with plain Vitest, Playwright, and an MCP SDK. sunpeak bundles the workflow because Claude Connectors and ChatGPT Apps share the same hard parts: tools, resources, host runtimes, simulations, display modes, and CI.

Use npx sunpeak new when you want a structured project with the inspector and tests built in. Use npx sunpeak inspect --server URL when you already have an MCP server and need a local Claude or ChatGPT runtime for inspection. Either way, the best testing loop is the same: make host behavior reproducible locally, run it in CI, and use real Claude only for the checks that only the real host can answer.

Get Started

Documentation →

npx sunpeak new

Frequently Asked Questions

How do I test a Claude Connector without a Claude account?

Use a local MCP App inspector such as sunpeak to run your connector tools and render your resources in a replicated Claude runtime. In a sunpeak project, run pnpm dev and select Claude from the Host dropdown. For an existing MCP server, run npx sunpeak inspect --server URL. This tests tool results, UI states, themes, display modes, and edge cases without connecting to the real Claude service.

What should I test first in a Claude Connector?

Start with protocol contract tests: each tool should have a clear inputSchema, safe annotations, an outputSchema when it returns structuredContent, predictable error results, and access checks. These tests are fast, run without a browser, and catch the bugs that later make Claude call the wrong tool or render the wrong data.

How do I test Claude Connector structuredContent?

Call the tool handler or MCP tools/call endpoint with representative arguments, validate result.structuredContent against the tool outputSchema, and render the linked resource with the same result. Test empty data, long strings, missing optional fields, pagination cursors, and error payloads. The goal is to prove that the server contract and UI resource agree.

What are simulation files in Claude Connector testing?

Simulation files are deterministic fixtures that describe a tool state, including tool input, tool result, and mock server data. The sunpeak inspector loads simulations so you can switch between states during development and reuse those states in Playwright E2E tests. They are useful for empty states, permission-denied states, large result sets, OAuth-expired states, and destructive-action confirmations.

How do I run Claude Connector tests in GitHub Actions?

Run your normal package test command, usually pnpm test or pnpm validate, after installing dependencies and Playwright browsers. Local inspector tests do not require a Claude account because they run against a replicated host runtime. Keep live Claude tests in a separate workflow or manual job because they need real credentials and may consume usage.

Do Claude Connectors need live tests against the real Claude app?

Yes, but only as a final confidence check. Local and CI tests should cover most contract, UI, and regression risk. Live tests are best for host-specific behavior such as real tool selection, OAuth redirects, per-conversation connector enablement, organization permission settings, and production network reachability.

How do I test Claude Connector permissions and write actions?

Separate read-only tools from write tools, set readOnlyHint and destructiveHint correctly, and test the visible approval path for write operations. For Team and Enterprise connectors, test organization-level action restrictions if your connector exposes read and write tools. A write-action test should verify the preview, user confirmation, side effect, and rollback or idempotency behavior.

How do I test a remote MCP custom connector locally?

Test the server locally with unit, contract, inspector, and E2E tests first. For a real Claude custom connector, expose a development server over HTTPS, make sure Anthropic can reach it from the public internet, configure OAuth if required, and run a small live test plan that covers connect, enable, call, render, disconnect, and error recovery.