All posts

Test-Driven Development for MCP Apps, ChatGPT Apps, and Claude Connectors (April 2026)

Abe Wheeler
MCP Apps MCP App Testing MCP App Framework ChatGPT Apps ChatGPT App Testing ChatGPT App Framework Claude Connectors Claude Connector Testing Claude Connector Framework TDD Test-Driven Development
Test-driven development for MCP Apps: write simulations and tests first, then build.

Test-driven development for MCP Apps: write simulations and tests first, then build.

MCP Apps have an architecture that maps cleanly to test-driven development. The tool handler produces data. The resource component renders it. A simulation file defines the contract between them. Because these pieces are separate, you can write the contract and the tests before you write any of the actual code, and that’s exactly what TDD asks you to do.

TL;DR: Write simulation files first to define your data contract, then write tests that assert against that data, then build the resource component and tool handler to make the tests pass. The full TDD cycle runs locally with no paid accounts, no API keys, and no AI credits. You get a test suite as a side effect of how you build.

Why TDD Fits MCP Apps

Traditional TDD works well for pure functions: define input, assert output, build the function. It gets harder with UI because the “expected output” is visual and subjective.

MCP Apps sit in a sweet spot between the two. Your resource component receives structured data from the tool handler through useToolData(), and the data shape is defined by your tool’s Zod schema. You know exactly what data your component will get before you write it. That means you can:

  1. Define the data shape in a simulation file
  2. Write a test that renders the component with that data and checks the output
  3. Build the component to make the test pass

The data contract between tool and resource is the specification that TDD needs. You don’t have to guess what your component should render because the simulation file tells you what data it will receive.

This is different from testing a typical React app where the data comes from an API you might still be designing. In an MCP App, the tool schema pins down the data shape early, which makes writing tests first practical rather than aspirational.

The TDD Cycle for MCP Apps

The classic red-green-refactor loop adapts to MCP App development with one addition: the simulation file comes before the test.

  1. Write a simulation file that defines expected tool input and output
  2. Write a test that asserts against that data. It fails because the code doesn’t exist yet (red)
  3. Build the minimum code to make the test pass (green)
  4. Refactor while keeping tests green

Here’s what each step looks like in practice, using a weather app as the running example.

Step 1: Write the Simulation File

Start with the data. Before you write a React component or a tool handler, create a simulation file that describes what a successful tool call looks like.

Create tests/simulations/show-weather.json:

{
  "tool": "show-weather",
  "userMessage": "What's the weather in Seattle?",
  "toolInput": {
    "city": "Seattle"
  },
  "toolResult": {
    "structuredContent": {
      "city": "Seattle",
      "temperature": 58,
      "unit": "F",
      "condition": "Cloudy",
      "humidity": 72,
      "wind": "12 mph NW"
    }
  }
}

This file is your specification. It says: when the model calls show-weather with { city: "Seattle" }, the tool returns this structured content, and the resource component should render it.

You can run pnpm dev right now and see this simulation in the inspector, even though no component or handler exists yet. The inspector shows the raw JSON because there’s nothing to render it. That’s your red state, visible in the browser.

Step 2: Write a Failing Test

Now write a test that asserts what the rendered output should look like. For the fastest TDD feedback loop, start with a unit test:

// tests/unit/weather.test.tsx
import { render, screen } from '@testing-library/react';
import { describe, it, expect, vi, beforeEach } from 'vitest';

let mockToolData: Record<string, unknown> = {};

vi.mock('sunpeak', () => ({
  useToolData: () => mockToolData,
  SafeArea: ({ children }: { children: React.ReactNode }) => <div>{children}</div>,
}));

// This import will fail because the component doesn't exist yet
// import Weather from '../../src/resources/weather/weather';

describe('Weather resource', () => {
  beforeEach(() => {
    mockToolData = {
      output: {
        city: 'Seattle',
        temperature: 58,
        unit: 'F',
        condition: 'Cloudy',
        humidity: 72,
        wind: '12 mph NW',
      },
      isError: false,
      isLoading: false,
      isCancelled: false,
    };
  });

  it('renders city and temperature', () => {
    // render(<Weather />);
    // expect(screen.getByText('Seattle')).toBeDefined();
    // expect(screen.getByText(/58°F/)).toBeDefined();
    // expect(screen.getByText('Cloudy')).toBeDefined();
  });
});

The test doesn’t even compile because the component file doesn’t exist. That’s the red phase. You’ve defined what the component should do before writing it.

Step 3: Build the Resource Component

Now write the minimum component to make the test pass:

// src/resources/weather/weather.tsx
import { useToolData, SafeArea } from 'sunpeak';
import type { ResourceConfig } from 'sunpeak';

export const resource: ResourceConfig = {
  description: 'Display current weather conditions',
};

interface WeatherData {
  city: string;
  temperature: number;
  unit: string;
  condition: string;
  humidity: number;
  wind: string;
}

export default function Weather() {
  const { output, isLoading, isError } = useToolData<WeatherData>();

  if (isLoading) return <p>Loading weather...</p>;
  if (isError || !output) return <p>Could not load weather data.</p>;

  return (
    <SafeArea>
      <h2>{output.city}</h2>
      <p>{output.temperature}°{output.unit}</p>
      <p>{output.condition}</p>
      <p>Humidity: {output.humidity}%</p>
      <p>Wind: {output.wind}</p>
    </SafeArea>
  );
}

Uncomment the import and assertions in your test file:

import Weather from '../../src/resources/weather/weather';

describe('Weather resource', () => {
  // ... beforeEach stays the same

  it('renders city and temperature', () => {
    render(<Weather />);
    expect(screen.getByText('Seattle')).toBeDefined();
    expect(screen.getByText(/58°F/)).toBeDefined();
    expect(screen.getByText('Cloudy')).toBeDefined();
  });
});

Run pnpm test:unit. The test passes. Green.

Step 4: TDD the Tool Handler

The resource component works with mock data. Now write the tool handler that produces real data with the same shape. Again, start with a failing test:

// tests/unit/show-weather-handler.test.ts
import { describe, it, expect, vi, beforeEach } from 'vitest';

vi.mock('../../src/lib/weather-api', () => ({
  fetchWeather: vi.fn(),
}));

import { handler } from '../../src/tools/show-weather/handler';
import { fetchWeather } from '../../src/lib/weather-api';
const mockFetchWeather = vi.mocked(fetchWeather);

describe('show-weather handler', () => {
  beforeEach(() => {
    vi.clearAllMocks();
  });

  it('returns structured content with weather data', async () => {
    mockFetchWeather.mockResolvedValue({
      city: 'Seattle',
      temperature: 58,
      unit: 'F',
      condition: 'Cloudy',
      humidity: 72,
      wind: '12 mph NW',
    });

    const result = await handler({ city: 'Seattle' });

    expect(result.structuredContent).toEqual({
      city: 'Seattle',
      temperature: 58,
      unit: 'F',
      condition: 'Cloudy',
      humidity: 72,
      wind: '12 mph NW',
    });
  });
});

The test fails because handler doesn’t exist. Write it:

// src/tools/show-weather/handler.ts
import { fetchWeather } from '../../lib/weather-api';

export async function handler(input: { city: string }) {
  const weather = await fetchWeather(input.city);

  return {
    content: [{ type: 'text' as const, text: `Weather for ${weather.city}: ${weather.temperature}°${weather.unit}` }],
    structuredContent: weather,
  };
}

Run the test. It passes. The handler returns the same data shape as the simulation file, so when you wire it up to the tool definition, the resource component renders it the same way.

Step 5: Refactor

Now you can refactor freely. Add Tailwind classes, extract sub-components, restructure the layout. After each change, run the tests. If they stay green, you haven’t broken anything.

TDD for Edge Cases and Error States

The biggest payoff of TDD in MCP Apps is edge case coverage. Without TDD, developers build the happy path first and handle errors later (or never). With TDD, you write the edge case simulation and test before you write the code that handles it.

Empty and Missing Data

Create tests/simulations/show-weather-unknown-city.json:

{
  "tool": "show-weather",
  "userMessage": "What's the weather on Mars?",
  "toolInput": { "city": "Mars" },
  "toolResult": {
    "structuredContent": null,
    "content": [{ "type": "text", "text": "No weather data found for Mars" }],
    "isError": true
  }
}

Write the unit test before you add error handling to your component:

it('shows error message when tool fails', () => {
  mockToolData = {
    output: null,
    isError: true,
    isLoading: false,
    isCancelled: false,
  };

  render(<Weather />);
  expect(screen.getByText('Could not load weather data.')).toBeDefined();
});

If your component already handles this case (because you built it into the initial version), the test passes immediately. If it doesn’t, you add the handling now. Either way, the test documents the expected behavior.

Loading and Cancelled States

The same pattern applies to every state your component can be in. Write the test, then build the UI:

it('shows loading state', () => {
  mockToolData = {
    output: null,
    isLoading: true,
    isError: false,
    isCancelled: false,
  };

  render(<Weather />);
  expect(screen.getByText('Loading weather...')).toBeDefined();
});

it('shows cancelled state', () => {
  mockToolData = {
    output: null,
    isLoading: false,
    isError: false,
    isCancelled: true,
  };

  render(<Weather />);
  expect(screen.getByText(/stopped/i)).toBeDefined();
});

Each test forces you to decide what the component shows for that state before you write it. For a deeper look at handling these states, see MCP App error handling.

TDD for Server Tool Interactions

If your resource component calls back to the server using callServerTool (for example, a confirmation button that triggers a purchase), you can TDD the interaction with simulation file mocking.

Write the simulation with serverTools first:

{
  "tool": "review-order",
  "userMessage": "Buy the headphones",
  "toolInput": { "itemId": "hp-100" },
  "toolResult": {
    "structuredContent": {
      "item": "Wireless Headphones",
      "price": 79,
      "status": "pending"
    }
  },
  "serverTools": {
    "confirm-order": [
      {
        "when": { "confirmed": true },
        "result": {
          "structuredContent": { "status": "confirmed", "orderId": "ord_123" }
        }
      },
      {
        "when": { "confirmed": false },
        "result": {
          "structuredContent": { "status": "cancelled" }
        }
      }
    ]
  }
}

Then write an e2e test that uses this simulation to test the interaction:

import { test, expect } from 'sunpeak/test';

test('confirming order shows confirmation', async ({ inspector }) => {
  const result = await inspector.renderTool('review-order');
  const app = result.app();

  await app.getByRole('button', { name: /confirm/i }).click();
  await expect(app.getByText(/confirmed/i)).toBeVisible();
  await expect(app.getByText('ord_123')).toBeVisible();
});

test('declining order shows cancellation', async ({ inspector }) => {
  const result = await inspector.renderTool('review-order');
  const app = result.app();

  await app.getByRole('button', { name: /cancel/i }).click();
  await expect(app.getByText(/cancelled/i)).toBeVisible();
});

Both tests fail until you build the order review component with the confirm and cancel buttons. The simulation file defines how the server responds to each action, so you can test the full interaction without a real backend.

TDD Across Hosts

E2e tests written with the inspector fixture run against both ChatGPT and Claude runtimes by default. The defineConfig() from sunpeak/test/config creates separate Playwright projects for each host, so every test runs twice: once on ChatGPT’s runtime and once on Claude’s.

This matters for TDD because you catch host-specific differences during the red phase, not after you’ve shipped. If your component’s layout breaks in Claude’s iframe because of different CSS variables or viewport constraints, the failing test tells you before you’ve committed to a design that only works on one host.

For visual differences across hosts, combine TDD with visual regression testing to catch layout shifts that text assertions miss.

When to Skip TDD

TDD works best when you know the data shape upfront. For MCP Apps, that’s most of the time because the tool schema defines the contract. But there are cases where building first makes more sense:

  • Early prototyping. If you’re exploring what the UI should look like and the data shape is still fluid, hardcode some data and iterate on the design. Once you settle on a shape, extract it into a simulation file and backfill tests.
  • Purely visual work. Adjusting colors, spacing, and layout is faster with the inspector than with test assertions. Write visual regression tests after the design is stable.
  • Third-party API exploration. If you don’t yet know what shape an external API returns, call it first, then write simulations based on real responses.

The key is that once you know the data shape, switching to TDD for the remaining work (error states, edge cases, new features) gives you coverage you’d otherwise skip.

Unit Tests vs E2E Tests in the TDD Loop

Both test types fit into TDD, but they serve different purposes in the loop.

Unit tests are faster. They run in milliseconds with Vitest and happy-dom, so you get immediate feedback when iterating on component logic. Use them for the tight red-green-refactor cycle: mock useToolData, render the component, assert against the DOM. They’re the right choice when you’re testing data rendering, state transitions, and handler logic.

E2e tests are slower but catch more. They render your component in a real browser inside a simulated host runtime, which means they test iframe rendering, CSS variable resolution, and display mode behavior. Use them for interaction tests (button clicks, form submissions via callServerTool) and for verifying your component works across hosts.

A practical TDD workflow uses both: unit tests for the fast inner loop, e2e tests as a verification step before committing.

Putting It Together

Here’s the full sequence for TDD-ing a new feature in an MCP App:

  1. Write a simulation file with the expected structuredContent
  2. Write a unit test that mocks useToolData with that data and asserts the expected UI
  3. Run the test and watch it fail
  4. Build the resource component to make it pass
  5. Write edge case simulations (empty data, errors, large datasets)
  6. Write unit tests for each edge case
  7. Build the edge case handling
  8. Write a tool handler unit test that asserts the structuredContent shape
  9. Build the handler to make it pass
  10. Write an e2e test with the inspector fixture to verify rendering in a real host
  11. Run npx sunpeak test to check everything across hosts
  12. Refactor

Every feature you build this way ships with simulation files, unit tests, and e2e tests already written. There’s no “add tests later” backlog because the tests came first.

sunpeak’s simulation files, inspector, and test fixtures make this loop fast. You write a simulation, write a test, run it, see it fail, build the code, and see it pass. All on localhost in seconds, with no accounts and no credits burned.

Get started with npx sunpeak new and try writing your first simulation file before your first component.

Get Started

Documentation →
npx sunpeak new

Further Reading

Frequently Asked Questions

What is test-driven development for MCP Apps?

TDD for MCP Apps means writing simulation files and tests before building your resource components and tool handlers. You define the expected data shape in a simulation file, write a test that asserts against that shape, then build the component and handler to make the test pass. The MCP architecture naturally supports this because tool output (data) and resource rendering (UI) are separate concerns connected by a defined contract.

How do simulation files support TDD in MCP App development?

Simulation files are JSON files that define a complete tool invocation with controlled inputs and outputs. In a TDD workflow, you write them first to define the data contract between your tool handler and resource component. They specify the tool name, input arguments, and the structuredContent your component will receive. Once written, you can render your component against this data in the inspector and write automated tests against it, all before building the actual tool handler.

Can I practice TDD for MCP Apps without a ChatGPT or Claude account?

Yes. The entire TDD cycle runs locally. Simulation files provide mock data, unit tests run with Vitest and happy-dom, and e2e tests run against a local inspector that replicates ChatGPT and Claude host runtimes. You do not need paid accounts, API keys, or AI credits at any point in the TDD loop.

What is the TDD cycle for an MCP App resource component?

First, write a simulation file with the expected structuredContent. Second, write a test that renders the component and asserts it displays the right output for that data. The test fails because the component does not exist yet. Third, build the component to make the test pass. Fourth, refactor while keeping tests green. Repeat for each new feature, edge case, or error state.

How do I TDD a tool handler for an MCP App?

Write a unit test that imports your tool handler function and calls it with test arguments, then asserts on the returned structuredContent shape. The test fails because the handler does not exist yet. Build the handler to return the expected shape and make the test pass. Mock external API calls with vi.mock() so tests stay fast and deterministic.

How does TDD help with MCP App edge cases and error states?

Write separate simulation files for each edge case: empty data, null fields, error responses, cancelled states, and large datasets. Write tests for each one before building the corresponding UI. This forces you to design error and empty states upfront rather than discovering them after launch. Each simulation file becomes both a test fixture and a visual preview in the inspector.

Does TDD work for cross-host MCP App testing?

Yes. E2e tests written with the inspector fixture from sunpeak/test run against both ChatGPT and Claude host runtimes by default via Playwright projects. When you write a failing test first, it fails on both hosts. When you make it pass, it passes on both hosts. This catches host-specific rendering differences early in the development cycle.

When should I skip TDD for MCP App development?

Skip TDD when you are prototyping a new UI and do not yet know what data shape you need. In that case, build the component first with hardcoded data, then extract the data shape into a simulation file and write tests after. TDD works best when the data contract is clear upfront, which is most of the time for MCP Apps because tool schemas define the contract explicitly.