Testing Authentication in MCP Apps, ChatGPT Apps, and Claude Connectors (May 2026)

May 20, 2026 Abe Wheeler

MCP Apps MCP App Testing ChatGPT Apps ChatGPT App Testing Claude Connectors Claude Connector Testing Authentication OAuth

Testing OAuth and authentication in MCP Apps, ChatGPT Apps, and Claude Connectors.

TL;DR: Test authentication in layers. Unit test your auth() function and token validation with locally minted JWTs (valid, expired, wrong issuer, wrong audience, missing scopes). Test tool handlers by passing a mock extra.authInfo so you can check scoped queries, scope enforcement, and user isolation. Test authenticated and unauthenticated UI states with simulation files and the inspector fixture. Test the discovery document at /.well-known/oauth-protected-resource. Save the full OAuth redirect flow for occasional live tests with ngrok. Almost all of this runs locally and in CI with no host account and no real OAuth provider.

Authentication is the part of an MCP App that developers test the least and worry about the most. The flow has a lot of moving parts: a discovery document, dynamic client registration, an OAuth consent screen, token exchange, token validation, and a refresh cycle. Most of that machinery runs inside the AI host, not your code, which makes it tempting to skip testing and hope the connection works in production.

The good news: you don’t own most of that machinery, so you don’t have to test most of it. The host (ChatGPT, Claude, VS Code) runs the OAuth client. Your identity provider issues the tokens. Your job is narrow: validate the token you receive, attach the right identity to each request, and return only the data that identity is allowed to see. That narrow job is exactly what you can and should test, and almost all of it runs locally without a single network call to an OAuth provider.

This post covers how to test authenticated MCP Apps, ChatGPT Apps, and Claude Connectors at every layer: unit tests for token validation, handler tests with mocked identity, UI tests for authenticated and unauthenticated states, the discovery document, token refresh, and the full OAuth flow when you actually need to run it. If you haven’t added auth yet, read MCP App Authentication first, then come back here to test it.

What You Actually Need to Test

Before writing a single test, draw the line between your code and the host’s code. You only test your side.

The host handles the OAuth flow: it fetches your discovery document, registers as a client, runs the authorization code + PKCE flow, exchanges the code for tokens, refreshes expired tokens, and attaches the access token to every request as Authorization: Bearer <token>. You do not test ChatGPT’s OAuth client. You do not test your identity provider’s token endpoint. Those are someone else’s contract.

Your side is four things, and each maps to a test layer:

Token validation. Your auth() function takes a request, validates the Bearer token, and returns an identity or null. Unit test it.
Authenticated tool behavior. Your handlers read the identity from extra.authInfo and scope their work to that user. Unit test handlers with a mocked authInfo.
Authenticated UI. Your resource component renders differently depending on whether data came back, whether the user is connected, and whether a request returned a 401. Test with simulation files and the inspector fixture.
The discovery document. Your server serves /.well-known/oauth-protected-resource so the host can find your authorization server. Test the JSON shape.

The full redirect flow sits on top of all four, and you verify it with a live test once in a while, not on every commit. Get layers one through four solid and the live test rarely surprises you.

Layer 1: Unit Test Token Validation

In sunpeak, token validation lives in the auth() function you export from src/server.ts. It runs on every MCP request before any tool handler, and it returns an AuthInfo (request allowed) or null (request rejected with a 401). This is the highest-value thing to test because a bug here either locks out every user or lets in requests it should reject.

The challenge: your auth() validates real JWTs signed by your identity provider, and you don’t want your test suite calling Auth0 or Okta. The fix is to mint your own test tokens with a local signing key and point validation at that key during tests.

Here’s an auth() function and the unit tests that cover it:

// src/server.ts
import type { IncomingMessage } from 'node:http';
import type { AuthInfo } from 'sunpeak/mcp';
import { jwtVerify, createRemoteJWKSet } from 'jose';

const JWKS = createRemoteJWKSet(
  new URL(`${process.env.AUTH_ISSUER}/.well-known/jwks.json`)
);

export async function auth(req: IncomingMessage): Promise<AuthInfo | null> {
  const header = req.headers.authorization;
  if (!header?.startsWith('Bearer ')) return null;

  const token = header.slice(7);
  try {
    const { payload } = await jwtVerify(token, JWKS, {
      issuer: process.env.AUTH_ISSUER,
      audience: process.env.AUTH_AUDIENCE,
    });
    return {
      token,
      clientId: payload.sub as string,
      scopes: (payload.scope as string)?.split(' ') ?? [],
    };
  } catch {
    return null;
  }
}

For tests, generate a key pair once and mint tokens with jose. Then stub the JWKS so verification uses your test key instead of fetching a remote one:

// tests/auth.test.ts
import { describe, it, expect, beforeAll, vi } from 'vitest';
import { generateKeyPair, SignJWT, exportJWK } from 'jose';
import type { IncomingMessage } from 'node:http';

const ISSUER = 'https://test-issuer.example.com';
const AUDIENCE = 'https://your-mcp-app.example.com';

let signToken: (claims: Record<string, unknown>, expSec?: number) => Promise<string>;

beforeAll(async () => {
  process.env.AUTH_ISSUER = ISSUER;
  process.env.AUTH_AUDIENCE = AUDIENCE;

  const { publicKey, privateKey } = await generateKeyPair('RS256');
  const publicJwk = await exportJWK(publicKey);

  // Point jose's remote JWKS at our local public key.
  vi.mock('jose', async (orig) => {
    const actual = await orig<typeof import('jose')>();
    return {
      ...actual,
      createRemoteJWKSet: () => async () => publicKey,
    };
  });

  signToken = (claims, expSec = 3600) =>
    new SignJWT(claims)
      .setProtectedHeader({ alg: 'RS256' })
      .setIssuer(ISSUER)
      .setAudience(AUDIENCE)
      .setExpirationTime(`${expSec}s`)
      .sign(privateKey);
});

function reqWith(token?: string): IncomingMessage {
  return {
    headers: token ? { authorization: `Bearer ${token}` } : {},
  } as IncomingMessage;
}

describe('auth()', () => {
  it('rejects requests with no Authorization header', async () => {
    const { auth } = await import('../src/server');
    expect(await auth(reqWith())).toBeNull();
  });

  it('returns AuthInfo for a valid token', async () => {
    const { auth } = await import('../src/server');
    const token = await signToken({ sub: 'user-123', scope: 'read write' });
    const info = await auth(reqWith(token));
    expect(info?.clientId).toBe('user-123');
    expect(info?.scopes).toEqual(['read', 'write']);
  });

  it('rejects an expired token', async () => {
    const { auth } = await import('../src/server');
    const token = await signToken({ sub: 'user-123' }, -10); // expired 10s ago
    expect(await auth(reqWith(token))).toBeNull();
  });

  it('rejects a token from the wrong issuer', async () => {
    const { auth } = await import('../src/server');
    const token = await new SignJWT({ sub: 'user-123' })
      .setProtectedHeader({ alg: 'RS256' })
      .setIssuer('https://evil-issuer.example.com')
      .setAudience(AUDIENCE)
      .setExpirationTime('1h')
      .sign((await generateKeyPair('RS256')).privateKey);
    expect(await auth(reqWith(token))).toBeNull();
  });
});

The cases that matter are the ones that are easy to get wrong:

No header and malformed header. A request with no token and a request with a non-Bearer header should both return null.
Expired token. Set exp in the past and confirm rejection. This is the case that breaks token refresh if you handle it wrong.
Wrong issuer. A valid JWT from a different issuer should fail. Without an issuer check, any token your provider’s tenant has ever signed could pass.
Wrong audience. A token minted for a different application at the same provider should fail. Audience validation is the check developers skip most often, and skipping it is a real vulnerability.
Missing or insufficient scopes. If a tool needs write, a read-only token should not reach it.

These tests run in milliseconds, need no network, and catch the auth bugs that are hardest to notice in manual testing because a token that works today silently stops working when it expires or when you tighten a claim.

Layer 2: Test Authenticated Tool Handlers

Once auth() returns an AuthInfo, sunpeak makes it available to every tool handler at extra.authInfo. Your handler reads extra.authInfo.clientId (or scopes) and uses it to scope its work. You test this without any OAuth at all, because you build the authInfo object yourself and pass it straight in.

A tool handler is a function that takes (args, extra). Mock extra:

// tests/tools/list-invoices.test.ts
import { describe, it, expect, vi } from 'vitest';
import handler from '../../src/tools/list-invoices';
import type { ToolHandlerExtra } from 'sunpeak/mcp';

vi.mock('../../src/lib/db', () => ({
  getInvoicesForUser: vi.fn(async (userId: string) =>
    userId === 'user-a'
      ? [{ id: 'INV-1', amount: 100 }]
      : [{ id: 'INV-9', amount: 999 }]
  ),
}));

function extraFor(clientId: string, scopes: string[] = ['read']): ToolHandlerExtra {
  return { authInfo: { token: 'test', clientId, scopes } } as ToolHandlerExtra;
}

describe('list-invoices handler', () => {
  it('returns invoices scoped to the authenticated user', async () => {
    const result = await handler({}, extraFor('user-a'));
    expect(result.structuredContent.invoices).toEqual([{ id: 'INV-1', amount: 100 }]);
  });

  it('rejects an unauthenticated call', async () => {
    const result = await handler({}, { authInfo: undefined } as ToolHandlerExtra);
    expect(result.content?.[0].text).toMatch(/not authenticated/i);
  });
});

Test user isolation

The most dangerous auth bug is a handler that ignores the authenticated identity and returns everyone’s data. It passes a naive test because the test data looks right. Catch it by calling the same handler with two different users and asserting each sees only its own data:

it('does not leak one user\'s data to another', async () => {
  const a = await handler({}, extraFor('user-a'));
  const b = await handler({}, extraFor('user-b'));

  expect(a.structuredContent.invoices).toEqual([{ id: 'INV-1', amount: 100 }]);
  expect(b.structuredContent.invoices).toEqual([{ id: 'INV-9', amount: 999 }]);
  expect(a.structuredContent.invoices).not.toEqual(b.structuredContent.invoices);
});

If a refactor ever drops the clientId from the database query, this test fails immediately. That’s a leak you want a CI failure to catch, not a support ticket.

Test scope enforcement

If a tool performs a write or returns sensitive data, it should check scopes and refuse a token that lacks them:

it('refuses to delete with a read-only token', async () => {
  const result = await deleteHandler({ id: 'INV-1' }, extraFor('user-a', ['read']));
  expect(result.content?.[0].text).toMatch(/insufficient scope|forbidden/i);
});

For more on token handling and trust boundaries, see Security Testing for MCP Apps. The unit-testing patterns here build on the basics in Unit Testing MCP Apps.

Layer 3: Test Authenticated and Unauthenticated UI States

Your resource component runs in a sandboxed iframe and renders whatever your tool returns. With auth in play, it has more states than an unauthenticated app: connected with data, connected with no data, and a request that came back unauthorized. Each state needs a UI, and each UI needs a test.

The component never sees the OAuth token directly (the host sends it to your server, not your iframe). So from the component’s point of view, “authenticated” just means the data shape it receives. That makes these states easy to drive with simulation files, the JSON fixtures the sunpeak Inspector auto-discovers. Create one simulation per state:

// src/resources/invoice-list/simulations/authenticated.json
{
  "title": "Connected with invoices",
  "output": {
    "invoices": [
      { "id": "INV-1", "amount": 100, "status": "paid" },
      { "id": "INV-2", "amount": 250, "status": "due" }
    ]
  }
}

// src/resources/invoice-list/simulations/needs-auth.json
{
  "title": "Not connected",
  "output": {
    "error": "unauthenticated",
    "invoices": []
  }
}

Run pnpm dev and switch between these in the Inspector sidebar to confirm each renders the right thing: a real list, an empty state, or a “connect your account” prompt. Then lock the behavior in with the inspector fixture so it runs on every commit:

// tests/e2e/invoice-list.spec.ts
import { test, expect } from 'sunpeak/test';

test('shows the invoice list when authenticated', async ({ inspector }) => {
  const result = await inspector.renderTool('list-invoices');
  const app = result.app();
  await expect(app.locator('[data-testid="invoice-row"]')).toHaveCount(2);
});

test('shows a connect prompt when unauthenticated', async ({ inspector }) => {
  const result = await inspector.renderTool('list-invoices', undefined, {
    simulation: 'needs-auth',
  });
  const app = result.app();
  await expect(app.locator('text=Connect your account')).toBeVisible();
});

This matters because the unauthenticated state is the one users hit first, before they’ve connected, and it’s the one developers forget to design. If your component throws on missing data instead of showing a prompt, the user sees a broken iframe on their very first interaction. Test the empty and error states as carefully as the happy path. The full set of loading, error, and 401 states is covered in MCP App Error Handling.

Layer 4: Test the Discovery Document

Hosts find your authorization server by fetching /.well-known/oauth-protected-resource from your MCP server. If that document is missing a field or malformed, the host can’t complete discovery, and the failure usually shows up as a vague “couldn’t connect” with no clue why. A tiny test removes the guesswork.

If your server builds the metadata from a function or constant, unit test the shape:

// tests/discovery.test.ts
import { describe, it, expect } from 'vitest';
import { protectedResourceMetadata } from '../src/well-known';

describe('protected resource metadata', () => {
  it('has the fields hosts require', () => {
    const doc = protectedResourceMetadata();
    expect(doc.resource).toMatch(/^https:\/\//);
    expect(doc.authorization_servers?.length).toBeGreaterThan(0);
    expect(doc.scopes_supported).toContain('read');
    expect(doc.bearer_methods_supported).toContain('header');
  });
});

If the document is served dynamically, add an integration test that fetches it from the running dev server and validates the same fields. Either way, you’re confirming that resource, authorization_servers, scopes_supported, and bearer_methods_supported are present and well-formed before a host ever tries to read them.

Testing Token Expiry and Refresh

Access tokens expire, and the host refreshes them automatically using the refresh token. You don’t implement refresh, so don’t test the refresh exchange. What you do own is the rejection: when an expired token arrives, your server must return a clean 401, not a 500 and not a confusing payload. The host reads the 401 as “refresh and retry,” so a clean rejection is what keeps the user’s session alive.

You already have the expired-token case in your auth() unit tests from Layer 1. Add one assertion that the rejection surfaces as a 401 and not an exception. If your handler reads extra.authInfo and it’s undefined (because auth() returned null), the handler should respond with a clear unauthenticated message rather than dereferencing a missing object and throwing. Test that path explicitly, because a crash here turns a routine token refresh into a broken tool call.

Testing the Full OAuth Flow

Everything above runs in code with no real provider. The one thing you can’t fully fake is the redirect chain, because the host runs the OAuth client and you can’t script ChatGPT’s or Claude’s browser. To verify discovery, dynamic client registration, consent, token exchange, and validation as one connected chain, you need a real host pointed at a reachable server.

Expose your local server with a tunnel:

ngrok http 8000

Add the ngrok URL as a custom connector in ChatGPT (Settings > Apps & Connectors > Create) or Claude, then walk through the consent screen yourself. Your local server receives real Bearer tokens, so you exercise the exact validation path production will use. For Claude specifically, allowlist both https://claude.ai/api/mcp/auth_callback and https://claude.com/api/mcp/auth_callback in your provider, since missing the claude.com URL is a common reason the flow breaks. The Claude Connector OAuth guide has the Claude-specific details.

You can automate a live check against the real runtime with pnpm test:live, which uses Playwright to drive a real conversation:

// tests/live/auth-live.spec.ts
import { test, expect } from 'sunpeak/test/live';

test('authenticated user sees their invoices', async ({ live }) => {
  const app = await live.invoke('Show my invoices');
  await expect(app.locator('[data-testid="invoice-row"]')).toBeVisible({
    timeout: 15_000,
  });
});

Live tests need a real account and a connected session, and they burn AI credits, so run them on release branches or as a manual trigger, not on every push. They’re a safety net for the one layer you can’t reproduce locally, not part of the daily loop.

Running Auth Tests in CI

The point of testing auth in layers is that almost everything runs in CI for free. Your auth() unit tests mint their own JWTs with a key generated in beforeAll, your handler tests build authInfo by hand, and your inspector e2e tests drive UI states from simulation files. None of that needs a real OAuth provider, a host account, or AI credits.

# .github/workflows/test.yml
name: Test
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: pnpm
      - run: pnpm install
      - run: pnpm exec playwright install --with-deps chromium
      - run: pnpm test

Keep the live OAuth test out of this workflow. Put it on a separate manual or scheduled job so a flaky network or an expired test account never blocks a normal merge. See MCP App CI/CD for the full pipeline.

Common Auth Testing Mistakes

Only testing the happy path. A valid token returning data proves almost nothing. The expired token, the wrong audience, and the unauthenticated request are where the bugs hide.

Skipping audience validation in tests. If your test only mints tokens with the right audience, you never confirm that a wrong-audience token is rejected. Add the negative case so the check can’t silently disappear in a refactor.

No user-isolation test. A handler that returns the same data for every clientId passes a single-user test and leaks data in production. Always test with two users.

Testing the host’s job. Don’t write tests for ChatGPT’s OAuth client or your provider’s token endpoint. They’re not your code, and you can’t change them. Test your auth(), your handlers, your UI, and your discovery document.

Calling a real provider in unit tests. Hitting Auth0 or Okta from your test suite makes tests slow and flaky and ties CI to an external service. Mint test tokens locally and stub the JWKS.

Forgetting the unauthenticated UI. The first thing a new user sees is the not-connected state. If your component crashes on missing data, that’s the first impression. Test it like the happy path.

Authenticated MCP Apps fail in quiet ways: a token that expires next week, a query that forgets the user ID, an iframe that throws before the user has connected. Layered tests turn those quiet failures into loud CI failures you catch on the way in. The sunpeak Inspector and the inspector fixture let you run the whole authenticated app locally, so you can build and test auth without a host account or a live OAuth provider until the one moment you actually need one.

Get Started

Documentation →


npx sunpeak new

Frequently Asked Questions

How do I test authentication in an MCP App without a real OAuth provider?

Test in layers. Unit test your auth() function and token validation logic by minting test JWTs with a local signing key and asserting that valid, expired, wrong-issuer, wrong-audience, and missing-scope tokens are handled correctly. Test tool handlers by passing a mock extra.authInfo object directly. Test UI states with simulation files that include authenticated and unauthenticated data. You only need a real OAuth provider for the full end-to-end redirect flow, which you run sparingly with ngrok or live tests.

How do I mock authInfo in MCP App tool handler tests?

Tool handlers receive an extra object as their second argument, and the authenticated identity lives at extra.authInfo. In a unit test, build that object yourself: pass { authInfo: { token: "test", clientId: "user-123", scopes: ["read"] } } as the second argument. This lets you test scoped data queries, scope enforcement, and unauthenticated rejection without running the OAuth flow.

How do I test that one user cannot see another user's data?

Write a unit test that calls your tool handler twice with two different authInfo.clientId values backed by mock data for each user, then assert each call only returns the data belonging to that user. User isolation bugs come from a handler that ignores the authenticated identity or shares a query path, and a two-user test catches them before they leak data in production.

Can I test the OAuth redirect flow locally?

Yes, but you cannot reproduce it purely in code because the AI host (ChatGPT or Claude) runs the OAuth client. Expose your local server with ngrok, register the ngrok URL as a custom connector in ChatGPT or Claude, and walk through the real consent screen. This is the only way to verify discovery, dynamic client registration, consent, token exchange, and validation as a chain. Reserve it for pre-release checks, not everyday development.

How do I test the protected resource metadata document?

Write a test that fetches /.well-known/oauth-protected-resource from your running server (or asserts on the object your server returns) and checks the required fields: resource, authorization_servers, scopes_supported, and bearer_methods_supported. A missing or malformed discovery document makes hosts fail to discover your authorization server, often with a generic error, so a single test that validates the JSON shape saves hours of debugging.

How do I test token expiry and refresh handling?

The host refreshes access tokens automatically using the refresh token, so your job is to reject expired tokens cleanly rather than crash. Unit test that your auth() function returns null for a token whose exp claim is in the past, and confirm an expired token produces a 401 rather than a 500. The host treats the 401 as a signal to refresh and retry, so a clean rejection is what keeps the flow working.

Do I need a Claude or ChatGPT account to test authenticated tools?

Not for most of it. Auth logic, tool handler behavior, scope enforcement, user isolation, and authenticated UI rendering all test locally with no host account by mocking authInfo and using simulation files. The sunpeak Inspector runs your full app locally so you can develop authenticated tools and resources without a subscription. You only need a real account for live tests that exercise the actual OAuth handshake.

How do I run authentication tests in CI/CD?

Generate a test signing key in the test setup, mint JWTs with known claims, and point your token validation at the test key (or inject a test JWKS). Your auth() unit tests, handler tests with mock authInfo, and inspector e2e tests all run in GitHub Actions with no real OAuth provider, no host account, and no AI credits. Keep live OAuth tests on a manual or release-only trigger so CI stays fast and free of external dependencies.