All posts

Accessibility Testing for MCP Apps, ChatGPT Apps, and Claude Connectors (May 2026)

Abe Wheeler
MCP Apps MCP App Testing MCP App Framework ChatGPT Apps ChatGPT App Testing ChatGPT App Framework Claude Connectors Claude Connector Testing Claude Connector Framework Accessibility Testing Accessibility
Accessibility testing MCP App components across hosts, themes, and display modes.

Accessibility testing MCP App components across hosts, themes, and display modes.

MCP Apps render inside iframes that the host controls. Your code does not decide when focus enters the iframe, how the iframe resizes during a display mode transition, or what CSS variables the host injects for colors and fonts. That makes accessibility harder to get right and harder to verify manually, because you’d need to test across multiple hosts, themes, display modes, and interaction patterns.

California’s AB-331 took effect in January 2026 and requires accessibility assessments for public-facing AI systems. The EU Accessibility Act applies to products distributed in EU markets. These regulations apply to MCP Apps just like any other web interface. If your app is listed in the ChatGPT App Store or Claude Connector Directory, users with disabilities will use it.

TL;DR: Use @axe-core/playwright to scan your rendered resource component for WCAG violations. Write keyboard navigation tests that Tab through every interactive element. Test both themes, multiple display modes, and both hosts. Run everything in CI with pnpm test:e2e. No paid accounts, no manual checking.

What Makes MCP App Accessibility Different

Regular web apps control their entire page. You set the document language, manage focus on route changes, and own the full DOM tree. MCP Apps don’t have any of that. Your component mounts inside a sandboxed iframe that the host manages.

This creates specific accessibility challenges:

  • Focus entry and exit are controlled by the host. Your app can’t intercept when the user tabs into or out of the iframe.
  • Display mode transitions move and resize the iframe. A fullscreen transition might happen mid-interaction, shifting the layout under the user’s focus.
  • Host CSS variables set your colors and fonts. If you override them with hardcoded values, you may break contrast ratios that the host’s theme was designed to maintain.
  • Viewport constraints differ between inline (narrow, embedded in chat) and fullscreen (full-width). Tab order that works in one may not work in the other.
  • Cross-host differences mean your component renders in different iframe configurations on ChatGPT versus Claude.

You cannot manually test all of these permutations on every change. Automated tests handle it.

Setting Up axe-core for MCP App Tests

axe-core is the most widely used accessibility engine. The Playwright integration lets you run WCAG checks against rendered pages. Since sunpeak’s inspector fixture renders your resource in a real browser, you can point axe at the rendered output.

Install the dependency:

pnpm add -D @axe-core/playwright

Then write a test that renders your resource and scans it:

import { test, expect } from 'sunpeak/test';
import AxeBuilder from '@axe-core/playwright';

test('resource passes WCAG 2.1 AA', async ({ inspector }) => {
  const result = await inspector.renderTool('get-dashboard', {
    input: { userId: 'test-user' },
    output: { metrics: [{ label: 'Revenue', value: '$12,400' }] },
  });

  const appFrame = result.app();
  const axeResults = await new AxeBuilder({ page: appFrame }).analyze();

  expect(axeResults.violations).toEqual([]);
});

If violations exist, axe reports the exact element, the WCAG rule it breaks, and the impact level (critical, serious, moderate, minor). A common first run on an existing component surfaces 5-10 issues, mostly missing labels on icon buttons and insufficient color contrast on custom colors.

Testing Across Themes and Display Modes

A component that passes axe in light mode might fail in dark mode if you use custom colors that don’t adapt. Similarly, a component that works in fullscreen might break in inline mode where the narrower viewport wraps content into a different layout.

Test every combination your app supports:

const themes = ['light', 'dark'] as const;
const displayModes = ['inline', 'fullscreen'] as const;

for (const theme of themes) {
  for (const displayMode of displayModes) {
    test(`passes WCAG AA in ${theme} ${displayMode}`, async ({ inspector }) => {
      const result = await inspector.renderTool('get-dashboard', {
        input: { userId: 'test-user' },
        output: { metrics: [{ label: 'Revenue', value: '$12,400' }] },
        theme,
        displayMode,
      });

      const axeResults = await new AxeBuilder({ page: result.app() })
        .withTags(['wcag2a', 'wcag2aa'])
        .analyze();

      expect(axeResults.violations).toEqual([]);
    });
  }
}

The withTags filter limits checks to WCAG 2.1 A and AA criteria, which is the standard most regulations reference. You can add 'wcag2aaa' if you want stricter checks, but AA is the practical target.

Keyboard Navigation Testing

Screen reader and keyboard-only users navigate entirely with Tab, Shift+Tab, Enter, Escape, and arrow keys. Your component must support this without trapping focus or skipping interactive elements.

Testing Tab Order

test('tab order follows visual layout', async ({ inspector }) => {
  const result = await inspector.renderTool('get-contacts', {
    input: {},
    output: {
      contacts: [
        { name: 'Alice', email: 'alice@example.com' },
        { name: 'Bob', email: 'bob@example.com' },
      ],
    },
  });

  const app = result.app();

  // Tab into the first interactive element
  await app.keyboard.press('Tab');
  const first = await app.evaluate(() => document.activeElement?.textContent);
  expect(first).toContain('Alice');

  // Tab to next
  await app.keyboard.press('Tab');
  const second = await app.evaluate(() => document.activeElement?.textContent);
  expect(second).toContain('Bob');
});

Testing Focus Traps

A focus trap is when Tab cycles within a component and the user cannot escape. This is correct behavior inside a modal dialog but a bug everywhere else. MCP Apps should never trap focus because the host manages iframe focus boundaries.

test('focus does not get trapped', async ({ inspector }) => {
  const result = await inspector.renderTool('get-dashboard', {
    input: { userId: 'test-user' },
    output: { metrics: [] },
  });

  const app = result.app();
  const interactiveElements = await app.locator(
    'button, a, input, select, textarea, [tabindex]:not([tabindex="-1"])'
  ).count();

  // Tab through all interactive elements plus one more
  for (let i = 0; i <= interactiveElements; i++) {
    await app.keyboard.press('Tab');
  }

  // Focus should have left the component (moved to iframe boundary)
  const activeTag = await app.evaluate(() => document.activeElement?.tagName);
  expect(activeTag).toBe('BODY');
});

Testing Keyboard Interactions

Custom widgets like tabs, dropdowns, and carousels need to support arrow key navigation per WAI-ARIA patterns:

test('tab panel switches with arrow keys', async ({ inspector }) => {
  const result = await inspector.renderTool('get-analytics', {
    input: {},
    output: { tabs: ['Overview', 'Details', 'History'] },
  });

  const app = result.app();

  // Focus the tab list
  await app.keyboard.press('Tab');

  // Arrow right should move to next tab
  await app.keyboard.press('ArrowRight');
  const activeTab = await app.evaluate(
    () => document.activeElement?.getAttribute('aria-selected')
  );
  expect(activeTab).toBe('true');

  const tabLabel = await app.evaluate(() => document.activeElement?.textContent);
  expect(tabLabel).toBe('Details');
});

Color Contrast Testing

The host CSS variables (--color-text-primary, --color-background-primary, etc.) are designed by the host’s design team to meet contrast requirements. If you use them, contrast is handled for you.

The problem comes when you use custom colors for status indicators, charts, badges, or branding elements. These need to meet the 4.5:1 ratio for normal text and 3:1 for large text (WCAG 2.1 SC 1.4.3).

axe-core catches most contrast issues automatically. But if you’re rendering canvas elements or SVG charts, axe can’t inspect them. Test those manually or with a custom contrast check:

test('status badges meet contrast requirements', async ({ inspector }) => {
  const result = await inspector.renderTool('get-status', {
    input: {},
    output: {
      items: [
        { name: 'API', status: 'healthy' },
        { name: 'DB', status: 'degraded' },
        { name: 'Cache', status: 'down' },
      ],
    },
    theme: 'dark',
  });

  const app = result.app();
  const badges = app.locator('[data-testid="status-badge"]');
  const count = await badges.count();

  for (let i = 0; i < count; i++) {
    const badge = badges.nth(i);
    const color = await badge.evaluate((el) => getComputedStyle(el).color);
    const bg = await badge.evaluate(
      (el) => getComputedStyle(el).backgroundColor
    );
    // Parse RGB values and calculate contrast ratio
    const ratio = calculateContrastRatio(parseRGB(color), parseRGB(bg));
    expect(ratio).toBeGreaterThanOrEqual(4.5);
  }
});

The safer approach: use the host’s semantic color tokens (--color-status-success, --color-status-danger, --color-status-warning) which maintain correct contrast in both themes.

ARIA Attribute Testing

Unit tests are the fastest way to verify ARIA attributes because they don’t need a browser. Test that your components output correct attributes directly:

import { render, screen } from '@testing-library/react';
import { describe, it, expect } from 'vitest';
import { Dashboard } from '../src/resources/dashboard/Dashboard';

describe('Dashboard ARIA attributes', () => {
  it('labels icon buttons', () => {
    render(<Dashboard metrics={[]} />);
    const refreshBtn = screen.getByRole('button', { name: /refresh/i });
    expect(refreshBtn).toHaveAttribute('aria-label', 'Refresh dashboard');
  });

  it('marks decorative images as hidden', () => {
    render(<Dashboard metrics={[{ label: 'Users', value: '1,234' }]} />);
    const icons = screen.getAllByRole('img', { hidden: true });
    icons.forEach((icon) => {
      expect(icon).toHaveAttribute('aria-hidden', 'true');
    });
  });

  it('announces dynamic content updates', () => {
    render(<Dashboard metrics={[{ label: 'Users', value: '1,234' }]} />);
    const liveRegion = screen.getByRole('status');
    expect(liveRegion).toHaveAttribute('aria-live', 'polite');
  });

  it('uses correct heading hierarchy', () => {
    render(<Dashboard metrics={[{ label: 'Users', value: '1,234' }]} />);
    const headings = screen.getAllByRole('heading');
    const levels = headings.map((h) => parseInt(h.tagName.replace('H', '')));
    // Verify no heading levels are skipped
    for (let i = 1; i < levels.length; i++) {
      expect(levels[i] - levels[i - 1]).toBeLessThanOrEqual(1);
    }
  });
});

These tests run with pnpm test:unit in milliseconds. They catch regressions immediately when someone removes an aria-label or changes a semantic element to a div.

Screen Reader Announcements

When your component updates dynamically (new data loads, a status changes, an action completes), screen reader users won’t see the visual change. You need aria-live regions to announce updates.

Test that dynamic updates are announced:

test('announces loading completion', async ({ inspector }) => {
  const result = await inspector.renderTool('get-report', {
    input: { reportId: 'q1-2026' },
    output: { title: 'Q1 Report', data: [] },
  });

  const app = result.app();
  const liveRegion = app.locator('[aria-live]');
  const announcement = await liveRegion.textContent();
  expect(announcement).toContain('Report loaded');
});

Common mistakes with live regions:

  • Using aria-live="assertive" for non-urgent updates (use polite unless the information is time-critical)
  • Putting the entire component inside a live region (only the changing text should be in the live region)
  • Announcing every intermediate loading state (announce the final result, not each progress tick)

Focus Management During Display Mode Changes

When the host transitions your component from inline to fullscreen (or back), the iframe resizes and your layout changes. If focus was on an element that moves or disappears during the transition, the user loses their place.

Test that focus stays logical after transitions:

test('focus remains on active element after display mode change', async ({
  inspector,
}) => {
  const result = await inspector.renderTool('get-editor', {
    input: {},
    output: { content: 'Hello world' },
    displayMode: 'inline',
  });

  const app = result.app();

  // Focus an input
  const input = app.locator('input[type="text"]');
  await input.focus();

  // Simulate display mode transition
  await result.setDisplayMode('fullscreen');

  // Verify focus is still on the same logical element
  const focused = await app.evaluate(() => document.activeElement?.tagName);
  expect(focused).toBe('INPUT');
});

If the focused element is removed from the DOM during a layout shift (for example, a mobile-only button that doesn’t render in fullscreen), move focus to the nearest logical alternative. Don’t let it fall to <body> where the user has to Tab through everything again.

Cross-Host Accessibility Testing

ChatGPT and Claude render your iframe differently. They inject different CSS variables, manage focus entry differently, and support different display modes. Test your component on both:

const hosts = ['chatgpt', 'claude'] as const;

for (const host of hosts) {
  test(`passes WCAG AA on ${host}`, async ({ inspector }) => {
    const result = await inspector.renderTool('get-dashboard', {
      input: { userId: 'test-user' },
      output: { metrics: [{ label: 'Revenue', value: '$12,400' }] },
      host,
    });

    const axeResults = await new AxeBuilder({ page: result.app() })
      .withTags(['wcag2a', 'wcag2aa'])
      .analyze();

    expect(axeResults.violations).toEqual([]);
  });
}

Common cross-host issues:

  • Font size differences that push content below minimum size thresholds
  • Border radius and padding differences that affect touch target sizes
  • Background color differences that change contrast ratios for custom colors

Semantic HTML Checklist

Before writing accessibility tests, make sure your component uses semantic HTML. This gets you most of the way without extra work:

Instead ofUse
<div onClick={...}><button onClick={...}>
<div class="link"><a href="...">
<div class="heading"><h2>, <h3>, etc.
<div class="list"><ul> or <ol> with <li>
<div class="input"><input> with <label>
<span class="nav"><nav> with aria-label
<div class="table"><table> with <th scope>

Every native HTML element comes with built-in keyboard support and screen reader semantics. A <button> is focusable, activates with Enter and Space, and announces as “button” to screen readers. A <div> with an onClick handler does none of those things unless you manually add tabIndex, role, onKeyDown, and ARIA attributes.

Running Accessibility Tests in CI

Add accessibility checks to your existing test commands. They run alongside your other tests without extra configuration:

{
  "scripts": {
    "test:a11y": "playwright test --grep @a11y",
    "test:e2e": "playwright test"
  }
}

Tag your accessibility tests with @a11y if you want to run them separately:

test('passes WCAG AA @a11y', async ({ inspector }) => {
  // ...
});

In GitHub Actions, these tests run in the same workflow as your other tests:

- name: Run tests
  run: pnpm test:e2e

No special configuration needed. The sunpeak test runner handles browser setup, host replication, and theme injection. axe-core runs inside the same browser instance. A full accessibility test suite across both hosts, both themes, and two display modes typically finishes in under 30 seconds.

Common Accessibility Bugs in MCP Apps

From reviewing apps submitted to the ChatGPT App Store and Claude Connector Directory, these are the most common accessibility failures:

Icon buttons without labels. A button with only an SVG icon inside it announces as “button” with no context. Add aria-label:

<button aria-label="Close panel" onClick={onClose}>
  <XIcon aria-hidden="true" />
</button>

Custom colors that fail in dark mode. A green #22c55e on white (#ffffff) has a 3.1:1 ratio in light mode (fails WCAG AA for normal text). The same green on dark mode’s #1a1a1a has a 4.8:1 ratio (passes). Test both.

Div-based clickable elements. A <div onClick={...}> is invisible to keyboard users and screen readers. Always use <button> or <a>.

Missing heading hierarchy. Jumping from <h2> to <h4> confuses screen reader users who navigate by heading level. Don’t skip levels.

Auto-playing content. MCP Apps that start animations or auto-scroll on load violate WCAG 2.2.2. Provide a way to pause, or don’t autoplay.

Hover-only interactions. Tooltips or menus that only appear on hover are inaccessible to keyboard and touch users. Add focus triggers too.

A Complete Accessibility Test File

Here’s a full test file you can adapt for your project:

import { test, expect } from 'sunpeak/test';
import AxeBuilder from '@axe-core/playwright';

const themes = ['light', 'dark'] as const;
const displayModes = ['inline', 'fullscreen'] as const;
const hosts = ['chatgpt', 'claude'] as const;

const toolFixture = {
  input: { userId: 'test-user' },
  output: {
    metrics: [
      { label: 'Active Users', value: '2,341' },
      { label: 'Revenue', value: '$48,200' },
    ],
    lastUpdated: '2026-05-06T10:00:00Z',
  },
};

// WCAG automated checks across all combinations
for (const host of hosts) {
  for (const theme of themes) {
    for (const displayMode of displayModes) {
      test(`WCAG AA: ${host}/${theme}/${displayMode} @a11y`, async ({
        inspector,
      }) => {
        const result = await inspector.renderTool('get-dashboard', {
          ...toolFixture,
          host,
          theme,
          displayMode,
        });

        const axeResults = await new AxeBuilder({ page: result.app() })
          .withTags(['wcag2a', 'wcag2aa'])
          .analyze();

        expect(axeResults.violations).toEqual([]);
      });
    }
  }
}

// Keyboard navigation
test('all interactive elements reachable via Tab @a11y', async ({
  inspector,
}) => {
  const result = await inspector.renderTool('get-dashboard', toolFixture);
  const app = result.app();

  const expectedCount = await app
    .locator(
      'button, a[href], input, select, textarea, [tabindex]:not([tabindex="-1"])'
    )
    .count();

  const focusedElements = new Set<string>();
  for (let i = 0; i < expectedCount + 2; i++) {
    await app.keyboard.press('Tab');
    const id = await app.evaluate(
      () => document.activeElement?.id || document.activeElement?.textContent
    );
    if (id) focusedElements.add(id);
  }

  expect(focusedElements.size).toBe(expectedCount);
});

// Focus management after display mode change
test('focus preserved after display mode transition @a11y', async ({
  inspector,
}) => {
  const result = await inspector.renderTool('get-dashboard', {
    ...toolFixture,
    displayMode: 'inline',
  });

  const app = result.app();
  const firstButton = app.locator('button').first();
  await firstButton.focus();
  const buttonText = await firstButton.textContent();

  await result.setDisplayMode('fullscreen');

  const focusedText = await app.evaluate(
    () => document.activeElement?.textContent
  );
  expect(focusedText).toBe(buttonText);
});

This file generates 8 WCAG tests (2 hosts x 2 themes x 2 display modes), plus keyboard navigation and focus management tests. It runs in CI in under 30 seconds and catches the majority of accessibility regressions.

Start With the Host Variables

The easiest way to make your MCP App accessible is to use the host’s design system. The host CSS variables were designed to meet accessibility requirements: correct contrast ratios, readable font sizes, appropriate spacing for touch targets. When you use var(--color-text-primary) on var(--color-background-primary), the contrast math is already done.

Layer automated tests on top to catch the places where you diverge from the host system: custom colors, dynamic content, keyboard interactions, and display mode transitions. That combination covers the full WCAG surface without manual testing against real hosts.

Get Started

Documentation →
npx sunpeak new

Further Reading

Frequently Asked Questions

Do MCP Apps need to be WCAG compliant?

Yes. MCP Apps render as web content inside iframes, so they fall under the same accessibility regulations as any web application. In the US, Section 508 and ADA case law apply. California AB-331 (effective January 2026) requires accessibility assessments for public-facing AI systems. The EU Accessibility Act requires compliance for products sold in EU markets. If your MCP App is publicly distributed through the ChatGPT App Store or Claude Connector Directory, it should meet WCAG 2.1 AA at minimum.

How do I run accessibility tests for MCP Apps in CI/CD?

Install @axe-core/playwright as a dev dependency and import it in your test files. Use the sunpeak inspector fixture to render your resource component, then run axe against the rendered iframe. These tests run in seconds with pnpm test:e2e and work in GitHub Actions without paid accounts or API keys. Add tests for each host, theme, and display mode combination your app supports.

How do I test keyboard navigation in an MCP App?

Use Playwright actions (page.keyboard.press Tab, Shift+Tab, Enter, Escape, Arrow keys) against your rendered resource in the sunpeak inspector. Assert that focus moves in a logical order, that all interactive elements are reachable via keyboard, and that focus never gets trapped inside a component. Test in both inline and fullscreen display modes because layout changes can break tab order.

How do I test color contrast in MCP App dark mode?

Render your resource with theme set to dark in your test options and run axe-core color contrast checks against the rendered output. If you use host CSS variables (--color-text-primary, --color-background-primary) the host handles contrast automatically. But custom colors, status indicators, and data visualizations need explicit testing in both themes. The sunpeak inspector fixture lets you test both themes in the same test file.

What ARIA attributes should MCP App components use?

Use aria-label on icon buttons without visible text, aria-hidden on decorative elements, role attributes on custom interactive widgets, aria-live regions for dynamic content updates, and aria-expanded/aria-controls for collapsible sections. MCP Apps that use native HTML elements (button, a, input, select) inherit correct roles automatically. Only add ARIA when you are building custom widgets that lack native semantics.

How does display mode affect MCP App accessibility?

Display mode transitions (inline to fullscreen, or to picture-in-picture) change the layout and available space, which can break focus order, hide interactive elements, or create scroll traps. Test that focus stays on a logical element after a display mode transition, that all interactive elements remain reachable, and that the component does not depend solely on hover interactions that are unavailable in inline or PiP modes.

Can I use axe-core to test MCP Apps?

Yes. Install @axe-core/playwright and run it against the iframe rendered by the sunpeak inspector fixture. Call new AxeBuilder({ page: result.app() }).analyze() to get a list of WCAG violations. You can filter by impact level (critical, serious, moderate, minor) and by specific rules. This catches most automated accessibility issues including missing labels, invalid ARIA, insufficient contrast, and incorrect heading structure.

What accessibility issues are unique to MCP Apps compared to regular web apps?

MCP Apps run inside host-controlled iframes, which means focus management at the iframe boundary is handled by the host. Your app cannot control how focus enters or leaves the iframe. Display mode transitions can move or resize the iframe unexpectedly. Host-injected CSS variables may change between renders. Your component needs to handle all of these without losing keyboard focus or breaking screen reader announcements. Test across multiple hosts because ChatGPT and Claude manage iframe focus differently.