All posts

Adopting AI Without Adopting Tech Debt (June 2026)

Abe Wheeler
MCP Apps ChatGPT Apps Claude Connectors Events CTO AI MCP App Framework
Austin CTO Club Talk Jan 2026

Austin CTO Club Talk Jan 2026

This is a refreshed blog adaptation of a talk I gave at the Austin CTO Club in January 2026.

TL;DR: AI coding agents change the tech debt problem from typing speed to alignment speed. The practical answer is better context, smaller ownership groups, protocol boundaries, and automated tests for the host states your users actually hit. That matters most for MCP Apps, ChatGPT Apps, and Claude Connectors, where the app runs across AI hosts you do not control.

Hi, I’m Abe. I previously sold Trigo, where I was CTO building AI in property tech. I’ve since been building sunpeak, the open-source MCP App framework and MCP testing framework for building and testing MCP Apps, ChatGPT Apps, and Claude Connectors.

My view has changed since the original talk. In early 2026, the hard question was, “How do we let agents write more code without creating a cleanup backlog?” In June 2026, the answer is clearer: treat context, protocols, and tests as first-class product infrastructure.

The New Tech Debt Shape

Classic tech debt came from schedule pressure, weak ownership, missing tests, and design shortcuts. AI adds a different failure mode: a team can now create a lot of plausible code before anyone notices the underlying assumptions have drifted.

The code is usually not bad in isolation. The debt comes from small mismatches:

  • One agent follows an old API pattern while another follows the new one.
  • A feature branch changes a schema but leaves examples, docs, and tests behind.
  • A ChatGPT App works in fullscreen but breaks in picture-in-picture.
  • A Claude Connector uses the right tool schema but forgets the auth, consent, or directory review path.
  • A repo has five copies of the same helper because each agent saw a different slice of the codebase.

Those are alignment bugs. They look like implementation bugs, but the root cause is missing shared context.

Start With Context Packages

Every repo now needs a context package. This is the material a new engineer, reviewer, or coding agent needs before changing the system.

Keep it boring and close to the code:

  • AGENTS.md or README.md for repo-level rules, commands, and writing style.
  • Architecture decision records for choices that should not be relitigated.
  • Short examples of correct tool handlers, resource metadata, auth flows, and tests.
  • A list of known traps, such as stale generated files, host-specific display modes, or schema fields that look optional but are required for submission.
  • Validation commands that agents can run without guessing.

The package should answer one question: “What would a senior engineer say before handing this repo to someone new?”

Model Context Protocol docs now include guidance for building with Agent Skills, which points in the same direction. Static instructions are part of the system. They shape the output, so they deserve review.

Keep Human Teams Smaller

AI gives each engineer more output, but it does not give the team more shared attention. More generated code means more decisions, more review, more edge cases, and more chances for two people to solve the same problem differently.

Smaller teams help because they reduce coordination paths. The better pattern is:

  • Fewer people per system.
  • Larger ownership per engineer.
  • Clear service and package boundaries.
  • Stronger contracts between teams.
  • Fewer shared abstractions that everyone can casually edit.

This is where service boundaries become useful earlier. You do not need a large company before boundaries matter. If a small AI-accelerated team can produce the amount of code a much larger team used to write, it needs some of the same contract discipline.

Treat MCP As Infrastructure

Model Context Protocol is now the common layer for tools, resources, prompts, auth, and app UI. It lets agents and hosts connect to real systems without a custom integration for every host.

That reduces debt because the team can build one protocol surface. It also creates a new debt risk: protocol details move quickly. The MCP roadmap and current docs cover active work around server cards, registry behavior, auth, transport scalability, tasks, and extension support. The official docs also list MCP Apps as an extension for interactive UI applications rendered inside MCP hosts.

The practical rule: isolate MCP details behind a small layer you can test. Do not spread tool metadata, resource URIs, CSP rules, output schemas, auth behavior, and host feature detection across random product components.

For MCP Apps, that layer should own:

  • Tool definitions and JSON Schema input contracts.
  • structuredContent, outputSchema, and tool result _meta.
  • UI resource registration and ui:// resource links.
  • Content security policy, iframe permissions, and stable resource domains.
  • Host capability checks before using host-specific features.
  • App-side tools, model context updates, and display mode requests.

When those pieces are centralized, protocol changes are one upgrade task instead of a scavenger hunt.

MCP Apps Need Cross-Host Tests

MCP Apps put interactive UI inside AI conversations. OpenAI’s Apps SDK reference documents ChatGPT-specific tool, resource, component, and _meta fields, while the official MCP Apps docs describe the broader app lifecycle, host bridge, resources, and UI protocol.

That split is exactly why tech debt appears. Portable MCP App behavior and host-specific behavior sit next to each other:

  • ChatGPT Apps may need OpenAI-specific metadata, display behavior, submission checks, and app directory requirements.
  • Claude Connectors may need Claude-specific connector setup, auth behavior, enterprise data access, and directory review.
  • Other MCP hosts can support different parts of the extension matrix.

Do not treat “works on my local web page” as a release signal. The app needs to work inside host constraints: iframe sandboxing, theme tokens, font loading, safe areas, width limits, display modes, locale, time zone, tool result timing, and model-visible state.

sunpeak’s inspector exists for this loop. It replicates ChatGPT and Claude runtimes locally, works with any MCP server via npx sunpeak inspect --server URL, and lets you switch host, theme, display mode, device width, locale, platform, safe area, and tool state. Simulation files pin tool input and output so tests are deterministic.

That means a normal pull request can run:

The point is not to add testing theater. The point is to stop paying for preventable host bugs during manual review, submission, or production support.

Build a Tech Debt Budget For Agents

Teams should budget for AI debt directly. I would track these signals:

SignalWhy it matters
Review latencyAgents can produce code faster than humans can review it. Queue time is the first warning.
Duplicate implementationsRepeated helpers mean agents are missing context or search paths.
Spec driftProduct, docs, tests, and schemas disagree after a change.
Host-state gapsA feature was tested in one host, theme, display mode, or auth state only.
Stale context filesAgents keep following rules that the team no longer believes.
Skipped validationsThe repo has commands, but agents or humans avoid them because they are slow or flaky.

The fix is usually plain engineering work: delete duplicate code, move repeated rules into shared helpers, make tests faster, update context docs, and narrow ownership.

A Practical Adoption Checklist

Before expanding agent use across a team, I would put these in place:

  1. Write the repo context package.
  2. Add one owner for keeping context files current.
  3. Define which files agents can edit freely and which require human review.
  4. Add deterministic tests for the most common user states.
  5. Add protocol conformance tests for MCP servers and MCP Apps.
  6. Keep host-specific app behavior behind adapters.
  7. Require a short changelog note when an agent changes architecture, auth, schemas, or public APIs.
  8. Review failed agent attempts, not just successful diffs.

For MCP Apps, add a host matrix to every meaningful feature:

StateExamples
HostChatGPT, Claude, and any other supported MCP host
Displayinline, fullscreen, picture-in-picture
Themelight and dark
Widthmobile, tablet, desktop
Dataloading, empty, happy path, error, large result
Authsigned out, expired token, limited permission, fully authorized
Tool behaviorsuccess, validation error, server error, cancellation

If the matrix feels too large, that is a signal to improve fixtures and tests, not a reason to test only the happy path.

Where sunpeak Fits

sunpeak is the layer I wanted while building MCP Apps with agents: a full-stack MCP App framework plus a testing framework that understands MCP host runtimes.

Use it when you want to:

  • Build one MCP App codebase for ChatGPT, Claude, and other MCP hosts.
  • Inspect any MCP server locally without deploying to a host.
  • Test tools, resources, UI states, display modes, themes, safe areas, and host context in CI.
  • Add E2E, visual regression, live host, and eval tests without hand-building a host simulator.
  • Keep protocol details close to the framework instead of spreading them across app code.

The current docs start at sunpeak.ai/docs. For existing MCP servers, the fastest path is:

npx sunpeak inspect --server http://localhost:8000/mcp

For a new app:

npx sunpeak new

AI can make a small team much faster, but speed only helps if the system stays understandable. The strongest teams will pair agent speed with clear context, clean boundaries, and feedback loops that catch host, protocol, and product drift before users do.

Get Started

Documentation →
npx sunpeak new

Further Reading

Frequently Asked Questions

What causes tech debt when teams use AI coding agents?

The main cause is misalignment. Agents can write code quickly, but they only follow the context they can see. If two agents get different architecture rules, outdated product requirements, or incomplete testing instructions, they create drift faster than a human team can review it. The fix is to package context, keep ownership boundaries clear, and make validation automatic.

How should teams manage context for AI coding agents?

Create a small context package for each repo, service, and feature area. Include architecture decisions, product rules, testing commands, style rules, examples, and known failure modes. Keep the package close to the code in markdown files or Agent Skills, then review it like production code because agents will treat it as source material.

Do AI coding agents change how engineering teams should be structured?

Yes. Higher code output makes coordination more expensive. Smaller teams with larger ownership areas work better because fewer people and agents need to agree on each decision. Clear service boundaries, API contracts, and review rules let each group move faster without spreading local choices across the whole system.

How does MCP reduce AI adoption tech debt?

MCP standardizes how agents connect to tools, resources, prompts, auth, and app UIs. That matters because teams can build one protocol layer instead of custom integrations for each host. The debt risk moves to protocol churn, so teams should isolate MCP code behind adapters, track spec changes, and test protocol behavior in CI.

What are MCP Apps, ChatGPT Apps, and Claude Connectors?

MCP Apps are interactive UIs delivered through MCP resources and rendered inside AI hosts. ChatGPT Apps are MCP Apps that run in ChatGPT. Claude Connectors connect Claude to external systems, and interactive connector experiences use the same MCP App ideas: tools, resources, structured results, host context, and sandboxed UI.

How do you avoid tech debt when MCP Apps support multiple hosts?

Separate portable MCP App behavior from host-specific behavior. Put shared tool schemas, resource contracts, structuredContent, output schemas, and UI state in the portable layer. Use small host adapters for ChatGPT-specific metadata, Claude-specific behavior, display modes, CSP, auth, and review requirements. Then test every supported host state before release.

How does sunpeak help teams test MCP Apps without paid host accounts?

sunpeak includes a local inspector that replicates ChatGPT and Claude runtimes. It can inspect any MCP server, load simulation fixtures, switch hosts, themes, display modes, device widths, locale, platform, safe areas, and tool states, then run Playwright E2E tests, visual regression tests, live host tests, and multi-model evals in CI.

What should CTOs track before scaling AI development?

Track review latency, escaped defects, duplicate implementations, spec churn, unsupported host states, stale context files, and tests skipped by agents. These metrics show whether AI is helping the system ship faster or simply moving hidden cleanup work into review, QA, security, and support.