Forem: Abe Wheeler

How do you know budget models are smart enough for your MCP server?

Abe Wheeler — Tue, 14 Apr 2026 18:23:12 +0000

We just shipped evals for sunpeak.ai

The #1 thing I hear from MCP server teams: “Our tools worked great with the latest models, but we had to start from scratch when we realized the free models couldn't use them at all.”

Budget models call tools differently: they misread ambiguous schemas, they pass wrong arguments, they can't chain tool calls, and you don’t find out until users complain.

sunpeak evals test your MCP server across every model that matters, in one command. 100% on GPT-4o, 40% on Gemini Flash. That 40% is a schema problem you’d never catch testing manually on ChatGPT. Fix the tool architecture + description, run it again, and watch it climb to 95%.

Works with any MCP server. sunpeak connects over MCP, discovers your tools, and runs each eval case dozens of times per model so you get a real pass rate, not a single lucky result.

Put it in CI. Track reliability over time. Your MCP server isn’t production-ready until the cheapest model your users might connect it to can use it consistently.

🚀 MCP App Testing Framework!

Abe Wheeler — Wed, 08 Apr 2026 17:15:13 +0000

We just shipped sunpeak.ai as a standalone testing framework for MCP Apps!

If you're building MCP Apps for ChatGPT or Claude, you know the pain: deploy, open the host, start a conversation, trigger the tool, check the result. Repeat for both hosts, both themes, three display modes. That's 24 combinations per code change.

sunpeak replicates the ChatGPT and Claude runtimes locally. You write Playwright tests that call tools, render resources, and assert against the output. One test file runs against both hosts automatically.

What's included:

Unit tests (Vitest + happy-dom)
E2E tests against replicated ChatGPT and Claude runtimes
Visual regression testing with screenshot baselines
Live tests against real ChatGPT
Works with any MCP server in any language (Python, Go, TypeScript)

Add it to an existing project with one command:

pnpm add -g sunpeak && sunpeak test init

No paid host accounts. No AI credits. Runs in CI/CD.

MIT licensed and open source! https://sunpeak.ai/testing-framework/

MCP Apps are hard to test

Abe Wheeler — Wed, 25 Mar 2026 16:39:12 +0000

MCP Apps are hard to test.

They run inside ChatGPT and Claude, so every code change means deploying to a real host, burning AI credits, and waiting through non-deterministic LLM responses. If you're building for both hosts, double everything.

We built the sunpeak Inspector to fix this.

It replicates the ChatGPT and Claude MCP App runtimes on localhost. Your app renders exactly as it would inside the real hosts, with accurate display modes, themes, safe areas, and conversation chrome. One command to start:

sunpeak inspect --server URL

Works with any MCP server. Python, TypeScript, Go, whatever. No sunpeak project required.

For development: Switch between ChatGPT and Claude from the sidebar. Toggle light/dark themes, mobile/tablet/desktop widths, and display modes. Edit tool input and output live. Changes appear instantly with HMR.

For testing: The inspector doubles as the test runtime for Playwright E2E tests. Define tool states with simulation files (JSON fixtures), load them via URL, and assert against the rendered output. Test every host, theme, and display mode combination in CI/CD. No paid accounts, no API keys, no credits on your CI runners.

For coding agents: Claude Code, Codex, and Cursor can run the inspector and execute Playwright tests programmatically, so they can iterate on MCP Apps without needing a human to manually test in a real host.
sunpeak is MIT licensed and open source.

Live Testing for Claude Connectors and ChatGPT Apps

Abe Wheeler — Wed, 18 Mar 2026 18:43:24 +0000

The sunpeak simulator tests cover a lot. They replicate the ChatGPT and Claude runtimes, run display mode transitions, test themes, and validate tool invocations without any paid accounts or AI credits. For most development work, they're enough.

But simulators don't catch everything. Real ChatGPT wraps your app in a nested iframe sandbox. The MCP protocol goes through ChatGPT's actual connection layer. Resource loading happens over a real network with production builds. There's a gap between "works in the simulator" and "works in ChatGPT," and the only way to close it is to test against the real thing.

sunpeak 0.16.23 adds live testing: automated Playwright tests that run against real ChatGPT. You write the same kind of assertions you write for simulator tests, and sunpeak handles authentication, MCP server refresh, host-specific message formatting, and iframe traversal.

TL;DR: Run pnpm test:live with a tunnel active. sunpeak imports your browser session, starts the dev server, refreshes the MCP connection, and runs your tests/live/*.spec.ts files in parallel against real ChatGPT. You write assertions against the app iframe. Everything else is automated.

What Live Tests Actually Do

A live test opens a real ChatGPT session in a browser, types a message that triggers your MCP tool, waits for ChatGPT to call it, and then asserts against the rendered app inside the host's iframe.

Here's a complete live test for an albums resource:

import { test, expect } from 'sunpeak/test';

test('albums tool renders photo grid', async ({ live }) => {
  const app = await live.invoke('show-albums');

  await expect(app.getByText('Summer Slice')).toBeVisible({ timeout: 15_000 });
  await expect(app.locator('img').first()).toBeVisible();

  // Switch to dark mode without re-invoking the tool
  await live.setColorScheme('dark', app);
  await expect(app.getByText('Summer Slice')).toBeVisible();
});

live.invoke('show-albums') starts a new chat, sends /{appName} show-albums to ChatGPT, waits for the LLM response to finish streaming, waits for the app iframe to render, and returns a Playwright FrameLocator pointed at your app's content. From there, it's standard Playwright assertions.

The { timeout: 15_000 } accounts for the LLM response time. ChatGPT needs to process your message, decide to call the tool, receive the result, and render the iframe. In practice this takes 5 to 10 seconds.

Prerequisites

You need three things:

A ChatGPT account with MCP/Apps support (Plus or higher)
A tunnel tool like ngrok or Cloudflare Tunnel
Your MCP server connected in ChatGPT (Settings > Apps > Create, enter your tunnel URL with /mcp path)

You do not need to install anything extra in your sunpeak project. Live test infrastructure ships with sunpeak starting at v0.16.23. New projects scaffolded with sunpeak new include example live test specs and the Playwright config.

Running Live Tests

Open two terminals:

# Terminal 1: Start a tunnel
ngrok http 8000

# Terminal 2: Run live tests
pnpm test:live

On first run, sunpeak imports your ChatGPT session from your browser. It checks Chrome, Arc, Brave, and Edge automatically. If no valid session is found, it opens a browser window and waits for you to log in. The session is saved to tests/live/.auth/chatgpt.json and reused for 24 hours.

After authentication, sunpeak:

Starts sunpeak dev --prod-resources (production resource builds)
Navigates to ChatGPT Settings > Apps, finds your MCP server, and clicks Refresh
Runs all tests/live/*.spec.ts files fully in parallel, each in its own chat window

The MCP refresh happens once in globalSetup, before any test workers start. This means your test workers don't each individually refresh the connection, which would be slow and flaky.

The Fixture API

All live tests import from sunpeak/test:

import { test, expect } from 'sunpeak/test';

The test function provides a live fixture with:

Method	What it does
`invoke(prompt)`	Starts a new chat, sends the prompt with host-specific formatting, waits for the app iframe, returns a `FrameLocator`
`sendMessage(text)`	Sends a message in the current chat with `/{appName}` prefix
`sendRawMessage(text)`	Sends a message without any prefix
`startNewChat()`	Opens a fresh conversation
`waitForAppIframe()`	Waits for the MCP app iframe and returns a `FrameLocator`
`setColorScheme(scheme, appFrame?)`	Switches to `'light'` or `'dark'` via `page.emulateMedia()`
`page`	Raw Playwright `Page` object

Most tests only need invoke and setColorScheme. The invoke method handles the full flow: new chat, message formatting (ChatGPT requires /{appName} before your prompt), waiting for streaming to finish, waiting for the nested iframe to render, and returning a locator into your app's content.

Theme Testing Without Re-Invocation

Sending a second message to trigger a new tool call is slow and burns credits. setColorScheme avoids that by switching the browser's prefers-color-scheme via Playwright's page.emulateMedia(). ChatGPT propagates the change into the iframe, and your app re-renders with the new theme.

test('ticket card text stays readable in dark mode', async ({ live }) => {
  const app = await live.invoke('show-ticket');

  const title = app.getByText('Search results not loading on mobile');
  await expect(title).toBeVisible({ timeout: 15_000 });

  // Verify status badge and assignee are visible in light mode
  await expect(app.getByText('in progress')).toBeVisible();
  await expect(app.getByText('Sarah Chen')).toBeVisible();

  // Switch to dark mode — common bugs: text blends into background,
  // borders disappear, badge colors lose contrast
  await live.setColorScheme('dark', app);

  // Same elements should still be visible with the new theme applied
  await expect(title).toBeVisible();
  await expect(app.getByText('in progress')).toBeVisible();
  await expect(app.getByText('Sarah Chen')).toBeVisible();

  // Badge background should still be distinguishable from the card
  const badge = app.locator('span:has-text("high")');
  const badgeBg = await badge.evaluate(
    (el) => window.getComputedStyle(el).backgroundColor
  );
  expect(badgeBg).not.toBe('rgba(0, 0, 0, 0)');
});

The second argument to setColorScheme tells it to wait for the app's <html data-theme="dark"> attribute to confirm the theme propagated through the iframe boundary before your assertions run.

A Full Example

Here's a live test for a review card resource. It invokes the tool, checks the rendered content, verifies a button interaction triggers a state transition, and confirms the card re-themes correctly in dark mode:

import { test, expect } from 'sunpeak/test';

test('review card renders and handles approval flow', async ({ live }) => {
  const app = await live.invoke('review-diff');

  // Verify the card rendered with the right content
  const title = app.locator('h1').first();
  await expect(title).toBeVisible({ timeout: 15_000 });
  await expect(title).toHaveText('Refactor Authentication Module');

  // Action buttons present
  const applyButton = app.getByRole('button', { name: 'Apply Changes' });
  await expect(applyButton).toBeVisible();

  // Theme switch: card should stay readable in dark mode
  await live.setColorScheme('dark', app);
  await expect(title).toBeVisible();
  await expect(applyButton).toBeVisible();

  // Click Apply Changes — UI transitions to accepted state
  await applyButton.click();
  await expect(applyButton).not.toBeVisible({ timeout: 5_000 });
  await expect(
    app.locator('text=Applying changes...').first()
  ).toBeVisible({ timeout: 5_000 });
});

This catches real issues that simulator tests can miss: the iframe sandbox blocking a script load, a theme change not propagating through the nested iframe boundary, or a button click failing because of host-specific event handling.

The Playwright Config

The live test config is a one-liner:

// tests/live/playwright.config.ts
import { defineLiveConfig } from 'sunpeak/test/config';

export default defineLiveConfig();

This generates a full Playwright config with:

globalSetup pointing to sunpeak's auth and MCP refresh flow
headless: false because chatgpt.com blocks headless browsers
Anti-bot browser arguments and a real Chrome user agent
2-minute timeout per test (LLM responses can be slow)
1 retry per test (LLM responses are non-deterministic)
Fully parallel execution (each test gets its own chat)
Automatic dev server with --prod-resources on a dynamically allocated port

You can pass options to customize the environment:

export default defineLiveConfig({
  colorScheme: 'dark',
  viewport: { width: 1440, height: 900 },
  locale: 'fr-FR',
  timezoneId: 'Europe/Paris',
  geolocation: { latitude: 48.8566, longitude: 2.3522 },
  permissions: ['geolocation'],
});

How It Relates to Simulator Tests

Live tests don't replace simulator tests. They complement them.

	Simulator (`pnpm test:e2e`)	Live (`pnpm test:live`)
Runs against	Local simulator	Real ChatGPT
Speed	Seconds	10-30 seconds per test
Cost	Free	Requires ChatGPT Plus
CI/CD	Yes	Not recommended (needs auth)
Catches	Component logic, display modes, themes, cross-host layout	Real MCP connection, LLM tool invocation, iframe sandbox, production resource loading

Use simulator tests for development and CI/CD. Use live tests before shipping, after major changes, or when debugging issues that only reproduce in the real host.

The Testing Pyramid for Claude Connectors

A Claude Connector built with sunpeak now has three test tiers:

Unit tests (pnpm test): Vitest, jsdom, fast, test component logic in isolation
Simulator e2e tests (pnpm test:e2e): Playwright against the local ChatGPT and Claude simulator, test display modes and themes, runs in CI/CD
Live tests (pnpm test:live): Playwright against real ChatGPT (with Claude coming soon), test real MCP protocol behavior and iframe rendering

Each tier catches different classes of bugs. Unit tests catch logic errors. Simulator tests catch rendering and layout issues across hosts and display modes. Live tests catch protocol and sandbox issues that only show up in the real host environment.

All three are pre-configured when you run sunpeak new. You don't need to set up Vitest, Playwright, or any test infrastructure yourself.

Host-Agnostic Architecture

The live test infrastructure is designed to support multiple hosts. The live fixture resolves the correct host page object based on the Playwright project name. All host-specific DOM interaction (selectors, login flow, settings navigation, iframe nesting) lives in per-host page objects that sunpeak maintains.

Your test code is host-agnostic:

import { test, expect } from 'sunpeak/test';

test('my resource renders', async ({ live }) => {
  const app = await live.invoke('show me something');
  await expect(app.locator('h1')).toBeVisible();
});

This same test will run against any host that sunpeak supports. Today that's ChatGPT. When Claude live testing ships, add it with one line:

// tests/live/playwright.config.ts
export default defineLiveConfig({ hosts: ['chatgpt', 'claude'] });

No changes to your test files.

Getting Started

If you have an existing sunpeak project, update to v0.16.23 or later:

pnpm add sunpeak@latest && sunpeak upgrade

Create tests/live/playwright.config.ts:

import { defineLiveConfig } from 'sunpeak/test/config';
export default defineLiveConfig();

Add the test script to package.json:

{
  "scripts": {
    "test:live": "playwright test --config tests/live/playwright.config.ts"
  }
}

Write your first live test in tests/live/your-resource.spec.ts:

import { test, expect } from 'sunpeak/test';

test('my tool renders correctly in ChatGPT', async ({ live }) => {
  const app = await live.invoke('your prompt here');
  await expect(app.locator('your-selector')).toBeVisible({ timeout: 15_000 });
});

Start a tunnel, run pnpm test:live, and watch Playwright drive a real ChatGPT session.

New projects created with sunpeak new include all of this out of the box, with example live tests for every starter resource.

Claude Simulator for MCP Apps: Test Claude Apps Locally with sunpeak

Abe Wheeler — Mon, 02 Mar 2026 15:15:24 +0000

TL;DR: sunpeak v0.15 adds a local Claude simulator. Run sunpeak dev, pick Claude from the Host dropdown (or ?host=claude URL Param), and test your MCP App in both Claude and ChatGPT from one dev server.

Until now, sunpeak's local simulator only replicated the ChatGPT runtime. If you wanted to test how your MCP App looked in Claude, you had to deploy it and connect it manually. That's fixed.

sunpeak v0.15 ships with first-class Claude support. The old ChatGPTSimulator is now just Simulator, and both Claude and ChatGPT are registered as host shells out of the box. Switch between them with a dropdown, a URL param, or a prop.

What Changed

The simulator is now multi-host. Instead of a single ChatGPT-specific component, sunpeak uses a pluggable host shell system. Each host registers its own conversation chrome, color palette, and theme behavior. The Simulator component renders whichever host you select.

Two hosts ship by default:

ChatGPT uses the familiar gray/white palette with the ChatGPT conversation layout.
Claude uses a warm beige/cream palette matching claude.ai, with Claude's conversation chrome.

Both implement the core MCP App protocol, but each host adds its own extras. ChatGPT supports host-specific features like file uploads and downloads on top of the standard. Claude doesn't have additional host APIs today, though sunpeak's Claude host shell does handle Claude's rendering quirks. If Claude adds host-specific capabilities in the future, they'll be built into this shell. Your resource component renders in both, wrapped in each host's chat UI, so you see exactly what your users will see.

How to Use It

If you already have a sunpeak project, update to v0.15 and migrate your CSS classes:

sunpeak upgrade

Then run the dev server:

sunpeak dev

Open localhost:3000. You will see a Host dropdown in the simulator sidebar. Select Claude to test your app in the Claude runtime. Select ChatGPT to switch back.

If you are starting fresh:

pnpm add -g sunpeak
sunpeak new
cd my-app && sunpeak dev

The scaffolded project uses the new Simulator component by default. Both hosts are available from the first run.

Host Selection

Three ways to pick a host:

Sidebar dropdown. The Host control appears in the sidebar when multiple hosts are registered. Click it to switch at runtime.

URL parameter. Add ?host=claude or ?host=chatgpt to the simulator URL. This is useful for bookmarking a specific host, linking teammates to a particular test configuration, or testing certain rendering states automatically.

defaultHost prop. Set the initial host in code:

import { Simulator } from 'sunpeak/simulator';

<Simulator
  simulations={simulations}
  defaultHost="claude"
/>

The default is chatgpt if you don't specify one.

Migrating from ChatGPTSimulator

If your project uses the old ChatGPTSimulator from sunpeak/chatgpt, it still work as an alias to the new simulator. No migration is required, but the alias will be removed in the near future.

The change is small. In your dev entry point, replace:

// Before
import { ChatGPTSimulator } from 'sunpeak/chatgpt';

<ChatGPTSimulator simulations={simulations} />

with:

// After
import { Simulator } from 'sunpeak/simulator';

<Simulator simulations={simulations} />

Same simulations, same resource components, same test suite. The Simulator just adds the host dropdown and Claude's rendering behavior.

What Your App Looks Like in Claude

The Claude host shell wraps your resource component in Claude's conversation UI. The background uses Claude's warm beige and grey instead of ChatGPT's white and dark grey. User messages appear in Claude's bubble style. The toolbar and display mode controls (inline, fullscreen, picture-in-picture) work the same way.

The core data flow is shared across hosts. useToolData receives the tool output. useAppState syncs state back to the host. SafeArea handles safe rendering boundaries. These work the same in both Claude and ChatGPT.

Where hosts differ is in extras. ChatGPT supports host-specific features like file uploads and downloads that go beyond the MCP App standard. Claude has its own rendering quirks that sunpeak's host shell accounts for. If Claude adds host-specific APIs later, sunpeak will surface them through the same shell system.

The simulator lets you catch these differences locally instead of deploying to find out.

Extensible Host System

The host shell registry is open. If a new major MCP App host appears, sunpeak can add support without changing the Simulator component or your resource code. Each host registers itself with an id, a label, a conversation component, a theme function, and style variables. The simulator picks up all registered hosts automatically.

For now, Claude and ChatGPT cover the two largest MCP App hosts.

Getting Started

pnpm add -g sunpeak
sunpeak new
cd my-app && sunpeak dev

Open localhost:3000, select Claude from the Host dropdown, and start building.

How to Build a Claude App covers architecture and code patterns for Claude.
ChatGPT App Tutorial walks through building a resource from scratch (same steps work for Claude).
Building One MCP App for ChatGPT and Claude covers the cross-platform story.
Testing guide covers Vitest and Playwright setup.
sunpeak documentation has the quickstart and full API reference.

Build once with sunpeak, test locally in both Claude and ChatGPT, and ship to every MCP App host.

sunpeak is all-in on MCP Apps

Abe Wheeler — Wed, 11 Feb 2026 17:18:22 +0000

MCP Apps now run in ChatGPT, Claude, Goose, and VS Code. That happened fast. Claude announced MCP App support on January 26, and ChatGPT followed on February 4. Two weeks, two major hosts, one standard.

TL;DR: sunpeak's APIs are built around the MCP App standard. ChatGPT and Claude-specific features are layered on top as optional imports. Write your app once, run it everywhere—even localhost.

Why MCP-App-First

When ChatGPT Apps launched in October 2025, they had their own proprietary SDK. Building for ChatGPT meant building only for ChatGPT.

That changed when OpenAI contributed to and aligned on MCP Apps as the open standard. The rendering model, the iframe sandbox, the UI functionality — all of it became portable. And as of February 2026, the major hosts actually implemented it.

sunpeak followed the same trajectory. We started as a ChatGPT App framework because ChatGPT was the only major host supporting embedded UIs. Now we're an MCP App framework, because the standard is real and the host list is growing.

What that means in practice: sunpeak's core APIs target the MCP App interface, not any single host. At the same time, sunpeak layers in the major host-specific functionality developers need to seamlessly support differentiated platforms (think React Native).

How It Works

sunpeak separates standard MCP App APIs from host-specific ones at the import level.

Core APIs come from the top-level sunpeak import. These work everywhere:

import { useToolData, useHostContext, useDisplayMode, AppProvider } from 'sunpeak';
import type { ResourceConfig } from 'sunpeak';

export const resource: ResourceConfig = {
  name: 'dashboard',
  description: 'Show analytics dashboard',
};

export function DashboardResource() {
  const { output } = useToolData();
  const context = useHostContext();
  const displayMode = useDisplayMode();

  return <div>{/* Your UI — runs in ChatGPT, Claude, Goose, VS Code */}</div>;
}

Host-specific APIs come from subpath imports. Right now that's sunpeak/chatgpt for ChatGPT-specific tooling:

import { ChatGPTSimulator, buildDevSimulations } from 'sunpeak/chatgpt';

The ChatGPT simulator, dev simulation builder, and any ChatGPT-only runtime features live here. They're first-class — not afterthoughts or community plugins — but they don't pollute your app code. Your resource components stay portable.

What Changed, What Didn't

If you're already using sunpeak, you'll notice v0.13 changes many APIs to be based on MCP App abstractions and nomenclature.
Fortunately, a lot of Apps SDK knowledge is easily portable to the MCP App interface.
Refer to the release notes for more specific migration instructions,
and refer to the sunpeak docs for a complete overview of sunpeak and MCP Apps.

With these changes, your app renders in Claude (and others) today. The sunpeak dev simulator at localhost:6767 replicates the MCP App runtime that all hosts implement, so what works locally works in production across hosts.

The Host Landscape

Here's where MCP Apps run today:

ChatGPT — OpenAI contributed elements of the original ChatGPT Apps protocol to MCP and now supports the open standard alongside their existing SDK.
Claude — Anthropic's web and desktop clients render MCP Apps natively.
Goose — Block's open-source AI agent supports MCP Apps.
VS Code Insiders — Microsoft's editor renders MCP Apps in the chat sidebar.

More hosts will follow. The MCP App standard is under the Linux Foundation now, and the spec is actively developed at modelcontextprotocol/ext-apps.

Platform-Specific Features Are First-Class

MCP-App-first doesn't mean lowest-common-denominator. ChatGPT has features that Claude doesn't, and vice versa. sunpeak treats these as first-class extensions, not hacks.

For ChatGPT, that means full access to OpenAI's apps-sdk-ui component library, the ChatGPT simulator for local development, and any ChatGPT-specific runtime APIs. These are maintained alongside the core framework, tested in CI, and documented.

As Claude and other hosts ship their own platform-specific features, sunpeak will add corresponding subpath imports. The pattern scales: core stays portable, extensions stay organized.

Get Started

sunpeak is open source and free.

pnpm add -g sunpeak && sunpeak new

Your app works across ChatGPT, Claude, and every other MCP App host from the first line of code.

Documentation: guides, API reference, and tutorials
GitHub: source code and issue tracker
MCP App Framework: overview of sunpeak's capabilities

The Complete Guide to Testing ChatGPT Apps

Abe Wheeler — Tue, 03 Feb 2026 19:46:39 +0000

Testing ChatGPT Apps presents unique challenges. Your UI runs inside ChatGPT's runtime, responds to tool invocations, and adapts to multiple display modes and themes. Without proper testing infrastructure, you're deploying blind.

TL;DR: Use sunpeak's built-in testing with Vitest for unit tests (pnpm test) and Playwright for e2e tests (pnpm test:e2e). Define states in simulation files, test across display modes with createSimulatorUrl, and run everything in CI.

This guide covers everything you need to test ChatGPT Apps and MCP Apps with confidence.

Why Testing ChatGPT Apps is Different

ChatGPT Apps run in a specialized runtime environment. Your React components don't just render in a browser—they render inside ChatGPT's Apps SDK runtime with:

ChatGPT frontend state - Inline, in picture-in-picture, and fullscreen display modes, light or dark theme, etc.
Tool invocations - ChatGPT calls your app's tools with specific inputs
Backend state - Various possible states for users and sessions in your database
Widget state - Persistent state that survives across invocations

Testing each combination manually isn't feasible, the combinatorics are brutal.
You need automated testing that covers all these scenarios.

Setting Up Your Testing Environment

If you're using the sunpeak framework, testing is pre-configured. Start with:

pnpm add -g sunpeak && sunpeak new
cd my-app

Your project includes:

Vitest configured with jsdom, React Testing Library, and jest-dom matchers
Playwright configured to test against the ChatGPT simulator
Simulation files in tests/simulations/ for deterministic states

Unit Testing with Vitest

Unit tests validate individual components in isolation. Run them with:

pnpm test

Create tests alongside your components in src/resources with the .test.tsx extension:

import { render, screen } from '@testing-library/react';
import { Counter } from '../src/resources/counter-resource';

describe('Counter', () => {
  it('renders the initial count', () => {
    render(<Counter />);
    expect(screen.getByText('0')).toBeInTheDocument();
  });

  it('increments when button is clicked', async () => {
    render(<Counter />);
    await userEvent.click(screen.getByRole('button', { name: /increment/i }));
    expect(screen.getByText('1')).toBeInTheDocument();
  });
});

Unit tests run fast and catch component-level bugs early. They're ideal for testing:

Component rendering logic
User interactions within a component
Props and state handling

End-to-End Testing with Playwright

E2E tests validate your app running in the ChatGPT simulator. Run them with:

pnpm test:e2e

Create tests in tests/e2e/ with the .spec.ts extension:

import { test, expect } from '@playwright/test';
import { createSimulatorUrl } from 'sunpeak/chatgpt';

test('counter increments in fullscreen mode', async ({ page }) => {
  await page.goto(createSimulatorUrl({
    simulation: 'counter-show',
    displayMode: 'fullscreen',
    theme: 'dark',
  }));

  await page.getByRole('button', { name: /increment/i }).click();
  await expect(page.getByText('1')).toBeVisible();
});

The createSimulatorUrl utility generates URLs with your test configuration:

simulation - Your simulation file name (sets initial state)
displayMode - inline, pip, or fullscreen (tests display adaptation)
theme - light or dark (tests theme handling)
deviceType - mobile, tablet, desktop, or unknown (tests responsive behavior)
touch / hover - Enable or disable touch/hover capabilities
safeAreaTop, safeAreaBottom, etc. - Simulate device notches and insets

Creating Simulation Files

Simulation files define deterministic states for testing. Create them in tests/simulations/{resource-name}/:

{
  "userMessage": "Show me a counter starting at 5",
  "tool": {
    "name": "show_counter",
    "description": "Displays an interactive counter",
    "inputSchema": {
      "type": "object",
      "properties": {
        "initialCount": { "type": "number" }
      }
    }
  },
  "callToolRequestParams": {
    "arguments": { "initialCount": 5 }
  },
  "callToolResult": {
    "content": [{ "type": "text", "text": "Counter displayed" }],
    "structuredContent": {
      "count": 5
    }
  }
}

This simulation:

Shows userMessage in the simulator chat interface
Defines the tool with its name and input schema
Sets callToolRequestParams with mock input accessible via useToolInput()
Provides callToolResult with mock data passed to your component via useWidgetProps()

Use simulations to test specific states without manual setup:

// Test the counter with structuredContent.count = 5
await page.goto(createSimulatorUrl({ simulation: 'counter-show' }));
await expect(page.getByText('5')).toBeVisible();

// Test a different initial state
await page.goto(createSimulatorUrl({ simulation: 'counter-initial' }));
await expect(page.getByText('0')).toBeVisible();

Testing Across Display Modes

ChatGPT Apps appear in three display modes. Test all of them:

const displayModes = ['inline', 'pip', 'fullscreen'] as const;

for (const displayMode of displayModes) {
  test(`renders correctly in ${displayMode} mode`, async ({ page }) => {
    await page.goto(createSimulatorUrl({
      simulation: 'counter-show',
      displayMode,
    }));

    await expect(page.getByRole('button')).toBeVisible();
  });
}

Each mode has different constraints:

Inline - Limited height, embedded in chat
Picture-in-picture - Floating window, can be repositioned
Fullscreen - Maximum space, modal overlay

Your app should adapt gracefully to each.

Testing Theme Adaptation

Test both light and dark themes:

test('adapts to dark theme', async ({ page }) => {
  await page.goto(createSimulatorUrl({
    simulation: 'counter-show',
    theme: 'dark',
  }));

  // Verify dark theme styles are applied
  const button = page.getByRole('button');
  await expect(button).toHaveCSS('background-color', 'rgb(255, 184, 0)');
});

Running Tests in CI/CD

Add testing to your GitHub Actions workflow:

name: Test
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: pnpm/action-setup@v2
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'pnpm'

      - run: pnpm install
      - run: pnpm test
      - run: pnpm exec playwright install --with-deps
      - run: pnpm test:e2e

Playwright tests automatically:

Start the sunpeak dev server
Wait for it to be ready
Run tests against the ChatGPT simulator
Shut down when complete

Debugging Failing Tests

When tests fail, use these debugging techniques:

Playwright Debug Mode

pnpm test:e2e --ui

Opens a visual debugger where you can:

Step through tests
Inspect the DOM at each step
See screenshots and traces

Vitest Verbose Output

pnpm test --reporter=verbose

Shows detailed output including:

Individual assertion results
Component render output
Error stack traces

Screenshot on Failure

Playwright automatically captures screenshots on failure. Find them in test-results/.

Testing Best Practices

One assertion per test. Keep tests focused and easy to debug:

// Good: focused test
test('increment button is visible', async ({ page }) => {
  await page.goto(createSimulatorUrl({ simulation: 'counter-show' }));
  await expect(page.getByRole('button', { name: /increment/i })).toBeVisible();
});

// Avoid: multiple unrelated assertions
test('counter works', async ({ page }) => {
  // Too many things being tested at once
});

Test behavior, not implementation. Focus on what users see:

// Good: tests user-visible behavior
await expect(page.getByText('5')).toBeVisible();

// Avoid: tests implementation details
await expect(component.state.count).toBe(5);

Use descriptive test names. Make failures self-explanatory:

// Good: clear failure message
test('displays error message when API call fails', ...)

// Avoid: vague description
test('handles error', ...)

Clean up between tests. Reset state to avoid test pollution:

afterEach(async () => {
  // Reset any global state
});

Next Steps

Testing is essential for shipping reliable ChatGPT Apps and MCP Apps. With sunpeak's testing infrastructure, you can:

Run unit tests with Vitest for fast feedback
Run e2e tests with Playwright for full integration coverage
Test across display modes, themes, and device types
Integrate testing into your CI/CD pipeline

Get started with sunpeak:

pnpm add -g sunpeak && sunpeak new

Why You Need a ChatGPT App Framework

Abe Wheeler — Thu, 29 Jan 2026 00:41:27 +0000

ChatGPT Apps are a new UI paradigm: your code renders directly inside the ChatGPT conversation. But building them from scratch means solving the same infrastructure problems every time you start a project. A ChatGPT App framework changes that.

TL;DR: Sunpeak is the first ChatGPT App framework. It gives ChatGPT App developers the same developer experience that Next.js gives web developers: a simulator, components, CLI scaffolding, testing, and deployment.

What Is a ChatGPT App Framework?

A ChatGPT App framework provides the development infrastructure for building applications that use OpenAI's Apps SDK runtime. The Apps SDK defines how ChatGPT Apps work: the protocol, the rendering model, the communication between your MCP server and ChatGPT. A framework builds on top of that to give you the tooling you actually need to develop, test, and ship apps.

Think of it like the relationship between React and Next.js. React is the rendering library. Next.js gives you routing, server-side rendering, a dev server, and deployment. You can build a React app without Next.js, but most teams don't, because the framework handles the infrastructure so you can focus on your product.

A ChatGPT App framework does the same thing for the Apps SDK.

The Pain of Building Without a Framework

If you've tried building a ChatGPT App from the official resources alone, you've hit these problems:

No local testing. The only way to see your app render is to connect it to the real ChatGPT, which requires a paid ChatGPT Plus or Team subscription with developer mode. Every change means tunneling your local server, refreshing ChatGPT, and waiting for the round trip.

No component library. OpenAI provides apps-sdk-ui, a low-level React component library. But it gives you primitives, not production-ready components. You're rebuilding common patterns from scratch every time.

No project structure. Every project starts from zero. There's no standard way to organize your resources, tools, and configuration. You're making structural decisions before you've written a line of product code.

No testing story. You can't run automated tests against the ChatGPT interface. There's no way to verify your app renders correctly in CI. Manual testing through the ChatGPT UI is the only option.

No deployment pipeline. Getting your app from local development to production means manually configuring an MCP server, setting up hosting, and wiring everything together.

What a ChatGPT App Framework Gives You

Sunpeak maps each of those pain points to a concrete solution:

Local runtime simulator. Run sunpeak dev and open localhost:6767. You get a full ChatGPT simulator that renders your app exactly as ChatGPT would, with no paid account, no tunneling, and no round trips.

sunpeak dev
# Simulator running at http://localhost:6767
# MCP server running at http://localhost:6766

Apps SDK UI components. Sunpeak includes production-ready components built on top of OpenAI's apps-sdk-ui. Cards, carousels, forms, and layouts, so you're not starting from scratch.

CLI scaffolding. Run sunpeak new and get a working project with dependencies installed, configuration set up, and a starter app ready to modify.

pnpm add -g sunpeak && sunpeak new

Testing support. Write tests with Vitest and Playwright that run against the simulator. Verify your UI renders correctly in CI without connecting to ChatGPT.

Deployment via Resource Repository. Sunpeak's Resource Repository gives you a deployment target for your app's resources, so you can ship without manually wiring MCP servers.

Building With vs. Without a Framework

Here's what the developer workflow looks like side by side:

Without a framework:

# Set up a new project
mkdir my-app && cd my-app
npm init -y
npm install @modelcontextprotocol/sdk express
# Manually configure MCP server...
# Manually set up React rendering...
# Manually build UI components...
# Set up ngrok tunnel to test in ChatGPT...
# Hope it renders correctly...

With sunpeak:

# Set up a new project
pnpm add -g sunpeak && sunpeak new
cd my-app && pnpm install

# Develop with instant feedback
sunpeak dev
# Open localhost:6767, done.

The difference isn't just fewer commands. It's fewer decisions, fewer things to debug, and fewer things that can go wrong before you've started building your actual product.

When You Don't Need a Framework

A framework isn't always the right choice.

If you're building a simple server-side-only MCP tool that doesn't render any UI in ChatGPT, you don't need sunpeak. A plain MCP server with a few tool handlers is straightforward to set up with just the @modelcontextprotocol/sdk package.

If you're writing a one-off script or experimenting with the protocol, going framework-free is fine.

But the moment you're building a UI that renders inside ChatGPT, especially one you plan to maintain and ship to users, a framework pays for itself immediately.

Get Started

Sunpeak is open source and free to use.

Documentation: guides, API reference, and tutorials
GitHub: source code and issue tracker
ChatGPT App Framework: overview of sunpeak's capabilities

pnpm add -g sunpeak && sunpeak new

Storybook for ChatGPT Apps

Abe Wheeler — Wed, 14 Jan 2026 18:31:42 +0000

If you've built React applications, you probably know Storybook—the tool that lets you develop UI components in isolation, share them with your team, and iterate without spinning up your entire app. Today we're bringing that same workflow to ChatGPT Apps.

The Problem with ChatGPT App Development

Building ChatGPT Apps has a painful feedback loop. To see your changes, you need to:

Build your resources
Deploy / run your MCP server
Refresh your ChatGPT connector
Start a new ChatGPT conversation
Create the right conversation state
Configure the perfect state in your database to illustrate a single scenario

That's a lot of friction for checking if a button is the right shade of blue.

Worse, sharing your work-in-progress with teammates or stakeholders means they need access to your MCP server,
mastery of your technical data model, and the patience to navigate through the same steps.

Enter the sunpeak simulator

Local Development

The flagship sunpeak simulator was originally for local development only.
In the simulator, each resource in your app gets its own preview.
Switch between inline, fullcreen, and pip display modes instantly.
Test light and dark themes. No ChatGPT account required.

sunpeak dev

This starts a local development server with hot reloading. Every save updates the preview immediately.

Hosted Storybook

The Sunpeak Resource Repository now hosts the sunpeak simulator to run your ChatGPT App resources in an isolated environment.
Think of it as a higher-level Storybook for ChatGPT Apps: you can preview every resource, test different display modes, and share a link with your teammates.

Once you push your resources to the repository, your teammates can try them out at the provided link:

sunpeak push -t design-review

Pushing 4 resource(s) to repository "Sunpeak-AI/sunpeak"...
Tags: design-review

✓ Pushed albums, 1 simulation(s), tags: design-review
  https://app.sunpeak.ai/resources/5e57bbe6-b4a5-4895-9f10-81b667740b78
✓ Pushed carousel, 1 simulation(s), tags: design-review
  https://app.sunpeak.ai/resources/f5304085-46d2-4b96-9173-ad865523862b
✓ Pushed map, 1 simulation(s), tags: design-review
  https://app.sunpeak.ai/resources/95087582-be0a-45b2-80ec-16d439b380eb
✓ Pushed review, 3 simulation(s), tags: design-review
  https://app.sunpeak.ai/resources/c329195b-23ea-4577-8116-32b52de37f13

Share your resource URLs with your team. Designers can review the UI without touching code. Product managers can validate the flow without configuring MCP servers. Engineers can debug tool responses in isolation.

Collaborate on Behavior

The simulator isn't just for visuals or static states. You can mock tool inputs and outputs to test how your app responds to different states:

What does the app look like when a tool returns an error?
How does the UI handle a slow response?
Does the loading state feel right?

Configure these scenarios once and share them with your team for feedback.

Why This Matters

Storybook transformed frontend development by making components shareable and testable in isolation. ChatGPT Apps deserve the same treatment.

With the sunpeak simulator, you can:

Iterate faster: See changes instantly without the deploy-refresh-navigate dance
Collaborate earlier: Get feedback on designs before they hit production
Test edge cases: Mock different tool responses without backend changes
Document behavior: Create shareable previews that serve as living documentation

Get Started

The simulator is available now in the sunpeak resource repository. If you're already using Sunpeak, sunpeak push your resources to the repository.

New to sunpeak? Check out the quickstart guide to get your first ChatGPT App running in minutes.

MCP Needs a Browser

Abe Wheeler — Mon, 05 Jan 2026 20:29:13 +0000

MCP isn’t the perfect protocol, but I’ll leave it to other people to complain about it. It has adoption and that is all that matters—our systems can be connected. Sometimes they are connected. But MCP tool use has not remotely broken into the mainstream. Why?

The consumer experience around MCP is horrendous.

Discovery: Imagine your parents proactively and willingly taking on the task of “connecting to the Facebook MCP server”, even through relatively simple UIs. The act of searching and the subject of the search are essentially dealbreakers for non-technical users.

Even if users exceed the necessary technical bar, and even if users know exactly what they want done, they don’t know how to do it. They’re welcome to search the many lists of lists of lists of MCP servers, but it’s a lot of work and unlikely to surface trustworthy, stable results.

For real, production MCP use today, we essentially rely on developers to proactively integrate MCP servers in the background so we can unwittingly use these servers via the web servers of products we’re already using. Imagine being able to use any given website only after a Google engineer found time & motivation to integrate it into google.com.

MCP needs a search engine & proactive connection embedded in the model.
Connection: Imagine if, every time you went to a website, you had to read a security notice, a privacy notice, approve a terms & conditions popup, and review the structure of JSON payloads the website will be making. This has become more true over time as a consumer (thanks, EU), but the actual browser-server connection itself remains virtually permission-less. MCP servers are servers, not client-side applications. Connecting to a server should be as easy as entering a URL in the browser.

Obviously, seamless MCP server connection has major security implications. The models & their MCP clients need to be architected to be more sandboxed and trust-less. Ultimately, the protection of the user & user data falls almost entirely within the purview of the model provider. They’ve got the users, the data, and the access to protect, and the new paradigms & architectures will have to flow from them.

MCP needs to make connection more like a browser than an app store. This requires substantial protections built into the model.
Use: Imagine if, on a webpage, you had to manually trigger the correct sequence of API calls to deliver the proper user experience. With MCP, models are left with that impossible task. Invisible dependencies, edge cases, the permutations & combinatorics of all possible tool calls. Such a task is nontrivial even for relatively simple, newer products, let alone massive, complex, legacy systems and all of the unintuitive tech debt they’ve accrued.

Further, imagine if, in using a webpage, every input to and output from that page had to pass through a model. Would you use such a webpage to wire rent money? Models are non-deterministic. They can be wrong (less and less over time, but they always will). In most systems, there’s at least one action that you want to be direct-to-server and 100% deterministic.

MCP needs to let server providers own parts of the client within the model.

All of the fundamental blockers to MCP have one thing in common: they’re totally dependent on the model provider to implement. Fortunately, OpenAI is on the right track.

ChatGPT Apps bring MCP one step closer to having a “browser”, but it doesn’t go all the way. I suspect that this is the direction that we’re heading. As with all macro trends, it will take us a while to get there.

MCP is very young, ChatGPT Apps are younger, and the Apps of today are only weeks old. Everything will get a LOT better. We’re building sunpeak to help. https://sunpeak.ai is the ChatGPT App framework that helps developers quickstart, build, test, and ship ChatGPT Apps. Please star us on Github!

Introducing the Sunpeak Resource Repository

Abe Wheeler — Tue, 23 Dec 2025 19:35:25 +0000

Today we're launching the Sunpeak Resource Repository—ECR for ChatGPT Apps.

Why Decouple Your App from Your MCP Server?

ChatGPT Apps are built on MCP servers, but your UI resources don't need to live alongside your server code. Decoupling them provides:

Generic MCP servers: Keep your production MCP server generic & largely client agnostic
Independent lifecycles: Clearly indicate which code changes and version tags require ChatGPT App submission reviews and which are entirely MCP server-side
Team collaboration: Designers and frontend devs can push UI changes without touching server infrastructure and vice versa
Independent deployments: Update your UI without redeploying your server

Getting Started

Authenticate

sunpeak login

This opens your browser for secure OAuth authentication. Sunpeak stores local configuration in ~/.sunpeak/.

Push Resources

After building your resources with sunpeak build, push them to the repository:

sunpeak push

Tag your resources for versioning:

sunpeak push -t v1.0.0 -t staging

Pull Resources

Retrieve resources by tag from any directory:

sunpeak pull -r myorg/my-app -t prod

This downloads the JavaScript bundles and metadata files, ready for deployment.

Common Workflows

Rollback to a previous version:

sunpeak pull -r myorg/my-app -t v1.0.0
sunpeak deploy # Shorthand for sunpeak push -t prod

Promote staging to production:

sunpeak pull -r myorg/my-app -t staging
sunpeak deploy

Get Started

Ready to try it? Head to sunpeak.ai to learn more, or jump straight into the web application to create your account.

Ship a ChatGPT App in 2 commands

Abe Wheeler — Wed, 17 Dec 2025 13:37:53 +0000

With sunpeak, you can start and ship a ChatGPT App with two commands:

Initialize your project:
```
pnpm dlx sunpeak new
```
Inside your project, start your mcp server:
```
pnpm mcp
```

Your ChatGPT App UI and mock data server is now up and running.

If you’re running the server on your local machine, you’ll need to expose that MCP server so ChatGPT can access it. Do so with a free account from ngrok:

ngrok http 6766

Lastly, you need to point ChatGPT to your new app. From your ChatGPT account, proceed to: User > Settings > Apps & Connectors > Create

You need to be in developer mode to add your App, which requires a paid account. If you don’t have a paid account, you can just develop your App locally with pnpm dev instead of pnpm mcp.

You can now connect ChatGPT to the ngrok Forwarding URL at the /mcp path (e.g. https://your-random-subdomain.ngrok-free.dev/mcp). Your App is now connected to ChatGPT! Send /sunpeak show carousel to ChatGPT to see your UI in action!