Forem: Michael Egberts

Which Gemma 4 Variant Should Power Your MCP Agent?

Michael Egberts — Sat, 16 May 2026 14:35:48 +0000

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

I’m writing this from a phone, on vacation. That’s not a flex — it’s the point.

I run an MCP server in production: WebsitePublisher.ai, 55+ tools, 9 AI platforms connected. This afternoon I opened Google AI Studio on my phone, selected Gemma 4 26B, gave it our tool schemas, and asked it to build a bakery website. It returned six structured tool calls. I executed them. The site went live.

No laptop. No terminal. No IDE. Just a phone, a model, and a protocol.

That experience crystallized something I’ve been thinking about: Gemma 4 doesn’t ship one model — it ships four, each sized for a different deployment reality. The question isn’t whether open-weight models can power MCP agents. It’s which Gemma 4 variant fits which part of your agent stack.

What is MCP?

Model Context Protocol is an open standard (JSON-RPC based) that lets AI models call external tools through a universal interface. Instead of each AI platform building proprietary integrations, you build one MCP server and every compatible AI can use it.

Our MCP server exposes tools like create_page, upload_asset, create_record, configure_form, and execute_integration. When a model connects, it can create web pages, manage structured data, handle form submissions, and trigger third-party services — all through standardized tool calls.

Any model that can produce structured output can be an MCP client. The question is: how well does it handle the work once connected?

The Gemma 4 Lineup

	E2B	E4B	26B A4B	31B Dense
Active params	~2B	~4B	3.8B (26B total, MoE)	31B
Context	128K	128K	256K	256K
Audio input	Yes	Yes	No	No
Min RAM	~4 GB	~8 GB	~16 GB	~24 GB
Runs on	Phone, RPi	Laptop	Dev workstation	GPU server

All Apache 2.0. No usage caps, no MAU thresholds.

The Test: Gemma 4 26B Builds a Website from a Phone

Here’s exactly what happened.

I opened Google AI Studio on my iPhone, selected Gemma 4 26B A4B IT, and pasted a prompt containing five of our MCP tool schemas (create_page, create_entity, create_record, configure_form, list_pages) along with this instruction:

“A user says: Build me a simple landing page for my bakery called ‘Golden Crust’. Include a short intro, three signature breads, and a contact form. Respond with the exact sequence of MCP tool calls.”

Gemma 4 returned six tool calls in valid JSON:

create_entity — defined a “bread” data model with name and description fields
create_record x3 — added Sourdough, French Baguette, and Honey Whole Wheat
create_page — generated a full HTML landing page with inline CSS, product listings, and a contact form
configure_form — set up the contact form with name, email, and message fields

Every tool call used the correct parameter structure. The ordering was logical: data model first, then records, then the page that references them, then the form configuration. The HTML included sensible styling, warm bakery colors, and properly structured sections.

I copied the tool calls into Claude (the AI assistant I use for MCP execution), ran them against our server, and the site went live at gemma-test.websitepublisher.ai.

Total time from prompt to live website: under 10 minutes. From a phone. Over 5G.

What This Test Actually Proves

Let me be precise about what this demonstrates and what it doesn’t.

It proves: Gemma 4 26B can parse MCP tool schemas, reason about task decomposition, produce correctly structured tool calls, and sequence them in a logical order — all without any fine-tuning on our specific tools. This is zero-shot tool use on a real production API.

It doesn’t prove: That Gemma 4 can handle a live MCP connection autonomously. In this test, I manually copied the tool calls and executed them. The model generated the plan; I was the middleware.

That distinction matters, and it’s where the variant comparison gets interesting.

Mapping Variants to MCP Agent Roles

Based on running MCP across 9 AI platforms and watching models of every size class interact with our tools, here’s how I’d think about placing each Gemma 4 variant:

E2B: The Front Door

With ~2B active parameters, E2B fits as the trigger: the component that understands intent and dispatches a single tool call. A voice command on a phone — “publish my latest blog post” — parsed and routed to the right MCP tool. One intent, one call, one response.

The native audio input is the differentiator. For voice-triggered MCP agents on battery-constrained devices, this is the size class that makes sense.

Likely sweet spot: Single-tool dispatcher. Voice-triggered agent entry point.
Likely limitation: Multi-step chains where context from earlier calls matters.

E4B: The Local Workhorse

This is where local MCP agents become genuinely useful. Running on any modern laptop, handling single-step tool calls with good reliability.

Based on what I’ve seen at this parameter range: straightforward create-and-deploy loops work well. Where models this size show limits is context-dependent sequences — “build a five-page site with consistent navigation” requires maintaining consistency across multiple creation steps.

Likely sweet spot: Local development agent. Content creation. Moderate single-step tool calls.
Likely limitation: Multi-page orchestration requiring consistency across 4+ sequential calls.

26B A4B: The Efficiency Sweet Spot

This is the variant I tested. And it delivered.

Six sequential tool calls, all correctly structured, logically ordered, with a coherent HTML output that referenced the data model it had just created. That’s not trivial — it requires the model to hold its own plan in context and execute against it consistently.

The MoE architecture (activating only 3.8B parameters per token while drawing on 26B total) and the 256K context window make this variant particularly suited for MCP work. Tool schemas are large — our 55+ tools consume significant context before the model even starts reasoning. The 256K window gives comfortable headroom.

But the bakery test was deliberately simple. Our MCP server also exposes 13 e-commerce integrations — product catalogs, shopping carts, checkout flows, payment processing via Stripe or Mollie, invoice generation, inventory tracking, and more. Building a full webshop means orchestrating these proven software building blocks in sequence: the AI picks the right pieces and combines them into a working application. We call this wave coding — not prompting and praying like vibe coding, but riding deliberate waves of AI-assembled, production-tested components. Each wave builds on the last. That’s where a model like the 26B earns its place: enough reasoning depth to orchestrate 6-8 integration calls reliably, enough context to hold the full picture.

Proven sweet spot: Multi-step tool orchestration. Production agent server. The “right answer” for most self-hosted MCP deployments.
Likely limitation: Highly creative or ambiguous tasks where raw reasoning power matters more than efficiency.

31B Dense: The Precision Architect

Every token touches all 31B parameters — no routing, no sparsity. Slower, heavier, but the strongest reasoner in the family.

For MCP agent work, this class earns its compute in two scenarios: architecture-level planning where the sequence of tool calls matters as much as individual calls, and fine-tuning for domain-specific tool patterns. The dense architecture makes fine-tuning more predictable than MoE.

Where 31B pulls ahead of 26B is full wave coding sessions — building an entire webshop from brief to live, orchestrating 15+ sequential integration calls while maintaining consistency across product data, payment configuration, email templates, and frontend pages. That’s the kind of sustained, multi-layer orchestration where every additional parameter matters.

Likely sweet spot: Complex project planning. Full wave coding orchestration. Fine-tuned domain agents.
Likely limitation: Cost and latency. For tasks where 26B delivers equivalent results, you’re burning compute you don’t need.

What I Learned About Model Size and Tool Calling

Running MCP across 9 platforms, one pattern stands out: for simple tool calls, model size barely matters. A “create this page” request succeeds with roughly the same reliability across model classes.

Where model size becomes decisive is orchestration depth — the number of sequential, context-dependent tool calls a model can chain before losing coherence. At two to three calls, almost anything works. Past six calls, only the stronger reasoners maintain consistency.

Open-weight models give you something closed APIs never will: the ability to match model weight to task weight. Route simple status checks to E4B and complex builds to 31B. Your agent gets smarter and cheaper at the same time.

That’s the real unlock of open-weight + MCP: you own both the brain and the hands.

The Decision Framework

Need voice or audio input?
Then E2B (phone/IoT) or E4B (laptop)

How many sequential tool calls per task?
1-3: E4B — fast, light, capable
4-8: 26B A4B — tested and proven
8+: 31B Dense — when orchestration quality justifies compute

Fine-tuning for a specific domain?
31B Dense — dense fine-tunes more predictably than MoE

Budget-constrained?
26B A4B. Almost always the answer.

What’s Next

While testing, I discovered that MCP Playground — an online tool for testing MCP servers — lists both Gemma 4 26B and 31B as available models. Our server connects and authenticates successfully. Once we resolve a token compatibility issue on our end, this will enable fully automated testing: type a prompt, Gemma 4 calls our MCP tools directly, website appears. No copy-paste middleware needed.

That’s the trajectory: from “model generates a plan I execute manually” to “model executes the plan autonomously through MCP.” Gemma 4’s native function calling support, combined with MCP’s standardized tool protocol, makes this path viable on fully open-source infrastructure.

If you want to start experimenting:

Gemma 4 models — Google AI Studio, Ollama, Hugging Face
MCP specification — modelcontextprotocol.io
An MCP server to test against — WebsitePublisher.ai has a free tier with 55+ tools

Pick the variant that fits your hardware. Connect it to a real MCP server. The benchmarks start mattering a lot less once you’re watching a model build something real.

Built and tested entirely from a phone. On vacation. Because that’s what open protocols and open models make possible.

WAVE Coding: Why we built 78 integrations for AI instead of letting AI build them

Michael Egberts — Thu, 14 May 2026 19:38:45 +0000

Every week I see another "I built a SaaS in 4 hours with AI" post. And every week, the comments are the same: "Cool, but does the Stripe integration actually work?"

Usually it doesn't.

That's vibe coding. You prompt, you hope, and you pray that the AI correctly implements a payment flow it's never actually tested. It hallucinates webhook handlers. It guesses at email configs. It builds checkout flows that break on the first real transaction.

We took the opposite approach.

The puzzle piece pattern
We're building WebsitePublisher.ai — a platform where AI assistants build and publish websites and web applications through conversation. Available on 9 AI platforms (Claude, ChatGPT, Gemini, Cursor, Windsurf, GitHub Copilot, Grok, Mistral, n8n) via MCP protocol.
In the last 16 days, we shipped 78 integrations. Each one is a self-contained puzzle piece — proven software running on our web servers. AI doesn't generate the integration code. AI calls the integration.
Here's what that looks like in practice:

User: "I need a webshop with Stripe payments and order confirmation emails"

AI selects:
→ product-catalog (MAPI entity + helpers)
→ shopping-cart (session-based)
→ checkout-flow (orchestration engine)
→ stripe (payment processing)
→ invoice-generator (PDF + accounting)
→ email-templates (Resend rendering)

Result: 6 tested puzzle pieces combined into a working application.

Zero hallucinated Stripe webhooks. Zero guessed SMTP configs. The heavy lifting happens in proven software on the server.

Why this matters
The fundamental problem with vibe coding is that AI is asked to do two things at once:

Understand what you want (AI is great at this)
Implement reliable infrastructure (AI is terrible at this)

WAVE coding separates the two. AI handles #1 — understanding your intent and selecting the right puzzle pieces. The proven software handles #2 — the actual Stripe calls, the email delivery, the database queries.

What's in the 78 puzzle pieces
Some highlights from what shipped in 16 days:
E-commerce stack:
Product catalog, shopping cart, checkout flow, order management, inventory tracking, shipping (MyParcel), invoice generation, discount codes
Communication:
SMTP email, email templates, contact forms, multi-layer spam protection
Data layer:
Server-side rendering for SEO, batch update/delete endpoints, data grids with validation
AI layer:
Coach (guided website creation), concept generation, streaming chat
Platform hardening:
DNSSEC, request tracing, error envelope standardization, security hardening

Each piece follows the same pattern: a handler receives the endpoint, input, and project ID. Dependencies are explicit. No magic.

The results
In the same 16 days:

7 new customers onboarded
World Cup prediction game deployed for PSV Supporters (30,000 members)
Visual editor upgrade shipped
Coach AI guidance system improved

We're calling it WAVE coding

Not because it's a clever acronym. Because each application you build is a wave — one deliberate push that combines existing puzzle pieces into something new. Each wave builds on what came before.
Vibe coding is random energy hoping to land somewhere useful.
WAVE coding is deliberate momentum. 🌊

Curious what you think. Are you building infrastructure for AI to use, or letting AI build infrastructure from scratch?
websitepublisher.ai

We're now in Mistral's connector directory — here's what that means for AI-powered web publishing

Michael Egberts — Tue, 05 May 2026 11:54:06 +0000

Hook: WebsitePublisher.ai is a pre-configured Directory Connector in Mistral's curated MCP connector directory. That's a mouthful, so let me break it down: Le Chat users can now find us in their connector settings, click Add, complete OAuth, and immediately start building websites through conversation.

Key points to cover:
What a Directory Connector is vs Custom Connector
OAuth 2.1 + DCR auto-discovery
55+ MCP tools available
How it compares to our ChatGPT integration (Custom GPT = zero setup)
The 10-platform story
Link to setup guide: websitepublisher.ai/docs/mcp#mistral-setup

Building an AI-native web platform: 69 features in 11 days (solo dev log)

Michael Egberts — Tue, 28 Apr 2026 10:35:51 +0000

I'm building WebsitePublisher.ai — a platform where AI assistants build and publish complete websites through MCP (Model Context Protocol) tools. Here's what the last 11 days looked like.

The Visual Editor Problem
Users could build entire websites through conversation, but changing a single typo required asking the AI to patch the page. Terrible UX for small edits.
Solution: WPE v2 — a visual editor that loads the published page in an iframe, detects editable elements, and lets users click-to-edit text, drag-and-drop images, right-click for context menus, and save back through the same PAPI that AI assistants use.

Key decisions: httpOnly cookies instead of URL tokens, change count badge in dashboard, overlay detection for CSS-covered images.

The Integration Cookbook
Our integration engine (IAPI) supports http (external proxy) and internal (PHP handler) drivers. After shipping LinkedIn posting and SMTP email in the same week, I wrote an Internal Integration Cookbook:

Create manifest JSON with endpoint definitions
Implement handler extending InternalIntegrationHandler
Define body_transform and response_transform hooks
Register in engine

Result: the API Proxy integration took 2 hours from start to production.

MCP Session Stability
OAuth 2.1 + PKCE authentication. Claude's connector occasionally lost session state → 401 errors mid-conversation. Fixed with a 6-layer approach: token refresh, graceful 401 handling, session persistence, stale token detection, project selector refresh, frontend error boundaries.

Custom Domains with DNS Pre-validation
SSL certificate requests would fail when DNS hadn't propagated. Now we validate CNAME resolution before calling certbot. Clear error messages instead of cryptic failures.
Numbers
Tasks completed (11 days): 69
Total MCP tools: 55
AI platforms supported: 9
API layers: 10
Directory listings: 15+

→ websitepublisher.ai

Week 16 Dev Log: From First Customer to 56 MCP Tools — Building an AI-Native Website Publisher

Michael Egberts — Mon, 13 Apr 2026 14:05:54 +0000

I'm building WebsitePublisher.ai — a platform where AI assistants build and publish websites through MCP (Model Context Protocol) tools. This week: first paying customer, team collaboration, and some interesting architecture decisions.

The Stack

Quick context: Laravel/PHP backend, dual-server DigitalOcean cluster, Redis Sentinel, MySQL, S3 for assets. The AI layer is a multi-API stack:

PAPI — pages and assets
MAPI — entities and structured data
VAPI — vault/secrets management
IAPI — third-party integrations (Stripe, Resend, Twilio, etc.)
SAPI — sessions, forms, visitor auth
TAPI — task tracking across AI sessions
AAPI — scheduled automations

All exposed as MCP tools. Currently 56 tools, accessible from Claude, ChatGPT, Cursor, Windsurf, GitHub Copilot, Gemini, Grok, Mistral, and n8n.

What shipped this week

Team Collaboration

Our first paying customer — an agency — asked for multi-user support on day one. We shipped it within 48 hours:

Owner invites team members via email
Team members get full access to all projects
Magic link authentication (no passwords)
Max 5 members per Agency account

Architecture decision: no per-project granularity in v1. Simpler model, ship fast, iterate based on real usage.

Dashboard Vault — Secrets Without AI Exposure

A security feature I'm particularly proud of: the Vault tab lets users manage API keys (Stripe, Resend, etc.) through the browser UI. The key insight: if you share an API key in an AI chat, it ends up in transcripts, logs, and context windows. The Vault bypasses AI entirely — write-once, never displayed, rotate or delete only.

Backend uses AES-256-GCM encryption keyed per project, so even if someone gains database access, secrets from other projects are unreadable.

Language Refactor — From 4 Hardcoded Lists to Zero

Our AI Coach (a conversational website builder) needed proper i18n. The old code had 4 separate hardcoded language→string mappings scattered across the codebase. We refactored to a single CapiLanguage value object with a Redis → DB → Haiku → fallback chain:

Check Redis cache
Check papi_language_meta table (27 seeded languages)
Ask Claude Haiku for language detection (costs ~$0.001)
Fall back to English

Result: any of the 250+ ISO 639-1 codes work automatically. Adding a new language = 1 database row.

get_asset MCP Tool — Closing the Read Gap

We had tools to list, upload, patch, and delete assets — but no tool to read them. AI agents were doing blind find/replace via patch_asset without knowing the current file state. get_asset closes that gap: text content returned directly, binary assets as base64.

The Activation Challenge

67 signups, one paying customer. The gap is real. Our research this week showed that the friction is in the "how do I start?" moment — users sign up, see a dashboard, but don't know which AI platform to connect or how to begin.

Our answer: an embedded AI coach right inside the dashboard. Uses Sonnet (not Opus — cost control), generates one concept, writes directly to the user's first project. From "I just signed up" to "I have a website" in under 2 minutes.

What's next

Friday release agent (AAPI-powered automated changelog + test plans)
patch_asset optimistic concurrency (base version hash to prevent conflicts)
CAPI language refactor retest (waiting on external tester confirmation)

If you're building with MCP or interested in AI-native development tools, I'd love to connect. The MCP ecosystem is moving fast.

GitHub: megberts/websitepublisher-mcp
MCP Server: mcp.websitepublisher.ai

How we built an MCP server that lets AI assistants publish complete websites

Michael Egberts — Wed, 25 Mar 2026 07:53:24 +0000

Building a website with an AI assistant usually ends the same way: you get a wall of HTML in a code block, you paste it somewhere, and then you're on your own for hosting, deployment, and every update after that.
We wanted to fix that. So we built WebsitePublisher.ai — a platform where AI assistants don't just describe websites, they actually build and publish them.
Here's how it works under the hood.

The core idea: AI as a first-class developer
The premise is simple. Instead of AI being a code generator that hands off to a human, we wanted AI to be the developer — with access to a real API it can call directly.
That API needed to cover the full stack of what a website actually needs:

Pages and assets (HTML, CSS, images)
Structured data (entities, records)
Forms and visitor sessions
Integrations (email, SMS, payments)
Scheduled tasks
Vault (credentials management) So we built it. Eight API layers, all exposed through a Model Context Protocol (MCP) server.

What MCP gives us
MCP is an open protocol that lets AI assistants call tools — similar to function calling, but standardized across clients. Claude, ChatGPT, Cursor, Windsurf, GitHub Copilot, and others all support it.
Our MCP server exposes ~55 tools. An AI can call create_page with HTML content and it's live. It can call configure_form and a contact form appears. It can call create_scheduled_task and a nightly content refresh starts running.
The AI doesn't need to know about hosting, DNS, or deployment. It just calls the tools.

The API layers
We ended up with eight layers, each with a clear responsibility:
PAPI (Pages & Assets) — Create, update, and version HTML pages and static assets. Includes diff-patch for surgical updates, URL fetching, and a content quality warning system.
MAPI (Entities & Data) — A schema-less data layer. The AI defines entities (think: database tables) and creates records. Powers everything from contact lists to leaderboards to inventory.
SAPI (Sessions & Forms) — Anonymous visitor sessions, form submissions, visitor authentication, and analytics. No cookies to configure — it just works.
VAPI (Vault) — Encrypted credential storage. The AI stores API keys that are then used by integrations — never exposed back to the client.
IAPI (Integrations) — A proxy engine that routes calls through stored credentials to external services. Resend, Mailgun, Stripe, Mollie, Twilio — the AI picks the integration, the vault provides the credentials.
AAPI (Agent API) — Scheduled tasks. The AI creates cron jobs that run PHP handlers on a schedule. Daily content refresh, nightly cleanup, automated data sync.
CAPI (Coach API) — A conversational intake system. Ask four questions, generate a complete website. The AI handles the conversation; the platform handles the generation.

A real example
Here's what a Claude session looks like when building a site from scratch:

User: Build me a landing page for my consulting business. Focus on lead generation.

Claude: [calls get_skill to load WebsitePublisher context]
[calls create_page with full HTML/CSS]
[calls configure_form with name, email, message fields]
[calls setup_integration with Resend credentials]
[calls execute_integration to test email delivery]

Done — your page is live at yourproject.websitepublisher.ai.
The contact form sends leads to your inbox via Resend.
Want me to add a thank-you page or set up SMS notifications too?

No copy-paste. No deployment step. The AI did it.

The interesting engineering problems
A few things that weren't obvious until we built them:
Multi-session coordination. When multiple AI sessions work on the same project in parallel, they can overwrite each other's progress. We built TAPI — an append-only task tracking system — specifically to solve this. Each session logs progress via INSERTs only. MAX(completion_pct) from history records prevents any session from accidentally rolling back another's progress.
Tool count limits. Our MCP server returns 55 tools in tools/list. Some clients have limits on how many they load. Our workaround: the get_skill tool loads a SKILL.md document that gives the AI a map of the full API — so even with five tools loaded, it can use the REST API directly for everything else.
Content quality detection. AIs occasionally send a file path instead of HTML content to create_page. We added a WarningCollector that catches this pattern and returns a structured warning before anything gets saved.
Authentication across API layers. Each layer needed a different auth model. Project tokens (wpa_) for AI access. Dashboard sessions (wps_) for humans. Admin tokens (wsa_) for visitor-facing login flows. Getting these to coexist cleanly took a few iterations.

What's next
We're currently in the Mistral sprint — working on SSE streaming for the conversational intake (so responses feel instant instead of batched), parallel concept generation, and getting listed in Mistral's connector directory.
If you're building with MCP, or thinking about what "AI-native" infrastructure actually means in practice — we'd love to hear what you think.

The MCP server is at mcp.websitepublisher.ai
Full docs at websitepublisher.ai/docs