<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Manveer Chawla</title>
    <description>The latest articles on Forem by Manveer Chawla (@manveer_chawla_64a7283d5a).</description>
    <link>https://forem.com/manveer_chawla_64a7283d5a</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3271159%2F5d4c3ad5-7832-4565-bf5c-b790ca7ea6ff.jpg</url>
      <title>Forem: Manveer Chawla</title>
      <link>https://forem.com/manveer_chawla_64a7283d5a</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/manveer_chawla_64a7283d5a"/>
    <language>en</language>
    <item>
      <title>Claude Code Routines: 5 production workflows that ship real work</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Fri, 01 May 2026 16:50:58 +0000</pubDate>
      <link>https://forem.com/arcade/claude-code-routines-5-production-workflows-that-ship-real-work-25il</link>
      <guid>https://forem.com/arcade/claude-code-routines-5-production-workflows-that-ship-real-work-25il</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;TL;DR&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code Routines enable unattended, cloud-run workflows&lt;/strong&gt; via scheduled, API, and GitHub event triggers. Enterprise use breaks with demo-grade setups.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Daily run caps and shared subscription usage push teams to batch work&lt;/strong&gt; into a single daily "meta-orchestrator" routine plus a few real-time triggers.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5 production workflows:&lt;/strong&gt; incident postmortem drafting, on-call triage → ticket drafts, PR-aging report, expansion-signal scanning, and changelog PR generation.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key enterprise risks:&lt;/strong&gt; over-permissioned connectors, prompt injection from untrusted inputs, API rate limits (notably Slack history), and weak auditability.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production pattern:&lt;/strong&gt; use an &lt;strong&gt;MCP runtime&lt;/strong&gt; that delivers &lt;strong&gt;agent authorization&lt;/strong&gt;, &lt;strong&gt;agent-optimized tools&lt;/strong&gt;, and &lt;strong&gt;agent lifecycle governance&lt;/strong&gt;, plus &lt;strong&gt;human approval gates&lt;/strong&gt; for write actions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cloud-hosted agents are not new. OpenClaw, Perplexity Computer, n8n, Zapier, and a handful of SaaS agent runtimes have been executing unattended work for a while. The release of Claude Code Routines adds a different option: teams that already use Claude Code as their day-to-day development agent can now run that same agent, with the same prompts, tools, and conventions, on Anthropic's cloud instead of tethered to a laptop.&lt;/p&gt;

&lt;p&gt;A routine is a saved Claude Code configuration (a prompt, one or more repositories, and a set of connectors) packaged once and run automatically on Anthropic-managed cloud infrastructure. Each routine can attach any combination of three trigger types: scheduled (recurring cadence), API (POST to a per-routine endpoint with a bearer token), and GitHub events (pull request or release activity on a connected repository). Routines are currently in &lt;a href="https://code.claude.com/docs/en/routines" rel="noopener noreferrer"&gt;research preview&lt;/a&gt;, so limits and API shapes are still moving.&lt;/p&gt;

&lt;p&gt;Most of the early Routines content focuses on personal productivity: meeting prep, inbox summaries, and calendar wrangling. For senior developers and engineering leaders trying to run autonomous agents across an enterprise, those demos do not cut it.&lt;/p&gt;

&lt;p&gt;Moving from a script on one laptop to a production-grade engineering workflow means dealing with the realities of enterprise architecture. Production automation demands strict governance, robust security boundaries, and the ability to work within aggressive API rate limits.&lt;/p&gt;

&lt;p&gt;This article covers five production-leaning, unattended routines designed for engineering teams. We'll map exactly what happens at runtime, identify which workflows need human oversight, and outline the governance models you need to safely run scheduled, API-triggered, and GitHub-triggered Claude Code sessions without compromising your infrastructure. Before getting to the workflows, it's worth looking at why demo-grade setups buckle the moment they move from a single laptop to a shared team environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Where demo patterns hit production reality (security, reliability, governance)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Routines formalize what teams have been wiring together with cron jobs, GitHub Actions, and custom middleware for two years: Claude Code running on a schedule, against a GitHub event, or through an API call, with no developer laptop in the loop. But moving from a single developer's personal setup to a shared enterprise environment exposes severe limitations in security, reliability, and auditability. Fast.&lt;/p&gt;

&lt;p&gt;Start with the execution model. Per &lt;a href="https://code.claude.com/docs/en/routines" rel="noopener noreferrer"&gt;Anthropic's docs&lt;/a&gt;, routines &lt;em&gt;"run autonomously as full Claude Code cloud sessions: there is no permission-mode picker and no approval prompts during a run."&lt;/em&gt; Whatever the agent decides to do, it does. At the speed of inference, without a human in the loop. That shifts the burden of "what is this agent allowed to do" from interactive confirmation to pre-deployment configuration. If the configuration leans on bundled first-party connectors and creator-inherited OAuth scopes, the guardrails come off exactly when you need them most.&lt;/p&gt;

&lt;p&gt;The most critical vulnerability is the permission inheritance model of bundled first-party connectors.&lt;/p&gt;

&lt;p&gt;In a standard setup, an automated routine inherits the full global access of the developer who created it. &lt;a href="https://code.claude.com/docs/en/routines" rel="noopener noreferrer"&gt;Anthropic's docs&lt;/a&gt; make the consequence explicit: &lt;em&gt;"Anything a routine does through your connected GitHub identity or connectors appears as you: commits and pull requests carry your GitHub user, and Slack messages, Linear tickets, or other connector actions use your linked accounts for those services."&lt;/em&gt; A first-party OAuth token works for a single developer querying their personal pull requests. It becomes a massive liability the moment you deploy it as an unattended routine on behalf of a whole team.&lt;/p&gt;

&lt;p&gt;If an agent operates with an engineering lead's administrative permissions, a single compromised routine gains unrestricted read and write access across your entire enterprise system. This architecture fails security reviews every time the automation touches shared customer data, source code, or regulated infrastructure.&lt;/p&gt;

&lt;p&gt;This over-permissioning makes &lt;a href="https://genai.owasp.org/llmrisk/llm01-prompt-injection" rel="noopener noreferrer"&gt;prompt injection&lt;/a&gt; threats way worse. Unattended routines ingest untrusted third-party text by design. They process incoming PagerDuty incident descriptions, analyze raw Sentry stack traces, and scan customer support emails.&lt;/p&gt;

&lt;p&gt;Without typed, permission-scoped tool contracts to validate the output, a malicious payload hidden in a customer ticket can instruct the routine to exfiltrate data or delete production resources. Natural language instructions won't stop these exploits in an enterprise environment.&lt;/p&gt;

&lt;p&gt;Operational and reliability constraints compound the problem. Routines &lt;a href="https://code.claude.com/docs/en/routines" rel="noopener noreferrer"&gt;draw down the same subscription usage&lt;/a&gt; as interactive sessions, plus a separate daily cap on how many runs can start per account. Anthropic doesn't publish a specific number, and Claude usage tightens once team activity ramps up, so unattended workflows have to be designed with quota-awareness from day one.&lt;/p&gt;

&lt;p&gt;This forces engineering teams to abandon simple event-driven architectures for complex batch processing. You can't trigger a routine for every individual pull request comment. Instead, you orchestrate batch jobs that process dozens of events at once to conserve quota, or enable extra usage and accept metered overage when caps hit.&lt;/p&gt;

&lt;p&gt;Reliability and visibility close out the failure list. Early adopters report consistent issues with bundled connectors in unattended execution: &lt;a href="https://github.com/anthropics/claude-code/issues/45306" rel="noopener noreferrer"&gt;community issue trackers show silent failures&lt;/a&gt; during runtime, OAuth token expiration errors that crash scheduled tasks, and connectors that fail to load in the cloud environment.&lt;/p&gt;

&lt;p&gt;Bundled connectors also lack auditability. When an unattended routine updates a Jira ticket, queries a GitHub repository, and posts a Slack message, standard bundled connectors give you opaque execution logs. Security teams can't construct a definitive audit trail of what the agent did across multiple platforms.&lt;/p&gt;

&lt;p&gt;The rest of this article shows how a dedicated MCP runtime resolves each of these failure modes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Risk&lt;/th&gt;
&lt;th&gt;Control&lt;/th&gt;
&lt;th&gt;Where it lives&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Over-permissioned token&lt;/td&gt;
&lt;td&gt;Per-user, per-tool authorization evaluated per action&lt;/td&gt;
&lt;td&gt;MCP runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt injection from untrusted text&lt;/td&gt;
&lt;td&gt;Agent-optimized tools with schema enforcement and isolated credentials&lt;/td&gt;
&lt;td&gt;MCP runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quota overrun&lt;/td&gt;
&lt;td&gt;Meta-orchestrator batching plus targeted GitHub event triggers&lt;/td&gt;
&lt;td&gt;Routine design&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Silent write to production&lt;/td&gt;
&lt;td&gt;Human approval gate on drafts, PRs, or prefixed branches&lt;/td&gt;
&lt;td&gt;Workflow config and branch protection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No audit trail for compliance&lt;/td&gt;
&lt;td&gt;Full execution context logged per tool call, exportable via &lt;a href="https://opentelemetry.io/" rel="noopener noreferrer"&gt;OpenTelemetry&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;MCP runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;5 production Claude Code routine workflows you can batch into one daily run&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The risks and controls above become concrete through workflow design. Before the patterns, one operational constraint shapes every choice below: quota. Routines share subscription usage with interactive sessions and add a daily cap on runs per account, so running a separate routine for every minor event burns through the budget fast.&lt;/p&gt;

&lt;p&gt;The solution is to architect a single "meta-orchestrator" routine that wakes up once a day, runs a sequential batch of discrete data-gathering and reporting tasks, and shuts down. That consumes one run from your daily cap.&lt;/p&gt;

&lt;p&gt;This strategy saves your remaining runs for critical, real-time API and GitHub event triggers that demand immediate attention.&lt;/p&gt;

&lt;p&gt;Here are five concrete engineering workflows designed for this quota-aware framework, with their technical triggers, human approval surfaces, and governance requirements. Three of them (nightly incident postmortem, weekly PR-aging, expansion-signal scanning) sit inside the meta-orchestrator and share the daily run. The other two (Sentry triage, release-notes draft) run real-time because their value is latency-bound. You want the Linear ticket while the incident is hot, and the changelog draft as soon as the release tag lands.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Routine&lt;/th&gt;
&lt;th&gt;Trigger&lt;/th&gt;
&lt;th&gt;Primary tools&lt;/th&gt;
&lt;th&gt;Approval surface&lt;/th&gt;
&lt;th&gt;Run slot&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Nightly incident postmortem&lt;/td&gt;
&lt;td&gt;Scheduled (2:00 AM daily)&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://docs.arcade.dev/en/resources/integrations/development/pagerduty" rel="noopener noreferrer"&gt;PagerDuty&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/social/slack" rel="noopener noreferrer"&gt;Slack&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/productivity/notion" rel="noopener noreferrer"&gt;Notion&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Human engineers review and publish the drafted Notion page&lt;/td&gt;
&lt;td&gt;Meta-orchestrator&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;On-call Sentry triage&lt;/td&gt;
&lt;td&gt;API (Sentry webhook → routine &lt;code&gt;/fire&lt;/code&gt; endpoint)&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://docs.sentry.io/ai/mcp/" rel="noopener noreferrer"&gt;Sentry&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/productivity/linear" rel="noopener noreferrer"&gt;Linear&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;On-call engineer triages the drafted Linear ticket queue&lt;/td&gt;
&lt;td&gt;Real-time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Weekly PR-aging report&lt;/td&gt;
&lt;td&gt;Scheduled (Friday morning)&lt;/td&gt;
&lt;td&gt;GitHub, email&lt;/td&gt;
&lt;td&gt;Read-only; no write approval needed&lt;/td&gt;
&lt;td&gt;Meta-orchestrator&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Expansion signal scanner&lt;/td&gt;
&lt;td&gt;API (nightly)&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://docs.arcade.dev/en/resources/integrations/sales/hubspot" rel="noopener noreferrer"&gt;HubSpot&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/social/slack" rel="noopener noreferrer"&gt;Slack Search&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Account managers review flagged accounts in a Slack channel&lt;/td&gt;
&lt;td&gt;Meta-orchestrator&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Friday release notes draft&lt;/td&gt;
&lt;td&gt;GitHub event (release created)&lt;/td&gt;
&lt;td&gt;GitHub, &lt;a href="https://docs.arcade.dev/en/resources/integrations/productivity/jira" rel="noopener noreferrer"&gt;Jira&lt;/a&gt; / &lt;a href="https://docs.arcade.dev/en/resources/integrations/productivity/linear" rel="noopener noreferrer"&gt;Linear&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;PM reviews the pull request and merges the changelog&lt;/td&gt;
&lt;td&gt;Real-time&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Nightly incident postmortem draft (PagerDuty, Slack, Notion)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Assembling a postmortem means stitching PagerDuty timestamps, Slack threads, and deploy markers into a readable narrative. This workflow does the assembly and drafts the first pass so the engineer lands on a structured Notion page instead of a blank one.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trigger:&lt;/strong&gt; Scheduled. Runs as the first sequence in the daily 2:00 AM meta-orchestrator.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow:&lt;/strong&gt; The routine queries the PagerDuty API for resolved events from the previous 24 hours. The hard part is Slack context: the &lt;a href="https://api.slack.com/methods/conversations.history" rel="noopener noreferrer"&gt;conversations.history endpoint&lt;/a&gt; now rate-limits non-Marketplace apps to one request per minute, so bulk-ingesting incident channels is off the table. The routine uses the Slack Search API to isolate key messages, or fires via the API trigger when a Slack reaction-event webhook (configured in your Slack app) POSTs to the routine's &lt;code&gt;/fire&lt;/code&gt; endpoint after an engineer drops a designated emoji on a summary message. It then drafts a Notion page with a timeline, impact, and initial resolution steps.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Approval surface:&lt;/strong&gt; The routine runs unattended. An engineer reviews, edits, and publishes the Notion draft the next morning.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Governance &amp;amp; security checklist:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Scope the PagerDuty token to read-only on specific services. Scope Slack tokens to the incident channels only, not org-wide.
&lt;/li&gt;
&lt;li&gt;Redact customer identifiers (email, user ID, account ID) at the tool layer before the draft is written to Notion. Do not rely on the model to scrub PII.
&lt;/li&gt;
&lt;li&gt;Log triggering PagerDuty incident ID → drafted Notion page ID for every run, not just on failure.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;On-call triage and ticket creation (Sentry to Linear)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When a service degrades, on-call engineers get paged with a dozen near-identical error reports. This workflow groups the noise by Sentry fingerprint and files one Linear ticket per cluster so the on-call triages root causes, not duplicates.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trigger:&lt;/strong&gt; API. Claude Code Routines don't accept arbitrary third-party webhooks (only GitHub events), so configure &lt;a href="https://docs.sentry.io/product/integrations/integration-platform/webhooks/" rel="noopener noreferrer"&gt;Sentry's webhook integration&lt;/a&gt; to POST to the routine's &lt;code&gt;/fire&lt;/code&gt; endpoint with its bearer token when an error spike crosses a configured threshold. Runs outside the daily orchestrator because triage value drops fast if it waits.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow:&lt;/strong&gt; The routine reads fresh events from Sentry, groups them by fingerprint to collapse duplicates, and ranks clusters by event count and affected-users count. Each cluster becomes a Linear ticket with the stack trace snippet, affected release, and a link back to the Sentry issue. Tickets land in an un-triaged queue with a default P3 label.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Approval surface:&lt;/strong&gt; The routine never triages itself. The on-call engineer reviews the queue, adjusts severity, and assigns the ticket.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Governance &amp;amp; security checklist:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Scope the Sentry token to specific project slugs. Exclude projects flagged as handling authentication or payment data.
&lt;/li&gt;
&lt;li&gt;Strip user-supplied strings (URL params, form inputs, search terms) from error payloads before the agent sees them. Those fields are the prompt-injection surface.
&lt;/li&gt;
&lt;li&gt;Log the mapping from Sentry event ID → Linear ticket ID. This is what lets post-incident reviews reconstruct which alert caused which ticket.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Weekly pull request aging and code review report (GitHub)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Stale PRs create merge conflicts, block releases, and erode review velocity. This workflow replaces the Friday morning dashboard sweep with a single email that names the three PRs each lead needs to act on.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trigger:&lt;/strong&gt; Scheduled. The daily orchestrator runs the workflow every day; the body skips itself on non-Fridays.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow:&lt;/strong&gt; The routine queries the &lt;a href="https://docs.github.com/en/graphql/overview/resource-limitations" rel="noopener noreferrer"&gt;GitHub GraphQL API&lt;/a&gt; for PRs open longer than three days across the org, pulling each PR's review state, failing check runs, and unresolved review comments in a single query. It summarizes each PR's blocker (waiting on reviewer X, failing CI check Y, unresolved change requests) and emails a grouped digest to the relevant engineering leads.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Approval surface:&lt;/strong&gt; Read-only. The email dispatches without human intervention, so the token scope is the real control.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Governance &amp;amp; security checklist:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Use a &lt;a href="https://docs.github.com/en/apps/creating-github-apps/authenticating-with-a-github-app/about-authentication-with-a-github-app" rel="noopener noreferrer"&gt;GitHub App token&lt;/a&gt; with metadata, pull_requests, and issues read-only. Do not grant contents scope; the routine never needs the diff.
&lt;/li&gt;
&lt;li&gt;Strip code blocks from the email template before send, even if the agent tries to paste one.
&lt;/li&gt;
&lt;li&gt;Send from a dedicated service-account email, not a developer mailbox, so downstream audit trails stay clean.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Expansion signal scanner for customer health (HubSpot, Slack)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Support tickets and shared Slack channels are where customers accidentally self-identify as enterprise-tier: questions about rate limits, SSO, SOC 2 reviews, and data residency. This workflow surfaces those signals into a single account-health feed so the revenue team sees them.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trigger:&lt;/strong&gt; API-triggered. Runs as part of the nightly meta-orchestrator.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow:&lt;/strong&gt; The routine queries HubSpot for tickets created or updated in the last 24 hours and scans the body and notes for enterprise-tier keywords ("rate limits," "SSO," "SOC 2," "HIPAA," "data residency"). For shared customer Slack channels, bulk history ingestion is off the table because of conversations.history rate limits, so the routine uses the Slack Search API against the same keyword set. Each matching account gets a row in an internal Slack post with links back to the source ticket or message.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Approval surface:&lt;/strong&gt; Findings land in a dedicated internal Slack channel with source links. An account manager reviews each flagged account and decides whether to open an expansion conversation.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Governance &amp;amp; security checklist:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;The routine never writes to HubSpot. It reads from an allowlist of ticket properties (subject, body, pipeline stage) and nothing else.
&lt;/li&gt;
&lt;li&gt;Restrict the Slack token to public support channels plus explicitly listed shared customer channels. Never grant channels:history org-wide.
&lt;/li&gt;
&lt;li&gt;Log which account IDs, ticket IDs, and Slack message IDs were scanned on each run, along with which keywords matched. The keyword that triggered the flag is the part account managers need to trust the signal.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Friday release notes and changelog draft (GitHub, Jira/Linear)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Commit messages are written for engineers; release notes are written for customers. This workflow drafts the customer version so the product team edits prose instead of compiling a changelog from scratch.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trigger:&lt;/strong&gt; GitHub event trigger on &lt;code&gt;release.created&lt;/code&gt;, scoped to the specific repository. Requires the Claude GitHub App installed on the repo. Running &lt;code&gt;/web-setup&lt;/code&gt; alone grants clone access but doesn't enable webhook delivery.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow:&lt;/strong&gt; The routine finds the previous release tag, collects every PR merged into main between the two tags, and resolves each PR back to its Jira or Linear ticket using the ticket ID conventionally placed in the PR title or body. It then drafts customer-facing release notes in Markdown, grouped by feature area. One caveat: the bundled GitHub MCP connector has &lt;a href="https://github.com/anthropics/claude-code/issues/45306" rel="noopener noreferrer"&gt;gaps around basic writes like updating the release body directly&lt;/a&gt;, so the routine opens a pull request against a release-notes/ branch instead of editing the release in place.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Approval surface:&lt;/strong&gt; The routine commits the Markdown to a release-notes/&amp;lt;tag&amp;gt; branch and opens a PR. A product manager edits the copy and merges.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Governance &amp;amp; security checklist:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Give the routine read-only access to Jira and Linear. It should never change a ticket's status or rewrite acceptance criteria.
&lt;/li&gt;
&lt;li&gt;Enforce a &lt;a href="https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/defining-the-mergeability-of-pull-requests/about-protected-branches" rel="noopener noreferrer"&gt;branch protection rule&lt;/a&gt;: the routine's write token can only push to branches matching release-notes/*. The main branch is structurally unreachable.
&lt;/li&gt;
&lt;li&gt;Log triggering release tag → list of PRs analyzed → resulting changelog PR number. When the next release breaks, provenance is what makes the diff debuggable.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How to evaluate an enterprise MCP runtime for Claude Code routines&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Every workflow above has a shared dependency: the tool layer underneath. Native Claude Code Routines can't safely execute these tasks on bundled connectors alone. Workflow 5's note about the GitHub connector missing basic writes is representative of the stock first-party set, not an outlier.&lt;/p&gt;

&lt;p&gt;Relying on bundled connectors and first-party token inheritance also means rate-limit failures, prompt injection exploits, and security audits that halt deployment.&lt;/p&gt;

&lt;p&gt;What's missing is a purpose-built &lt;a href="https://www.arcade.dev/mcp" rel="noopener noreferrer"&gt;MCP runtime&lt;/a&gt;: the execution layer where tools run, credentials are resolved just-in-time, and every action is authorized against a specific user's permissions. This is not another proxy in front of your enterprise systems; &lt;a href="https://www.arcade.dev/documents/why-mcp-needs-a-runtime.pdf" rel="noopener noreferrer"&gt;the agent is already the proxy&lt;/a&gt;. The runtime is where the tool call lands, where identity and policy are evaluated, and where the audit record is written. Critically, the runtime is stateful. It maintains per-session, per-user context across an agent's entire reasoning loop, which is exactly what a stateless proxy cannot do. And this statefulness is what makes per-user, per-tool authorization enforceable.&lt;/p&gt;

&lt;p&gt;An enterprise MCP runtime delivers three capabilities working in concert: &lt;strong&gt;agent authorization&lt;/strong&gt; (per-user, per-tool, per-action), &lt;strong&gt;agent-optimized tools&lt;/strong&gt; (built for LLM consumption, not API passthrough), and &lt;strong&gt;agent lifecycle governance&lt;/strong&gt; (centralized control, versioning, and full-execution audit logs).&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Bundled first-party connectors&lt;/th&gt;
&lt;th&gt;Enterprise MCP runtime&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Permission model&lt;/td&gt;
&lt;td&gt;Inherits the creator's global OAuth scope&lt;/td&gt;
&lt;td&gt;Scoped per routine, per user, per action&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auth lifecycle&lt;/td&gt;
&lt;td&gt;Token embedded at setup; manual refresh&lt;/td&gt;
&lt;td&gt;Runtime manages refresh, rotation, and expiry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Audit logs&lt;/td&gt;
&lt;td&gt;Opaque, per-connector, not unified&lt;/td&gt;
&lt;td&gt;Full chain of custody per tool call (user, tool, params, result), exportable to SIEM via OpenTelemetry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt injection defense&lt;/td&gt;
&lt;td&gt;None; LLM parses raw input into API calls&lt;/td&gt;
&lt;td&gt;Multi-layered: isolated credentials, per-action auth, schema enforcement, visibility filtering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rate-limit handling&lt;/td&gt;
&lt;td&gt;Direct hits against upstream APIs&lt;/td&gt;
&lt;td&gt;Throttling, batching, and targeted webhooks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool catalog&lt;/td&gt;
&lt;td&gt;Stock first-party set only&lt;/td&gt;
&lt;td&gt;The &lt;a href="https://www.arcade.dev/tools" rel="noopener noreferrer"&gt;largest catalog of agent-optimized MCP tools (8000+)&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gateway composition&lt;/td&gt;
&lt;td&gt;One OAuth/connector per upstream service&lt;/td&gt;
&lt;td&gt;Runtime-level federation: tools composed into a single identity-scoped URL (Arcade calls this the &lt;a href="https://docs.arcade.dev/en/guides/mcp-gateways" rel="noopener noreferrer"&gt;MCP Gateway feature&lt;/a&gt;: a composition layer, not a proxy)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-harness portability&lt;/td&gt;
&lt;td&gt;Claude Code only&lt;/td&gt;
&lt;td&gt;Any MCP-compatible harness (Codex, OpenCode, local-model)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Agent authorization: per-user, per-tool, evaluated at runtime&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The most critical function of a dedicated MCP runtime is handling multi-user &lt;a href="https://docs.arcade.dev/home/auth/how-arcade-helps" rel="noopener noreferrer"&gt;agent authorization&lt;/a&gt;, sometimes called post-prompt authorization.&lt;/p&gt;

&lt;p&gt;Single-user demos hide the real problem. &lt;a href="https://code.claude.com/docs/en/routines" rel="noopener noreferrer"&gt;Anthropic's docs&lt;/a&gt; are explicit that &lt;em&gt;"routines belong to your individual claude.ai account. They are not shared with teammates."&lt;/em&gt; Every routine is structurally a single-user artifact, even when the work it does affects an entire team. The moment a routine has to act on behalf of multiple users (one-per-engineer on a platform team, or org-wide when a customer-health scanner runs for every account manager), shared service accounts and creator-inherited OAuth scopes collapse as a model. Teams either give the agent broad permissions (and an intern bypasses their access controls through the agent) or inherit the user's full permissions (and one prompt injection cascades through every system that user can touch). The right answer is the intersection: &lt;em&gt;what is this agent allowed to do AND what is this user allowed to do&lt;/em&gt;, evaluated per action at runtime. That is the problem the runtime has to solve before routines can move past single-user demos.&lt;/p&gt;

&lt;p&gt;Rather than letting a routine inherit the global, administrative permissions of its creator, an advanced runtime isolates the LLM entirely from underlying credentials and executes every tool call On-Behalf-Of (OBO) a specific user. The runtime evaluates the intersection of the agent's baseline permissions and that user's native permissions per action at runtime, so every action is attributable to a specific human in the audit log.&lt;/p&gt;

&lt;p&gt;Authorization is just-in-time. The runtime requests and validates credentials only when a specific user action requires them. If a user never invokes the Salesforce integration, no Salesforce tokens are ever obtained or stored. The entire OAuth flow (token exchange, refresh, storage) executes in deterministic backend logic that the LLM can never observe, alter, or leak. For additional governance, teams attach pre-tool-call and post-tool-call hooks to enforce custom policies: human-in-the-loop approvals for destructive actions, usage limits, or contextual access rules.&lt;/p&gt;

&lt;p&gt;The runtime &lt;a href="https://docs.arcade.dev/en/references/auth-providers/oauth2" rel="noopener noreferrer"&gt;manages the entire OAuth token lifecycle&lt;/a&gt;. It handles token refresh, rotation, and mismatch scenarios outside the view of the LLM. If a routine tries to access a repository the target user can't see, the runtime blocks the action at the protocol layer.&lt;/p&gt;

&lt;p&gt;Critically, the runtime hooks into the identity and entitlement systems you already run (Okta, Entra, SailPoint) instead of asking you to redefine authorization policies in yet another system. It acquires scoped tokens just-in-time, enforces the policy your IDP already owns, and keeps credentials isolated from the LLM and the MCP client. The runtime delegates authorization to what the enterprise has already defined; it doesn't duplicate it.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Agent-optimized tools: built for LLM consumption, not API passthrough&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Most MCP servers today are thin API wrappers. When a user says "update the Acme deal," the wrapper still asks the agent for &lt;code&gt;opportunity_id&lt;/code&gt;, &lt;code&gt;owner_id&lt;/code&gt;, &lt;code&gt;stage_enum&lt;/code&gt;, and &lt;code&gt;close_date&lt;/code&gt;. The agent fills those parameters probabilistically and either guesses the wrong values or retries blindly. This failure mode is called parameter hallucination, and it's where most agent failures happen in production. A proxy layer has no mechanism to close it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.arcade.dev/guides/create-tools/tool-basics/build-mcp-server" rel="noopener noreferrer"&gt;Agent-optimized tools&lt;/a&gt; invert this pattern. When a user asks to "make the intro paragraph friendlier," the tool translates that to &lt;code&gt;segmentId=gz49hg56, index=350, text='your friendlier message'&lt;/code&gt;. The agent never thinks beyond "intro paragraph." Every tool ships with rich semantic descriptions to help the LLM pick correctly, consistent schemas across services regardless of the underlying API, and agent-interpretable errors instead of raw HTTP status codes. In practice this ships as the &lt;a href="https://www.arcade.dev/tools" rel="noopener noreferrer"&gt;largest catalog of pre-built agent-optimized MCP tools (8000+)&lt;/a&gt;, covering productivity, CRM, communication, and developer systems, so teams skip the wrap-an-API-in-MCP step entirely.&lt;/p&gt;

&lt;p&gt;Reliability is a runtime concern, not an agent concern. Pagination, rate limiting, retries, and failover all get handled by the runtime, invisible to the agent. Tools execute in parallel where safe; failed calls retry with additional developer-defined context; MCP servers fail over automatically. The agent gets a clean result or a clean error, never a half-paginated list or a transient network blip bubbling up into the reasoning loop.&lt;/p&gt;

&lt;p&gt;Strict schemas also harden the tool layer against prompt injection. Schema enforcement is one layer of the defense, not the whole defense. A malicious payload buried in a customer email can't talk the agent into a destructive call that doesn't match an approved schema. More importantly, credentials never leave the runtime, so a jailbroken prompt has no tokens to exfiltrate. Per-user authorization is evaluated at every action, so an injected instruction can't do more than the acting user is already permitted to do. And visibility filtering scopes the tools a routine can even see, so there's no latent high-privilege tool hanging around for a payload to discover. Prompt injection defense has to be structural and in depth: at the tool layer, the auth layer, and the governance layer. Not a prompt-level patch.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Agent lifecycle governance: centralized control and full visibility&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Agent lifecycle governance is the third pillar of an enterprise MCP runtime. Deploying autonomous agents at scale requires centralized control over which tools are available, to whom, and with what permissions, plus total visibility into what's happening at runtime.&lt;/p&gt;

&lt;p&gt;A dedicated runtime provides a full chain of custody for every agent action (user identity, tool name, parameters, and result), exportable to your SIEM via OpenTelemetry. Independent attestation (&lt;a href="https://www.arcade.dev/blog/soc-2-compliance-ai-agents-production-security/" rel="noopener noreferrer"&gt;Arcade.dev is SOC 2 Type 2 certified&lt;/a&gt;) validates that these controls hold in production, which matters when security reviews start before deployment, not after. The runtime also lets security teams enforce visibility filtering so a routine only sees the tools it explicitly has permission to use, and provides the infrastructure to mandate human-approval gates for any routine attempting to write data to a production system.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Portability across agent runtimes using MCP&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Investing in an MCP runtime also guarantees architectural portability. Because tools are exposed over the &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;open MCP standard&lt;/a&gt;, the heavy lifting of building tool contracts, managing OAuth flows, and establishing governance policies happens once.&lt;/p&gt;

&lt;p&gt;That investment is usable from any MCP client (Claude Code Routines, Cursor, Claude Desktop, VS Code, ChatGPT, and custom applications) and stays portable across other agent harnesses like OpenAI Codex or on-prem deployments running open-weights models for regulated workloads. When your team swaps Claude for a different harness on a specific workflow, or moves sensitive routines onto on-prem compute for compliance reasons, the tool contracts, OAuth flows, and audit logs travel with you. The agent harness changes; the governance layer does not.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How to test and deploy your first remote Claude Code routine&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;With the runtime in place, the remaining question is how to ship a routine to production without breaking things. Writing a prompt, attaching a token, and flipping the schedule is not the move. The four-step framework below enforces clear boundaries on top of your MCP runtime:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 1: Wire up Arcade MCP Gateway as a custom connector&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Before you can safely test anything, give the routine somewhere governed to call. With Arcade, the flow is (full integration walkthrough at &lt;a href="https://docs.arcade.dev/en/get-started/mcp-clients/claude-code" rel="noopener noreferrer"&gt;Arcade for Claude Code&lt;/a&gt;):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In your &lt;a href="http://app.arcade.dev/" rel="noopener noreferrer"&gt;Arcade dashboard&lt;/a&gt;, create a new &lt;a href="https://docs.arcade.dev/en/guides/mcp-gateways" rel="noopener noreferrer"&gt;&lt;strong&gt;MCP Gateway&lt;/strong&gt;&lt;/a&gt;. Configure it with &lt;a href="https://docs.arcade.dev/en/get-started/about-arcade" rel="noopener noreferrer"&gt;&lt;strong&gt;Arcade auth&lt;/strong&gt;&lt;/a&gt; so tools inherit per-user, per-action authorization rather than a shared service account.
&lt;/li&gt;
&lt;li&gt;Add the tools this routine needs to the gateway, scoped to the minimum the workflow requires and nothing more.
&lt;/li&gt;
&lt;li&gt;In the Claude web interface, create a &lt;strong&gt;custom connector&lt;/strong&gt; pointing at the gateway's URL.
&lt;/li&gt;
&lt;li&gt;Complete the one-time authorization to link the connector to the gateway.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;With the connector live, any routine you create can include it alongside (or in place of) bundled first-party connectors.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 2: Sandbox execution&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Never test a new routine against production data. Sandbox the execution using the &lt;code&gt;/schedule&lt;/code&gt; command in the CLI or the "Run now" feature in the web interface.&lt;/p&gt;

&lt;p&gt;Point the routine at a scratch Notion workspace, a dedicated testing Slack channel, or a sandbox GitHub repository. Conduct multiple dry runs to observe how the routine handles edge cases, unexpected inputs, and empty datasets.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 3: Start with read-only permissions&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When configuring the routine for its initial deployment, enforce a strict "Read-Only First" mandate. Use your Arcade gateway to scope the routine's MCP tools exclusively to read operations.&lt;/p&gt;

&lt;p&gt;For example, if you're building an incident triage routine, allow the routine to read from PagerDuty and output its analysis to a simple text file or a private Slack message. Validate the quality of the routine's logic and data extraction for at least one week before granting permission to write data or create tickets.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Step 4: Add human approval gates for write actions&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;As you transition the routine to handle write operations, establish hard structural boundaries that mandate human oversight.&lt;/p&gt;

&lt;p&gt;Don't allow the agent to commit directly to your main branch or publish documentation live. Instead, configure the routine to draft documents, open pull requests, or push code exclusively to branches with a specific prefix. Every destructive or state-changing action requires a human engineer to review and merge the work.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Where to start&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Claude Code Routines deliver genuine unattended automation for engineering teams: Claude Code running on a schedule, GitHub event, or API call, entirely off the developer laptop. Realizing that value across an organization means acknowledging that moving from a localized laptop demo to a nightly production workflow introduces severe architectural and security challenges.&lt;/p&gt;

&lt;p&gt;You can't run autonomous workflows at scale using bundled connectors, first-party token inheritance, and opaque execution logs. Production deployments demand typed tool contracts, robust rate-limit handling, and explicit permission scoping to protect against prompt injection and data exposure.&lt;/p&gt;

&lt;p&gt;If your engineering team is evaluating how to run unattended AI agents safely, &lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;Arcade is the industry’s first MCP runtime&lt;/a&gt; purpose-built for this. By unifying &lt;strong&gt;agent authorization&lt;/strong&gt;, &lt;strong&gt;agent-optimized tools&lt;/strong&gt;, and &lt;strong&gt;agent lifecycle governance&lt;/strong&gt; in a single runtime, we let you ship reliable production workflows without spending months rebuilding security and operational plumbing.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;FAQ&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What are Claude Code Routines, and what changed in the April 2026 release?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A routine is a saved Claude Code configuration (prompt, repositories, and connectors) packaged to run automatically on Anthropic-managed cloud infrastructure. The April 2026 release shipped three trigger types: scheduled, API (per-routine &lt;code&gt;/fire&lt;/code&gt; endpoint with a bearer token), and GitHub events (pull request or release activity on a connected repository). Routines are currently in research preview.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How many times per day can a Claude Code Routine run?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Routines share subscription usage with interactive sessions and have an additional daily cap on how many runs can start per account. Anthropic doesn't publish a specific number and it can change during the research preview, so per-event routines that fire on every PR comment or alert quickly become impractical.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How do teams work around routine run quotas in production?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Two options. First, batch multiple tasks into a single daily "meta-orchestrator" routine and reserve real-time runs for only the highest-severity API and GitHub event triggers. Second, enable extra usage in Settings → Billing so runs that hit the cap continue on metered overage.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why are bundled connectors risky for enterprise unattended routines?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Bundled first-party connectors inherit the creating developer's global OAuth scope. That permission inheritance fails security reviews the moment the routine touches shared code, customer data, or regulated systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How do unattended routines increase prompt injection risk?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Untrusted third-party text (PagerDuty descriptions, Sentry traces, customer emails) flows directly into the agent at runtime. A payload buried in that text can steer the agent toward unsafe actions. Defense has to be multi-layered at the runtime: isolated credentials the LLM never sees, per-user authorization evaluated on every action, schema enforcement on each tool call, and visibility filtering so the routine can't even discover tools it isn't permitted to use.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What is an MCP runtime, and why do I need it?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;An MCP runtime is the execution layer where agent tool calls run. It resolves credentials just-in-time, authorizes each action against a specific user's permissions, enforces tool schemas, and writes a unified audit log. It is not another proxy in front of your enterprise systems. The agent is already the proxy. The runtime is where identity, policy, and execution come together.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What is "post-prompt authorization"?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The runtime checks each individual tool action at execution time against the acting user's permissions and the routine's policy. The routine never inherits the creator's blanket credentials.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Which routine actions should require human approval?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Any write or state-changing action (creating tickets, committing code, publishing documentation) should land as a draft, PR, or triage queue and go through a human review gate before merging.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How do Slack API rate limits affect these workflows?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Slack's conversations.history endpoint now rate-limits non-Marketplace apps to a single request per minute. Production designs use Slack Search, targeted webhooks, or curated context instead of bulk history pulls.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What should I implement first to deploy a safe routine?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Wire up Arcade as a custom connector first so the routine calls tools through a governed runtime, then test in a sandbox, enforce read-only tools, and introduce human-in-the-loop gates before granting write permissions.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What should be logged for auditability in enterprise routines?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Log the triggering event, the tools called, the target resources, the acting user or service account, and the resulting object IDs (e.g., Sentry event ID → Linear ticket ID).&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>claude</category>
      <category>devops</category>
    </item>
    <item>
      <title>Does ClickHouse Support UPDATEs? A 2026 Data Analysis</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Thu, 30 Apr 2026 20:30:00 +0000</pubDate>
      <link>https://forem.com/manveer_chawla_64a7283d5a/does-clickhouse-support-updates-a-2026-data-analysis-4m75</link>
      <guid>https://forem.com/manveer_chawla_64a7283d5a/does-clickhouse-support-updates-a-2026-data-analysis-4m75</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;TL;DR&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Yes, ClickHouse fully supports UPDATEs.&lt;/strong&gt; As of April 2026, ClickHouse ships standard SQL &lt;code&gt;UPDATE ... SET ... WHERE&lt;/code&gt; syntax that runs in milliseconds, alongside four other update mechanisms: &lt;code&gt;ALTER TABLE … UPDATE&lt;/code&gt; mutations for bulk operations, lightweight DELETE, on-the-fly mutation visibility, and &lt;code&gt;ReplacingMergeTree&lt;/code&gt; for high-volume upserts and CDC. The "ClickHouse is append-only" claim is outdated by eight years and 100+ merged pull requests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key facts:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Standard SQL UPDATE shipped in ClickHouse 25.7 (July 2025)&lt;/strong&gt; via &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/82004" rel="noopener noreferrer"&gt;PR #82004&lt;/a&gt;, backed by a new "patch part" architecture. It was promoted to Beta with default enablement in version 25.8 (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85952" rel="noopener noreferrer"&gt;PR #85952&lt;/a&gt;).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lightweight UPDATE delivers up to 1,000× to 2,400× speedup&lt;/strong&gt; for single-row updates compared to classical mutations, per ClickHouse's own benchmarks. Patch parts store only the changed columns plus five system columns, with no part rewrite.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ALTER TABLE … UPDATE&lt;/code&gt; has shipped since August 2018&lt;/strong&gt; (ClickHouse v18.12), authored by Alex Zatelepin (&lt;code&gt;ztlpn&lt;/code&gt;). Updates have never been "unsupported" in any release from the last eight years.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lightweight DELETE has been GA since 2022&lt;/strong&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/37893" rel="noopener noreferrer"&gt;PR #37893&lt;/a&gt;). The &lt;code&gt;allow_experimental_lightweight_delete&lt;/code&gt; flag is no longer required.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;On-the-fly mutations&lt;/strong&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/74877" rel="noopener noreferrer"&gt;PR #74877&lt;/a&gt;) make queued UPDATEs immediately visible to SELECTs, eliminating the eventual-consistency gap when needed.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operational controls are production-grade&lt;/strong&gt;: &lt;code&gt;max_uncompressed_bytes_in_patches&lt;/code&gt; (default 30 GiB, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85641" rel="noopener noreferrer"&gt;PR #85641&lt;/a&gt;), exponential backoff for failed mutations (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/58036" rel="noopener noreferrer"&gt;PR #58036&lt;/a&gt;), workload classification (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/64061" rel="noopener noreferrer"&gt;PR #64061&lt;/a&gt;), and bandwidth throttling (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/57877" rel="noopener noreferrer"&gt;PR #57877&lt;/a&gt;).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability is first-class&lt;/strong&gt;: &lt;code&gt;parts_postpone_reasons&lt;/code&gt;, &lt;code&gt;latest_fail_error_code_name&lt;/code&gt;, &lt;code&gt;mutation_ids&lt;/code&gt; in &lt;code&gt;system.part_log&lt;/code&gt;, and dynamic &lt;code&gt;system.warnings&lt;/code&gt; for stalled mutations all ship by default.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verdict&lt;/strong&gt;: the "ClickHouse is append-only" claim made sense in 2017. Repeating it in 2026 is misinformation. ClickHouse's UPDATE subsystem now uses standard SQL, runs in milliseconds, and replicates correctly across distributed clusters.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Why People Still Say "ClickHouse Doesn't Support Updates"&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;If you have evaluated ClickHouse in the last few years, you have probably heard one of these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;"ClickHouse is append-only."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"ClickHouse doesn't support UPDATEs."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"Updates require ALTER TABLE mutations that rewrite entire parts."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"Mutations are the only way to update data."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"Updates are eventually consistent."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"You have to use &lt;code&gt;allow_experimental_lightweight_delete&lt;/code&gt;."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"There is no standard SQL &lt;code&gt;UPDATE&lt;/code&gt; syntax."&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some of this was accurate documentation circa 2018 to 2022. Some is now folklore that competitors keep repeating because it is a convenient story: ClickHouse is fast for scans, but you can't update.&lt;/p&gt;

&lt;p&gt;In 2017, before mutations even existed, the criticism was structurally correct. ClickHouse's MergeTree engine was designed around immutability, and "updates" had to be modeled by inserting new rows into specialized engines like &lt;code&gt;ReplacingMergeTree&lt;/code&gt; and resolving the conflict at merge time or via &lt;code&gt;FINAL&lt;/code&gt;. There was no &lt;code&gt;UPDATE&lt;/code&gt; statement.&lt;/p&gt;

&lt;p&gt;Then, over eight years, ClickHouse's engineering team systematically dismantled that limitation. They added &lt;code&gt;ALTER TABLE … UPDATE&lt;/code&gt; (2018), &lt;code&gt;KILL MUTATION&lt;/code&gt; and &lt;code&gt;system.mutations&lt;/code&gt; for diagnosability (2019), &lt;code&gt;mutations_sync&lt;/code&gt; for synchronous waits (2019), &lt;code&gt;IN PARTITION&lt;/code&gt; scoping (2020), the &lt;code&gt;MutateTask&lt;/code&gt; refactor (2021), a long correctness wave for replicated mutations (2020 to 2022), lightweight DELETE (2022), on-the-fly mutations (2025), and finally lightweight UPDATE backed by patch parts (2025), accompanied by 50+ stabilization PRs that made it production-safe by 2026.&lt;/p&gt;

&lt;p&gt;This article traces that evolution with PR-level evidence. No marketing claims, no benchmarks on toy datasets. Just the commit history.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Methodology: How This ClickHouse UPDATE Analysis Was Built&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We went through ClickHouse's GitHub commit history, pull requests, changelogs, and release blogs from 2018 through April 2026. The scope covered every PR that touched the UPDATE subsystem: the original mutation engine, the replicated-coordination correctness wave, lightweight DELETE, on-the-fly mutations, the patch-part architecture (PR #82004 plus 35 follow-up commits inside the same PR), and the post-landing stabilization work.&lt;/p&gt;

&lt;p&gt;Each PR was classified by category (engine, planner, replication, observability, performance, correctness), impact severity, and whether it changed default behavior or required an opt-in flag. We cross-referenced PR descriptions against changelog entries and the &lt;a href="https://clickhouse.com/blog/updates-in-clickhouse-1-purpose-built-engines" rel="noopener noreferrer"&gt;ClickHouse Updates blog series&lt;/a&gt; to verify the claimed improvements. Where multiple PRs addressed the same subsystem, we traced the dependency chain to understand how the incremental changes compounded.&lt;/p&gt;

&lt;p&gt;The result is a chronological narrative across seven distinct eras, with full provenance. Every claim in this article maps to a specific merged PR or issue that you can verify yourself on GitHub.&lt;/p&gt;

&lt;p&gt;This is not a benchmarking exercise. Benchmarks measure peak performance on controlled workloads. This analysis measures the engineering trajectory: what was built, why, and what it means for teams deciding whether ClickHouse can support their update workloads today.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;What Update Features Does ClickHouse Ship by Default in 2026?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The current state, as of April 2026:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Standard SQL &lt;code&gt;UPDATE&lt;/code&gt; statement.&lt;/strong&gt; &lt;code&gt;UPDATE table SET col = expr WHERE …&lt;/code&gt; works for MergeTree-family tables, backed by patch parts. No special syntax, no experimental flags by default in stable production paths.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Classical &lt;code&gt;ALTER TABLE … UPDATE&lt;/code&gt; mutations.&lt;/strong&gt; The original 2018 mechanism is still available and is the right tool for bulk backfills, schema-level corrections, and operations where rewriting affected parts is acceptable.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lightweight DELETE.&lt;/strong&gt; &lt;code&gt;DELETE FROM … WHERE&lt;/code&gt; is implemented as a single-column rewrite of a &lt;code&gt;_row_exists&lt;/code&gt; virtual mask. Deletes that used to take 8 seconds finish in 200 ms.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;On-the-fly mutation visibility.&lt;/strong&gt; SELECTs see queued UPDATEs and DELETEs immediately, before background materialization completes. The latency between issuing an UPDATE and seeing its effect goes from "depends on the merge schedule" to "insert-like."
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ReplacingMergeTree&lt;/code&gt; for CDC and upsert workflows.&lt;/strong&gt; Updates are ingested as new rows; deduplication happens asynchronously during background merges. The &lt;code&gt;FINAL&lt;/code&gt; keyword guarantees deduplicated reads at query time, and &lt;code&gt;FINAL&lt;/code&gt; has been heavily optimized for production use.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operational safety nets.&lt;/strong&gt; Exponential backoff for failed mutations, workload classification for resource isolation, server-level bandwidth throttling, per-replica concurrency caps, and &lt;code&gt;max_uncompressed_bytes_in_patches&lt;/code&gt; reject runaway UPDATE storms before they can hurt the cluster.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;First-class observability.&lt;/strong&gt; &lt;code&gt;system.mutations&lt;/code&gt;, &lt;code&gt;system.part_log&lt;/code&gt;, &lt;code&gt;system.warnings&lt;/code&gt;, and &lt;code&gt;system.parts.is_patch&lt;/code&gt; give operators the data they need to diagnose stalled or failed mutations without grepping logs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are not experimental features hidden behind flags. They are defaults that ship with every modern ClickHouse installation.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;ClickHouse UPDATE Myths vs. Reality: A 2026 Checklist&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;The FUD&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Evidence&lt;/th&gt;
&lt;th&gt;Reality (2026)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;"ClickHouse is append-only"&lt;/td&gt;
&lt;td&gt;🟢 False since 2018&lt;/td&gt;
&lt;td&gt;&lt;a href="https://clickhouse.com/blog/handling-updates-and-deletes-in-clickhouse" rel="noopener noreferrer"&gt;v18.12 release&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;ALTER TABLE … UPDATE&lt;/code&gt; shipped in 2018. Standard SQL &lt;code&gt;UPDATE&lt;/code&gt; shipped in 2025 (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/82004" rel="noopener noreferrer"&gt;PR #82004&lt;/a&gt;).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;"ClickHouse doesn't support updates"&lt;/td&gt;
&lt;td&gt;🟢 False&lt;/td&gt;
&lt;td&gt;100+ PRs across 8 years&lt;/td&gt;
&lt;td&gt;Multiple update mechanisms ship by default: standard &lt;code&gt;UPDATE&lt;/code&gt;, &lt;code&gt;ALTER TABLE … UPDATE&lt;/code&gt;, lightweight DELETE, on-the-fly mutations, ReplacingMergeTree.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;"Updates require ALTER TABLE mutations that rewrite entire parts"&lt;/td&gt;
&lt;td&gt;🟢 False since 2025&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/82004" rel="noopener noreferrer"&gt;PR #82004&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Lightweight UPDATE writes only changed columns into patch parts. No part rewrite. Insert-like latency.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;"Mutations are the only way to update"&lt;/td&gt;
&lt;td&gt;🟢 False since 2022&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/37893" rel="noopener noreferrer"&gt;PR #37893&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/74877" rel="noopener noreferrer"&gt;#74877&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/82004" rel="noopener noreferrer"&gt;#82004&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Lightweight DELETE, on-the-fly mutations, and lightweight UPDATE all bypass the classical part-rewrite path.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;"Updates are eventually consistent"&lt;/td&gt;
&lt;td&gt;🟡 Nuanced&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/74877" rel="noopener noreferrer"&gt;PR #74877&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/82004" rel="noopener noreferrer"&gt;#82004&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Classical &lt;code&gt;ALTER TABLE … UPDATE&lt;/code&gt; is async by default. On-the-fly mutations and lightweight UPDATE provide immediate read-after-write visibility. &lt;code&gt;mutations_sync&lt;/code&gt; provides synchronous semantics on demand.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;"&lt;code&gt;allow_experimental_lightweight_delete&lt;/code&gt; is required"&lt;/td&gt;
&lt;td&gt;🟢 False since v22.8&lt;/td&gt;
&lt;td&gt;Lightweight DELETE is GA&lt;/td&gt;
&lt;td&gt;The flag is no longer needed. Lightweight DELETE is the default &lt;code&gt;DELETE&lt;/code&gt; implementation.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;"No standard SQL UPDATE syntax"&lt;/td&gt;
&lt;td&gt;🟢 False since 2025&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/82004" rel="noopener noreferrer"&gt;PR #82004&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;UPDATE table SET col = expr WHERE …&lt;/code&gt; works as standard SQL on MergeTree-family tables.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;"Updates cause unbounded part rewriting"&lt;/td&gt;
&lt;td&gt;🟢 Solved since 2025&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85641" rel="noopener noreferrer"&gt;PR #85641&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/58036" rel="noopener noreferrer"&gt;#58036&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;max_uncompressed_bytes_in_patches&lt;/code&gt; (default 30 GiB) caps patch accumulation. Exponential backoff prevents failure-loop CPU burn.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;"You can't kill a stuck UPDATE"&lt;/td&gt;
&lt;td&gt;🟢 False since 2019&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/4287" rel="noopener noreferrer"&gt;PR #4287&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;KILL MUTATION&lt;/code&gt; works on both MergeTree and ReplicatedMergeTree. &lt;code&gt;system.mutations&lt;/code&gt; exposes failure reasons.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;"ClickHouse can't isolate update workload from queries"&lt;/td&gt;
&lt;td&gt;🟢 False since 2024&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/64061" rel="noopener noreferrer"&gt;PR #64061&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/57877" rel="noopener noreferrer"&gt;#57877&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;mutation_workload&lt;/code&gt;, &lt;code&gt;merge_workload&lt;/code&gt;, and &lt;code&gt;max_mutations_bandwidth_for_server&lt;/code&gt; provide first-class resource isolation.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Phase 0 (2016 to 2017): How Did You Update Data in ClickHouse Before Mutations?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The FUD:&lt;/strong&gt; &lt;em&gt;"ClickHouse is append-only and was never designed for updates."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This part of the criticism is half-right historically. ClickHouse was designed as a columnar OLAP store optimized for ingest throughput and scan performance. Row-level mutability was deliberately out of scope. There was no &lt;code&gt;UPDATE&lt;/code&gt; statement.&lt;/p&gt;

&lt;p&gt;But "no UPDATE statement" never meant "no way to update data." From the earliest releases, ClickHouse shipped specialized MergeTree engines that modeled mutations as insertions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ReplacingMergeTree&lt;/code&gt;&lt;/strong&gt;: last-write-wins on the sorting key. Updates are ingested as new rows with the same primary key, and the most recent version wins after the next background merge. The &lt;code&gt;FINAL&lt;/code&gt; keyword forces deduplication at query time for cases where you can't wait for the merge.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;CollapsingMergeTree&lt;/code&gt;&lt;/strong&gt;: uses a &lt;code&gt;Sign&lt;/code&gt; column (+1 for the new row, −1 for the old one). Pairs of rows with the same key cancel each other out during merges.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;VersionedCollapsingMergeTree&lt;/code&gt;&lt;/strong&gt;: adds a version column for ingestion of updates that arrive out of order.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These were not workarounds. They were the design. For change-data-capture (CDC) workloads, high-volume upserts, and event-sourcing patterns, they remain the most efficient option in ClickHouse to this day. The trade-off is moving the cost from write time to read time (or to merge time).&lt;/p&gt;

&lt;p&gt;The competitor FUD that frames ClickHouse as "append-only" is technically describing this era. What it leaves out is everything that happened after.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Phase 1 (2018): How Does ClickHouse's &lt;code&gt;ALTER TABLE … UPDATE&lt;/code&gt; Work?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The FUD:&lt;/strong&gt; &lt;em&gt;"ClickHouse has no UPDATE statement."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In v18.12, Alex Zatelepin (&lt;code&gt;ztlpn&lt;/code&gt;) shipped &lt;code&gt;ALTER TABLE … UPDATE&lt;/code&gt; and &lt;code&gt;ALTER TABLE … DELETE&lt;/code&gt;. The model was deliberately heavyweight: every UPDATE is a logged &lt;em&gt;mutation&lt;/em&gt; that runs asynchronously in the background.&lt;/p&gt;

&lt;p&gt;Mechanically, a mutation does this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The command is persisted (to ZooKeeper for &lt;code&gt;ReplicatedMergeTree&lt;/code&gt;, or to a local &lt;code&gt;mutation_*.txt&lt;/code&gt; file for non-replicated tables).
&lt;/li&gt;
&lt;li&gt;Each affected part is rewritten to a temporary part by &lt;code&gt;MergeTreeDataMergerMutator::mutatePartToTemporaryPart&lt;/code&gt;. Files for unaffected columns are &lt;em&gt;hardlinked&lt;/em&gt; in Wide parts; only the changed columns get rewritten.
&lt;/li&gt;
&lt;li&gt;A &lt;code&gt;max_block_number&lt;/code&gt; invariant ensures mutations only process parts that existed when the mutation was issued. Data inserted &lt;em&gt;after&lt;/em&gt; the UPDATE is not retroactively touched.
&lt;/li&gt;
&lt;li&gt;Replicas pull the mutation entry from the ZooKeeper log and execute it locally on their copies of the affected parts.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This design enforces several semantic restrictions that persist today. They are not bugs; they are the contract:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You cannot UPDATE primary-key or partition-key columns.&lt;/strong&gt; Enforced in &lt;code&gt;MutationsInterpreter::validateUpdateColumns&lt;/code&gt;. Changing the sort order would require rebuilding the entire part's index, which defeats the point of MergeTree.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No transactional atomicity.&lt;/strong&gt; Mutations are not bundled into transactions by default. If the server restarts mid-mutation, the operation resumes from where it left off, but you do not get cross-mutation atomicity.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No immediate read-after-write.&lt;/strong&gt; A SELECT issued right after an &lt;code&gt;ALTER TABLE … UPDATE&lt;/code&gt; may return pre-update values until the background materialization completes.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No non-deterministic functions in replicated mutations.&lt;/strong&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/7247" rel="noopener noreferrer"&gt;PR #7247&lt;/a&gt;.) &lt;code&gt;rand()&lt;/code&gt; and &lt;code&gt;now()&lt;/code&gt; are forbidden because each replica would compute different values, causing divergence.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The 2018 criticism was: ClickHouse just got UPDATE support, but it is slow and async. That was fair. What followed was eight years of work to make it fast, predictable, and immediately visible, without giving up the bulk-update use case the original design was good at.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Phase 2 (2019 to 2021): Can You Diagnose, Cancel, and Wait for ClickHouse Mutations?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The FUD:&lt;/strong&gt; &lt;em&gt;"You can't kill a stuck mutation. There's no way to know if your UPDATE landed."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Once mutations existed, the next two years were dominated by operational maturity. Six PRs in particular turned mutations from "fire and pray" into something you could reason about in production.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;&lt;code&gt;KILL MUTATION&lt;/code&gt; and Failure Diagnostics (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/4287" rel="noopener noreferrer"&gt;PR #4287&lt;/a&gt;, 2019)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;ztlpn&lt;/code&gt;'s February 2019 PR added &lt;code&gt;KILL MUTATION&lt;/code&gt; for both MergeTree and ReplicatedMergeTree. It also extended &lt;code&gt;system.mutations&lt;/code&gt; with &lt;code&gt;latest_failed_part&lt;/code&gt;, &lt;code&gt;latest_fail_time&lt;/code&gt;, and &lt;code&gt;latest_fail_reason&lt;/code&gt;, and added an &lt;code&gt;is_mutation&lt;/code&gt; flag in &lt;code&gt;system.merges&lt;/code&gt;. From this point on, "my UPDATE is stuck" became a diagnosable problem rather than an opaque one.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;&lt;code&gt;mutations_sync&lt;/code&gt;: Defining "Did My UPDATE Land?" (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/8237" rel="noopener noreferrer"&gt;PR #8237&lt;/a&gt;, 2019)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;alesapin&lt;/code&gt;'s December 2019 PR introduced the &lt;code&gt;mutations_sync&lt;/code&gt; setting. Set to 0, the default, mutations are fully async. Set to 1, the client waits until the mutation completes on the local replica. Set to 2, it waits until all replicas have completed. Every later wait-correctness fix in the replication wave (#22669, #28889, #24809, #10588) is a repair of this contract.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;&lt;code&gt;IN PARTITION&lt;/code&gt; Scoping (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/13403" rel="noopener noreferrer"&gt;PR #13403&lt;/a&gt;, 2020)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Vladimir Chebotarev's PR added &lt;code&gt;ALTER UPDATE/DELETE … IN PARTITION&lt;/code&gt;. This was the first SQL semantics extension since the original landing, enabling partition pruning. If you only need to update last week's data, you say so explicitly and the mutation skips every other partition.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The &lt;code&gt;MergeTask&lt;/code&gt; / &lt;code&gt;MutateTask&lt;/code&gt; Refactor (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/25165" rel="noopener noreferrer"&gt;PR #25165&lt;/a&gt;, 2021)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;nikitamikhaylov&lt;/code&gt;'s September 2021 PR is the quietly load-bearing change of the entire UPDATE history. It split the monolithic merge/mutate logic into stage-based, suspendable &lt;code&gt;MergeTask&lt;/code&gt; and &lt;code&gt;MutateTask&lt;/code&gt; objects. The PR description: &lt;em&gt;"Added an ability to suspend and resume a process of a merge."&lt;/em&gt; Every later mutation improvement, from compact-part stage collapse to patch parts to vertical-merge correctness, builds on this refactor.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Replication Correctness Wave (2020 to 2022)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In parallel, a long series of PRs from &lt;code&gt;alesapin&lt;/code&gt;, &lt;code&gt;azat&lt;/code&gt;, and &lt;code&gt;tavplubix&lt;/code&gt; turned replicated UPDATE from "best-effort" into "predictable but slow." Notable fixes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/9022" rel="noopener noreferrer"&gt;#9022&lt;/a&gt; fixes the &lt;code&gt;parts_to_do=0 ∧ is_done=0&lt;/code&gt; hang where a mutation appeared "almost done" forever.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/11681" rel="noopener noreferrer"&gt;#11681&lt;/a&gt; fixes the inconsistency between &lt;code&gt;system.mutations.is_done=1&lt;/code&gt; and a &lt;code&gt;MUTATE_PART&lt;/code&gt; entry still sitting in the replication queue.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/17499" rel="noopener noreferrer"&gt;#17499&lt;/a&gt; fixes ALTER hang when the corresponding mutation is killed on a different replica.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/19702" rel="noopener noreferrer"&gt;#19702&lt;/a&gt; fixes &lt;code&gt;virtual_parts&lt;/code&gt; after part corruption so replicated mutations can recover.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/22669" rel="noopener noreferrer"&gt;#22669&lt;/a&gt; fixes wait-on-multiple-replicas semantics for &lt;code&gt;mutations_sync=2&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/28889" rel="noopener noreferrer"&gt;#28889&lt;/a&gt; fixes a &lt;code&gt;rbegin&lt;/code&gt; vs &lt;code&gt;begin&lt;/code&gt; typo in the cross-replica wait logic. Tiny diff, large blast radius.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/34096" rel="noopener noreferrer"&gt;#34096&lt;/a&gt; fixes the race between &lt;code&gt;mergeSelectingTask&lt;/code&gt; and queue reinit after ZooKeeper reconnect.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Distributed UPDATE is uniquely hard. ZooKeeper coordination, virtual_parts after part corruption, queue reinit races, finalization ambiguity: every distributed system that supports UPDATE eventually relives this set of problems. ClickHouse's 2020 to 2022 commit history is what working through them looks like.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Phase 3 (2022): Why Is ClickHouse's Lightweight DELETE So Much Faster?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The FUD:&lt;/strong&gt; &lt;em&gt;"Every delete in ClickHouse rewrites entire parts."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/37893" rel="noopener noreferrer"&gt;PR #37893&lt;/a&gt; by &lt;code&gt;zhangjmruc&lt;/code&gt; in 2022 was the architectural wedge that made everything later possible. It implemented &lt;code&gt;DELETE FROM … WHERE&lt;/code&gt; as &lt;code&gt;ALTER UPDATE _row_exists = 0 WHERE …&lt;/code&gt; against a new virtual mask column.&lt;/p&gt;

&lt;p&gt;Before this PR, deleting matching rows meant rewriting every part that contained any of them. After it, deletion is a single-column UPDATE of the virtual &lt;code&gt;_row_exists&lt;/code&gt; mask, with the actual row filtering happening at SELECT time.&lt;/p&gt;

&lt;p&gt;The PR body cites a benchmark: &lt;strong&gt;200 ms vs 8 seconds&lt;/strong&gt; on the same workload. Forty-fold improvement, with no part rewrite required.&lt;/p&gt;

&lt;p&gt;This was not just a performance win. It was a proof of concept for a new pattern: instead of physically modifying data, write a small "diff" alongside it and reconcile at read time. That pattern would later become the foundation of patch parts.&lt;/p&gt;

&lt;p&gt;The competitor FUD point about &lt;code&gt;allow_experimental_lightweight_delete&lt;/code&gt; refers to the early enablement flag for this feature. The flag is no longer needed; lightweight DELETE has been the default &lt;code&gt;DELETE&lt;/code&gt; implementation for years.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Phase 4 (Early 2025): How Do On-the-Fly Mutations in ClickHouse Work?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The FUD:&lt;/strong&gt; &lt;em&gt;"You always have to wait for the next merge to see your update."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The latency problem with classical mutations is structural: the UPDATE is logged immediately, but the data is not physically modified until a background merge gets around to it. In a busy cluster, that can mean seconds or minutes between issuing an UPDATE and being able to read its effect.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/74877" rel="noopener noreferrer"&gt;PR #74877&lt;/a&gt; by &lt;code&gt;CurtizJ&lt;/code&gt; (early 2025) introduced &lt;strong&gt;on-the-fly mutations&lt;/strong&gt; via the &lt;code&gt;apply_mutations_on_fly&lt;/code&gt; setting. With this enabled, SELECTs apply non-finished UPDATE/DELETE mutations immediately, before background materialization. The latency between "I issued an UPDATE" and "I can read the new value" goes from "depends on the merge schedule" to "insert-like."&lt;/p&gt;

&lt;p&gt;Three companion settings landed alongside it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;mutations_max_literal_size_to_replace&lt;/code&gt;: caps how large a literal can be while still being inlined into the on-the-fly application path.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mutations_execute_nondeterministic_on_initiator&lt;/code&gt;: controls where non-deterministic mutation expressions execute, to keep results consistent across replicas.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mutations_execute_subqueries_on_initiator&lt;/code&gt;: same idea for subqueries inside mutation predicates.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On-the-fly mutations made it explicit: read-after-write consistency is something users can opt into when they need it, without giving up the asynchronous bulk-rewrite model when they do not.&lt;/p&gt;

&lt;p&gt;This was the latency wedge. The next PR was the syntax wedge.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Phase 5 (Mid-2025): How Does ClickHouse's Lightweight UPDATE and Patch-Part Architecture Work?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The FUD:&lt;/strong&gt; &lt;em&gt;"There is no standard SQL &lt;code&gt;UPDATE&lt;/code&gt; syntax in ClickHouse. Every UPDATE rewrites parts."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/82004" rel="noopener noreferrer"&gt;PR #82004&lt;/a&gt; is the landmark commit of this entire eight-year history. Authored by Anton Popov (&lt;code&gt;CurtizJ&lt;/code&gt;), the initial commit (&lt;code&gt;a5327c6&lt;/code&gt;) landed June 16, 2025, and the PR merged to master around July 6, 2025. It shipped in ClickHouse 25.7.&lt;/p&gt;

&lt;p&gt;What it does: introduces standard SQL &lt;code&gt;UPDATE table SET col = expr WHERE …&lt;/code&gt; for MergeTree-family tables, backed by a new artifact called a &lt;strong&gt;patch part&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What Does a ClickHouse Patch Part Contain?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A patch part is a small, separate part on disk that stores only what changed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The columns that were updated (with their new values).
&lt;/li&gt;
&lt;li&gt;Five system columns: &lt;code&gt;_part&lt;/code&gt;, &lt;code&gt;_part_offset&lt;/code&gt;, &lt;code&gt;_block_number&lt;/code&gt;, &lt;code&gt;_block_offset&lt;/code&gt;, &lt;code&gt;_data_version&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is it. No copy of unchanged columns. No index rebuild. No part rewrite. The size overhead is approximately &lt;strong&gt;40 bytes per row&lt;/strong&gt; plus the actual changed cell values.&lt;/p&gt;

&lt;p&gt;The implementation lives in a new directory, &lt;code&gt;src/Storages/MergeTree/PatchParts/&lt;/code&gt;, with new types like &lt;code&gt;PatchPartInfo&lt;/code&gt;, &lt;code&gt;PatchMode&lt;/code&gt;, &lt;code&gt;MergeTreeSinkPatch&lt;/code&gt;, &lt;code&gt;MergeTreePatchReader&lt;/code&gt;, and a brand-new &lt;code&gt;InterpreterUpdateQuery&lt;/code&gt;. The landing also touched &lt;code&gt;MutationCommands&lt;/code&gt;, &lt;code&gt;MutateTask&lt;/code&gt;, &lt;code&gt;MergeTreeData&lt;/code&gt;, &lt;code&gt;MergeTreeDataMergerMutator&lt;/code&gt;, &lt;code&gt;MergeTreeSink&lt;/code&gt;, &lt;code&gt;ReplicatedMergeTreeQueue&lt;/code&gt;, &lt;code&gt;ReplicatedMergeTreeLogEntry&lt;/code&gt;, and the reader-chain files. By any reasonable measure, this was a multi-subsystem rewrite, not a feature add.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How Does ClickHouse Apply Patch Parts at Query Time?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;ClickHouse reconciles a patch part with a base part at SELECT time. There are two strategies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;PatchMode::Merge&lt;/code&gt;&lt;/strong&gt;: sorted on &lt;code&gt;(_part, _part_offset)&lt;/code&gt;. Used when patches and base parts share row offsets directly.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;PatchMode::Join&lt;/code&gt;&lt;/strong&gt;: joined on &lt;code&gt;(_block_number, _block_offset)&lt;/code&gt;. Used when offsets do not line up directly and a logical join is needed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The choice is automatic. Implicit minmax indexes on &lt;code&gt;_block_number&lt;/code&gt; and &lt;code&gt;_block_offset&lt;/code&gt; inside patch parts (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85040" rel="noopener noreferrer"&gt;PR #85040&lt;/a&gt;) make the join-mode path much faster by pruning patches that do not touch the rows being read.&lt;/p&gt;

&lt;p&gt;Patches themselves get merged together in the background (a "replacing-merge by &lt;code&gt;_data_version&lt;/code&gt;"), so the read-time overhead does not accumulate forever. Eventually, patches fold into base parts during normal merges, and the system returns to baseline read performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How Fast Is ClickHouse's Lightweight UPDATE?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;ClickHouse's own benchmarks, published in the &lt;a href="https://clickhouse.com/blog/updates-in-clickhouse-3-benchmarks" rel="noopener noreferrer"&gt;Updates in ClickHouse, Part 3&lt;/a&gt; blog post, report &lt;strong&gt;up to 1,000× to 2,400× faster&lt;/strong&gt; for single-row updates compared to classical &lt;code&gt;ALTER TABLE … UPDATE&lt;/code&gt; mutations. The exact multiplier depends on the workload shape; the headline is that what used to be a heavyweight asynchronous operation now has insert-like latency.&lt;/p&gt;

&lt;p&gt;The cost is read-time overhead. The umbrella issue &lt;a href="https://github.com/ClickHouse/ClickHouse/issues/82033" rel="noopener noreferrer"&gt;#82033&lt;/a&gt; cites approximately &lt;strong&gt;7% to 18% on average&lt;/strong&gt; for SELECTs that have to apply patches. That is the trade-off: patches are cheap to write and bounded in size, but they do add a small reconciliation cost at read time. When patches fold into base parts during background merges, the overhead disappears.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What Settings Control Lightweight UPDATE in ClickHouse?&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;allow_experimental_lightweight_update&lt;/code&gt;: gate during the experimental period.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;apply_patches_to_read&lt;/code&gt;: read-side toggle.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;update_parallel_mode&lt;/code&gt;: controls write-side parallelism for patch creation.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;update_sequential_consistency&lt;/code&gt;: visibility model.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;enable_block_number_column = 1&lt;/code&gt; and &lt;code&gt;enable_block_offset_column = 1&lt;/code&gt;: prerequisites; patch parts depend on the per-row block-number/offset columns introduced for this purpose.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;lightweight_delete_mode = 'lightweight_update'&lt;/code&gt;: opt-in path for routing DELETEs through patch parts as well.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85952" rel="noopener noreferrer"&gt;PR #85952&lt;/a&gt; (August 24, 2025) promoted lightweight UPDATE to &lt;strong&gt;Beta&lt;/strong&gt; with default enablement, shipping in ClickHouse 25.8.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Phase 6 (Mid-2025 to 2026): How Was ClickHouse's Patch-Part Architecture Stabilized?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The FUD:&lt;/strong&gt; &lt;em&gt;"The new UPDATE features are experimental and unsafe in production."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A feature this cross-cutting needed weeks of immediate stabilization. Inside PR #82004 itself, 35 commits landed between the initial June 16, 2025 commit and the final merge. Ten of those follow-up commits are worth naming:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;SHA&lt;/th&gt;
&lt;th&gt;What it fixed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2025-06-17&lt;/td&gt;
&lt;td&gt;&lt;code&gt;7f5a42a&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Lightweight updates on &lt;code&gt;ReplicatedMergeTree&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025-06-19&lt;/td&gt;
&lt;td&gt;&lt;code&gt;284c239&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Better consistency for lightweight updates in RMT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025-06-19&lt;/td&gt;
&lt;td&gt;&lt;code&gt;c7ec4db&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Merges of patch parts in RMT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025-06-20&lt;/td&gt;
&lt;td&gt;&lt;code&gt;f18385c&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Disable partition detach in RMT with patch parts (operational safety)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025-06-20&lt;/td&gt;
&lt;td&gt;&lt;code&gt;9902d37&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Crash in prefetch of patch parts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025-06-23&lt;/td&gt;
&lt;td&gt;&lt;code&gt;e7d8624&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Filtering of &lt;code&gt;versions_block&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025-06-25&lt;/td&gt;
&lt;td&gt;&lt;code&gt;cc28005&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Better waiting for LWU before running classic mutation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025-06-26&lt;/td&gt;
&lt;td&gt;&lt;code&gt;5af26c2&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Better applying patches with PREWHERE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025-07-02&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;6409858&lt;/code&gt;/&lt;code&gt;b23a074&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Disable lazy columns with lightweight updates (correctness over a read-path optimization)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That is just inside the original PR. Outside it, the post-landing stabilization involved another wave of fixes:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Read-Path and Query-Plan Correctness&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85040" rel="noopener noreferrer"&gt;PR #85040&lt;/a&gt;: implicit minmax indexes on &lt;code&gt;_block_number&lt;/code&gt;/&lt;code&gt;_block_offset&lt;/code&gt; inside patch parts; reworked &lt;code&gt;PatchJoinCache&lt;/code&gt;. Big SELECT-side win.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/92838" rel="noopener noreferrer"&gt;PR #92838&lt;/a&gt;: primary-index use for lightweight updates with &lt;code&gt;IN&lt;/code&gt;-subquery predicates.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/99023" rel="noopener noreferrer"&gt;PR #99023&lt;/a&gt;: patch parts without &lt;code&gt;_part_offset&lt;/code&gt; query-plan fix.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/99164" rel="noopener noreferrer"&gt;PR #99164&lt;/a&gt;: patch-parts column-order mismatch causing &lt;code&gt;LOGICAL_ERROR&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Memory and Resource Guardrails&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85641" rel="noopener noreferrer"&gt;PR #85641&lt;/a&gt;: &lt;code&gt;max_uncompressed_bytes_in_patches&lt;/code&gt; (default &lt;strong&gt;30 GiB&lt;/strong&gt;). New lightweight updates are rejected with &lt;code&gt;TOO_LARGE_LIGHTWEIGHT_UPDATES&lt;/code&gt; if patches accumulate beyond the threshold. This is the operational governor that prevents runaway patch growth from degrading reads forever.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/95231" rel="noopener noreferrer"&gt;PR #95231&lt;/a&gt;: fixes inaccurate memory accounting for large patch-part application that could trigger OOM-killer events.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/77922" rel="noopener noreferrer"&gt;PR #77922&lt;/a&gt;: parallel column flushes during vertical merges.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Correctness and Crash Fixes (Late 2025 to April 2026)&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/82945" rel="noopener noreferrer"&gt;PR #82945&lt;/a&gt;: mutations snapshot built from parts visible in the query; consistency for on-fly + patch parts vs running mutations.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/97162" rel="noopener noreferrer"&gt;PR #97162&lt;/a&gt; (alexey-milovidov, 2026-02-17): fixes phantom entries in mutations' &lt;code&gt;parts_to_do&lt;/code&gt; that caused stuck mutations. Race condition where &lt;code&gt;PartCheckThread&lt;/code&gt; re-enqueued already-mutated parts; the fix adjusts &lt;code&gt;ReplicatedMergeTreeQueue&lt;/code&gt; to immediately remove obsolete parts.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/97347" rel="noopener noreferrer"&gt;PR #97347&lt;/a&gt; (Kirill Kopnev, 2026-02-20): scalar subquery in &lt;code&gt;ALTER UPDATE/DELETE&lt;/code&gt; could corrupt the mutation command and even make the table unloadable on restart. &lt;strong&gt;High-severity.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/98044" rel="noopener noreferrer"&gt;PR #98044&lt;/a&gt; (Raul Marin / &lt;code&gt;Algunenano&lt;/code&gt;, 2026-02-26): fixes mutation after lightweight update on tables with secondary indices. The cleanest example of how the legacy mutation framework and the new lightweight-update system needed to learn to coexist.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/101403" rel="noopener noreferrer"&gt;PR #101403&lt;/a&gt; (2026-04-22): fixes &lt;code&gt;UPDATE SET DateTime&lt;/code&gt; literal not being rewritten with session timezone, which was a silent data-corruption hazard.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Replicated-Side Concurrency&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/95771" rel="noopener noreferrer"&gt;PR #95771&lt;/a&gt; (2026-04-09): optimizes &lt;code&gt;ReplicatedMergeTree&lt;/code&gt; queue locks; reduces lock contention for SELECTs on replicated tables with mutations.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/87265" rel="noopener noreferrer"&gt;PR #87265&lt;/a&gt;: fixes lightweight UPDATE with &lt;code&gt;WHERE col IN (SELECT …)&lt;/code&gt; in replicated tables with partitions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The volume of stabilization work tells you something honest: a feature that lets you write &lt;code&gt;UPDATE&lt;/code&gt; against a columnar OLAP store &lt;em&gt;and&lt;/em&gt; finishes in milliseconds &lt;em&gt;and&lt;/em&gt; replicates correctly &lt;em&gt;and&lt;/em&gt; coexists with the legacy mutation framework is genuinely hard. ClickHouse's engineering team did the work. Running ClickHouse 25.8 or later gets you a feature that has been hardened in the open, with every fix traceable to a public PR.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;What Operational Controls Does ClickHouse Provide for UPDATE Workloads?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Beyond the headline features, the eight-year history added a set of operational levers that make UPDATE workloads predictable in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Exponential backoff for failed mutations&lt;/strong&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/58036" rel="noopener noreferrer"&gt;PR #58036&lt;/a&gt;, 2024). Default retry interval of 5 minutes for mutations that keep failing (e.g., a bad CAST). Prevents CPU and log-file blowup from hot-looping on a permanent error.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workload classification&lt;/strong&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/64061" rel="noopener noreferrer"&gt;PR #64061&lt;/a&gt;, 2024). The &lt;code&gt;mutation_workload&lt;/code&gt; and &lt;code&gt;merge_workload&lt;/code&gt; settings integrate with the workload scheduler so UPDATE mutations can be classed and throttled separately from merges and queries.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Server-level bandwidth throttling&lt;/strong&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/57877" rel="noopener noreferrer"&gt;PR #57877&lt;/a&gt;, 2024). The &lt;code&gt;max_mutations_bandwidth_for_server&lt;/code&gt; setting caps the I/O bandwidth mutations can consume cluster-wide.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre-submit query validation&lt;/strong&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/71300" rel="noopener noreferrer"&gt;PR #71300&lt;/a&gt;, 2024). The full mutation query, including subqueries, is validated before being queued. Prevents queue-blocking dead mutations.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Throttling caps&lt;/strong&gt;. &lt;code&gt;number_of_mutations_to_delay&lt;/code&gt;, &lt;code&gt;number_of_mutations_to_throw&lt;/code&gt;, and &lt;code&gt;max_number_of_mutations_for_replica&lt;/code&gt; cap queued and concurrent mutation counts.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Replication coalescing limit&lt;/strong&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/48731" rel="noopener noreferrer"&gt;PR #48731&lt;/a&gt;, 2023). &lt;code&gt;replicated_max_mutations_in_one_entry&lt;/code&gt; (default 10000) bounds how many mutation commands are coalesced into one ZooKeeper entry, preventing OOM on startup.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lightweight-DELETE-with-projections control&lt;/strong&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/66169" rel="noopener noreferrer"&gt;PR #66169&lt;/a&gt;). &lt;code&gt;lightweight_mutation_projection_mode&lt;/code&gt; (&lt;code&gt;throw&lt;/code&gt; / &lt;code&gt;drop&lt;/code&gt; / &lt;code&gt;rebuild&lt;/code&gt;) gives operators explicit control over how lightweight DELETE interacts with materialized projections.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these individually make a press release. Together, they are what "production-grade UPDATE support in a columnar database" actually requires.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;How Do You Monitor ClickHouse UPDATE Performance and Health?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;If you cannot see what your UPDATEs are doing, you cannot run them in production. The 2024 to 2026 observability additions are substantial:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;latest_fail_error_code_name&lt;/code&gt;&lt;/strong&gt; in &lt;code&gt;system.mutations&lt;/code&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/72398" rel="noopener noreferrer"&gt;PR #72398&lt;/a&gt;). Enables automated alerting on specific failure classes.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;parts_postpone_reasons&lt;/code&gt;&lt;/strong&gt; in &lt;code&gt;system.mutations&lt;/code&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/92206" rel="noopener noreferrer"&gt;PR #92206&lt;/a&gt;, 2025-12-16). Lets operators diagnose stalled mutations instantly. "Why is this mutation not progressing?" used to require log-grepping. Now it is a column.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;mutation_ids&lt;/code&gt;&lt;/strong&gt; in &lt;code&gt;system.part_log&lt;/code&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/93811" rel="noopener noreferrer"&gt;PR #93811&lt;/a&gt;). For &lt;code&gt;MUTATE_PART&lt;/code&gt; and &lt;code&gt;MUTATE_PART_START&lt;/code&gt; events. Materially improves traceability during incident investigations.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;is_patch&lt;/code&gt;&lt;/strong&gt; in &lt;code&gt;system.parts&lt;/code&gt;. Distinguishes patch overlays from base parts, so operators can see directly how much patch material has accumulated.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-running mutation warnings&lt;/strong&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/78658" rel="noopener noreferrer"&gt;PR #78658&lt;/a&gt;). Adds a dynamic &lt;code&gt;system.warnings&lt;/code&gt; entry when mutations exceed &lt;code&gt;max_pending_mutations_execution_time_to_warn&lt;/code&gt;. Surfaces silently-stuck mutations without external monitoring.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;On-fly mutation metrics in &lt;code&gt;system.tables&lt;/code&gt;&lt;/strong&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/75738" rel="noopener noreferrer"&gt;PR #75738&lt;/a&gt;). Per-table visibility into the on-the-fly mutation backlog.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Independent background settings for mutate vs. merge&lt;/strong&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/93905" rel="noopener noreferrer"&gt;PR #93905&lt;/a&gt;). Previously the two shared the default profile, which made it impossible to isolate update resource usage from merges.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;What Are the Limitations of ClickHouse UPDATEs in 2026?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Fairness matters. A few things still require awareness:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Primary-key and partition-key columns still cannot be updated.&lt;/strong&gt; This is a structural property of MergeTree, not a missing feature. Changing the sort order would require rebuilding the part's primary index; if you genuinely need to change a key column, the right pattern is to insert into a new table with the desired key and swap.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Classical &lt;code&gt;ALTER TABLE … UPDATE&lt;/code&gt; mutations are still asynchronous by default.&lt;/strong&gt; They are the right tool for bulk backfills and schema-level corrections, but if you need read-after-write consistency, you need on-the-fly mutations, lightweight UPDATE, or &lt;code&gt;mutations_sync&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Patch parts have a read-time cost.&lt;/strong&gt; The umbrella issue cites approximately 7% to 18% read overhead while patches are unmerged. Background merges fold patches into base parts and the overhead disappears, but a workload that issues massive patch volume faster than merges can absorb will see sustained read regression.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;max_uncompressed_bytes_in_patches&lt;/code&gt; is a hard ceiling.&lt;/strong&gt; The 30 GiB default is a sensible starting point, but a workload generating patches faster than merges can consume them will eventually hit the cap and have new updates rejected with &lt;code&gt;TOO_LARGE_LIGHTWEIGHT_UPDATES&lt;/code&gt;. Tune it, monitor it.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The new analyzer is intentionally not used by &lt;code&gt;MutationsInterpreter&lt;/code&gt;.&lt;/strong&gt; &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/61528" rel="noopener noreferrer"&gt;PR #61528&lt;/a&gt; (2024) explicitly forces mutations to use the legacy analyzer, and &lt;a href="https://github.com/ClickHouse/ClickHouse/issues/61563" rel="noopener noreferrer"&gt;issue #61563&lt;/a&gt; tracking the migration remains open in early 2026. This is the largest outstanding planner gap on the UPDATE side.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lightweight UPDATE and classical mutations can interact.&lt;/strong&gt; Issues like &lt;a href="https://github.com/ClickHouse/ClickHouse/issues/98898" rel="noopener noreferrer"&gt;#98898&lt;/a&gt; (&lt;code&gt;LOGICAL_ERROR: Found patch part intersects mutation&lt;/code&gt;) and &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/98044" rel="noopener noreferrer"&gt;PR #98044&lt;/a&gt; show that the two systems are still being taught to coexist cleanly. Run a recent stable release.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ReplacingMergeTree&lt;/code&gt; with &lt;code&gt;FINAL&lt;/code&gt; is still the right tool for very high-volume CDC and upsert workloads.&lt;/strong&gt; Lightweight UPDATE is fast for low-to-medium volume row-level changes; for streams of millions of upserts per second, the engine-level deduplication model continues to win.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are real engineering trade-offs. Understanding them is part of making an informed decision.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;ClickHouse UPDATE Improvements Timeline (2018 to 2026)&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Year&lt;/th&gt;
&lt;th&gt;What Changed&lt;/th&gt;
&lt;th&gt;Key PRs&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2018&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Original &lt;code&gt;ALTER TABLE … UPDATE/DELETE&lt;/code&gt; lands&lt;/td&gt;
&lt;td&gt;ztlpn 2018 series&lt;/td&gt;
&lt;td&gt;First UPDATE statement. Heavyweight, async, replicated via ZooKeeper.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2019&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;KILL MUTATION&lt;/code&gt;, &lt;code&gt;system.mutations&lt;/code&gt; failure columns, &lt;code&gt;mutations_sync&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/4287" rel="noopener noreferrer"&gt;#4287&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/8237" rel="noopener noreferrer"&gt;#8237&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;UPDATE becomes diagnosable, cancellable, and waitable.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2020&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;IN PARTITION&lt;/code&gt; scoping, NULL semantics fix, &lt;code&gt;isAffectingAllColumns&lt;/code&gt; gate&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/13403" rel="noopener noreferrer"&gt;#13403&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/12153" rel="noopener noreferrer"&gt;#12153&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/12760" rel="noopener noreferrer"&gt;#12760&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;First SQL semantics extension. Partition pruning. Correct WHERE handling.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2021&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;MergeTask&lt;/code&gt;/&lt;code&gt;MutateTask&lt;/code&gt; refactor; replicated correctness wave&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/25165" rel="noopener noreferrer"&gt;#25165&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/22669" rel="noopener noreferrer"&gt;#22669&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/28889" rel="noopener noreferrer"&gt;#28889&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Architectural foundation for everything later. Replicated UPDATE becomes predictable.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2022&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Lightweight DELETE via &lt;code&gt;_row_exists&lt;/code&gt; mask&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/37893" rel="noopener noreferrer"&gt;#37893&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;200 ms vs 8 s. Wedge for the patch-part architecture.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2023&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Skip-index recalc, vertical Compact-to-Wide merges, &lt;code&gt;replicated_max_mutations_in_one_entry&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/55202" rel="noopener noreferrer"&gt;#55202&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/45681" rel="noopener noreferrer"&gt;#45681&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/48731" rel="noopener noreferrer"&gt;#48731&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Storage-feature integration. Mutation-storm safety.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2024&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Backoff, workload classification, bandwidth throttling, &lt;code&gt;latest_fail_error_code_name&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/58036" rel="noopener noreferrer"&gt;#58036&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/64061" rel="noopener noreferrer"&gt;#64061&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/57877" rel="noopener noreferrer"&gt;#57877&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/72398" rel="noopener noreferrer"&gt;#72398&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Production-grade operational controls. UPDATE becomes a first-class workload class.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2025&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;On-the-fly mutations; lightweight UPDATE / patch parts&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/74877" rel="noopener noreferrer"&gt;#74877&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/82004" rel="noopener noreferrer"&gt;#82004&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85641" rel="noopener noreferrer"&gt;#85641&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85952" rel="noopener noreferrer"&gt;#85952&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Standard SQL &lt;code&gt;UPDATE&lt;/code&gt;. Insert-like latency. 1,000× to 2,400× faster for single-row updates. Promoted to Beta.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2026&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stabilization: phantom-queue fix, secondary-index reconciliation, queue-lock optimization, timezone correctness&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/97162" rel="noopener noreferrer"&gt;#97162&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/98044" rel="noopener noreferrer"&gt;#98044&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/95771" rel="noopener noreferrer"&gt;#95771&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/101403" rel="noopener noreferrer"&gt;#101403&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Production hardening. New observability columns. Replicated concurrency improvements.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;When Should You Use Each ClickHouse Update Mechanism?&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workload&lt;/th&gt;
&lt;th&gt;Recommendation&lt;/th&gt;
&lt;th&gt;Reasoning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single-row UPDATEs from an application&lt;/td&gt;
&lt;td&gt;✅ Lightweight UPDATE (&lt;code&gt;UPDATE ... SET ... WHERE&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Insert-like latency, standard SQL syntax, immediate read-after-write visibility (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/82004" rel="noopener noreferrer"&gt;PR #82004&lt;/a&gt;).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scattered row-level updates from a service&lt;/td&gt;
&lt;td&gt;✅ Lightweight UPDATE&lt;/td&gt;
&lt;td&gt;Patch parts handle scattered writes far better than classical mutations.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bulk backfill of a column across millions of rows&lt;/td&gt;
&lt;td&gt;✅ &lt;code&gt;ALTER TABLE … UPDATE&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Classical mutation rewrites parts efficiently when the volume justifies the rewrite.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema-level correction (one-off fix for bad data)&lt;/td&gt;
&lt;td&gt;✅ &lt;code&gt;ALTER TABLE … UPDATE&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Async, runs in the background, no read-time overhead afterwards.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Continuous high-volume CDC / upsert stream&lt;/td&gt;
&lt;td&gt;✅ &lt;code&gt;ReplacingMergeTree&lt;/code&gt; + &lt;code&gt;FINAL&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Engine-level deduplication remains the most efficient path for millions of upserts per second.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Soft delete / mark-as-deleted&lt;/td&gt;
&lt;td&gt;✅ Lightweight DELETE&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;_row_exists&lt;/code&gt; mask is a single-column rewrite (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/37893" rel="noopener noreferrer"&gt;PR #37893&lt;/a&gt;).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hard delete with disk reclamation&lt;/td&gt;
&lt;td&gt;🟡 &lt;code&gt;ALTER TABLE … DELETE&lt;/code&gt; or &lt;code&gt;APPLY DELETED MASK&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Lightweight DELETE leaves data on disk until merge; force physical removal when compliance or reclamation requires it.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read-after-write consistency on a queued mutation&lt;/td&gt;
&lt;td&gt;✅ &lt;code&gt;apply_mutations_on_fly&lt;/code&gt; or &lt;code&gt;mutations_sync&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;On-the-fly application makes pending mutations visible to SELECTs immediately (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/74877" rel="noopener noreferrer"&gt;PR #74877&lt;/a&gt;).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Update to a primary-key or partition-key column&lt;/td&gt;
&lt;td&gt;❌ Not supported&lt;/td&gt;
&lt;td&gt;Insert into a new table with the desired key and swap. This is structural, not a missing feature.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Updates with non-deterministic functions in replicated tables&lt;/td&gt;
&lt;td&gt;❌ Not supported&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;rand()&lt;/code&gt; and &lt;code&gt;now()&lt;/code&gt; would diverge across replicas (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/7247" rel="noopener noreferrer"&gt;PR #7247&lt;/a&gt;).&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;How to Respond to "ClickHouse Doesn't Support Updates"&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Run the numbers on your data.&lt;/p&gt;

&lt;p&gt;When someone tells you ClickHouse cannot handle updates in 2026, ask them which version they tested against. If they are benchmarking ClickHouse 22.x or earlier, they are testing a system that does not include lightweight DELETE (2022), on-the-fly mutations (early 2025), lightweight UPDATE (mid-2025), patch parts (mid-2025), or the entire 2025 to 2026 stabilization wave.&lt;/p&gt;

&lt;p&gt;If they cite "ClickHouse is append-only" without acknowledging that &lt;code&gt;ALTER TABLE … UPDATE&lt;/code&gt; shipped in v18.12, they are working from 2017 documentation.&lt;/p&gt;

&lt;p&gt;If they cite "no standard SQL UPDATE syntax," they have not read &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/82004" rel="noopener noreferrer"&gt;PR #82004&lt;/a&gt; or the &lt;a href="https://clickhouse.com/blog/updates-in-clickhouse-2-sql-style-updates" rel="noopener noreferrer"&gt;Updates in ClickHouse blog series&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If they cite "all updates rewrite entire parts," they are describing one of three update mechanisms (the classical heavyweight one) and ignoring the other two (lightweight DELETE and lightweight UPDATE) plus the engine-level upsert pattern (&lt;code&gt;ReplacingMergeTree&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;If they cite "you need &lt;code&gt;allow_experimental_lightweight_delete&lt;/code&gt;," they have not run a stable ClickHouse release in years.&lt;/p&gt;

&lt;p&gt;The commit history does not lie. 100+ pull requests across eight years. Standard SQL &lt;code&gt;UPDATE&lt;/code&gt; syntax. Insert-like latency for single-row updates. Production-grade observability. Workload isolation. Bandwidth throttling. Patch-part guardrails. Phantom-queue race conditions fixed in February 2026 by Alexey Milovidov himself.&lt;/p&gt;

&lt;p&gt;ClickHouse's UPDATE subsystem in 2026 bears no resemblance to the one that earned the "append-only" label. The engineers built a real update story, and the evidence is in the PRs.&lt;/p&gt;

&lt;p&gt;Test it on your workload. That is the only benchmark that matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;ClickHouse UPDATE FAQ&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Does ClickHouse support standard SQL &lt;code&gt;UPDATE&lt;/code&gt;?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Yes. ClickHouse 25.7 (July 2025) added standard SQL &lt;code&gt;UPDATE table SET col = expr WHERE …&lt;/code&gt; for MergeTree-family tables via &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/82004" rel="noopener noreferrer"&gt;PR #82004&lt;/a&gt;. It uses a "patch part" architecture and was promoted to Beta with default enablement in version 25.8.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Is ClickHouse append-only?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;No. ClickHouse stopped being append-only in August 2018, when v18.12 added &lt;code&gt;ALTER TABLE … UPDATE&lt;/code&gt;. Standard SQL &lt;code&gt;UPDATE&lt;/code&gt; arrived in v25.7 (July 2025). The "append-only" label is accurate only for the 2016 to 2017 era.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Do all ClickHouse UPDATEs rewrite entire parts?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;No. ClickHouse offers three update paths. Lightweight UPDATE (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/82004" rel="noopener noreferrer"&gt;PR #82004&lt;/a&gt;) writes a small patch part containing only changed columns, with no part rewrite. Lightweight DELETE (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/37893" rel="noopener noreferrer"&gt;PR #37893&lt;/a&gt;) rewrites only the &lt;code&gt;_row_exists&lt;/code&gt; virtual column. Classical &lt;code&gt;ALTER TABLE … UPDATE&lt;/code&gt; rewrites affected parts and is the right mechanism for bulk backfills.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Are ClickHouse UPDATEs eventually consistent?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;It depends on the mechanism. Classical &lt;code&gt;ALTER TABLE … UPDATE&lt;/code&gt; is asynchronous by default. Lightweight UPDATE and on-the-fly mutations (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/74877" rel="noopener noreferrer"&gt;PR #74877&lt;/a&gt;) provide immediate read-after-write visibility. The &lt;code&gt;mutations_sync&lt;/code&gt; setting forces synchronous semantics on demand. You choose the consistency model per workload.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What is &lt;code&gt;ReplacingMergeTree&lt;/code&gt; and when should you use it?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;ReplacingMergeTree&lt;/code&gt; is a ClickHouse engine that resolves duplicates on the sorting key during background merges. Use it for high-volume CDC and upsert workflows: updates are ingested as new rows, and deduplication runs asynchronously. Add the &lt;code&gt;FINAL&lt;/code&gt; keyword to SELECT queries for guaranteed deduplicated reads. &lt;code&gt;FINAL&lt;/code&gt; has been heavily optimized for production use.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What is the read-time overhead of ClickHouse patch parts?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Approximately 7% to 18% on average while patches are unmerged, per umbrella issue &lt;a href="https://github.com/ClickHouse/ClickHouse/issues/82033" rel="noopener noreferrer"&gt;#82033&lt;/a&gt;. Background merges fold patches into base parts, after which the overhead disappears. The &lt;code&gt;max_uncompressed_bytes_in_patches&lt;/code&gt; setting (default 30 GiB, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85641" rel="noopener noreferrer"&gt;PR #85641&lt;/a&gt;) caps total patch accumulation.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Do you still need &lt;code&gt;allow_experimental_lightweight_delete&lt;/code&gt; in ClickHouse?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;No. Lightweight DELETE has been GA for years and is the default &lt;code&gt;DELETE&lt;/code&gt; implementation in modern ClickHouse releases. The experimental flag is no longer required.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Can you cancel a stuck UPDATE in ClickHouse?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Yes. &lt;code&gt;KILL MUTATION&lt;/code&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/4287" rel="noopener noreferrer"&gt;PR #4287&lt;/a&gt;, 2019) works on both MergeTree and ReplicatedMergeTree. The &lt;code&gt;system.mutations&lt;/code&gt; table exposes &lt;code&gt;latest_fail_reason&lt;/code&gt;, &lt;code&gt;latest_failed_part&lt;/code&gt;, and &lt;code&gt;latest_fail_time&lt;/code&gt;. Since late 2025, &lt;code&gt;parts_postpone_reasons&lt;/code&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/92206" rel="noopener noreferrer"&gt;PR #92206&lt;/a&gt;) tells you exactly why a mutation is not progressing.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Can you UPDATE primary-key or partition-key columns in ClickHouse?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;No. This is structural, not a missing feature. Changing a key column would require rebuilding the part's primary index. The recommended pattern is to insert into a new table with the desired key and swap.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How fast is ClickHouse lightweight UPDATE compared to classical mutations?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Up to 1,000× to 2,400× faster for single-row updates, per ClickHouse's &lt;a href="https://clickhouse.com/blog/updates-in-clickhouse-3-benchmarks" rel="noopener noreferrer"&gt;Updates in ClickHouse, Part 3&lt;/a&gt; benchmark blog post. Classical mutation latency is bounded by the merge schedule; lightweight UPDATE has insert-like latency because it writes a small patch part instead of rewriting affected parts.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Does ClickHouse use the new query analyzer for mutations?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Not yet. &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/61528" rel="noopener noreferrer"&gt;PR #61528&lt;/a&gt; (2024) explicitly forces &lt;code&gt;MutationsInterpreter&lt;/code&gt; to use the legacy analyzer. The migration is tracked in &lt;a href="https://github.com/ClickHouse/ClickHouse/issues/61563" rel="noopener noreferrer"&gt;issue #61563&lt;/a&gt;, still open in early 2026. This is the largest outstanding planner gap on the UPDATE side.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Analysis based on 100+ GitHub pull requests, official ClickHouse changelogs, and release blog posts covering the period 2018 to April 2026. Every claim maps to a specific merged PR, issue, or blog post. Verify the evidence yourself; the commit history is public.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Reference reading: &lt;a href="https://clickhouse.com/blog/updates-in-clickhouse-1-purpose-built-engines" rel="noopener noreferrer"&gt;Updates in ClickHouse, Part 1: Purpose-Built Engines&lt;/a&gt; · &lt;a href="https://clickhouse.com/blog/updates-in-clickhouse-2-sql-style-updates" rel="noopener noreferrer"&gt;Part 2: SQL-Style Updates&lt;/a&gt; · &lt;a href="https://clickhouse.com/blog/updates-in-clickhouse-3-benchmarks" rel="noopener noreferrer"&gt;Part 3: Benchmarks&lt;/a&gt; · &lt;a href="https://clickhouse.com/blog/handling-updates-and-deletes-in-clickhouse" rel="noopener noreferrer"&gt;Handling Updates and Deletes in ClickHouse&lt;/a&gt; · &lt;a href="https://clickhouse.com/docs/sql-reference/statements/update" rel="noopener noreferrer"&gt;SQL Reference: UPDATE&lt;/a&gt; · &lt;a href="https://clickhouse.com/docs/updating-data/overview" rel="noopener noreferrer"&gt;Updating Data Overview&lt;/a&gt; · &lt;a href="https://clickhouse.com/docs/guides/replacing-merge-tree" rel="noopener noreferrer"&gt;ReplacingMergeTree Guide&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>database</category>
      <category>analytics</category>
      <category>dataengineering</category>
      <category>sql</category>
    </item>
    <item>
      <title>Claude Code for the Outer Loop: An AI SRE Playbook to Reduce On-Call Toil</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Wed, 22 Apr 2026 22:03:16 +0000</pubDate>
      <link>https://forem.com/arcade/claude-code-for-the-outer-loop-an-ai-sre-playbook-to-reduce-on-call-toil-1ghd</link>
      <guid>https://forem.com/arcade/claude-code-for-the-outer-loop-an-ai-sre-playbook-to-reduce-on-call-toil-1ghd</guid>
      <description>&lt;p&gt;It is 2:13am. PagerDuty fires for checkout-service, p95 past threshold for four minutes. You open Datadog, find the wrong dashboard, then the right one, then the CI tool for recent deploys, then Jira for open incidents, then #incidents in Slack to check whether a co-worker is already in the war room. Eight minutes in, you have a working hypothesis.&lt;/p&gt;

&lt;p&gt;That is not incident response. That is a context-loading tax the on-call pays before the work begins.&lt;/p&gt;

&lt;p&gt;Coding agents, such as Claude Code, are eating the inner loop. The outer loop is a different story. Operational work (incident response, runbook execution, SLO investigation, on-call handoffs) still looks almost identical to how it looked five years ago. The gap is not the model. It is the infrastructure to run agentic tools across a team, against production, with the auth, scope, and audit guarantees an SRE program needs.&lt;/p&gt;

&lt;p&gt;This article is about the execution layer. The data substrate underneath is the other half of the problem, and I've written about it on &lt;a href="https://clickhouse.com/blog/ai-sre-observability-architecture" rel="noopener noreferrer"&gt;the ClickHouse blog.&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code already works in the outer loop.&lt;/strong&gt; The interface, the reasoning, the tool-call contract all transfer. What changes is the data sources.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Five workflows prove it.&lt;/strong&gt; Incident triage, runbook execution, postmortem drafting, SLO investigation, on-call handoffs. Every one of them is Claude-shaped.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The auth, scope, and audit gap is the bottleneck.&lt;/strong&gt; The MCP servers for most SaaS tools already exist. The problem is that when every engineer wires their own connection, you inherit inconsistent authorization, over-scoped credentials, and no audit trail. Useful to one person at best. A data exposure incident at worst.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The gap is an MCP runtime, not a model.&lt;/strong&gt; Managed auth, hosted compute, tool-level governance, persistent audit logs. Until something provides all four, outer-loop AI stays a party trick.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An MCP runtime is more than an MCP gateway.&lt;/strong&gt; A gateway routes MCP tools under one URL. An MCP runtime adds the compute that runs them, the auth that scopes them, and the audit trail that makes them safe in production. Arcade.dev is an MCP runtime with a gateway inside it.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Five AI SRE workflows and the MCP servers that power them
&lt;/h2&gt;

&lt;p&gt;If you only read one thing in this article, read this table.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Workflow&lt;/th&gt;
&lt;th&gt;MCP servers&lt;/th&gt;
&lt;th&gt;What Claude Code does&lt;/th&gt;
&lt;th&gt;What on-call does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Incident triage&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://docs.arcade.dev/en/resources/integrations/development/pagerduty" rel="noopener noreferrer"&gt;PagerDuty&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/development/datadog-api" rel="noopener noreferrer"&gt;Datadog&lt;/a&gt;, &lt;a href="https://www.arcade.dev/tools/slack" rel="noopener noreferrer"&gt;Slack&lt;/a&gt;, &lt;a href="https://www.arcade.dev/tools/github" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Pulls the PagerDuty payload, correlates Datadog signals in the window, checks recent deploys, scans Jira and #incidents, drafts a war room post&lt;/td&gt;
&lt;td&gt;Decides the next move&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Runbook execution&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://www.arcade.dev/tools/confluence" rel="noopener noreferrer"&gt;Confluence&lt;/a&gt;, &lt;a href="https://github.com/containers/kubernetes-mcp-server" rel="noopener noreferrer"&gt;Kubernetes&lt;/a&gt;, &lt;a href="https://www.arcade.dev/tools/github" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Parses the Confluence doc into steps, lays out the diagnostic sequence with commands and expected output, proposes any write command&lt;/td&gt;
&lt;td&gt;Runs the steps, approves every write&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Postmortem drafting&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://www.arcade.dev/tools/slack" rel="noopener noreferrer"&gt;Slack&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/development/pagerduty" rel="noopener noreferrer"&gt;PagerDuty&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/development/datadog-api" rel="noopener noreferrer"&gt;Datadog&lt;/a&gt;, &lt;a href="https://www.arcade.dev/tools/confluence" rel="noopener noreferrer"&gt;Confluence&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Reconstructs the timeline from Slack, PagerDuty, Datadog, and the deploy log, fills the team template with source-linked evidence&lt;/td&gt;
&lt;td&gt;Writes the root cause and action items&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;SLO investigation&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://docs.arcade.dev/en/resources/integrations/development/datadog-api" rel="noopener noreferrer"&gt;Datadog&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/development/pagerduty" rel="noopener noreferrer"&gt;PagerDuty&lt;/a&gt;, &lt;a href="https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-agents-mcp" rel="noopener noreferrer"&gt;Snowflake&lt;/a&gt;, &lt;a href="https://www.arcade.dev/tools/confluence" rel="noopener noreferrer"&gt;Confluence&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Finds the burn inflection, correlates deploys, config changes, traffic shifts, and upstream incidents, ranks hypotheses with linked evidence&lt;/td&gt;
&lt;td&gt;Evaluates hypotheses, decides action items&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;On-call handoff&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://docs.arcade.dev/en/resources/integrations/development/pagerduty" rel="noopener noreferrer"&gt;PagerDuty&lt;/a&gt;, &lt;a href="https://docs.arcade.dev/en/resources/integrations/development/datadog-api" rel="noopener noreferrer"&gt;Datadog&lt;/a&gt;, &lt;a href="https://www.arcade.dev/tools/slack" rel="noopener noreferrer"&gt;Slack&lt;/a&gt;, &lt;a href="https://www.arcade.dev/tools/zendesk" rel="noopener noreferrer"&gt;Zendesk&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Assembles the shift briefing from pages, active incidents, baking deploys, SLO burn, and open action items, delivers it as a Slack DM&lt;/td&gt;
&lt;td&gt;Reviews, adds color, signs off&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Workflow 1: Incident triage is mostly archaeology
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario
&lt;/h3&gt;

&lt;p&gt;The manual triage above is a parallelism problem, not a skill problem. One engineer, five workflows, sequential context loads. Every on-call engineer I know tells the same story: "I spent the first ten minutes figuring out what was happening."&lt;/p&gt;

&lt;h3&gt;
  
  
  What Claude Code does
&lt;/h3&gt;

&lt;p&gt;Hand the alert to Claude Code: "Triage this particular alert, correlated with the Datadog metrics, service logs, and the deployment history. Scan Slack history for other correlated failures."&lt;/p&gt;

&lt;p&gt;Claude Code returns the alert context in two sentences, the top three correlated signals with direct Datadog links, and the deploys most likely to matter by service-graph proximity with commit SHAs and authors. Two to three minutes end to end, running while you are opening the laptop. Grafana's team &lt;a href="https://grafana.com/blog/a-tale-of-two-incident-responses-how-our-ai-assist-helped-us-find-the-cause-3-5x-faster/" rel="noopener noreferrer"&gt;reported a 3.5x reduction&lt;/a&gt; in time to root cause using a similar pattern.&lt;/p&gt;

&lt;h3&gt;
  
  
  What on-call does
&lt;/h3&gt;

&lt;p&gt;By the time the on-call moves from the alert on their phone to opening their laptop, Claude Code's initial analysis is waiting. They read the summary, validate it against the dashboards, cross-reference the ranked deploys against what they know shipped recently, and decide the next move. They also catch the failure modes: the correlation that is spurious, the deploy the service graph does not know about, the #incidents thread that was noise. Claude Code compresses the archaeology. The on-call judges it.&lt;/p&gt;

&lt;h3&gt;
  
  
  The auth, scope, and audit gap
&lt;/h3&gt;

&lt;p&gt;PagerDuty, Datadog, Slack, Jira, and GitHub all ship MCP servers. The problem is running them across a team, not building them.&lt;/p&gt;

&lt;p&gt;If the setup is not configured consistently for every engineer on the rotation, the workflow breaks on the shift that needs it most. Misconfigured permissions lead to inconclusive analysis, and inconclusive analysis at 3am is worse than no analysis at all. Engineers who wire up their own connections often grant themselves broader scopes than the workflow needs, and the next access review turns into cleanup nobody planned for. The failure mode that matters most: if tool access is not scoped properly, a diagnostic step can inadvertently trigger a write action, mutate state in production, and turn the triage itself into the incident. Consistent setup, scoped credentials, and read-only enforcement are properties of the MCP runtime, not the individual engineer's configuration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workflow 2: Runbook execution at 3am
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario
&lt;/h3&gt;

&lt;p&gt;Mature teams maintain their runbooks. The ones in constant use stay fresh because people fix them after every incident. The rot lives in two quieter places. Runbooks that fire once a quarter drift between uses, and nobody notices until the next 3am page reveals that half the commands point at deprecated tools and renamed clusters. And new engineers on the rotation often do not know which runbook applies to the alert in front of them. Finding the right doc at 3am is its own skill, and it takes months on the rotation to build.&lt;/p&gt;

&lt;p&gt;"Runbooks are a lie we tell ourselves."&lt;/p&gt;

&lt;p&gt;During my time leading &lt;a href="https://www.confluent.io/blog/making-apache-kafka-10x-more-reliable/" rel="noopener noreferrer"&gt;reliability at Confluent&lt;/a&gt; and Dropbox, I saw this pattern play out across very different stacks. It is not an organization-specific problem. It is the law of prioritization playing out: the runbooks that fire often get the attention, and the ones that fire rarely do not.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Claude Code does
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Finding the right runbook.&lt;/strong&gt; Once triage narrows the problem, the on-call needs to know which runbook applies and what to run. Point Claude Code at the alert. It matches the metadata (service, symptom, tag) against the runbook index, surfaces the top candidate, and lays out the diagnostic sequence with exact commands, the systems they target, and expected output for each step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keeping runbooks fresh.&lt;/strong&gt; Most mature teams run quality weeks or reliability sprints to refresh runbooks. At Confluent, we did this quarterly. Claude Code makes the sprint cheaper since this is a safe environment: replay every runbook against staging in a batch, flag the commands pointing at deprecated tools and renamed clusters, regenerate steps against current infra. The rot that accumulated since the last review gets caught in hours instead of weeks.&lt;/p&gt;

&lt;h3&gt;
  
  
  What on-call does
&lt;/h3&gt;

&lt;p&gt;The on-call runs the steps. Claude Code lays out the plan, the engineer executes it. Opening unbounded production access to a coding agent does not pass the sniff test for any reliability org I have worked with, and should not. The engineer confirms Claude Code picked the right runbook, runs each diagnostic in their own terminal with their own scoped credentials, and tracks pass/fail as they go. When Claude Code picks the wrong runbook, the on-call re-points it, and that correction feeds the index for the next page.&lt;/p&gt;

&lt;h3&gt;
  
  
  The auth, scope, and audit gap
&lt;/h3&gt;

&lt;p&gt;If Claude Code does not execute against production directly, enforcement becomes the whole game. The runbook has to be scoped to the user running it, the environment it targets, and the actions the current step actually needs. A step that is safe in staging is dangerous in prod. A step that is safe for a senior SRE is catastrophic for a new joiner still learning the cluster. Without tool-level governance that understands user, environment, and action together, you are back to trusting every engineer to read carefully at 3am, which is exactly the failure mode the runbook was supposed to prevent. Finding the right runbook and enforcing the right scopes are two different problems. Claude Code solves the first. The MCP runtime solves the second, with governance scoped per user, per environment, and per action. Both have to work, and neither replaces the other.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workflow 3: Postmortem drafting rots at the archaeology step
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario
&lt;/h3&gt;

&lt;p&gt;The incident resolved at 4pm. The retro is Thursday. Someone has to write the draft. The hard part is not the thinking. It is the archaeology: Slack scrollback, PagerDuty timeline, Datadog graphs, deploy history, team template. The &lt;a href="https://incident.io/blog/postmortem-software-roi-calculator" rel="noopener noreferrer"&gt;incident.io team puts manual reconstruction&lt;/a&gt; at 60 to 90 minutes per incident. That matches every team I have run.&lt;/p&gt;

&lt;p&gt;Most postmortems get drafted badly at the last minute. The retro starts from a weak foundation, and the same incident class comes back six months later.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Claude Code does
&lt;/h3&gt;

&lt;p&gt;Type into Claude Code: "Draft the postmortem for INC-4729 using the team template." Claude Code assembles the archaeology. It pulls the Slack transcript, the PagerDuty timeline, the Datadog panels from the incident dashboard, and the deploy log for every service touched. It drops each of those into the team template with source links, so every timeline entry traces back to the panel, commit, or message it came from.&lt;/p&gt;

&lt;p&gt;The draft stops at archaeology. Timeline, impact, affected services, evidence. The root cause, contributing factors, and action items fields are left structurally empty. Teams that let AI draft those turn every retro into a cleanup exercise. &lt;a href="https://engineering.zalando.com/posts/2025/09/dead-ends-or-data-goldmines-ai-powered-postmortem-analysis.html" rel="noopener noreferrer"&gt;Zalando's team reported hallucination rates as high as 40 percent&lt;/a&gt; in early AI-drafted postmortem analysis, and the lesson is not better prompting. It is to keep anything causal out of the draft.&lt;/p&gt;

&lt;h3&gt;
  
  
  What on-call does
&lt;/h3&gt;

&lt;p&gt;The on-call and the retro group review the draft. They are not rewriting it. They correct timeline entries that are wrong, add the signal the archaeology missed (a customer report that came through email, a related incident three days earlier, the deploy two sprints ago that introduced the latent bug), and spend their time on the part that matters: running the 5 whys, pressure-testing the root cause, deciding action items.&lt;/p&gt;

&lt;p&gt;The leverage is strongest on the long tail. In my experience, eighty to ninety percent of incidents a mature team handles are high-volume, low-priority events where the archaeology is mechanical and the writeup feels mundane. That is where teams cut corners, and where repeat incidents quietly accumulate. Claude Code absorbs the mundane work so the high-judgment work gets attention on every incident, not just the big ones.&lt;/p&gt;

&lt;h3&gt;
  
  
  The auth, scope, and audit gap
&lt;/h3&gt;

&lt;p&gt;The tools the draft pulls from carry the most sensitive data in the company. #incidents has customer PII and vendor secrets. The deploy log has commit messages that sometimes leak security context. Datadog dashboards expose traffic patterns across the fleet. The engineer who set up the Slack connector usually has broader workspace read than the postmortem role needs, and the draft ends up citing messages it had no business reading.&lt;/p&gt;

&lt;p&gt;Scoping has to happen at the tool layer, not the prompt layer. Which channels the draft can read, which dashboards it can fetch, which tables it can query, all bounded by policy and tied to the user triggering the workflow. Then a provenance trail in a persistent log, showing what the AI accessed, when, and under whose identity. That is the half compliance will ask about, and the half that decides whether the workflow survives its first security review.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workflow 4: SLO investigation and error budget reviews
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario
&lt;/h3&gt;

&lt;p&gt;At Confluent, my team reviewed our availability SLO every Monday. We pulled the week's incidents, measured their impact on the SLO and the customer SLA, and mapped the root causes from each postmortem back to services and themes. The goal was to see whether the week's error budget had been spent on one repeat problem or scattered across five unrelated ones.&lt;/p&gt;

&lt;p&gt;Most of the prep was manual correlation: error budget delta, matched to PagerDuty incident, matched to Datadog regression, matched to deploy history, matched to the postmortem, matched to the theme bucket. One SRE typically spent four to six hours on that pipeline before the meeting started. The thinking happened in the review. The prep was legwork.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Claude Code does
&lt;/h3&gt;

&lt;p&gt;Ask Claude Code to prep the Monday review. It pulls the SLO and SLA deltas, fetches every PagerDuty incident in the window, joins each to the Datadog regression that matches in time and service, pulls the postmortem from Confluence, and extracts the root cause section. It groups root causes into themes using the team's existing taxonomy and hands back a structured brief: error budget delta, the incidents that account for it, the themes, and the open questions the postmortems did not resolve.&lt;/p&gt;

&lt;p&gt;What Claude Code does not do is quantify how much of the burn each incident "caused" in percentage terms. That is causal analysis current models do poorly, and a made-up percentage in a metrics review is worse than no number.&lt;/p&gt;

&lt;p&gt;The AI hunts. The human decides.&lt;/p&gt;

&lt;h3&gt;
  
  
  What on-call does
&lt;/h3&gt;

&lt;p&gt;The SRE running the review reads the brief, validates the incident-to-regression matches (Claude Code will get some wrong), writes the causal story the AI refused to guess at, decides which themes warrant action items, and raises the open questions in the meeting. Four hours of prep becomes thirty minutes of review and correction.&lt;/p&gt;

&lt;h3&gt;
  
  
  The auth, scope, and audit gap
&lt;/h3&gt;

&lt;p&gt;Warehouse-backed workflows are the ones SRE teams have held off on the longest, and the reason is scope. You cannot hand Claude Code unrestricted warehouse access and hope prompt engineering keeps it away from PII. You cannot give it unbounded query budgets and wait to see a five-thousand-dollar scan on next month's bill. Scope enforcement at the MCP runtime layer is what changes the math: this task queries these tables and not others, costs less than fifty dollars, never touches prod write paths. Without that, the workflow stays a prototype and never makes the rotation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Workflow 5: On-call handoffs lose the context nobody wrote down
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario
&lt;/h3&gt;

&lt;p&gt;Handoffs are the most undervalued ritual in SRE work because the incidents they prevent never get counted. Handoff quality tracks how tired the outgoing engineer is, which means handoffs are worst on the shifts that had the most incidents, which is when they matter most. The non-obvious cost: the morning incident where the new on-call did not know a deploy was still baking, and ends up paging the previous on-call at 8am to ask what happened overnight.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Claude Code does
&lt;/h3&gt;

&lt;p&gt;Claude Code generates the briefing at the rotation boundary, without anyone triggering it. It pulls the last 24 hours of pages with resolution notes, active incidents, baking deploys, SLOs that crossed a burn threshold, unresolved #incidents threads, Zendesk escalations, and customer reports that came in through the on-call email alias. It lists open action items assigned to the rotation. It delivers the briefing as a Slack DM with a copy in the team's handoff Confluence doc.&lt;/p&gt;

&lt;h3&gt;
  
  
  What on-call does
&lt;/h3&gt;

&lt;p&gt;The outgoing engineer adds the color only they can add: what they think is a false alarm, which customer report to watch, which deploy they are nervous about, which alert they silenced and why. That is the handoff knowledge that lives in the outgoing engineer's head and nowhere else. Claude Code assembles the facts. The on-call provides the judgment.&lt;/p&gt;

&lt;h3&gt;
  
  
  The auth, scope, and audit gap
&lt;/h3&gt;

&lt;p&gt;The briefing fires at 5pm whether anyone is logged in or not, which means it needs a credential that lives outside any single engineer's session. Dotfiles on a closed laptop do not qualify. A scheduled workflow without a persistent service identity is not a workflow. It is a cron job that silently stops running the next time someone rotates off the team. Persistent service identity is a property of the MCP runtime, not the engineer's laptop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Code is a companion, not an autonomous AI SRE
&lt;/h2&gt;

&lt;p&gt;Five workflows, one pattern. Claude Code reads, correlates, drafts, and waits. The human decides.&lt;/p&gt;

&lt;p&gt;Most of the AI SRE market is betting the other way. &lt;a href="https://traversal.com/" rel="noopener noreferrer"&gt;Traversal&lt;/a&gt;, &lt;a href="https://resolve.ai/" rel="noopener noreferrer"&gt;Resolve&lt;/a&gt;, &lt;a href="https://www.anyshift.io/" rel="noopener noreferrer"&gt;Anyshift&lt;/a&gt;, and others are building toward autonomous agents that page, remediate, and close incidents on their own. I am skeptical. A model's output is a function of its capability and the context it is given. Current models can do the archaeology reliably. They cannot reliably be given enough scoped context and the right tools to remediate production unsupervised. That is a context and tooling gap, not a model gap, and I would rather ship the shape that already works.&lt;/p&gt;

&lt;p&gt;Claude Code runs when you ask. It stops when the next step needs judgment. It never pages, rolls back, or closes an incident on its own.&lt;/p&gt;

&lt;p&gt;A companion also dodges the procurement fight that stalls autonomous rollouts. You are not replacing a role or adding an on-call tier. You are pointing the tool your team already uses at data sources they already trust, with an MCP runtime that scopes what it can do. The security review goes from "new vendor, new risk" to "scoped tools inside an existing agent."&lt;/p&gt;

&lt;p&gt;Every workflow in this article starts as a prompt and grows into a skill. The triage prompt, the runbook dispatcher, the postmortem drafter, the SLO prep pipeline, the handoff briefing: each one begins as something one engineer types once, and becomes a packaged skill every engineer on the rotation invokes the same way. The skill keeps getting sharper because the team keeps editing it: a new data source here, a tighter prompt there, a correction after an incident surfaces a blind spot. One person's trick becomes team infrastructure, and the infrastructure compounds.&lt;/p&gt;

&lt;p&gt;Reliability comes from running a proper reliability program, and a proper program is mostly operational work around rituals: triage, runbooks, postmortems, SLO reviews, handoffs. Claude Code earns its keep by making the rituals cheap enough to happen on every shift, not just the ones where someone has the energy for them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What an AI SRE needs from its MCP tool integration layer
&lt;/h2&gt;

&lt;p&gt;Every workflow above needs the same four things.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Managed authentication and authorization across tools.&lt;/strong&gt; OAuth flows for every connected tool, credentials refreshed automatically, scoped per user, reachable from any device including a phone at 3am.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Managed compute, always on, team-wide.&lt;/strong&gt; Tools run on shared infrastructure, cloud-hosted or on-prem, with the same behavior whether the trigger came from a laptop, a phone, a webhook, or a cron job.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool- and agent-level governance.&lt;/strong&gt; Per-tool permission policies, per-task cost budgets, and per-query data access limits enforced where the call happens, not where the model proposes it. This is the difference between a workflow security will approve and one they kill on sight.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistent audit logs.&lt;/strong&gt; Every tool call logged with triggering user, arguments, response, and timestamp, in a log the agent cannot modify. Without this you cannot retro the AI, and you cannot trust it.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Arcade: an MCP runtime for AI SRE workflows
&lt;/h2&gt;

&lt;p&gt;Arcade is an &lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;MCP runtime&lt;/a&gt; built to close exactly this gap. &lt;a href="https://www.arcade.dev/blog/sso-for-ai-agents-authentication-and-authorization-guide/" rel="noopener noreferrer"&gt;Managed OAuth&lt;/a&gt; handles every connected tool, with credentials that refresh automatically and never touch the language model. Every tool call runs &lt;a href="https://docs.arcade.dev/en/guides/create-tools/tool-basics/runtime-data-access" rel="noopener noreferrer"&gt;on behalf of the user&lt;/a&gt; who triggered it, so native permissions in PagerDuty, Datadog, and Snowflake apply exactly as they would outside the agent. You connect PagerDuty once, and every Claude Code session on your team picks it up at the right scope.&lt;/p&gt;

&lt;p&gt;The runtime runs tools on hosted workers, deployable in your cloud or on-prem, and enforces per-tool policies where the call happens, not where the model proposes it. The same workflow triggered from a phone, a laptop, or a cron job executes on shared infrastructure. Policies fire at the MCP runtime layer: "this workflow queries these Snowflake tables and not others," "this workflow can propose PagerDuty actions but cannot execute without approval," "this workflow has a $25 query budget."&lt;/p&gt;

&lt;p&gt;Every tool call lands in an OpenTelemetry-compatible run log with triggering user, arguments, response, and timestamp. It drops straight into the observability pipeline your platform team already runs. When your postmortem asks what Claude Code did during the incident, you have the answer. When compliance asks for every query this AI ran against the warehouse last quarter, you have the answer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.arcade.dev/toolkits" rel="noopener noreferrer"&gt;Prebuilt tools&lt;/a&gt; ship for PagerDuty, Datadog, Slack, Jira, Confluence, GitHub, Snowflake, and more. You can also &lt;a href="https://docs.arcade.dev/en/home/custom-mcp-server-quickstart" rel="noopener noreferrer"&gt;bring your own MCP servers&lt;/a&gt; into the runtime: the PagerDuty, Datadog, Snowflake, and Kubernetes servers linked in the table above drop in as-is and inherit the same managed auth, policy enforcement, and audit logs as the prebuilt ones. You extend your existing MCP investment instead of replacing it.&lt;/p&gt;

&lt;p&gt;You can build this without Arcade, and the reason not to is the same reason you did not write your own CI system: the work is real, the edge cases are ugly, and it is not where your reliability differentiation lives. A mature team can hand-roll managed OAuth, stand up hosted workers, wire per-tool policy enforcement, and ship a tamper-evident audit log. A few platform teams I know started down that path and concluded it was too costly to own, or simply not where they wanted to spend their reliability budget.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reducing on-call toil is where SRE leverage lives
&lt;/h2&gt;

&lt;p&gt;The outer loop has not caught up to the inner loop because the infrastructure to run agentic tools safely against production systems has been missing. A coding assistant only needs your repo and your editor. An operational assistant needs managed identity, hosted compute, enforced governance, and an audit trail, because it reaches into systems where mistakes page the CTO.&lt;/p&gt;

&lt;p&gt;The SRE teams that figure this out over the next year will pull away from the ones that do not, the same way the teams that adopted Claude Code for inner-loop work in 2024 pulled away from the teams that waited. The inner loop is solved. The outer loop is where the leverage lives now, sitting on a &lt;a href="https://clickhouse.com/blog/ai-sre-observability-architecture" rel="noopener noreferrer"&gt;data substrate that is its own design problem&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Claude Code does not replace the on-call. It just lets them start on page 5 instead of page 1.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is an AI SRE?
&lt;/h3&gt;

&lt;p&gt;An AI SRE is an AI assistant that helps site reliability engineers with operational work: incident triage, runbook execution, postmortem drafting, SLO investigation, and on-call handoffs. Most practical AI SRE deployments today run as companions that read, correlate, and draft while a human engineer decides the next move, rather than as autonomous agents that page, remediate, and close incidents on their own.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the difference between an MCP gateway and an MCP runtime?
&lt;/h3&gt;

&lt;p&gt;An MCP gateway routes MCP tools under a single URL so any MCP client can call them. An MCP runtime goes further: it adds the compute that runs the tools, managed authentication, per-tool permission enforcement, and persistent audit logs. A gateway is routing infrastructure. A runtime is production infrastructure. Arcade is an MCP runtime with a gateway inside it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can Claude Code replace an on-call engineer?
&lt;/h3&gt;

&lt;p&gt;No. Claude Code works best as a companion to the on-call engineer, not a replacement. It compresses the archaeology (pulling alerts, correlating signals, drafting summaries) so the engineer starts with context already loaded. Every decision that requires judgment (rolling back a deploy, paging a co-worker, closing an incident) stays with the human.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I use Claude Code for incident triage?
&lt;/h3&gt;

&lt;p&gt;Point Claude Code at the alert with a prompt like "Triage this alert, correlated with Datadog metrics, service logs, and deployment history. Scan Slack for correlated failures." With MCP servers for PagerDuty, Datadog, Slack, and GitHub wired into an MCP runtime, Claude Code returns a summary, the top correlated signals, candidate deploys, and a draft war room post in two to three minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is it safe to let Claude Code execute runbooks in production?
&lt;/h3&gt;

&lt;p&gt;Claude Code should not execute against production directly. The safer pattern is for Claude Code to parse the runbook, lay out the diagnostic sequence, and propose commands, while the on-call engineer runs each step in their own terminal with their own scoped credentials. Unbounded production access for any coding agent should not pass a reliability review.&lt;/p&gt;

&lt;h3&gt;
  
  
  What MCP servers do I need for AI SRE workflows?
&lt;/h3&gt;

&lt;p&gt;The core set covers the tools already in an SRE rotation: PagerDuty, Datadog, Slack, and GitHub for incident triage; Confluence and Kubernetes for runbook execution; Snowflake for SLO investigation; Zendesk for on-call handoffs. Each has a production-ready MCP server that can run inside an MCP runtime like Arcade, which handles managed auth, policies, and audit logs across all of them.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does Arcade work with Claude Code?
&lt;/h3&gt;

&lt;p&gt;Arcade is an MCP runtime that manages OAuth, per-tool permission policies, and audit logs for every tool Claude Code calls. You connect PagerDuty, Datadog, or Snowflake once, and every Claude Code session on your team picks up the tools at the right scope. Arcade also runs bring-your-own MCP servers, so existing integrations work as-is.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the difference between AI SRE tools like Traversal and using Claude Code with an MCP runtime?
&lt;/h3&gt;

&lt;p&gt;Traversal, Resolve, and Anyshift are building autonomous agents that page, remediate, and close incidents on their own. Claude Code with an MCP runtime takes the companion approach: read, correlate, draft, and wait for the engineer to decide. The companion pattern ships today. The autonomous bet does not.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does the observability store underneath matter as much as the MCP runtime above?
&lt;/h3&gt;

&lt;p&gt;Yes. An AI agent runs 10 to 30 queries per investigation, and most observability stores weren't built to serve that pattern at the retention and cardinality an SRE needs. The MCP runtime handles the execution layer; the observability store handles the cognitive substrate. Both matter. I've written about the substrate side &lt;a href="https://clickhouse.com/blog/ai-sre-observability-architecture" rel="noopener noreferrer"&gt;here&lt;/a&gt;.  &lt;/p&gt;

</description>
      <category>ai</category>
      <category>sre</category>
      <category>devops</category>
      <category>mcp</category>
    </item>
    <item>
      <title>ClickHouse Native JSON Support in 2026: A PR-by-PR Analysis</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Mon, 20 Apr 2026 20:18:38 +0000</pubDate>
      <link>https://forem.com/manveer_chawla_64a7283d5a/clickhouse-native-json-support-in-2026-a-pr-by-pr-analysis-1hdp</link>
      <guid>https://forem.com/manveer_chawla_64a7283d5a/clickhouse-native-json-support-in-2026-a-pr-by-pr-analysis-1hdp</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;TL;DR&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;ClickHouse has full native JSON support, and has since v25.3. The JSON type stores each path as a separate columnar subcolumn with native type preservation, primary key indexing, and selective path reads. It is 2,500x faster than MongoDB for aggregations on the JSONBench 1-billion-document benchmark. The narrative that "ClickHouse can't do JSON" is outdated by two years and 80+ merged PRs.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We analyzed 80+ GitHub pull requests, official ClickHouse changelogs, release blogs, and third-party benchmarks to trace the full evolution of JSON support from string-based functions through the modern native JSON type.
&lt;/li&gt;
&lt;li&gt;In 2021, the criticism had some basis. JSON was stored as opaque String blobs, queried via &lt;code&gt;JSONExtract*&lt;/code&gt; functions that required full column scans on every query. The experimental &lt;code&gt;Object('json')&lt;/code&gt; type shipped in 2022 but suffered from eager type unification, unbounded column explosion, and race conditions.
&lt;/li&gt;
&lt;li&gt;By early 2026, ClickHouse ships a production-ready native JSON type built on three foundational types (Variant, Dynamic, JSON), with configurable path limits, type hints, primary key support for JSON subcolumns, three generations of storage serialization, and a query planner that reads only the specific JSON paths your query needs. None of this requires manual schema management.
&lt;/li&gt;
&lt;li&gt;The single highest-impact storage change is advanced shared data serialization (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/83777" rel="noopener noreferrer"&gt;PR #83777&lt;/a&gt;), which delivered &lt;strong&gt;58x faster reads and 3,300x less memory&lt;/strong&gt; for selective path access by introducing per-granule metadata with path indexes.
&lt;/li&gt;
&lt;li&gt;The native JSON type stores each path as a separate Dynamic-typed subcolumn in columnar format. The result: &lt;strong&gt;2,500x faster than MongoDB&lt;/strong&gt; for aggregations, &lt;strong&gt;10x faster than Elasticsearch&lt;/strong&gt;, and &lt;strong&gt;9,000x faster than DuckDB/PostgreSQL&lt;/strong&gt; for analytics on the same dataset, according to the JSONBench benchmark on 1 billion Bluesky documents.
&lt;/li&gt;
&lt;li&gt;The JSON type reached GA in ClickHouse 25.3 (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/77785" rel="noopener noreferrer"&gt;PR #77785&lt;/a&gt;), with experimental flags removed and the type backported to the LTS release. The legacy &lt;code&gt;Object('json')&lt;/code&gt; type was fully removed in v25.11 (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85718" rel="noopener noreferrer"&gt;PR #85718&lt;/a&gt;).
&lt;/li&gt;
&lt;li&gt;Verdict: the "ClickHouse doesn't do JSON" advice referenced a system that no longer exists. The current JSON type is a ground-up columnar implementation that preserves native types, supports primary key indexing, and reads only the paths you query. Repeating the old criticism in 2026 is misinformation.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Why People Still Say "ClickHouse Has No Native JSON Support"&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;If you've evaluated ClickHouse for semi-structured data, you've heard the warnings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;"ClickHouse doesn't support JSON natively"&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"Flatten JSON into columns manually"&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"Use JSONExtract functions on String columns"&lt;/em&gt; (as the primary approach)
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"Use Object('JSON')"&lt;/em&gt; (deprecated type)
&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"No native JSON support"&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some of these started as legitimate observations circa 2021-2022. ClickHouse did store JSON as Strings. The &lt;code&gt;JSONExtract*&lt;/code&gt; functions did scan the full column. The first attempt at a native type (&lt;code&gt;Object('json')&lt;/code&gt;) did have serious architectural flaws.&lt;/p&gt;

&lt;p&gt;Others were amplified by competitors who found a convenient story: ClickHouse is fast for scans, but it can't handle semi-structured data.&lt;/p&gt;

&lt;p&gt;Then ClickHouse's engineering team spent three years building one of the most sophisticated columnar JSON implementations in any database. Over 80 significant pull requests merged. They built three new foundational types (Variant, Dynamic, JSON), three generations of storage serialization, a query planner that reads only needed subcolumns, primary key and skip index support for JSON paths, and clear migration paths from every legacy representation.&lt;/p&gt;

&lt;p&gt;This article traces that evolution with PR-level evidence. No marketing claims. No benchmarks on toy datasets. Just the commit history.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Methodology: How We Analyzed ClickHouse's JSON Type Commit History&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We went through ClickHouse's GitHub commit history, pull requests, changelogs, and release blogs from 2019 through early 2026. The scope covered every PR that touched JSON handling: type implementations, storage formats, function changes, planner optimizations, memory improvements, correctness fixes, and migration paths.&lt;/p&gt;

&lt;p&gt;Each PR was classified by category (type system, storage, functions, planner, correctness, migration), impact severity, and whether it changed default behavior. We cross-referenced PR descriptions against changelog entries and benchmark results to verify claimed improvements. Where multiple PRs addressed the same subsystem, we traced the dependency chain to understand how incremental changes compounded.&lt;/p&gt;

&lt;p&gt;The result is a ranked analysis of 80+ pull requests organized into six phases, with full provenance. Every claim in this article maps to a specific merged PR that you can verify yourself on GitHub.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;ClickHouse JSON Capabilities in 2026: What Ships by Default&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The current state, as of early 2026:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Native JSON data type (GA since v25.3):&lt;/strong&gt; Each JSON path is stored as a separate Dynamic-typed subcolumn in columnar format. Full SQL query, filter, and aggregation support on JSON fields, including nested structures and arrays (&lt;code&gt;Array(JSON)&lt;/code&gt;). Configurable &lt;code&gt;max_dynamic_paths&lt;/code&gt; (default 1024) and &lt;code&gt;max_dynamic_types&lt;/code&gt; (default 32) control resource usage. Known paths can be materialized as physical columns with type hints (&lt;code&gt;JSON(key1 UInt32, key2 String)&lt;/code&gt;), while unknown paths are automatically discovered with type inference. Path filtering via &lt;code&gt;SKIP&lt;/code&gt; and &lt;code&gt;SKIP REGEXP&lt;/code&gt; provides fine-grained schema control.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Three foundational types:&lt;/strong&gt; Variant (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/58047" rel="noopener noreferrer"&gt;PR #58047&lt;/a&gt;) provides discriminated unions. Dynamic (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/63058" rel="noopener noreferrer"&gt;PR #63058&lt;/a&gt;) extends Variant with open-ended type storage. JSON (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/66444" rel="noopener noreferrer"&gt;PR #66444&lt;/a&gt;) combines both to store semi-structured data with native type preservation.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Primary key and skip index support:&lt;/strong&gt; JSON subcolumns can appear in &lt;code&gt;ORDER BY&lt;/code&gt; and data-skipping index expressions (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/72644" rel="noopener noreferrer"&gt;PR #72644&lt;/a&gt;), enabling the same data pruning that ClickHouse applies to regular typed columns.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advanced shared data serialization:&lt;/strong&gt; Per-granule path indexes for selective reads of specific paths without scanning the entire JSON column (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/83777" rel="noopener noreferrer"&gt;PR #83777&lt;/a&gt;). Three serialization modes optimized for different access patterns.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Planner-level subcolumn optimization:&lt;/strong&gt; The query planner reads only the JSON paths referenced in your query (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/68053" rel="noopener noreferrer"&gt;PR #68053&lt;/a&gt;), pushes subcolumn requirements through CTEs and views (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/94105" rel="noopener noreferrer"&gt;PR #94105&lt;/a&gt;), and rewrites JSONExtract calls into direct subcolumn reads (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/96711" rel="noopener noreferrer"&gt;PR #96711&lt;/a&gt;).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full JSONExtract interop:&lt;/strong&gt; All JSONExtract* functions work with native JSON columns (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/96711" rel="noopener noreferrer"&gt;PR #96711&lt;/a&gt;). Introspection functions (&lt;code&gt;distinctJSONPaths&lt;/code&gt;, &lt;code&gt;distinctJSONPathsAndTypes&lt;/code&gt;) provide schema discovery from metadata alone (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/68463" rel="noopener noreferrer"&gt;PR #68463&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/92196" rel="noopener noreferrer"&gt;PR #92196&lt;/a&gt;).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Migration from every legacy format:&lt;/strong&gt; &lt;code&gt;ALTER TABLE ... MODIFY COLUMN&lt;/code&gt; converts String, Object('json'), Map, and Tuple columns to the native JSON type (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/70442" rel="noopener noreferrer"&gt;PR #70442&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/71784" rel="noopener noreferrer"&gt;PR #71784&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/71320" rel="noopener noreferrer"&gt;PR #71320&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are not experimental features behind flags. They are defaults that ship with every ClickHouse installation since v25.3.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;ClickHouse JSON Myths vs. Reality: A 2026 Checklist&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;The FUD&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Evidence Volume&lt;/th&gt;
&lt;th&gt;Reality (2026)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;"No native JSON support"&lt;/td&gt;
&lt;td&gt;False since Aug 2024&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/66444" rel="noopener noreferrer"&gt;PR #66444&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/77785" rel="noopener noreferrer"&gt;#77785&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Native JSON type stores each path as a separate columnar subcolumn. GA since v25.3.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;"Flatten JSON into columns manually"&lt;/td&gt;
&lt;td&gt;False since Aug 2024&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/66444" rel="noopener noreferrer"&gt;PR #66444&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/72644" rel="noopener noreferrer"&gt;#72644&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Automatic path flattening into Dynamic-typed subcolumns. No manual schema management.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;"Use JSONExtract on String columns"&lt;/td&gt;
&lt;td&gt;Outdated&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/96711" rel="noopener noreferrer"&gt;PR #96711&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/66444" rel="noopener noreferrer"&gt;#66444&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;JSONExtract works on native JSON columns and gets rewritten to direct subcolumn reads. No full-column scan.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;"Use Object('JSON')"&lt;/td&gt;
&lt;td&gt;Removed in v25.11&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85718" rel="noopener noreferrer"&gt;PR #85718&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/66444" rel="noopener noreferrer"&gt;#66444&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Object('json') was replaced by a ground-up redesign. The old type was fully removed in v25.11.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;"JSON queries require full column scans"&lt;/td&gt;
&lt;td&gt;False since 2024&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/68053" rel="noopener noreferrer"&gt;PR #68053&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/83777" rel="noopener noreferrer"&gt;#83777&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/94105" rel="noopener noreferrer"&gt;#94105&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Planner reads only referenced subcolumns. Advanced serialization provides per-granule path indexes. 58x faster, 3,300x less memory.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;"Can't index JSON fields"&lt;/td&gt;
&lt;td&gt;False since Dec 2024&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/72644" rel="noopener noreferrer"&gt;PR #72644&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/98886" rel="noopener noreferrer"&gt;#98886&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;JSON subcolumns in ORDER BY, primary key, and skip indexes. Bloom/text indexes on JSONAllPaths.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;"JSON types lose type information"&lt;/td&gt;
&lt;td&gt;False&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/58047" rel="noopener noreferrer"&gt;PR #58047&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/63058" rel="noopener noreferrer"&gt;#63058&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Variant/Dynamic preserve native types (UInt32, Float64, DateTime, etc.). No String collapse.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;"ClickHouse JSON is slower than document DBs"&lt;/td&gt;
&lt;td&gt;False&lt;/td&gt;
&lt;td&gt;JSONBench (1B docs)&lt;/td&gt;
&lt;td&gt;2,500x faster than MongoDB. 10x faster than Elasticsearch. 9,000x faster than DuckDB/PostgreSQL.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;"No schema discovery for JSON"&lt;/td&gt;
&lt;td&gt;False since Aug 2024&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/68463" rel="noopener noreferrer"&gt;PR #68463&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/92196" rel="noopener noreferrer"&gt;#92196&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;distinctJSONPaths()&lt;/code&gt; and &lt;code&gt;distinctJSONPathsAndTypes()&lt;/code&gt; read metadata only. Instant schema views.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;"Can't migrate existing JSON String columns"&lt;/td&gt;
&lt;td&gt;False since Oct 2024&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/70442" rel="noopener noreferrer"&gt;PR #70442&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/71784" rel="noopener noreferrer"&gt;#71784&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/71320" rel="noopener noreferrer"&gt;#71320&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;ALTER TABLE converts String, Object, Map, and Tuple to native JSON. Background merge conversion.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Phase 1: ClickHouse JSON Functions and String Storage (2019-2021)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The FUD:&lt;/strong&gt; &lt;em&gt;"Use JSONExtract functions on String columns"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In this era, the criticism was fair. ClickHouse stored JSON as opaque String blobs, and every JSON query required parsing the entire string value.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;ClickHouse JSONExtract Functions: simdjson-Powered but CPU-Heavy (May 2019)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/5235" rel="noopener noreferrer"&gt;PR #5235&lt;/a&gt; introduced the &lt;code&gt;JSONExtract*&lt;/code&gt; function family, powered by simdjson with a RapidJSON fallback. This was a meaningful step: SIMD instructions allowed structural element identification at near-memory-bandwidth speeds.&lt;/p&gt;

&lt;p&gt;But the fundamental limitation remained. Every query, no matter which field it accessed, required scanning and parsing the full JSON string column. There was no way to read just &lt;code&gt;event.user_id&lt;/code&gt; without also reading &lt;code&gt;event.metadata&lt;/code&gt;, &lt;code&gt;event.payload&lt;/code&gt;, and every other field.&lt;/p&gt;

&lt;p&gt;ClickHouse provided two function families with different trade-offs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;simpleJSON&lt;/code&gt; / &lt;code&gt;visitParam&lt;/code&gt;: Minimalist heuristic parsing with low CPU overhead, but strict assumptions about canonical encoding and no nested object support.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;JSONExtract*&lt;/code&gt;: Full simdjson-powered parsing with standards-compliant extraction, but high per-row CPU cost from full document parsing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Neither approach could avoid the core problem: 100% column scan for every query.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;SQL/JSON Standard Functions (Mid-2021)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/24148" rel="noopener noreferrer"&gt;PR #24148&lt;/a&gt; added &lt;code&gt;JSON_VALUE&lt;/code&gt;, &lt;code&gt;JSON_QUERY&lt;/code&gt;, and &lt;code&gt;JSON_EXISTS&lt;/code&gt; with JSONPath expression support, bringing ClickHouse closer to SQL/JSON standard compliance. This improved SQL compatibility but did not change the underlying storage model. JSON was still strings.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Map(String, String): A Partial Improvement&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;Map(String, String)&lt;/code&gt; type offered some improvement by storing JSON key-value pairs natively, eliminating the need for string parsing on every access. But it still required reading all keys to find one entry, and it lost all type information by collapsing everything to strings.&lt;/p&gt;

&lt;p&gt;By the end of 2021, ClickHouse had capable JSON parsing functions but no native JSON storage. The gap was real, and the engineering team knew it.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Phase 2: ClickHouse Object('json') Type -- What Went Wrong (2022)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The FUD:&lt;/strong&gt; &lt;em&gt;"Use Object('JSON')"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/23932" rel="noopener noreferrer"&gt;PR #23932&lt;/a&gt;, merged March 2022 by Anton Popov, was the first attempt at native columnar JSON storage. It shipped in ClickHouse 22.3 LTS under &lt;code&gt;allow_experimental_object_type&lt;/code&gt;. The implementation spanned 101 commits and proved a critical concept: JSON could be stored with each path as a separate subcolumn.&lt;/p&gt;

&lt;p&gt;But it had serious architectural flaws:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Challenge&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;th&gt;Consequence&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Eager Type Unification&lt;/td&gt;
&lt;td&gt;Mixed types at a path collapsed to String&lt;/td&gt;
&lt;td&gt;Lost native type optimizations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Metadata Explosion&lt;/td&gt;
&lt;td&gt;High memory for many unique keys&lt;/td&gt;
&lt;td&gt;System instability with high-cardinality JSON&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Race Conditions&lt;/td&gt;
&lt;td&gt;Inconsistent results during merges&lt;/td&gt;
&lt;td&gt;Unreliable query analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema Rigidity&lt;/td&gt;
&lt;td&gt;Inability to handle type changes&lt;/td&gt;
&lt;td&gt;Required manual ALTER or table rewrites&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No Primary Key Support&lt;/td&gt;
&lt;td&gt;JSON paths excluded from ORDER BY&lt;/td&gt;
&lt;td&gt;No data pruning on JSON fields&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Despite these flaws, &lt;code&gt;Object('json')&lt;/code&gt; validated the demand for native JSON storage and identified every architectural challenge the replacement would need to solve.&lt;/p&gt;

&lt;p&gt;Alongside the type work, ClickHouse continued improving JSON ecosystem support. &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/40910" rel="noopener noreferrer"&gt;PR #40910&lt;/a&gt; introduced the &lt;code&gt;JSONObjectEachRow&lt;/code&gt; format for keyed JSON objects. &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/39186" rel="noopener noreferrer"&gt;PR #39186&lt;/a&gt; added automatic type inference from JSON strings, detecting dates, datetimes, and integers by default. &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/54427" rel="noopener noreferrer"&gt;PR #54427&lt;/a&gt; enabled schema inference of JSON objects as named Tuples.&lt;/p&gt;

&lt;p&gt;These format and inference improvements meant ClickHouse was getting better at ingesting JSON data. What it still lacked was a sound way to store it.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Phase 3: ClickHouse Variant and Dynamic Types -- The JSON Foundation (2024)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The FUD:&lt;/strong&gt; &lt;em&gt;"JSON types lose type information"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Rather than patching Object('json'), ClickHouse built from first principles. The redesign started with two new foundational types that solved the type-preservation problem that had plagued the original implementation.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Variant: Discriminated Union Type (January 2024)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/58047" rel="noopener noreferrer"&gt;PR #58047&lt;/a&gt;, by Pavel Kruglov, introduced &lt;code&gt;Variant(T1, T2, ..., TN)&lt;/code&gt;, a discriminated union storing values of different types in a single column. It uses a UInt8 discriminator column plus dense subcolumns per type variant, supporting up to 255 variants. The PR included 47 commits and roughly 5,000 lines of tests.&lt;/p&gt;

&lt;p&gt;This solved the type-unification problem that killed Object('json'). Instead of collapsing &lt;code&gt;42&lt;/code&gt; and &lt;code&gt;"hello"&lt;/code&gt; at the same path into String, Variant stores them in their native types with a discriminator indicating which type each row contains.&lt;/p&gt;

&lt;p&gt;A follow-up optimization (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/62774" rel="noopener noreferrer"&gt;PR #62774&lt;/a&gt;) introduced compact discriminator serialization: when all discriminators in a granule are the same type (the common case for JSON paths), it stores 3 values instead of 8,192. This is highly effective in practice since most JSON paths have homogeneous types within a granule.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Dynamic: Open-Ended Type Storage (May 2024)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/63058" rel="noopener noreferrer"&gt;PR #63058&lt;/a&gt;, also by Pavel Kruglov, extended Variant with an open, self-describing type set. Dynamic has a &lt;code&gt;max_types&lt;/code&gt; parameter (default 32); the most frequent types get their own Variant slots, and overflow types are stored in a SharedVariant as binary-encoded strings. This provided the flexibility that JSON demands without the unbounded explosion that doomed Object('json').&lt;/p&gt;

&lt;p&gt;The PR included 39 commits and introduced the &lt;code&gt;dynamicType()&lt;/code&gt; introspection function. A &lt;code&gt;dynamic_structure.bin&lt;/code&gt; metadata file per data part tracks the type composition.&lt;/p&gt;

&lt;p&gt;These two types, Variant and Dynamic, were the architectural foundation. The JSON type would combine them both.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Phase 4: ClickHouse Native JSON Type Implementation (2024)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The FUD:&lt;/strong&gt; &lt;em&gt;"ClickHouse doesn't support JSON natively"&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How ClickHouse Implemented the Native JSON Data Type: PR #66444 (August 2024)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/66444" rel="noopener noreferrer"&gt;PR #66444&lt;/a&gt; is the single most important commit in ClickHouse's JSON evolution. Authored by Pavel Kruglov, it implements the entirely new JSON data type in 91 commits, closing &lt;a href="https://github.com/ClickHouse/ClickHouse/issues/54864" rel="noopener noreferrer"&gt;RFC #54864&lt;/a&gt; ("Semistructured Columns") authored by Alexey Milovidov.&lt;/p&gt;

&lt;p&gt;The design works as follows. JSON paths are flattened into individual Dynamic-typed subcolumns, each stored in separate column files per data part. Paths exceeding &lt;code&gt;max_dynamic_paths&lt;/code&gt; (default 1024) overflow into a shared data structure. The type supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Full SQL support:&lt;/strong&gt; Query, filter, and aggregate on any JSON field using standard SQL. Nested structures and arrays are first-class citizens.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configurable limits:&lt;/strong&gt; &lt;code&gt;max_dynamic_paths&lt;/code&gt; (default 1024) and &lt;code&gt;max_dynamic_types&lt;/code&gt; (default 32) control resource usage
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Materialized known paths:&lt;/strong&gt; Type hints like &lt;code&gt;JSON(key1 UInt32, key2 String)&lt;/code&gt; materialize known paths as physical typed columns for maximum performance, while unknown paths are automatically created with type inference as they are discovered
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Path filtering:&lt;/strong&gt; &lt;code&gt;SKIP&lt;/code&gt; and &lt;code&gt;SKIP REGEXP&lt;/code&gt; to exclude noisy paths from columnar storage
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dot-notation access:&lt;/strong&gt; &lt;code&gt;json.a.b&lt;/code&gt; for direct path reads
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sub-object access:&lt;/strong&gt; &lt;code&gt;json.^prefix&lt;/code&gt; for extracting JSON subtrees
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Array(JSON) support:&lt;/strong&gt; Nested structures and arrays of JSON documents
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Efficient data skipping on dynamic paths:&lt;/strong&gt; JSON subcolumns in primary keys and skip indexes enable granule-level pruning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;First shipped in ClickHouse 24.8 LTS under &lt;code&gt;allow_experimental_json_type&lt;/code&gt;. The official blog post "How we built a new powerful JSON data type for ClickHouse" (October 2024) detailed the architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;20x Memory Reduction for Inserts (September 2024)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/69272" rel="noopener noreferrer"&gt;PR #69272&lt;/a&gt; addressed a critical production concern: memory consumption during JSON inserts. Before this PR, inserting JSON data consumed 6.99 GiB of memory. After, 354 MiB. A 20x reduction.&lt;/p&gt;

&lt;p&gt;The fix was adaptive write buffer sizing. Buffers start at 16 KiB and grow exponentially to a maximum of 1 MiB, selectively enabled for dynamic substreams. S3 inserts improved from 23.13 GiB to 7.65 GiB. No throughput regression.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;ALTER String to JSON + Serialization V2 (October 2024)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/70442" rel="noopener noreferrer"&gt;PR #70442&lt;/a&gt; delivered two major changes. First, &lt;code&gt;ALTER TABLE ... MODIFY COLUMN col JSON&lt;/code&gt; to convert existing String columns to the JSON type. Conversion happens during background merges, so there is no downtime. Second, Serialization V2 for JSON and Dynamic types with an improved binary layout.&lt;/p&gt;

&lt;p&gt;This was the beginning of clear migration paths. Teams no longer had to reimport data to adopt native JSON.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;JSONExtract Refactoring for Native JSON (July 2024)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/66046" rel="noopener noreferrer"&gt;PR #66046&lt;/a&gt; refactored the JSONExtract function family to work with the new type, splitting the implementation into reusable &lt;code&gt;JSONExtractTree.h/cpp&lt;/code&gt; components and adding Dynamic type support. This ensured that existing queries using JSONExtract would continue to work when columns migrated to native JSON.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Introspection Functions (August 2024)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/68463" rel="noopener noreferrer"&gt;PR #68463&lt;/a&gt; added &lt;code&gt;distinctDynamicTypes()&lt;/code&gt;, &lt;code&gt;distinctJSONPaths()&lt;/code&gt;, and &lt;code&gt;distinctJSONPathsAndTypes()&lt;/code&gt;. These are essential schema discovery tools for semi-structured data. They were later optimized in &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/92196" rel="noopener noreferrer"&gt;PR #92196&lt;/a&gt; to read only metadata files instead of scanning actual data, making schema diversity views effectively instant.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Subcolumn Optimization Enabled by Default (August 2024)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/68053" rel="noopener noreferrer"&gt;PR #68053&lt;/a&gt; enabled &lt;code&gt;optimize_functions_to_subcolumns&lt;/code&gt; by default. This planner optimization rewrites function calls to read only the specific subcolumns needed, which is transformative for JSON queries. A query accessing &lt;code&gt;json.user.id&lt;/code&gt; reads only that subcolumn's data, not the entire JSON column.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Beta Promotion (November 2024)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/72294" rel="noopener noreferrer"&gt;PR #72294&lt;/a&gt; moved JSON, Dynamic, and Variant to beta status, backported to 24.11. This signaled production readiness for early adopters.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;JSON Subcolumns in Primary Key and Skip Indexes (December 2024)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/72644" rel="noopener noreferrer"&gt;PR #72644&lt;/a&gt; was a milestone for performance. It enabled JSON subcolumns (&lt;code&gt;json.path.to.key&lt;/code&gt;) in &lt;code&gt;ORDER BY&lt;/code&gt; expressions and data-skipping index definitions. This means ClickHouse applies the same data pruning to JSON fields that it applies to regular typed columns.&lt;/p&gt;

&lt;p&gt;The JSONBench benchmark uses this capability for sub-second queries over 1 billion documents. Without it, JSON columns could not participate in ClickHouse's primary mechanism for reducing scan ranges.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Migration Paths from Every Legacy Format&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;By the end of 2024, clear migration routes existed for every semi-structured representation:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source Type&lt;/th&gt;
&lt;th&gt;Migration Method&lt;/th&gt;
&lt;th&gt;PR&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;String&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ALTER TABLE ... MODIFY COLUMN ... JSON&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/70442" rel="noopener noreferrer"&gt;#70442&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Background merge conversion; &lt;code&gt;ALTER UPDATE&lt;/code&gt; for immediate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Map(String, String)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;CAST(col AS JSON)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/71320" rel="noopener noreferrer"&gt;#71320&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Serialize-then-parse roundtrip&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Tuple&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;CAST(col AS JSON)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/71320" rel="noopener noreferrer"&gt;#71320&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Serialize-then-parse roundtrip&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Object('json')&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ALTER TABLE ... MODIFY COLUMN ... JSON&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/71784" rel="noopener noreferrer"&gt;#71784&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Must complete before upgrading past v25.11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;JSON(params_A)&lt;/code&gt; to &lt;code&gt;JSON(params_B)&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;CAST&lt;/code&gt; or &lt;code&gt;ALTER&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/72303" rel="noopener noreferrer"&gt;#72303&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Change max_dynamic_paths, SKIP rules, type hints&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Phase 5: ClickHouse JSON Reaches GA -- Performance and Storage Optimizations (2025)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The FUD:&lt;/strong&gt; &lt;em&gt;"JSON queries require full column scans"&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;ClickHouse JSON Production-Ready: GA in v25.3 (March 2025)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/77785" rel="noopener noreferrer"&gt;PR #77785&lt;/a&gt;, authored by Alexey Milovidov and expanded by Pavel Kruglov, removed all experimental and beta gates for JSON, Dynamic, and Variant. The commit message references &lt;a href="https://jsonbench.com/" rel="noopener noreferrer"&gt;https://jsonbench.com/&lt;/a&gt;. Backported to ClickHouse 25.3 LTS via cherry-pick PRs &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/77974" rel="noopener noreferrer"&gt;#77974&lt;/a&gt; and &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/77975" rel="noopener noreferrer"&gt;#77975&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The 25.3 release blog stated: "About 1.5 years ago, we weren't happy with our JSON implementation, so we returned to the drawing board."&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;63x Memory Reduction for Read Prefetches (March 2025)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/77640" rel="noopener noreferrer"&gt;PR #77640&lt;/a&gt; addressed memory consumption during read-ahead prefetches of JSON columns in Wide parts. Before: &lt;code&gt;SELECT * WHERE y=1&lt;/code&gt; on 1 million rows with 1,000 JSON paths consumed 69.16 GiB peak memory. After: 1.11 GiB. A 63x reduction.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4-10x Faster S3 Reads (February 2025)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/74827" rel="noopener noreferrer"&gt;PR #74827&lt;/a&gt; introduced prefetches for subcolumn prefix deserialization, a cache for deserialized prefixes, and parallel prefix deserialization for JSON columns on S3. The result: 4x faster full scans and roughly 10x faster &lt;code&gt;LIMIT 10&lt;/code&gt; queries on remote storage. This introduced &lt;code&gt;MergeTreePrefixesDeserializationThreadPool&lt;/code&gt; and benefits any remote filesystem with similar latency characteristics.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;58x Faster Selective Reads: Advanced Shared Data Serialization (August 2025)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/83777" rel="noopener noreferrer"&gt;PR #83777&lt;/a&gt; is the most impactful storage optimization in the JSON type's history. It introduced three serialization modes for shared data (the overflow storage for paths beyond &lt;code&gt;max_dynamic_paths&lt;/code&gt;):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Read Latency&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;th&gt;Write Cost&lt;/th&gt;
&lt;th&gt;Ideal Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;map&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;High for subcolumns&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Writing data, reading whole JSON&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;map_with_buckets&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Balanced workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;advanced&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Low for subcolumns&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Reading specific paths&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;advanced&lt;/code&gt; mode creates per-granule &lt;code&gt;.structure&lt;/code&gt;, &lt;code&gt;.data&lt;/code&gt;, and &lt;code&gt;.paths_marks&lt;/code&gt; files with a path index that enables direct lookup of specific paths without scanning the entire shared data structure.&lt;/p&gt;

&lt;p&gt;The benchmarks speak for themselves. Reading a single key from 200,000 rows with 10,000 unique paths improved from 3.63s / 12.53 GiB to 0.063s / 3.89 MiB. That is &lt;strong&gt;58x faster and 3,300x less memory&lt;/strong&gt;. For Compact parts, non-existing key reads improved from 3.4s to 0.3s (roughly 11x faster), memory from 517 MiB to 3.7 MiB (roughly 140x reduction).&lt;/p&gt;

&lt;p&gt;This PR contained 47 commits and is documented in the official ClickHouse blog "Making complex JSON 58x faster, use 3,300x less memory."&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Substream Marks in Compact Parts (March 2025)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/77940" rel="noopener noreferrer"&gt;PR #77940&lt;/a&gt; added marks for individual substreams within compact parts, extending selective subcolumn read efficiency to the compact storage format. Previously, reading any subcolumn from a compact part required reading the entire part.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Experimental Settings Obsoleted (v25.8)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85934" rel="noopener noreferrer"&gt;PR #85934&lt;/a&gt; marked the experimental and beta JSON settings as obsolete. JSON was now unconditionally enabled.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Legacy Object('json') Fully Removed (November 2025)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85718" rel="noopener noreferrer"&gt;PR #85718&lt;/a&gt; removed the deprecated &lt;code&gt;Object('json')&lt;/code&gt; implementation entirely. 270 files changed. &lt;code&gt;ColumnObjectDeprecated&lt;/code&gt;, &lt;code&gt;DataTypeObjectDeprecated&lt;/code&gt;, deprecated serialization files, the &lt;code&gt;JSONDataParser&lt;/code&gt;, and all legacy tests were deleted. This was backward-incompatible: any tables or queries referencing &lt;code&gt;Object('json')&lt;/code&gt; must be migrated before upgrading past v25.11.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Phase 6: ClickHouse JSON Query Planner and JSONExtract Interop (2026)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The FUD:&lt;/strong&gt; &lt;em&gt;"JSONExtract on String columns is the primary approach"&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;ClickHouse JSONExtract Now Works with Native JSON Columns (2026)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/96711" rel="noopener noreferrer"&gt;PR #96711&lt;/a&gt;, by Fisnik Kastrati, extended all JSONExtract*, JSONHas, JSONLength, and JSONType functions to accept native JSON columns directly. More importantly, it introduced a &lt;code&gt;FunctionToSubcolumnsPass&lt;/code&gt; planner optimization that rewrites constant-path JSONExtract calls into direct subcolumn reads.&lt;/p&gt;

&lt;p&gt;This means existing queries that use &lt;code&gt;JSONExtractString(json_col, 'user', 'name')&lt;/code&gt; now bypass text parsing entirely when the column is a native JSON type. The planner rewrites the call to a direct subcolumn read of &lt;code&gt;json_col.user.name&lt;/code&gt;. This closed &lt;a href="https://github.com/ClickHouse/ClickHouse/issues/88370" rel="noopener noreferrer"&gt;issue #88370&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Optimized has(JSON, path) Function (2026)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/96927" rel="noopener noreferrer"&gt;PR #96927&lt;/a&gt; added an optimized &lt;code&gt;has(json_col, 'path')&lt;/code&gt; function for fast path-existence checks without text parsing. This is essential for queries that filter based on whether a JSON path exists.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;SubcolumnPushdownPass in Query Planner (January 2026)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/94105" rel="noopener noreferrer"&gt;PR #94105&lt;/a&gt; introduced &lt;code&gt;SubcolumnPushdownPass&lt;/code&gt;, which pushes subcolumn requirements through CTEs and views. This means wrapping a JSON table in a view or CTE no longer defeats subcolumn optimizations.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Skip Indexes on JSONAllPaths (April 2026)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/98886" rel="noopener noreferrer"&gt;PR #98886&lt;/a&gt; enabled bloom and text skip indexes on &lt;code&gt;JSONAllPaths()&lt;/code&gt;, allowing efficient filtering on JSON key presence. This gives ClickHouse the ability to skip entire granules when querying for documents that contain (or don't contain) specific paths.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;SIMD Tokenizer Refactoring (2026)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/97871" rel="noopener noreferrer"&gt;PR #97871&lt;/a&gt;, by Amos Bird, refactored the tokenizer to a SIMD-ready stateful API, replacing the older iterator API. This lays the groundwork for continued parsing performance improvements.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Combined Subcolumn Access (2026)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/98788" rel="noopener noreferrer"&gt;PR #98788&lt;/a&gt; introduced a unified combined subcolumn that returns Dynamic for both scalar and object access at a path. This simplifies queries that need to handle paths where the value might be a scalar or a nested object.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;ClickHouse JSON Performance Benchmarks: MongoDB, Elasticsearch, and DuckDB Compared&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The JSON type's performance has been validated by both ClickHouse's internal benchmarks and independent third-party testing. The numbers come from specific, verifiable sources.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;JSONBench: 1 Billion Bluesky Documents (January 2025)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The JSONBench benchmark (&lt;a href="https://jsonbench.com/" rel="noopener noreferrer"&gt;https://jsonbench.com/&lt;/a&gt;) tested the native JSON type against other databases on 1 billion Bluesky social media documents on a single m6i.8xlarge node:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Comparison&lt;/th&gt;
&lt;th&gt;ClickHouse Advantage&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;vs MongoDB aggregations&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2,500x faster&lt;/strong&gt; (405ms vs ~16 min)&lt;/td&gt;
&lt;td&gt;JSONBench&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;vs Elasticsearch aggregations&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;10x faster&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;JSONBench&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;vs DuckDB/PostgreSQL analytics&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;9,000x faster&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;JSONBench&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage vs compressed files&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;20% more compact&lt;/strong&gt; (same algorithm)&lt;/td&gt;
&lt;td&gt;JSONBench&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage vs MongoDB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;40% more efficient&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;JSONBench&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Peak memory (1B document count)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&amp;lt; 3 MiB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;JSONBench&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A follow-up benchmark in March 2025 scaled to 4 billion+ documents (1.6 TiB), achieving 91.84 million docs/sec throughput with sub-100ms queries.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Storage and Memory Improvements&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Optimization&lt;/th&gt;
&lt;th&gt;PR&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Insert memory&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/69272" rel="noopener noreferrer"&gt;#69272&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;6.99 GiB&lt;/td&gt;
&lt;td&gt;354 MiB&lt;/td&gt;
&lt;td&gt;20x reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 insert memory&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/69272" rel="noopener noreferrer"&gt;#69272&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;23.13 GiB&lt;/td&gt;
&lt;td&gt;7.65 GiB&lt;/td&gt;
&lt;td&gt;3x reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read prefetch memory&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/77640" rel="noopener noreferrer"&gt;#77640&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;69.16 GiB&lt;/td&gt;
&lt;td&gt;1.11 GiB&lt;/td&gt;
&lt;td&gt;63x reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Selective read latency&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/83777" rel="noopener noreferrer"&gt;#83777&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;3.63s&lt;/td&gt;
&lt;td&gt;0.063s&lt;/td&gt;
&lt;td&gt;58x faster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Selective read memory&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/83777" rel="noopener noreferrer"&gt;#83777&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;12.53 GiB&lt;/td&gt;
&lt;td&gt;3.89 MiB&lt;/td&gt;
&lt;td&gt;3,300x reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 full scan&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/74827" rel="noopener noreferrer"&gt;#74827&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Baseline&lt;/td&gt;
&lt;td&gt;4x faster&lt;/td&gt;
&lt;td&gt;4x improvement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3 LIMIT 10&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/74827" rel="noopener noreferrer"&gt;#74827&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Baseline&lt;/td&gt;
&lt;td&gt;~10x faster&lt;/td&gt;
&lt;td&gt;~10x improvement&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Third-Party Validation&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;SigNoz, a ClickHouse-based observability platform, reported 30% faster log queries with the native JSON type. ClickHouse's own observability stack (ClickStack) demonstrated 9x faster queries compared to the previous Map-based approach for OpenTelemetry log attributes.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;ClickHouse JSON for OpenTelemetry and Log Analytics: A Real-World Use Case&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The observability domain is where the JSON type's impact is most visible. Before native JSON, log management solutions built on ClickHouse flattened attributes into &lt;code&gt;Map(String, String)&lt;/code&gt; columns, losing type information. Queries like &lt;code&gt;SUM(LogAttributes.response_size)&lt;/code&gt; required explicit casts on every access.&lt;/p&gt;

&lt;p&gt;With the native JSON type, OpenTelemetry log attributes preserve their native types. The performance difference:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Legacy (Map)&lt;/th&gt;
&lt;th&gt;Modern (JSON)&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;I/O Efficiency&lt;/td&gt;
&lt;td&gt;Read entire column&lt;/td&gt;
&lt;td&gt;Read specific path subcolumns&lt;/td&gt;
&lt;td&gt;Reduced disk I/O&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory Footprint&lt;/td&gt;
&lt;td&gt;High (String parsing)&lt;/td&gt;
&lt;td&gt;Low (Columnar access)&lt;/td&gt;
&lt;td&gt;Lower peak memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema Migration&lt;/td&gt;
&lt;td&gt;Manual ALTER TABLE&lt;/td&gt;
&lt;td&gt;Fully Automatic&lt;/td&gt;
&lt;td&gt;Simplified operations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aggregation Speed&lt;/td&gt;
&lt;td&gt;Slow (Cast required)&lt;/td&gt;
&lt;td&gt;Native (No cast)&lt;/td&gt;
&lt;td&gt;Up to 10x faster queries&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;ClickHouse JSON Limitations and Trade-offs in 2026&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Fairness matters. A few things still require awareness:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Path explosion requires attention.&lt;/strong&gt; Without appropriate &lt;code&gt;max_dynamic_paths&lt;/code&gt; settings and SKIP rules, high-cardinality JSON (thousands of unique paths per document) can create many subcolumns. Set limits that match your schema shape, and use &lt;code&gt;SKIP REGEXP&lt;/code&gt; for noisy paths.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subcolumn pruning through &lt;code&gt;SELECT *&lt;/code&gt; in CTEs is not yet supported.&lt;/strong&gt; &lt;a href="https://github.com/ClickHouse/ClickHouse/issues/92455" rel="noopener noreferrer"&gt;Issue #92455&lt;/a&gt; documents this gap. Explicitly name columns in CTEs over JSON tables for now.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Legacy Object('json') migration is mandatory.&lt;/strong&gt; &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85718" rel="noopener noreferrer"&gt;PR #85718&lt;/a&gt; enforces a hard removal. Post-upgrade to v25.12+, any tables or queries referencing Object('json') will fail. Audit schemas and run ALTER before upgrading past v25.11.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Correctness fixes are ongoing.&lt;/strong&gt; Edge cases in JSONExtract interop (&lt;a href="https://github.com/ClickHouse/ClickHouse/issues/102018" rel="noopener noreferrer"&gt;issue #102018&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/issues/102079" rel="noopener noreferrer"&gt;#102079&lt;/a&gt;), default value handling (&lt;a href="https://github.com/clickhouse/clickhouse/issues/101721" rel="noopener noreferrer"&gt;issue #101721&lt;/a&gt;), and specific format combinations (&lt;a href="https://github.com/ClickHouse/ClickHouse/issues/101911" rel="noopener noreferrer"&gt;issue #101911&lt;/a&gt;) show that a system this complex requires staying on the latest stable release.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advanced shared data mode trades write cost for read performance.&lt;/strong&gt; The per-granule path indexes that enable 58x faster reads add write overhead. For write-heavy workloads with infrequent selective reads, the simpler &lt;code&gt;map&lt;/code&gt; mode may be more appropriate.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Type hints and path configuration require understanding your data.&lt;/strong&gt; The defaults work well for moderate schemas (up to 1,024 unique paths). Workloads with tens of thousands of unique paths need tuning.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are real engineering trade-offs, and understanding them is part of making an informed decision.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;ClickHouse JSON Evolution Timeline (2019-2026)&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Year&lt;/th&gt;
&lt;th&gt;What Changed&lt;/th&gt;
&lt;th&gt;Key PRs&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2019&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;JSONExtract function family with simdjson&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/5235" rel="noopener noreferrer"&gt;#5235&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;SIMD-accelerated extraction from String columns. Full column scan required.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2021&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;SQL/JSON standard functions&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/24148" rel="noopener noreferrer"&gt;#24148&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;JSONPath support. Still string-based storage.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2022&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Object('json') first attempt. Format ecosystem.&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/23932" rel="noopener noreferrer"&gt;#23932&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/40910" rel="noopener noreferrer"&gt;#40910&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/39186" rel="noopener noreferrer"&gt;#39186&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Proved columnar JSON concept. Architectural flaws identified. Schema inference improved.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2024 H1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Variant and Dynamic types. Compact discriminators.&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/58047" rel="noopener noreferrer"&gt;#58047&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/63058" rel="noopener noreferrer"&gt;#63058&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/62774" rel="noopener noreferrer"&gt;#62774&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Type-preserving foundation built. Efficient storage for homogeneous granules.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2024 H2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Native JSON type. 20x insert memory. Serialization V2. Primary key support. Beta.&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/66444" rel="noopener noreferrer"&gt;#66444&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/69272" rel="noopener noreferrer"&gt;#69272&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/70442" rel="noopener noreferrer"&gt;#70442&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/72644" rel="noopener noreferrer"&gt;#72644&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/72294" rel="noopener noreferrer"&gt;#72294&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Complete native JSON type ships. Migration paths established. JSON paths in ORDER BY.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2025&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GA. 63x read memory. 58x selective reads. S3 optimization. Legacy removal.&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/77785" rel="noopener noreferrer"&gt;#77785&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/77640" rel="noopener noreferrer"&gt;#77640&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/83777" rel="noopener noreferrer"&gt;#83777&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/74827" rel="noopener noreferrer"&gt;#74827&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85718" rel="noopener noreferrer"&gt;#85718&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Production-ready. Advanced shared data. 2,500x vs MongoDB. Legacy Object removed.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2026&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;JSONExtract interop. Planner intelligence. Skip indexes on paths.&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/96711" rel="noopener noreferrer"&gt;#96711&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/94105" rel="noopener noreferrer"&gt;#94105&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/98886" rel="noopener noreferrer"&gt;#98886&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/96927" rel="noopener noreferrer"&gt;#96927&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Full function compatibility. Subcolumn pushdown through CTEs. Bloom indexes on JSON paths.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;When Should You Use the Native JSON Type in ClickHouse?&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workload&lt;/th&gt;
&lt;th&gt;Verdict&lt;/th&gt;
&lt;th&gt;Reasoning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Log and event analytics with semi-structured attributes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Native type preserves types, subcolumn reads minimize I/O, primary key support enables data pruning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenTelemetry / observability data&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Purpose-built for this. ClickStack validates 9x faster queries vs Map approach&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSON documents with known high-value fields&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Use type hints for critical paths, SKIP rules for noisy paths, ORDER BY on key fields&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema-on-read analytics over heterogeneous JSON&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Dynamic type handles mixed schemas. &lt;code&gt;distinctJSONPaths()&lt;/code&gt; provides instant schema discovery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Migrating from MongoDB/Elasticsearch for analytics&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;2,500x faster aggregations (MongoDB), 10x faster (Elasticsearch). Clear migration via ALTER&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSON with 10,000+ unique paths per document&lt;/td&gt;
&lt;td&gt;Depends&lt;/td&gt;
&lt;td&gt;Set appropriate &lt;code&gt;max_dynamic_paths&lt;/code&gt;. Use SKIP REGEXP for noisy paths. Advanced shared data helps but requires tuning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write-heavy JSON ingestion with rare reads&lt;/td&gt;
&lt;td&gt;Depends&lt;/td&gt;
&lt;td&gt;Simpler serialization modes (&lt;code&gt;map&lt;/code&gt;) may be more appropriate than &lt;code&gt;advanced&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Existing String/Map JSON columns&lt;/td&gt;
&lt;td&gt;Yes, migrate&lt;/td&gt;
&lt;td&gt;ALTER TABLE converts in background. No downtime. Immediate query performance improvement&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;How to Respond When Someone Says "ClickHouse Doesn't Support JSON"&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Run the PR and &lt;a href="https://clickhouse.com/blog/json-bench-clickhouse-vs-mongodb-elasticsearch-duckdb-postgresql" rel="noopener noreferrer"&gt;benchmark&lt;/a&gt; numbers.&lt;/p&gt;

&lt;p&gt;When someone tells you ClickHouse can't handle JSON in 2026, ask them if they've tested against a version that includes the native JSON type (GA since v25.3, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/77785" rel="noopener noreferrer"&gt;PR #77785&lt;/a&gt;), primary key support for JSON subcolumns (v24.12, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/72644" rel="noopener noreferrer"&gt;PR #72644&lt;/a&gt;), advanced shared data serialization (v25.8, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/83777" rel="noopener noreferrer"&gt;PR #83777&lt;/a&gt;), or JSONExtract interop with native JSON columns (v26.2, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/96711" rel="noopener noreferrer"&gt;PR #96711&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;If they're referencing &lt;code&gt;Object('json')&lt;/code&gt;, that type was removed in v25.11. If they're recommending JSONExtract on String columns as the primary approach, the native JSON type has made that unnecessary since v24.8. If they're telling you to flatten JSON into columns manually, the type does this automatically with configurable limits and type hints.&lt;/p&gt;

&lt;p&gt;The commit history doesn't lie. 80+ pull requests. Three foundational types. Three generations of storage serialization. Primary key indexing. Planner-level subcolumn optimization. 2,500x faster than MongoDB on real-world data.&lt;/p&gt;

&lt;p&gt;ClickHouse's JSON implementation in 2026 bears no resemblance to the string-based functions and experimental Object type that earned those early warnings. The engineers built a ground-up columnar JSON storage system, and the evidence is in the PRs.&lt;/p&gt;

&lt;p&gt;Test it on your workload. That's the only benchmark that matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;ClickHouse JSON FAQ&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Does ClickHouse support JSON natively in 2026?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Yes. ClickHouse's native JSON type (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/66444" rel="noopener noreferrer"&gt;PR #66444&lt;/a&gt;) stores each JSON path as a separate Dynamic-typed subcolumn in columnar format. It reached GA in v25.3 (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/77785" rel="noopener noreferrer"&gt;PR #77785&lt;/a&gt;) with all experimental flags removed. The legacy Object('json') type was fully removed in v25.11 (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85718" rel="noopener noreferrer"&gt;PR #85718&lt;/a&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What is the most impactful ClickHouse JSON optimization?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Advanced shared data serialization (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/83777" rel="noopener noreferrer"&gt;PR #83777&lt;/a&gt;), which delivers 58x faster reads and 3,300x less memory for selective path access. For insert workloads, adaptive write buffers (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/69272" rel="noopener noreferrer"&gt;PR #69272&lt;/a&gt;) with 20x memory reduction are equally important.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;ClickHouse vs MongoDB vs Elasticsearch for JSON: Which Is Faster?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;On the JSONBench benchmark (1 billion Bluesky documents, single node), ClickHouse with the native JSON type is 2,500x faster than MongoDB for aggregations, 10x faster than Elasticsearch, and 9,000x faster than DuckDB/PostgreSQL for analytics. Storage is 20% more compact than compressed files and 40% more efficient than MongoDB.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Should I migrate from JSONExtract on String columns to the native JSON type?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Yes. &lt;code&gt;ALTER TABLE ... MODIFY COLUMN col JSON&lt;/code&gt; converts String columns to native JSON during background merges with no downtime (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/70442" rel="noopener noreferrer"&gt;PR #70442&lt;/a&gt;). After migration, queries read only the paths they need instead of scanning the full string. JSONExtract functions continue to work on native JSON columns (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/96711" rel="noopener noreferrer"&gt;PR #96711&lt;/a&gt;) and get rewritten to direct subcolumn reads by the planner.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What happened to ClickHouse Object('json') and how to migrate?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Object('json') (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/23932" rel="noopener noreferrer"&gt;PR #23932&lt;/a&gt;) was ClickHouse's first attempt at native JSON storage, shipped in 2022. It suffered from type unification issues, metadata explosion, and race conditions. Rather than patching it, ClickHouse built an entirely new implementation from first principles using Variant, Dynamic, and JSON types. Object('json') was fully removed in v25.11 (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85718" rel="noopener noreferrer"&gt;PR #85718&lt;/a&gt;). Tables using it must be migrated via &lt;code&gt;ALTER TABLE ... MODIFY COLUMN ... JSON&lt;/code&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/71784" rel="noopener noreferrer"&gt;PR #71784&lt;/a&gt;) before upgrading.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Can I use JSON fields in ClickHouse primary keys and indexes?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Yes, since v24.12. &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/72644" rel="noopener noreferrer"&gt;PR #72644&lt;/a&gt; enables JSON subcolumns in &lt;code&gt;ORDER BY&lt;/code&gt; and data-skipping index expressions. &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/98886" rel="noopener noreferrer"&gt;PR #98886&lt;/a&gt; adds bloom and text skip indexes on &lt;code&gt;JSONAllPaths()&lt;/code&gt; for efficient key-presence filtering.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How does ClickHouse handle high-cardinality JSON with thousands of paths?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;max_dynamic_paths&lt;/code&gt; parameter (default 1024) controls how many paths get their own columnar subcolumn. Paths beyond this limit overflow into shared data storage. The advanced serialization mode (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/83777" rel="noopener noreferrer"&gt;PR #83777&lt;/a&gt;) makes shared data reads efficient with per-granule path indexes. Use &lt;code&gt;SKIP&lt;/code&gt; and &lt;code&gt;SKIP REGEXP&lt;/code&gt; to exclude noisy paths from columnar storage.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Is ClickHouse JSON good for logs and observability?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Yes. ClickHouse's own observability stack (ClickStack) uses the native JSON type for OpenTelemetry log attributes, demonstrating 9x faster queries compared to the previous Map-based approach. SigNoz independently validated 30% faster log queries. The type preserves native numeric types, eliminating cast overhead for aggregations on log attributes.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Analysis based on 80+ GitHub pull requests, official ClickHouse changelogs, release blogs, and third-party benchmarks covering the period 2019-2026. Every claim maps to a specific merged PR. Verify the evidence yourself -- the commit history is public.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>data</category>
      <category>dataengineering</category>
      <category>clickhouse</category>
      <category>json</category>
    </item>
    <item>
      <title>Are ClickHouse JOINs Slow? A 2026 PR-by-PR Analysis</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Wed, 15 Apr 2026 15:14:34 +0000</pubDate>
      <link>https://forem.com/manveer_chawla_64a7283d5a/are-clickhouse-joins-slow-a-2026-pr-by-pr-analysis-21e8</link>
      <guid>https://forem.com/manveer_chawla_64a7283d5a/are-clickhouse-joins-slow-a-2026-pr-by-pr-analysis-21e8</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;TL;DR&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Are ClickHouse JOINs slow? Not since 2022. Over 50 merged PRs between 2022 and 2026 rebuilt the join engine from the ground up. The evidence is in the commit history.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We analyzed 50+ GitHub pull requests, official ClickHouse changelogs, and release blogs to trace the full evolution of JOIN support from 2022 through early 2026.
&lt;/li&gt;
&lt;li&gt;In 2021, the criticism was fair. ClickHouse had one join algorithm (hash join), no disk spilling, no cost-based optimization, and join order followed query syntax. If your right table exceeded memory, the query crashed.
&lt;/li&gt;
&lt;li&gt;By early 2026, ClickHouse ships six distinct join algorithms, cost-based global join reordering with dynamic programming, runtime bloom filters at the storage layer, parallel hash join as the default, correlated subquery decorrelation, and automatic build-side selection. None of this requires manual tuning.
&lt;/li&gt;
&lt;li&gt;The single highest-impact change is equivalence-set filter pushdown (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/61216" rel="noopener noreferrer"&gt;PR #61216&lt;/a&gt;), which delivered 180×+ speedups by propagating predicates across join sides through column equivalence classes. PostgreSQL and Oracle's planners use the same technique, and ClickHouse implements it natively in its columnar vectorized engine.
&lt;/li&gt;
&lt;li&gt;Grace hash join (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/38191" rel="noopener noreferrer"&gt;PR #38191&lt;/a&gt;) eliminated OOM crashes for memory-bound joins. Parallel hash join (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/70788" rel="noopener noreferrer"&gt;PR #70788&lt;/a&gt;) became the default and scales near-linearly across CPU cores. Neither requires configuration.
&lt;/li&gt;
&lt;li&gt;Global join reordering with column statistics (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/86822" rel="noopener noreferrer"&gt;PR #86822&lt;/a&gt;) produces 1,450× speedups on TPC-H SF100 by automatically finding the optimal join order. The DPsize dynamic programming algorithm (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/91002" rel="noopener noreferrer"&gt;PR #91002&lt;/a&gt;) further improves this for complex multi-table queries.
&lt;/li&gt;
&lt;li&gt;Runtime bloom filters, enabled by default since February 2026 (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/89314" rel="noopener noreferrer"&gt;PR #89314&lt;/a&gt;), dynamically prune probe-side data at the storage scan level. The v25.10 release blog reports a 2.1× speedup and 7× memory reduction on star-schema workloads.
&lt;/li&gt;
&lt;li&gt;Verdict: the "avoid JOINs in ClickHouse" advice made sense in 2020. Repeating it in 2026 is misinformation. ClickHouse's join engine now operates with the planning sophistication of a mature enterprise RDBMS, and it does so inside the columnar vectorized execution model that makes ClickHouse fast in the first place.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Why People Still Say "Avoid JOINs in ClickHouse"&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;If you've evaluated ClickHouse in the last few years, you've heard the warnings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;"Avoid JOINs in ClickHouse"&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"ClickHouse doesn't handle JOINs well"&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"Denormalize everything, always use flat tables"&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"JOINs are slow in ClickHouse"&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"Only hash join available, limited join algorithms"&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some of these started as legitimate ClickHouse documentation circa 2019–2020 that advised caution with joins. Others were amplified by competitors who found a convenient story: ClickHouse is fast for scans, but it can't join.&lt;/p&gt;

&lt;p&gt;In 2020, the criticism was mostly fair. ClickHouse had a single hash join algorithm, no disk spilling, no cost-based optimizer, and join order followed query syntax. If your right table exceeded memory, the query crashed with OOM.&lt;/p&gt;

&lt;p&gt;Then ClickHouse's engineering team spent four years dismantling every one of those limitations. Over 50 significant pull requests merged. They added six join algorithms, a cost-based optimizer with dynamic programming, runtime bloom filters, and automatic algorithm selection, build-side selection, join reordering, and predicate pushdown.&lt;/p&gt;

&lt;p&gt;This article traces that evolution with PR-level evidence. No marketing claims. No benchmarks on toy datasets. Just the commit history.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Methodology: How We Analyzed ClickHouse's Join Commit History&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We went through ClickHouse's GitHub commit history, pull requests, changelogs, and release blogs from 2022 through early 2026. The scope covered every PR that touched the join subsystem: algorithm changes, optimizer rewrites, planner passes, correctness fixes, and default configuration changes.&lt;/p&gt;

&lt;p&gt;Each PR was classified by category (algorithm, optimizer, parallelism, correctness), impact severity, and whether it changed default behavior. We cross-referenced PR descriptions against changelog entries and release blog benchmarks to verify the claimed improvements. Where multiple PRs addressed the same subsystem, we traced the dependency chain to understand how the incremental changes compounded.&lt;/p&gt;

&lt;p&gt;The result is a ranked list of 50 pull requests by impact, organized into eight thematic arcs, with full provenance. Every claim in this article maps to a specific merged PR that you can verify yourself on GitHub.&lt;/p&gt;

&lt;p&gt;This isn't a benchmarking exercise. Benchmarks measure peak performance on controlled workloads. This analysis measures the engineering trajectory: what was built, why, and what it means for teams deciding whether to use JOINs in ClickHouse today.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;ClickHouse JOIN Features in 2026: What Ships by Default&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The current state, as of early 2026:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Six distinct join algorithms:&lt;/strong&gt; hash, parallel hash, grace hash (disk-spilling), full sorting merge, direct (key-value), and paste join. Each one is optimized for a different workload shape, and the engine selects automatically.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost-based global join reordering:&lt;/strong&gt; Greedy and dynamic programming algorithms find the optimal join order using column statistics. No manual query rewriting needed.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime bloom filters:&lt;/strong&gt; Build-side join keys compile into bloom filters that get pushed down to probe-side storage scans, filtering non-matching rows before they reach the join.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Equivalence-set predicate pushdown:&lt;/strong&gt; Filters propagate transitively across multi-table join chains. &lt;code&gt;WHERE t1.id = 5&lt;/code&gt; pushes to t2, t3, and beyond when joined on equivalent keys.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel execution by default:&lt;/strong&gt; Parallel hash join scales near-linearly with cores. No configuration required.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Correlated subquery support:&lt;/strong&gt; EXISTS, scalar subqueries, and projection-list subqueries are automatically decorrelated into joins.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't experimental features hidden behind flags. They're defaults that ship with every ClickHouse installation.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;ClickHouse JOIN Myths vs. Reality: A 2026 Checklist&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;The FUD&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Evidence Volume&lt;/th&gt;
&lt;th&gt;Reality (2026)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;"Only hash join available"&lt;/td&gt;
&lt;td&gt;🟢 False since 2022&lt;/td&gt;
&lt;td&gt;6 algorithms, 10+ PRs&lt;/td&gt;
&lt;td&gt;Six algorithms: hash, parallel hash, grace hash, full sorting merge, direct, paste. Auto-selected.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;"JOINs cause OOM crashes"&lt;/td&gt;
&lt;td&gt;🟢 Solved since late 2022&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/38191" rel="noopener noreferrer"&gt;PR #38191&lt;/a&gt; + follow-ups&lt;/td&gt;
&lt;td&gt;Grace hash join spills to disk. Full sorting merge uses bounded memory. OOM joins are a solved problem.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;"JOINs are slow in ClickHouse"&lt;/td&gt;
&lt;td&gt;🟢 Outdated&lt;/td&gt;
&lt;td&gt;50+ optimization PRs&lt;/td&gt;
&lt;td&gt;180× from predicate pushdown, 1,450× from join reordering, 2.1× from runtime filters, all automatic.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;"No query optimizer for JOINs"&lt;/td&gt;
&lt;td&gt;🟢 False since 2024&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/86822" rel="noopener noreferrer"&gt;PR #86822&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/91002" rel="noopener noreferrer"&gt;#91002&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/71577" rel="noopener noreferrer"&gt;#71577&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Cost-based global join reordering with DPsize dynamic programming. Statistics-driven. Automatic.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;"Always denormalize everything"&lt;/td&gt;
&lt;td&gt;🟡 Nuanced&lt;/td&gt;
&lt;td&gt;Architecture-dependent&lt;/td&gt;
&lt;td&gt;Denormalization still has value for extreme query latency targets, but normalized star/snowflake schemas now perform well with automatic optimizations.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;"JOINs don't scale across cores"&lt;/td&gt;
&lt;td&gt;🟢 False since late 2024&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/70788" rel="noopener noreferrer"&gt;PR #70788&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/92068" rel="noopener noreferrer"&gt;#92068&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Parallel hash join is the default. Near-linear scaling. Outer join completion parallelized in 2026.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;"No predicate pushdown across JOINs"&lt;/td&gt;
&lt;td&gt;🟢 False since April 2024&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/61216" rel="noopener noreferrer"&gt;PR #61216&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/96596" rel="noopener noreferrer"&gt;#96596&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Equivalence-set pushdown across single and multi-join chains. Disjunctions supported.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;"Can't handle star/snowflake schemas"&lt;/td&gt;
&lt;td&gt;🟢 Outdated&lt;/td&gt;
&lt;td&gt;Runtime filters + reordering&lt;/td&gt;
&lt;td&gt;Runtime bloom filters and cost-based reordering specifically target star/snowflake schemas.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;"No correlated subquery support"&lt;/td&gt;
&lt;td&gt;🟢 False since mid-2025&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/76078" rel="noopener noreferrer"&gt;PR #76078&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85107" rel="noopener noreferrer"&gt;#85107&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Correlated subqueries decorrelated into joins. Beta since August 2025, enabled by default.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;"Have to manually tune join order"&lt;/td&gt;
&lt;td&gt;🟢 False since mid-2025&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/86822" rel="noopener noreferrer"&gt;PR #86822&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/89332" rel="noopener noreferrer"&gt;#89332&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/93912" rel="noopener noreferrer"&gt;#93912&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Automatic join reordering using column statistics and runtime hash table size feedback.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Phase 1 (2022): How Many Join Algorithms Does ClickHouse Support?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The FUD:&lt;/strong&gt; &lt;em&gt;"ClickHouse only has hash join"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In mid-2022, ClickHouse had exactly one production join algorithm: hash join. The criticism was valid. Then, in roughly six months, three new algorithms landed.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Full Sorting Merge Join: Memory-Bounded Joins (July 2022)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/35796" rel="noopener noreferrer"&gt;PR #35796&lt;/a&gt; introduced full sorting merge join, a classical sort-merge algorithm integrated into ClickHouse's pipeline. Both sides sort by join keys (with external sorting if needed), then merge in streaming fashion. Memory is bounded by the sort buffer, not by hash table size.&lt;/p&gt;

&lt;p&gt;This mattered for two reasons. First, it was the first non-memory-bound join algorithm in ClickHouse, so you could join tables larger than RAM without crashing. Second, it skips sorting entirely when physical row order already matches join keys, which makes it faster than hash join for pre-sorted data.&lt;/p&gt;

&lt;p&gt;A follow-up optimization (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/39418" rel="noopener noreferrer"&gt;PR #39418&lt;/a&gt;) builds an in-memory key set from the smaller table to pre-filter the larger table before sorting. That made full sorting merge competitive with hash join on general workloads, not just pre-sorted ones.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Grace Hash Join: Disk-Spilling for Out-of-Memory JOINs (November 2022)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/38191" rel="noopener noreferrer"&gt;PR #38191&lt;/a&gt; was arguably the most important foundational change of this era. Grace hash join partitions both inputs into buckets via a secondary hash. Only one bucket pair is processed at a time, and inactive buckets spill to disk.&lt;/p&gt;

&lt;p&gt;Before this PR, a join where the right table exceeded available memory crashed with OOM. After it, the join completed. It just took longer.&lt;/p&gt;

&lt;p&gt;Grace hash initially supported only INNER and LEFT joins. FULL and RIGHT support arrived in July 2023 (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/51013" rel="noopener noreferrer"&gt;PR #51013&lt;/a&gt;), and a cache locality optimization (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/72237" rel="noopener noreferrer"&gt;PR #72237&lt;/a&gt;) delivered a ~24% speedup in late 2024. It graduated to GA in v24.3, which closed &lt;a href="https://github.com/ClickHouse/ClickHouse/issues/11596" rel="noopener noreferrer"&gt;issue #11596&lt;/a&gt;, the most upvoted join-related issue in ClickHouse's history, open since June 2020.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Direct Join: O(1) Memory Key-Value Lookups (2022)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/35363" rel="noopener noreferrer"&gt;PR #35363&lt;/a&gt; introduced direct join for EmbeddedRocksDB tables, and &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/38956" rel="noopener noreferrer"&gt;PR #38956&lt;/a&gt; extended it to dictionaries with SEMI/ANTI support.&lt;/p&gt;

&lt;p&gt;Direct join bypasses hash table construction entirely. It performs O(1) key-value lookups against the storage engine for each left-side row, and memory usage stays constant regardless of right table size.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;ConcurrentHashJoin: The Foundation for Parallel Hash Join (May 2022)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/36415" rel="noopener noreferrer"&gt;PR #36415&lt;/a&gt; laid the groundwork for what would become ClickHouse's most impactful default change. ConcurrentHashJoin creates multiple HashJoin instances, one per thread, and partitions both build and probe sides for concurrent execution. This was the foundation for parallel hash join, which became the default two and a half years later.&lt;/p&gt;

&lt;p&gt;By the end of 2022, ClickHouse had five distinct join algorithms where it previously had one. The "only hash join available" criticism had a documented expiration date.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Phase 2 (2023–2024): Does ClickHouse Have a Query Optimizer for JOINs?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The FUD:&lt;/strong&gt; &lt;em&gt;"ClickHouse has no query optimizer for JOINs"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The promotion of ClickHouse's new query Analyzer to production status in v24.9 was the catalyst. The Analyzer provides richer semantic information about column relationships than the old parser-based planner did, which enabled a class of optimizations that were previously impossible.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Equivalence-Set Filter Pushdown: 180× Speedup (April 2024)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/61216" rel="noopener noreferrer"&gt;PR #61216&lt;/a&gt; is the single highest-impact join optimization in this entire four-year period. It introduced equivalence-class-based predicate pushdown across join sides.&lt;/p&gt;

&lt;p&gt;The logic is straightforward. When tables are joined on &lt;code&gt;t1.id = t2.id&lt;/code&gt;, a filter &lt;code&gt;WHERE t1.id = 5&lt;/code&gt; is equivalent to &lt;code&gt;t2.id = 5&lt;/code&gt;. The optimizer recognizes this equivalence and pushes the filter to both sides of the join before execution.&lt;/p&gt;

&lt;p&gt;Before this PR, filters were applied only after the join completed, which forced full table scans of both sides. After it, filters propagate to both sides and prune data before it reaches the join. Benchmarks show up to 180×+ improvement.&lt;/p&gt;

&lt;p&gt;This was later extended to work across chains of multiple INNER JOINs (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/96596" rel="noopener noreferrer"&gt;PR #96596&lt;/a&gt;) using a Disjoint Set Union data structure to track transitive equalities. For a query joining t1, t2, and t3 on equivalent keys, a filter &lt;code&gt;WHERE t1.id = 42&lt;/code&gt; now pushes to all three tables.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Automatic OUTER JOIN to INNER JOIN Conversion (April 2024)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/62907" rel="noopener noreferrer"&gt;PR #62907&lt;/a&gt; automatically converts OUTER JOINs to INNER JOINs when post-join filter conditions make the outer semantics unnecessary. A &lt;code&gt;LEFT JOIN ... WHERE right_col IS NOT NULL&lt;/code&gt; is functionally an INNER JOIN, and the optimizer now recognizes this.&lt;/p&gt;

&lt;p&gt;This matters beyond the immediate execution improvement (benchmarks show 32s to 0.006s in some cases) because it enables cascading optimizations. INNER JOINs allow predicate pushdown and join reordering that are structurally impossible for OUTER JOINs. Converting the join type first unlocks the full optimization pipeline downstream.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Right-Side Pushdown, OR Conditions, and Common Expression Extraction&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The planner intelligence kept accumulating:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/50532" rel="noopener noreferrer"&gt;PR #50532&lt;/a&gt; extended predicate pushdown to the right side of joins, delivering 27× improvement on applicable queries.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/84735" rel="noopener noreferrer"&gt;PR #84735&lt;/a&gt; enabled pushdown of OR conditions through joins. Previously only AND conditions could be pushed.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/71537" rel="noopener noreferrer"&gt;PR #71537&lt;/a&gt; extracted common expressions from WHERE/ON clauses, which reduced redundant hash table instantiation for BI-generated queries with complex OR conditions.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/78877" rel="noopener noreferrer"&gt;PR #78877&lt;/a&gt; moved equality predicates from WHERE into JOIN ON conditions, enabling more efficient hash table lookups.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each of these operates automatically. No query hints, and no manual rewriting. The planner just does the right thing.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Phase 3 (Late 2024): Do ClickHouse JOINs Scale Across CPU Cores?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The FUD:&lt;/strong&gt; &lt;em&gt;"JOINs don't scale across cores in ClickHouse"&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Parallel Hash Join Becomes the Default Algorithm (November 2024)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/70788" rel="noopener noreferrer"&gt;PR #70788&lt;/a&gt; changed the default &lt;code&gt;join_algorithm&lt;/code&gt; from &lt;code&gt;'direct,hash'&lt;/code&gt; to &lt;code&gt;'direct,parallel_hash,hash'&lt;/code&gt;. Every ClickHouse installation now uses parallel hash join by default.&lt;/p&gt;

&lt;p&gt;The parallel hash join builds hash tables using multiple threads via hash-based sharding. The probe phase shards the same way for lock-free concurrent execution. No configuration needed, and scaling is near-linear with CPU cores.&lt;/p&gt;

&lt;p&gt;This was the most broadly impactful default configuration change in this period. Every hash join query on every ClickHouse installation benefits without any user action.&lt;/p&gt;

&lt;p&gt;The path to default status was paved by years of incremental improvements. Hash table size statistics caching (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/64553" rel="noopener noreferrer"&gt;PR #64553&lt;/a&gt;) pre-allocates tables on repeat queries. Zero-copy block scattering (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/67782" rel="noopener noreferrer"&gt;PR #67782&lt;/a&gt;) eliminated redundant memory copies. An adaptive threshold (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/76185" rel="noopener noreferrer"&gt;PR #76185&lt;/a&gt;) falls back to single-threaded hash join for small tables where parallelism would add overhead. Two-level hash maps in v25.1 yielded another ~40% speedup.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Parallelizing OUTER JOIN Completion (February 2026)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/92068" rel="noopener noreferrer"&gt;PR #92068&lt;/a&gt; addressed the last remaining single-threaded bottleneck in parallel hash join. For FULL and RIGHT OUTER joins, the "non-joined rows" (rows from the build side with no match) were previously emitted by a single thread. That created an Amdahl's Law bottleneck that limited outer join scalability. The fix parallelizes non-joined row emission across all hash table buckets.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Phase 4 (2025–2026): ClickHouse Cost-Based Join Optimization&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The FUD:&lt;/strong&gt; &lt;em&gt;"You have to manually optimize join order in ClickHouse"&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Automatic Build-Side Selection for Hash Joins (November 2024)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/71577" rel="noopener noreferrer"&gt;PR #71577&lt;/a&gt; introduced &lt;code&gt;query_plan_join_swap_table = 'auto'&lt;/code&gt;. The optimizer estimates table sizes and places the smaller table on the build (right) side of hash joins. This was the first step toward automatic join reordering.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Statistics-Driven Global Join Reordering: 1,450× TPC-H Speedup (v25.9)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/86822" rel="noopener noreferrer"&gt;PR #86822&lt;/a&gt; introduced global join reordering using a greedy algorithm with column statistics. For queries joining three or more tables, the optimizer evaluates estimated cardinalities and selects the join order that minimizes intermediate result sizes.&lt;/p&gt;

&lt;p&gt;The numbers on TPC-H SF100: &lt;strong&gt;1,450× speedup and 25× memory reduction&lt;/strong&gt; compared to syntax-order execution. That kind of improvement turns "don't use JOINs" into "write whatever join order you want, the optimizer will figure it out."&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;DPsize Dynamic Programming Join Reordering (v25.12)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/91002" rel="noopener noreferrer"&gt;PR #91002&lt;/a&gt; added a dynamic programming algorithm (DPsize) for more exhaustive join order search. The greedy algorithm makes locally optimal choices, but DPsize evaluates subsets of joined relations systematically. It produces ~4.7% further improvement over greedy on TPC-H, with bigger gains on complex multi-table queries.&lt;/p&gt;

&lt;p&gt;The optimizer tries DPsize first and falls back to greedy if the complexity threshold is exceeded. That's how mature query planners work.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Automatic Statistics Collection for the Join Optimizer&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The optimizer is only as good as its statistics. Column statistics moved from manual (&lt;code&gt;ALTER TABLE ADD STATISTICS&lt;/code&gt;) to automatic. &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/89332" rel="noopener noreferrer"&gt;PR #89332&lt;/a&gt; enabled &lt;code&gt;allow_statistics_optimize&lt;/code&gt; by default in v25.10.&lt;/p&gt;

&lt;p&gt;Runtime hash table size statistics (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/93912" rel="noopener noreferrer"&gt;PR #93912&lt;/a&gt;) close the feedback loop between execution and planning. Actual observed sizes from previous queries inform future optimization decisions.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;ClickHouse Runtime Bloom Filters and Star Schema JOIN Performance&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The FUD:&lt;/strong&gt; &lt;em&gt;"ClickHouse can't handle star/snowflake schemas"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/89314" rel="noopener noreferrer"&gt;PR #89314&lt;/a&gt;, merged February 2026, enabled runtime bloom filters by default. During hash table construction, ClickHouse builds a bloom filter from the build-side join keys and pushes it down to the probe-side scan pipeline. Rows that don't match the bloom filter are discarded at the storage scan level, before they ever reach the join.&lt;/p&gt;

&lt;p&gt;For star-schema workloads where fact tables are orders of magnitude larger than dimension tables, this is transformative. The v25.10 release blog reports a &lt;strong&gt;2.1× overall query speedup and 7× memory reduction&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The implementation was hardened through 10+ follow-up correctness fixes addressing edge cases: Nullable keys (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/94555" rel="noopener noreferrer"&gt;PR #94555&lt;/a&gt;), multi-key ANTI joins (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/98871" rel="noopener noreferrer"&gt;PR #98871&lt;/a&gt;), const columns, Merge tables, and more. An adaptive mechanism (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/91578" rel="noopener noreferrer"&gt;PR #91578&lt;/a&gt;) dynamically disables bloom filters at runtime when they become saturated or aren't filtering enough rows, which prevents negative ROI on non-selective joins. Coverage was extended to RIGHT OUTER joins (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/96183" rel="noopener noreferrer"&gt;PR #96183&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Runtime filters can also be pushed into PREWHERE (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/95838" rel="noopener noreferrer"&gt;PR #95838&lt;/a&gt;), ClickHouse's storage-layer pre-filtering mechanism, for maximum efficiency.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Does ClickHouse Support Correlated Subqueries? (2025 Decorrelation)&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The FUD:&lt;/strong&gt; &lt;em&gt;"ClickHouse can't do correlated subqueries"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This one was true until April 2025. &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/76078" rel="noopener noreferrer"&gt;PR #76078&lt;/a&gt; introduced the first correlated subquery decorrelation support, converting EXISTS with correlated references into joins. Scalar subquery support followed (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/79600" rel="noopener noreferrer"&gt;PR #79600&lt;/a&gt;), then projection-list subqueries (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/79925" rel="noopener noreferrer"&gt;PR #79925&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85107" rel="noopener noreferrer"&gt;PR #85107&lt;/a&gt; promoted correlated subqueries to beta with default enablement in August 2025. That closed &lt;a href="https://github.com/ClickHouse/ClickHouse/issues/6697" rel="noopener noreferrer"&gt;issue #6697&lt;/a&gt;, one of the longest-standing SQL compatibility gaps in ClickHouse, open since 2019.&lt;/p&gt;

&lt;p&gt;Teams migrating from PostgreSQL, MySQL, or Snowflake no longer need to manually rewrite correlated subqueries into explicit joins. The planner does it automatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;ClickHouse Hash Join Internals: Low-Level Optimizations&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Beyond the headline features, ClickHouse's most-used join algorithm received systematic low-level optimization that compounds across every query:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Main loop specialization&lt;/strong&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/82308" rel="noopener noreferrer"&gt;PR #82308&lt;/a&gt;): Compile-time elimination of null_map and join_mask checks for single-key joins. No more unnecessary branches on every row.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JoinUsedFlags vector optimization&lt;/strong&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/83043" rel="noopener noreferrer"&gt;PR #83043&lt;/a&gt;): Replaced hash-based flag tracking with atomic vectors, removing per-access hash computation in FULL/RIGHT joins.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output size enforcement&lt;/strong&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/56996" rel="noopener noreferrer"&gt;PR #56996&lt;/a&gt;): &lt;code&gt;max_joined_block_size_rows&lt;/code&gt; prevents catastrophic memory spikes from ALL JOIN row replication.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache locality improvements&lt;/strong&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/60341" rel="noopener noreferrer"&gt;PR #60341&lt;/a&gt;): Right-table reranging by join keys for cache-friendly access patterns.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic dispatch&lt;/strong&gt; (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/79573" rel="noopener noreferrer"&gt;PR #79573&lt;/a&gt;): Optimized &lt;code&gt;ColumnVector::replicate&lt;/code&gt; in the hash join hot path, lowering CPU per output row.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these individually make a press release. Together, they compound into a materially faster join engine at every level of the stack.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;ClickHouse JOIN Limitations and Trade-offs in 2026&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Fairness matters. A few things still require awareness:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Denormalization still has value for extreme latency targets.&lt;/strong&gt; If you need sub-10ms p99 on dashboard queries and you can afford the storage, flat tables remain faster than joins. The optimizer is good, but it isn't free.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Join reordering depends on statistics.&lt;/strong&gt; When statistics are missing or stale, the optimizer can pick suboptimal plans. The system increasingly collects statistics automatically, but monitoring is still your responsibility.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Correlated subqueries are beta.&lt;/strong&gt; They work for common patterns like EXISTS and scalar subqueries, but edge cases exist. For complex correlated logic, explicit join rewrites may still be necessary.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grace hash join trades speed for completion.&lt;/strong&gt; Disk-spilling joins complete instead of crashing, but they're slower than in-memory execution. If you consistently need to spill, you need more memory or a different data model.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Correctness fixes are ongoing.&lt;/strong&gt; The volume of bug fixes following runtime filter enablement (10+ PRs) shows how complex cross-cutting optimizations get when they're enabled by default. ClickHouse's engineering team has been rigorous about correctness, but running the latest stable release matters.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are real engineering trade-offs, and understanding them is part of making an informed decision.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;ClickHouse JOIN Improvements Timeline (2022–2026)&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Year&lt;/th&gt;
&lt;th&gt;What Changed&lt;/th&gt;
&lt;th&gt;Key PRs&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2022&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Algorithm diversification: hash to 5 algorithms&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/35796" rel="noopener noreferrer"&gt;#35796&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/38191" rel="noopener noreferrer"&gt;#38191&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/35363" rel="noopener noreferrer"&gt;#35363&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/36415" rel="noopener noreferrer"&gt;#36415&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;OOM joins eliminated. Sort-merge and direct join added. Parallel hash foundation laid.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2023&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Grace hash FULL/RIGHT. PASTE JOIN. Output size limits.&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/51013" rel="noopener noreferrer"&gt;#51013&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/57995" rel="noopener noreferrer"&gt;#57995&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/56996" rel="noopener noreferrer"&gt;#56996&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;Disk-spilling joins cover all join types. Safety valves for memory.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2024&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Planner intelligence. Parallel hash default. Build-side selection.&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/61216" rel="noopener noreferrer"&gt;#61216&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/62907" rel="noopener noreferrer"&gt;#62907&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/70788" rel="noopener noreferrer"&gt;#70788&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/71577" rel="noopener noreferrer"&gt;#71577&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/71537" rel="noopener noreferrer"&gt;#71537&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;180×+ from predicate pushdown. Multi-core joins default. OUTER to INNER conversion automatic.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2025&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cost-based optimization. Correlated subqueries. Statistics infra.&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/86822" rel="noopener noreferrer"&gt;#86822&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/91002" rel="noopener noreferrer"&gt;#91002&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/76078" rel="noopener noreferrer"&gt;#76078&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85107" rel="noopener noreferrer"&gt;#85107&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/89332" rel="noopener noreferrer"&gt;#89332&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;1,450× from join reordering. Correlated subqueries work. Statistics automatic.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2026&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Runtime bloom filters default. Correctness hardening.&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/89314" rel="noopener noreferrer"&gt;#89314&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/92068" rel="noopener noreferrer"&gt;#92068&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/96596" rel="noopener noreferrer"&gt;#96596&lt;/a&gt;, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/98871" rel="noopener noreferrer"&gt;#98871&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;2.1× from runtime filters. Full outer join parallelism. Multi-join pushdown.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;When Should You Use JOINs in ClickHouse?&lt;/strong&gt;
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workload&lt;/th&gt;
&lt;th&gt;Verdict&lt;/th&gt;
&lt;th&gt;Reasoning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Star/snowflake schema analytics&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;Runtime bloom filters, cost-based reordering, and predicate pushdown are purpose-built for this&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-table reporting queries&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;Global join reordering eliminates the need for manual optimization. Write readable SQL.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Joins exceeding available memory&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;Grace hash join and full sorting merge handle this without OOM. Completion is guaranteed.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real-time dashboards with dimension lookups&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;Direct join (O(1) memory) and parallel hash join handle typical enrichment patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time-series ASOF joins&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;ASOF JOIN with full_sorting_merge is 2× faster and uses 2× less memory than the hash-based version (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/55051" rel="noopener noreferrer"&gt;PR #55051&lt;/a&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sub-10ms p99 latency on complex joins&lt;/td&gt;
&lt;td&gt;🟡 Depends&lt;/td&gt;
&lt;td&gt;Flat tables still win for extreme latency. Joins add overhead even when well-optimized.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Correlated subqueries from legacy SQL&lt;/td&gt;
&lt;td&gt;🟡 Mostly&lt;/td&gt;
&lt;td&gt;Beta support covers common patterns. Edge cases may need manual rewriting.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10+ table joins with no statistics&lt;/td&gt;
&lt;td&gt;🟡 Conditional&lt;/td&gt;
&lt;td&gt;The optimizer needs statistics. Make sure &lt;code&gt;allow_statistics_optimize&lt;/code&gt; is enabled (default since v25.10).&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;How to Respond to "Avoid JOINs in ClickHouse"&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Run the PR numbers.&lt;/p&gt;

&lt;p&gt;When someone tells you ClickHouse can't do joins in 2026, ask them if they've tested against a version that includes parallel hash join (default since v24.12), equivalence-set predicate pushdown (v24.4), grace hash join (GA since v24.3), runtime bloom filters (default since v25.10), or cost-based join reordering (v25.9).&lt;/p&gt;

&lt;p&gt;If they're benchmarking against ClickHouse 23.x or earlier, or repeating 2020-era blog posts, they aren't evaluating ClickHouse. They're evaluating a system that no longer exists.&lt;/p&gt;

&lt;p&gt;The commit history doesn't lie. 50+ pull requests. Six algorithms. Cost-based optimization. Runtime filtering. Automatic algorithm selection, build-side selection, join reordering, and predicate pushdown.&lt;/p&gt;

&lt;p&gt;ClickHouse's join subsystem in 2026 bears no resemblance to the one that earned those early warnings. The engineers built a modern join engine, and the evidence is in the PRs.&lt;/p&gt;

&lt;p&gt;Test it on your workload. That's the only benchmark that matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;ClickHouse JOINs FAQ&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Are JOINs production-ready in ClickHouse in 2026?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Yes. ClickHouse's join subsystem has been transformed since 2022. Six join algorithms, cost-based global join reordering, runtime bloom filters, and parallel execution all ship enabled by default. The "avoid JOINs" advice is outdated by four years and 50+ merged PRs.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What is the most impactful ClickHouse join optimization?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Equivalence-set filter pushdown (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/61216" rel="noopener noreferrer"&gt;PR #61216&lt;/a&gt;), which delivers 180×+ speedups by propagating predicates across join sides. For multi-table workloads, global join reordering (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/86822" rel="noopener noreferrer"&gt;PR #86822&lt;/a&gt;) with 1,450× speedup on TPC-H SF100 is equally transformative.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Does ClickHouse still crash with OOM on large joins?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;No. Grace hash join (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/38191" rel="noopener noreferrer"&gt;PR #38191&lt;/a&gt;) introduced disk-spilling in November 2022 and graduated to GA in v24.3. Full sorting merge join (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/35796" rel="noopener noreferrer"&gt;PR #35796&lt;/a&gt;) provides a memory-bounded sort-merge alternative. Both algorithms guarantee completion regardless of data size.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How many join algorithms does ClickHouse support?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Six: hash join, parallel hash join (default), grace hash join (disk-spilling), full sorting merge join, direct join (O(1) memory key-value), and paste join (positional). The engine selects the appropriate one automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Does ClickHouse have cost-based join optimization?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Yes, since v25.9. The optimizer uses column statistics for greedy join reordering and a DPsize dynamic programming algorithm (v25.12) for exhaustive search. Statistics collection is automatic since v25.10 (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/89332" rel="noopener noreferrer"&gt;PR #89332&lt;/a&gt;). Runtime hash table statistics (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/93912" rel="noopener noreferrer"&gt;PR #93912&lt;/a&gt;) feed execution data back into the planner.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Should I denormalize tables in ClickHouse instead of using JOINs?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;It depends on your latency requirements. For sub-10ms p99 dashboard queries, flat tables remain the fastest path. For analytical workloads where query readability, data freshness, and storage efficiency matter, normalized star/snowflake schemas with JOINs are now well-optimized. The "always denormalize" advice is no longer a blanket recommendation.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Can ClickHouse handle star and snowflake schemas?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Yes. Runtime bloom filters (default since February 2026, &lt;a href="https://github.com/ClickHouse/ClickHouse/pull/89314" rel="noopener noreferrer"&gt;PR #89314&lt;/a&gt;) specifically target star-schema patterns by filtering fact table rows at the storage layer using dimension table keys. Combined with cost-based join reordering and predicate pushdown, star/snowflake schemas are a first-class workload.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Does ClickHouse support correlated subqueries?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Yes, since mid-2025. Correlated subquery decorrelation (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/76078" rel="noopener noreferrer"&gt;PR #76078&lt;/a&gt;) automatically converts EXISTS, scalar, and projection-list subqueries into joins. Promoted to beta with default enablement in August 2025 (&lt;a href="https://github.com/ClickHouse/ClickHouse/pull/85107" rel="noopener noreferrer"&gt;PR #85107&lt;/a&gt;). Closes a feature gap that had been open since 2019.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Analysis based on 50+ GitHub pull requests, official ClickHouse changelogs, and release blogs covering the period 2022–2026. Every claim maps to a specific merged PR. Verify the evidence yourself, the commit history is public.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>data</category>
      <category>analytics</category>
      <category>dataengineering</category>
      <category>database</category>
    </item>
    <item>
      <title>How to Connect AI Agents to Enterprise Productivity Tools Securely (2026 Architecture Guide)</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Thu, 09 Apr 2026 20:58:36 +0000</pubDate>
      <link>https://forem.com/arcade/how-to-connect-ai-agents-to-enterprise-productivity-tools-securely-2026-architecture-guide-5d0n</link>
      <guid>https://forem.com/arcade/how-to-connect-ai-agents-to-enterprise-productivity-tools-securely-2026-architecture-guide-5d0n</guid>
      <description>&lt;p&gt;Most enterprise AI agents today can analyze but can't execute. They summarize documents, surface insights, and draft responses. They don't close support tickets, update Salesforce, or trigger deployments. The ROI stays incremental. The architecture that solves this is an MCP runtime, a secure execution layer that handles authorization, credentials, and tool calling on behalf of each user.&lt;/p&gt;

&lt;p&gt;The real transformation happens when agents take actions, when employees direct work instead of doing it. But getting agents to safely execute across enterprise systems is where everything falls apart.&lt;/p&gt;

&lt;p&gt;Recent industry studies from IDC and MIT show that &lt;a href="https://fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo/" rel="noopener noreferrer"&gt;88 to 95 percent of enterprise AI pilots fail to reach production&lt;/a&gt;. The root cause isn't the language model. It's the complexity of secure integration, and every month spent rebuilding auth plumbing is a month your agents aren't delivering business value.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use an MCP runtime as the secure action layer&lt;/strong&gt; between your agents and enterprise tools. It evaluates the intersection of agent permissions and user permissions per action at runtime.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execute every tool call on behalf of the user (OBO).&lt;/strong&gt; The agent acts with the user's credentials, scoped to the user's native permissions, and every action is attributable in audit logs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep OAuth tokens out of the LLM context.&lt;/strong&gt; Credentials must be vaulted at the runtime layer where the model cannot observe, alter, or leak them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Do not use static service accounts.&lt;/strong&gt; They break permission models and turn a single prompt injection into an enterprise-wide incident.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build with agent-optimized tools, not raw API wrappers&lt;/strong&gt;: intent-level operations with validated schemas that prevent parameter hallucination and eliminate retry loops.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Require human-in-the-loop approvals for all destructive actions&lt;/strong&gt;. Deletes, bulk updates, and external communications must pause for explicit sign-off before execution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ship audit logs and telemetry from day one.&lt;/strong&gt; Export every tool call via OpenTelemetry to your SIEM for compliance, incident response, and root cause analysis.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why connecting AI agents to enterprise tools is hard: identity, permissions, and safe execution
&lt;/h2&gt;

&lt;p&gt;The bottleneck in agentic systems, such as Claude Cowork or OpenClaw, isn't making API calls. It's identity propagation, permission inheritance, and safe execution within complex enterprise environments.&lt;/p&gt;

&lt;p&gt;When teams build direct integrations between LLMs and enterprise software, they immediately hit friction. Developers spend cycles managing fragile OAuth token lifecycles, handling async user consent flows, manually tuning least-privilege authorization scopes, and building custom approval controls. This is undifferentiated infrastructure work that burns engineering time without advancing the agent's core capabilities.&lt;/p&gt;

&lt;p&gt;Because this work is tedious and blocks core agent development, teams frequently take a dangerous shortcut: they use service accounts.&lt;/p&gt;

&lt;p&gt;Granting an agent global read and write access across an entire enterprise instance breaks native permission models. You're bypassing years of carefully configured role-based access controls.&lt;/p&gt;

&lt;p&gt;A single manipulated input can result in instant, untraceable data exfiltration or system modification. If an agent holds a static API key with global write access, a localized &lt;a href="https://genai.owasp.org/llm-top-10/" rel="noopener noreferrer"&gt;prompt injection vulnerability&lt;/a&gt; becomes an enterprise-wide blast radius.&lt;/p&gt;

&lt;p&gt;Teams make two mistakes here. Give the agent its own identity, and an intern can bypass their permissions through the agent. Inherit the user's full access, and one prompt injection cascades through every connected system.&lt;/p&gt;

&lt;p&gt;The right answer is the intersection: what is this agent allowed to do &lt;strong&gt;AND&lt;/strong&gt; what is this user allowed to do, evaluated per action, at runtime. This is the permission intersection model, and it's the only approach that prevents both privilege escalation and blast radius expansion simultaneously.&lt;/p&gt;

&lt;p&gt;This evaluation must happen at the runtime layer. Not at login time, not in the prompt, and not in the application code. Without it, scaling agents beyond single-user demos is unsafe.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architectural shift: The agent is already the proxy
&lt;/h2&gt;

&lt;p&gt;Before evaluating specific integration approaches, you need to understand why the traditional enterprise architecture no longer applies.&lt;/p&gt;

&lt;p&gt;In the pre-agentic model, a proxy (API gateway) sits between applications and APIs, routing, authenticating, and rate limiting. The proxy is the control point because all traffic flows through it.&lt;/p&gt;

&lt;p&gt;Agents invert this topology. The agent mediates between the user and the infrastructure. It already handles routing, orchestration, and decision-making. Adding a traditional proxy in front of the tools the agent calls doesn't add a control point. It adds a redundant hop that can't see into the execution context that matters: which user, which action, which permission, right now.&lt;/p&gt;

&lt;p&gt;The control point in an agentic architecture is the execution layer where the tool runs, where credentials are resolved, permissions are checked, and actions are taken on behalf of a specific human. That's the runtime.&lt;/p&gt;

&lt;p&gt;The gateway era was defined by the proxy as the control point. The agentic era is defined by the runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four architectures for connecting AI agents to enterprise tools
&lt;/h2&gt;

&lt;p&gt;As organizations move from isolated pilots to production deployments, engineering teams adopt one of four integration models. Understanding where each approach breaks down under enterprise load is critical for architectural planning.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Integration approach&lt;/th&gt;
&lt;th&gt;Security &amp;amp; identity&lt;/th&gt;
&lt;th&gt;Maintenance burden&lt;/th&gt;
&lt;th&gt;Reliability &amp;amp; execution&lt;/th&gt;
&lt;th&gt;Speed-to-market&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Custom connectors &amp;amp; DIY auth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Highly variable; often falls back to static keys.&lt;/td&gt;
&lt;td&gt;Extremely high; requires dedicated auth teams.&lt;/td&gt;
&lt;td&gt;Low; prone to parameter hallucination loops.&lt;/td&gt;
&lt;td&gt;Very slow.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Legacy iPaaS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Moderate; struggles with On-Behalf-Of execution.&lt;/td&gt;
&lt;td&gt;Medium; relies on maintaining visual workflows.&lt;/td&gt;
&lt;td&gt;Medium; optimized for linear triggers, not loops.&lt;/td&gt;
&lt;td&gt;Moderate.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Unmanaged MCP servers&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low; lacks centralized multi-user authorization.&lt;/td&gt;
&lt;td&gt;High; requires manual deployment and patching.&lt;/td&gt;
&lt;td&gt;Low; lacks native retries and failover state.&lt;/td&gt;
&lt;td&gt;Fast for prototypes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MCP runtime (e.g., Arcade)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High; native permission mapping and token vaults.&lt;/td&gt;
&lt;td&gt;Low; runtime handles lifecycle and upgrades.&lt;/td&gt;
&lt;td&gt;High; parallel execution and automatic retries.&lt;/td&gt;
&lt;td&gt;Very fast.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Approach 1: Build custom connectors and OAuth (DIY authentication)
&lt;/h3&gt;

&lt;p&gt;Build one-off API wrappers and custom OAuth layers for every enterprise tool your agent needs.&lt;/p&gt;

&lt;p&gt;The upside is total control. You dictate every aspect of the integration and avoid third-party vendor lock-in.&lt;/p&gt;

&lt;p&gt;But the limitations get crippling fast. Custom connectors become a massive engineering drain. Teams spend months building secure token vaults, handling refresh token rotation, and writing edge-case logic. Those are months that could have been spent shipping agent features that actually move the business forward.&lt;/p&gt;

&lt;p&gt;Raw enterprise APIs compound the problem. They expect highly structured, deterministic inputs, but agents generate dynamic natural language. Wiring them directly to raw endpoints leads to parameter hallucination and endless retry loops. Authentication alone becomes a standalone infrastructure project: token rotation, user matching, session validation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 2: Use legacy iPaaS for agent tool calls
&lt;/h3&gt;

&lt;p&gt;Enterprises retrofit existing integration platforms like Workato, MuleSoft, or Zapier to trigger actions based on LLM outputs.&lt;/p&gt;

&lt;p&gt;The strength is familiarity. Enterprise IT teams already know these tools, and they come with massive pre-built endpoint catalogs.&lt;/p&gt;

&lt;p&gt;But the limitations are architectural and fundamental. These platforms were built for linear, deterministic, trigger-based automation. Agentic systems operate on non-deterministic, stateful reasoning loops where the agent decides what to call, when, and how many times based on intermediate results. Forcing that into a linear webhook pattern breaks down fast.&lt;/p&gt;

&lt;p&gt;The deeper problem is identity. Legacy iPaaS platforms center on system-to-system service accounts. They lack true &lt;a href="https://learn.microsoft.com/en-us/azure/active-directory/develop/v2-oauth2-on-behalf-of-flow" rel="noopener noreferrer"&gt;user-scoped, On-Behalf-Of (OBO) execution&lt;/a&gt;, which forces teams to build complex, fragile workarounds to ensure the agent only acts with the specific permissions of the user typing the prompt. Per-user authorization evaluated at runtime across every tool call requires infrastructure these platforms were never designed to deliver.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 3: Run unmanaged MCP servers
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://modelcontextprotocol.io/specification/latest" rel="noopener noreferrer"&gt;Model Context Protocol standardized how AI models connect to data sources and tools&lt;/a&gt;. In this approach, teams deploy open-source MCP servers to expose local or SaaS capabilities directly to their agents.&lt;/p&gt;

&lt;p&gt;MCP's strength is standardization. It decouples the agent framework from the underlying tool implementation, creating a universal language for tool calling. The problem is that the quality of unmanaged, open-source MCP servers varies widely. According to &lt;a href="https://toolbench.arcade.dev/" rel="noopener noreferrer"&gt;benchmarks&lt;/a&gt; many struggle with reliability and correctness, which compounds the challenges of production deployments.&lt;/p&gt;

&lt;p&gt;These servers break down the moment you take them to production. Raw, unmanaged MCP servers lack centralized governance. They don't ship with multi-user enterprise authentication handling, meaning every user often shares the same connection identity.&lt;/p&gt;

&lt;p&gt;They also lack production reliability features like automatic retries, parallel execution, and stateful failover out of the box. That burden falls back on the application developer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 4: Use an MCP runtime (the secure action layer)
&lt;/h3&gt;

&lt;p&gt;An &lt;a href="https://docs.arcade.dev/en/home" rel="noopener noreferrer"&gt;MCP runtime&lt;/a&gt; is the infrastructure layer purpose-built to solve this problem. &lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;Arcade.dev&lt;/a&gt;, the industry's first MCP runtime, combines &lt;a href="https://www.arcade.dev/tools" rel="noopener noreferrer"&gt;agent-optimized tools&lt;/a&gt;, centralized authentication and authorization, and enterprise governance into a single control plane.&lt;/p&gt;

&lt;p&gt;This approach targets production AI specifically. The runtime speaks MCP natively (JSON-RPC, Streamable HTTP) with no protocol translation and no context loss. It preserves native permissions through On-Behalf-Of token flows, isolates credentials from the language model, and provides instant, OpenTelemetry-compatible audit logs for every action.&lt;/p&gt;

&lt;p&gt;Teams ship faster because the runtime handles authorization, token lifecycle, retries, and governance. Engineers focus entirely on agent logic and business outcomes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.arcade.dev/blog/mcp-runtime-gateway" rel="noopener noreferrer"&gt;Arcade's MCP Gateway&lt;/a&gt; lets any MCP client access the full tool catalog through a single endpoint. Teams can also bring their own MCP servers into the runtime to get authorization, retries, and audit logs without rewriting what already works. The runtime extends your existing MCP investment rather than replacing it.&lt;/p&gt;

&lt;p&gt;For single-user hobbyist projects or local scripts, a full runtime adds unnecessary overhead. But for platform engineering teams deploying autonomous systems to thousands of corporate users, an MCP runtime is the only viable path to production.&lt;/p&gt;

&lt;h3&gt;
  
  
  What production demands: authorization, tooling, and governance
&lt;/h3&gt;

&lt;p&gt;The comparison above shows where each approach breaks. But understanding why the MCP runtime wins requires going deeper into the three capabilities that separate production deployments from demos: just-in-time authorization that enforces user-scoped access, agent-optimized tools that eliminate hallucination loops, and governance infrastructure that gives platform teams full visibility over every action.&lt;/p&gt;

&lt;h4&gt;
  
  
  How just-in-time authorization enforces user-scoped access
&lt;/h4&gt;

&lt;p&gt;Custom connectors fall back to static keys. Legacy iPaaS platforms rely on shared service accounts. Unmanaged MCP servers lack multi-user auth entirely. All three fail at the same point: they can't evaluate who is allowed to do what at the moment the tool is called.&lt;/p&gt;

&lt;p&gt;That’s the problem &lt;a href="https://www.arcade.dev/blog/sso-for-ai-agents-authentication-and-authorization-guide/" rel="noopener noreferrer"&gt;just-in-time authorization&lt;/a&gt; solves.&lt;/p&gt;

&lt;p&gt;The agent requests and validates credentials only at the moment an action requires them, not upfront. If a user never invokes the Salesforce integration, no Salesforce tokens are ever obtained or stored.&lt;/p&gt;

&lt;p&gt;The entire authentication flow (OAuth exchanges, token refresh, credential storage) executes in deterministic backend logic that the LLM can never alter, observe, or leak. For additional governance, teams can attach pre-tool-call and post-tool-call hooks to enforce custom policies like human-in-the-loop approvals for certain actions, usage limits or contextual access rules.&lt;/p&gt;

&lt;p&gt;This works because the runtime is stateful. It maintains per-session, per-user context across an agent's entire reasoning loop. A stateless proxy evaluates each request in isolation and can't know that a request is step 3 of a 6-step workflow, acting on behalf of Alice, who authorized this specific scope 4 minutes ago. The runtime can, and that session context is what makes per-user, per-tool authorization enforceable.&lt;/p&gt;

&lt;p&gt;This is where the permission intersection model described earlier becomes operational. The architecture enforces: Agent Permissions ∩ User Permissions = Effective Action Scope. The agent can only execute an action if both the agent's role policy and the human user's native SaaS permissions explicitly allow it. Every other combination is denied.&lt;/p&gt;

&lt;p&gt;A concrete example: an enterprise AI agent is built to assist the Human Resources department. An employee using this agent has high-level administrative privileges in Workday, including access to global payroll data. But the HR agent itself is scoped strictly to recruiting tasks.&lt;/p&gt;

&lt;p&gt;Because the runtime evaluates the intersection of these permissions at call time, the agent is denied when prompted to access payroll data. The user has the authority, but the agent's restricted scope prevents the action. This stops data exfiltration and &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html" rel="noopener noreferrer"&gt;confused deputy&lt;/a&gt; attacks cold.&lt;/p&gt;

&lt;h4&gt;
  
  
  Agent-optimized tools vs API wrappers: what to use and why
&lt;/h4&gt;

&lt;p&gt;The comparison table flags a specific failure mode for custom connectors: parameter hallucination loops. This happens because raw REST endpoints require precise, deterministic parameters, and language models produce probabilistic natural language. Wiring one directly to the other without an intermediary is where agents break.&lt;/p&gt;

&lt;p&gt;Agents need intent-level tools rather than raw API wrappers. An intent-level tool absorbs the ambiguity of an agent's request and translates it into a safe, predictable transaction. The result is faster execution, fewer failed actions, and lower inference costs because the agent doesn't burn tokens on retry loops.&lt;/p&gt;

&lt;p&gt;Production execution also requires runtime reliability features that raw APIs don't provide. The runtime provides developer-defined context for intelligent retries, parallelized execution for multi-step tasks, and automatic failover to handle rate limits and transient network errors gracefully. Standardized schemas within these tools prevent parameter hallucination, the most common cause of agent failure when wiring models directly to APIs.&lt;/p&gt;

&lt;p&gt;Consider how this works in practice. Instead of an agent calling a raw Salesforce update endpoint and failing because it hallucinated a required stage ID string, the agent uses a high-level, agent-optimized progress tool.&lt;/p&gt;

&lt;p&gt;The tool natively understands the user's intent to move a deal to negotiation. Its internal logic securely looks up the correct, exact ID for that specific Salesforce instance, validates the state transition, and safely executes the update. The language model doesn't need to guess the exact database schema. The action succeeds on the first call, not the fifth.&lt;/p&gt;

&lt;h4&gt;
  
  
  Governance and observability for agent actions (audit logs, OTel, versioning)
&lt;/h4&gt;

&lt;p&gt;Unmanaged MCP servers scored "Low" on reliability and security in the comparison above because they lack centralized governance. Once agents execute real actions on behalf of users, platform teams need complete visibility and control over the integration ecosystem. The runtime delivers this through three mechanisms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Visibility filtering&lt;/strong&gt; ensures agents only see the specific tools the current user is permitted to invoke. If a user doesn't have permission to merge code in GitHub, the GitHub merge tool doesn't appear in the agent's context window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deep audit trails&lt;/strong&gt; log every action per user, per service, and per agent session. These logs are &lt;a href="https://opentelemetry.io/docs/specs/semconv/gen-ai/mcp/" rel="noopener noreferrer"&gt;exportable to standard SIEM tools via OpenTelemetry (OTel)&lt;/a&gt; to satisfy compliance audits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Version control&lt;/strong&gt; lets platform engineers safely upgrade tool schemas and rotate connection parameters without breaking production agents running mid-session on older versions.&lt;/p&gt;

&lt;p&gt;When an agent incorrectly closes several open opportunities in a CRM, the platform team can't spend days parsing raw application logs. With an OTel-compatible audit log generated by the action layer, the security team can instantly trace the destructive action back to the exact user prompt, the specific agent session, and the token used. This isolates the root cause in minutes, enabling teams to refine the agent's instructions or the tool's access policy immediately.&lt;/p&gt;

&lt;p&gt;Of the four approaches evaluated, only the MCP runtime delivers all three: user-scoped authorization at call time, intent-level tooling that prevents hallucination, and centralized governance with full audit trails. The remaining sections show how this architecture works in practice and how to evaluate it for your organization.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to choose an enterprise agent integration approach (security, OBO, and TCO)
&lt;/h2&gt;

&lt;p&gt;Choosing how to connect your AI agents to enterprise tools is a foundational architectural decision. It dictates the speed and security of your deployment. Platform engineers and technical leaders need to frame their buying and building criteria around security, scale, and where their engineering resources should focus.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security and compliance requirements (SOC 2, ISO 27001, auditability)
&lt;/h3&gt;

&lt;p&gt;Can the proposed solution natively map to SOC 2 and ISO 27001 requirements for strict user attribution? If an agent deletes a file in Google Workspace, the audit log must definitively prove which human authorized that action.&lt;/p&gt;

&lt;p&gt;The system must support pre-tool-call &lt;a href="https://hoop.dev/blog/how-to-keep-human-in-the-loop-ai-control-soc-2-for-ai-systems-secure-and-compliant-with-action-level-approvals" rel="noopener noreferrer"&gt;Human-in-the-Loop (HITL) approval hooks&lt;/a&gt;. Destructive actions like modifying production configurations or bulk-updating database records must pause execution and require cryptographic sign-off from a human administrator via Slack or email before proceeding.&lt;/p&gt;

&lt;h3&gt;
  
  
  Build vs buy economics (OAuth maintenance and total cost of ownership)
&lt;/h3&gt;

&lt;p&gt;Build versus buy demands a ruthless economic assessment.&lt;/p&gt;

&lt;p&gt;Calculate the actual engineering hours required to build, maintain, and securely upgrade OAuth flows for ten or more distinct enterprise APIs. Factor in the hidden costs: managing refresh token rotation, building webhook callback URLs for long-running async tasks, patching custom connectors when SaaS vendors inevitably deprecate their API versions.&lt;/p&gt;

&lt;p&gt;Then ask what those engineers could have shipped instead.&lt;/p&gt;

&lt;p&gt;Adopting an MCP runtime transforms a multi-month infrastructure project into a configuration exercise. The total cost of ownership drops dramatically, and your team reclaims months of engineering capacity to invest in the agent capabilities that differentiate your product.&lt;/p&gt;

&lt;h3&gt;
  
  
  Time-to-value and engineering focus
&lt;/h3&gt;

&lt;p&gt;Time-to-value is where most teams underestimate the cost of building in-house.&lt;/p&gt;

&lt;p&gt;Will your highly paid AI engineers spend the next three months building reliable Slack and Workspace connectors, or will they spend that time optimizing agent prompts, evaluating reasoning logic, and shipping the agent capabilities that drive revenue? Every week spent on integration plumbing is a week your competitors use to get their agents into production.&lt;/p&gt;

&lt;p&gt;When evaluating external vendors or internal architecture plans, force the issue with hard technical questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Are API keys or OAuth tokens ever visible in the language model's prompt context window?&lt;/li&gt;
&lt;li&gt;How does the system resolve conflicting permissions between a highly privileged user and a narrowly scoped agent?&lt;/li&gt;
&lt;li&gt;Can the system emit W3C-standard trace context to our existing OpenTelemetry collectors?&lt;/li&gt;
&lt;li&gt;How does the tool handle rate limiting when an agent enters an unexpected retry loop?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the answer to credential visibility is anything other than absolute isolation, the architecture is unfit for enterprise production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reference architecture for an MCP runtime (step-by-step flow)
&lt;/h2&gt;

&lt;p&gt;With the architectural decision framed, here's how a request actually flows through the runtime end to end. The MCP runtime acts as the intermediary that brokers trust and execution between the non-deterministic reasoning engine and the deterministic enterprise environment.&lt;/p&gt;

&lt;p&gt;The flow of a secure request follows a strict sequence:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7pa5dvzbt30a978qwvfb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7pa5dvzbt30a978qwvfb.png" alt="Secure AI agent enterprise integration architecture diagram showing MCP runtime flow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;User prompt&lt;/strong&gt;: The user submits a request, e.g., "close this support ticket."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM plan&lt;/strong&gt;: The agent's language model determines the sequence of tool calls needed to fulfill the request.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP runtime&lt;/strong&gt;: The runtime receives the tool call request. It evaluates user and agent permissions and retrieves the necessary On-Behalf-Of credential.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool execution&lt;/strong&gt;: The runtime, not the agent, executes the precise API call against the target system (e.g., Zendesk).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result &amp;amp; next action:&lt;/strong&gt; The runtime receives the API result, filters it, and passes it back to the agent. The LLM then either plans the next action in the sequence or determines the task is complete.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Confirmation &amp;amp; audit&lt;/strong&gt;: The agent confirms the action's completion to the user, and the runtime logs the entire transaction via OpenTelemetry for audit purposes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This architecture enforces a hard separation of concerns. The language model handles reasoning, planning, action selection, and generation. The runtime layer handles credentials, policy enforcement, rate limiting, action execution, and logging.&lt;/p&gt;

&lt;p&gt;By vaulting tokens at the runtime layer, this architecture prevents prompt-injection-driven data exfiltration. The language model never possesses the keys required to export data.&lt;/p&gt;

&lt;h3&gt;
  
  
  How an MCP runtime works with any LLM
&lt;/h3&gt;

&lt;p&gt;The MCP runtime works with any LLM through any orchestration framework, or none at all. No framework dependency is required. Arcade serves as the secure execution backend: your code handles reasoning, Arcade handles credentials, authorization, and tool execution.&lt;/p&gt;

&lt;p&gt;This clean separation is what accelerates time-to-production. AI engineers focus entirely on agent logic while offloading the high-risk plumbing of enterprise integrations to the runtime.&lt;/p&gt;

&lt;p&gt;A working example: an agent that reads Gmail and sends Slack messages through Arcade's runtime. Setup requires three dependencies and three environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;arcadepy openai python-dotenv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="c"&gt;# .env
&lt;/span&gt;&lt;span class="n"&gt;ARCADE_API_KEY&lt;/span&gt;=&lt;span class="n"&gt;your_arcade_api_key&lt;/span&gt;        &lt;span class="c"&gt;# Free at arcade.dev
&lt;/span&gt;&lt;span class="n"&gt;ARCADE_USER_ID&lt;/span&gt;=&lt;span class="n"&gt;your_email&lt;/span&gt;@&lt;span class="n"&gt;company&lt;/span&gt;.&lt;span class="n"&gt;com&lt;/span&gt;     &lt;span class="c"&gt;# The user the agent acts on behalf of
&lt;/span&gt;&lt;span class="n"&gt;OPENAI_KEY&lt;/span&gt;=&lt;span class="n"&gt;your_openai_key&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;arcadepy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Arcade&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;arcade_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Arcade&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;arcade_user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ARCADE_USER_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;llm_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Define enterprise productivity tools — Arcade handles auth for each
&lt;/span&gt;&lt;span class="n"&gt;tool_catalog&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
   &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gmail.ListEmails&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gmail.SendEmail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Slack.SendMessage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Get tool definitions formatted for the LLM
&lt;/span&gt;&lt;span class="n"&gt;tool_definitions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
   &lt;span class="n"&gt;arcade_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;formatted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tool_catalog&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# JIT authorization + execution — credentials never touch the LLM
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;authorize_and_run_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
   &lt;span class="n"&gt;auth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arcade_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;authorize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;arcade_user_id&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
       &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorize &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="n"&gt;arcade_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for_completion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

   &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arcade_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
       &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
       &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;arcade_user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Agentic loop — LLM reasons and selects tools, Arcade executes them
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoke_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
   &lt;span class="n"&gt;turns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
   &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;turns&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
       &lt;span class="n"&gt;turns&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
       &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
           &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_definitions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;tool_choice&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="p"&gt;)&lt;/span&gt;
       &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;
       &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
           &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exclude_none&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
           &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
               &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;authorize_and_run_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
               &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
           &lt;span class="k"&gt;continue&lt;/span&gt;
       &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
           &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
       &lt;span class="k"&gt;break&lt;/span&gt;
   &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;

&lt;span class="c1"&gt;# Run the agent
&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize my latest 5 emails, then send me a DM on Slack with the summary.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;invoke_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The LLM reasons through the task, selects &lt;code&gt;Gmail.ListEmails&lt;/code&gt; to fetch emails, summarizes them, then selects &lt;code&gt;Slack.SendMessage&lt;/code&gt; to deliver the summary. The runtime handles JIT authorization for each tool on behalf of that specific user. The agent never sees OAuth tokens, never manages refresh flows, and never touches credentials. &lt;a href="https://docs.arcade.dev/en/get-started/agent-frameworks/setup-arcade-with-your-llm-python" rel="noopener noreferrer"&gt;Full walkthrough in the Arcade docs.&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Next steps to productionize agent integrations (checklist)
&lt;/h2&gt;

&lt;p&gt;To transition from sandbox prototypes to production-grade deployments, platform engineering teams follow a structured, iterative implementation plan.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Inventory required tools and least-privilege scopes
&lt;/h3&gt;

&lt;p&gt;Start by conducting a rigorous audit of your necessary tools. List the specific APIs your agents need, and document the exact user-scopes and OAuth granularities required for each. Don't request global access. Map out the principle of least privilege for every single workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Define autonomous vs human-approved actions (HITL)
&lt;/h3&gt;

&lt;p&gt;Next, define your operational boundaries. Build a matrix deciding which actions are safe for autonomous execution (like reading calendar events) and which high-risk actions require explicit user delegation or human-in-the-loop approval hooks (like deleting files or sending external emails).&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Standardize on a single control plane
&lt;/h3&gt;

&lt;p&gt;Centralize your integration strategy immediately. Prevent the creation of "shadow registries."&lt;/p&gt;

&lt;p&gt;When disparate engineering teams build redundant, unmanaged integrations using hardcoded tokens, they create severe security vulnerabilities and integration sprawl. Standardize on a single control plane for all agent tool use.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Pilot one workflow and validate token isolation and telemetry
&lt;/h3&gt;

&lt;p&gt;Before rolling out broadly, test the architecture with a narrow, controlled use case. Pilot a single workflow, like developer issue automation linking GitHub and Jira, to validate token isolation and telemetry.&lt;/p&gt;

&lt;p&gt;Invest in infrastructure, not just isolated connectors. Evaluate platforms that treat authorization, agent-optimized tools, and lifecycle governance as a unified secure runtime, not separate problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Use an MCP runtime to connect AI agents to enterprise tools
&lt;/h2&gt;

&lt;p&gt;The true challenge of connecting AI to enterprise productivity tools has little to do with formatting JSON payloads or making API calls. The bottleneck is securing user-scoped access, enforcing least-privilege permissions at runtime, and maintaining rigorous operational governance over non-deterministic systems.&lt;/p&gt;

&lt;p&gt;The most successful platform engineering teams recognize that rebuilding identity propagation, token lifecycles, and reliable integration mechanics from scratch is an expensive distraction from their core business objectives. They need an MCP runtime, not more custom connectors.&lt;/p&gt;

&lt;p&gt;Arcade is the industry's first MCP runtime. It delivers secure agent authorization, the largest catalog of agent-optimized tools, and centralized lifecycle governance in a single control plane. Arcade eliminates the undifferentiated heavy lifting of enterprise integration so your team ships faster and scales with control.&lt;/p&gt;

&lt;p&gt;If you're building agents that need to execute across enterprise tools, start with the &lt;a href="https://docs.arcade.dev/en/get-started/about-arcade" rel="noopener noreferrer"&gt;getting started guide&lt;/a&gt; or explore the &lt;a href="https://www.arcade.dev/tools" rel="noopener noreferrer"&gt;full tool catalog&lt;/a&gt; to see what's available out of the box.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ: Enterprise AI agent integrations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is the best way to connect AI agents to enterprise productivity tools?
&lt;/h3&gt;

&lt;p&gt;Use an MCP runtime, a secure action layer that performs user-scoped (OBO) execution, keeps tokens out of the LLM, and enforces runtime authorization per tool call.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should AI agents use service accounts to access Slack, Google Workspace, or Microsoft 365?
&lt;/h3&gt;

&lt;p&gt;No. Service accounts bypass user permissions and expand the blast radius of prompt injection. Use on-behalf-of user execution with least-privilege scopes.&lt;/p&gt;

&lt;h3&gt;
  
  
  What does "On-Behalf-Of (OBO)" mean for agent integrations?
&lt;/h3&gt;

&lt;p&gt;OBO means the agent executes each action using credentials tied to the requesting user, so the action is limited to that user's native permissions and is attributable in audit logs.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is just-in-time authorization for AI agents?
&lt;/h3&gt;

&lt;p&gt;Just-in-time authorization is a runtime policy check that executes at the moment of each tool call, evaluating the user's identity, the agent's allowed scope, and the requested action. Credentials are requested and validated only when needed, not pre-authorized during setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is an MCP runtime, and how is it different from an MCP server?
&lt;/h3&gt;

&lt;p&gt;An MCP server exposes tools to an agent using the MCP, but it's typically single-user, stateless, and ships without built-in auth, token management, or observability. An MCP runtime is the enterprise infrastructure layer that complements MCP servers to add what they lack: multi-user OBO authentication, per-call policy enforcement, token vaulting, automatic retries, and audit/telemetry. The server defines what the agent can call; the runtime makes it safe to call at scale. Arcade is the industry's first MCP runtime, purpose-built for production agent deployments.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are the minimum security requirements for production agent tool access?
&lt;/h3&gt;

&lt;p&gt;Token isolation from the LLM, user-scoped/OBO execution, least-privilege scopes, per-action authorization, audit logs with user attribution, and HITL approvals for high-risk actions.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do you audit and attribute agent actions for compliance (SOC 2 / ISO 27001)?
&lt;/h3&gt;

&lt;p&gt;Log every tool call with user identity, tool, parameters/intent, outcome, and trace context, and export via OpenTelemetry to your SIEM for investigation and reporting.&lt;/p&gt;

&lt;h3&gt;
  
  
  When do legacy iPaaS tools (Zapier/Workato/MuleSoft) break down for agents?
&lt;/h3&gt;

&lt;p&gt;They struggle with non-deterministic agent loops and true user-scoped OBO execution, forcing teams to rely on shared credentials or brittle workarounds.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do agent-optimized tools reduce hallucinations compared to raw API wrappers?
&lt;/h3&gt;

&lt;p&gt;They use intent-level operations with validated schemas and internal lookups, so the model doesn't have to guess required IDs/parameters and can fail safely.&lt;/p&gt;

&lt;h3&gt;
  
  
  When should we add human-in-the-loop (HITL) approvals?
&lt;/h3&gt;

&lt;p&gt;For destructive or irreversible actions (deletes, external emails, bulk updates, permission changes) or any action that materially impacts security, finance, or customer data.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>mcp</category>
      <category>automation</category>
    </item>
    <item>
      <title>How to build a secure WhatsApp AI assistant with Arcade and Claude Code (OpenClaw alternative)</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Thu, 02 Apr 2026 21:43:19 +0000</pubDate>
      <link>https://forem.com/arcade/how-to-build-a-secure-whatsapp-ai-assistant-with-arcade-and-claude-code-openclaw-alternative-3f4f</link>
      <guid>https://forem.com/arcade/how-to-build-a-secure-whatsapp-ai-assistant-with-arcade-and-claude-code-openclaw-alternative-3f4f</guid>
      <description>&lt;p&gt;I texted "prep me for my 2pm" on WhatsApp. Thirty seconds later, my phone buzzed back with a structured briefing: who I was meeting, what we last discussed over email, what my team said about them in Slack, and three talking points. No browser tab. No laptop. Just a message on my commute.&lt;/p&gt;

&lt;p&gt;That's the promise of an always-on AI assistant. And until recently, it was almost impossible to build one that actually worked.&lt;/p&gt;

&lt;p&gt;Open-source frameworks like OpenClaw made headless, two-way messaging agents popular. Anthropic's &lt;a href="https://code.claude.com/docs/en/channels" rel="noopener noreferrer"&gt;Claude Code Channels&lt;/a&gt; confirmed the approach had legs. Channels is currently in research preview, but the direction is clear. Anthropic already uses this pattern for hand-offs between their desktop app, mobile app, and Claude Code. Expect this to GA in some form.&lt;/p&gt;

&lt;p&gt;But getting from a weekend demo to a reliable assistant exposes gaps that no amount of prompt engineering fixes. Authorization. Tool reliability. Session management. The agent needs access to your calendar, email, and Slack, and you need to be sure it's not a security liability.&lt;/p&gt;

&lt;p&gt;I built a working version. This guide walks through the entire thing: a WhatsApp relay server, an MCP server, Claude Code as the brain, and Arcade.dev for secure tool access. Working code at every step.&lt;/p&gt;

&lt;p&gt;We'll start with the pitfalls you need to understand, then build it.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;OpenClaw-style headless frameworks give your agent god-mode access to every connected service, rely on brittle tool wrappers, bloat the context window with raw API responses, and produce zero audit trail. Buying a dedicated Mac Mini to run them doesn't help. The machine isn't the threat model, the credentials are.&lt;/li&gt;
&lt;li&gt;This guide builds a WhatsApp AI assistant using a relay server that handles Meta's webhooks, an MCP server that bridges to Claude Code, Arcade for secure tool access and audit logging, and a meeting-prep skill that pulls from Google Calendar, Gmail, and Slack to deliver structured briefings directly in WhatsApp.&lt;/li&gt;
&lt;li&gt;Every layer includes working code you can run locally: webhook ingestion with HMAC signature validation, a cursor-based message queue, MCP tool definitions, Claude Code configuration, and a complete skill file that encodes a three-phase meeting-prep workflow.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  From demo to production: The four pitfalls of always-on AI agents
&lt;/h2&gt;

&lt;p&gt;The headless setup that OpenClaw popularized is the starting line. The moment you try to move from a weekend proof of concept to something you'd actually trust with your calendar and email, four architectural problems surface.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 1: God-mode credentials and the agent security risk
&lt;/h3&gt;

&lt;p&gt;Headless agent frameworks inherit the host machine's full access profile. The agent gets the same permissions as the developer who launched it. Every OAuth token, every API key, every connected service, wide open.&lt;/p&gt;

&lt;p&gt;A single prompt injection or compromised dependency cascades through everything. Your Google Drive, your CRM, your source code repos. One bad input and the agent becomes an insider threat.&lt;/p&gt;

&lt;p&gt;This isn't theoretical. &lt;a href="https://nvd.nist.gov/vuln/detail/CVE-2026-25253" rel="noopener noreferrer"&gt;CVE-2026-25253&lt;/a&gt; exposed a one-click RCE in OpenClaw. The gateway lacked origin validation. An attacker could exfiltrate the auth token via a malicious link and achieve total system compromise.&lt;/p&gt;

&lt;p&gt;We wrote about this pattern in detail in &lt;a href="https://blog.arcade.dev/openclaw-can-do-a-lot-but-it-shouldnt-have-access-to-your-tokens" rel="noopener noreferrer"&gt;OpenClaw doesn't need your tokens&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 2: Fragile API wrappers and the tool reliability problem
&lt;/h3&gt;

&lt;p&gt;Most agent tools are thin wrappers around REST APIs. They force the model to guess complex payload parameters and retry when natural language doesn't map to rigid schemas.&lt;/p&gt;

&lt;p&gt;Then shadow registries appear. Different teams build duplicate, unversioned wrappers for the same APIs. One unannounced API change breaks multiple agents in ways nobody predicted. Public tool registries have already become a supply-chain attack vector, with malicious tools that exfiltrate local state or establish backdoors.&lt;/p&gt;

&lt;p&gt;For patterns that make MCP tools more resilient, see &lt;a href="https://blog.arcade.dev/mcp-tool-patterns" rel="noopener noreferrer"&gt;54 Patterns for Building Better MCP Tools&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 3: Context window bloat from raw API responses
&lt;/h3&gt;

&lt;p&gt;Unoptimized tools dump the full API response into the context window. A Jira ticket history? Tens of thousands of tokens of irrelevant metadata. The agent's reasoning goes erratic. Costs spike with every conversation turn.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 4: No audit trail, no reliability, no compliance
&lt;/h3&gt;

&lt;p&gt;Keeping a self-hosted agent alive with &lt;code&gt;tmux&lt;/code&gt; or &lt;code&gt;systemd&lt;/code&gt; creates an audit black hole. When the process crashes or misbehaves, there's no structured log to trace what happened. Which action was taken? What parameters? Which user started the request?&lt;/p&gt;

&lt;p&gt;You can't answer "what did the agent do?" if you never logged it.&lt;/p&gt;

&lt;p&gt;That's an immediate fail for SOC2, ISO27001, and any serious compliance review.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why buying a Mac Mini doesn't fix any of this
&lt;/h2&gt;

&lt;p&gt;There's a growing trend: developers buying dedicated Mac Minis or spinning up VMs to run OpenClaw-style agents 24/7. The reasoning is, if the agent has its own machine, you've isolated it.&lt;/p&gt;

&lt;p&gt;You haven't. The machine isn't the threat model. The credentials are.&lt;/p&gt;

&lt;p&gt;That Mac Mini still needs OAuth tokens for Google Calendar, API keys for your CRM, access to your Slack workspace. A compromised dependency doesn't care whether it's running on your laptop or a dedicated server in a closet. The blast radius is identical. For a deeper comparison of isolation strategies that actually reduce blast radius, see &lt;a href="https://manveerc.substack.com/p/ai-agent-sandboxing-guide" rel="noopener noreferrer"&gt;AI Agent Sandboxing Guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Hardware isolation solves availability. It doesn't touch authorization, tool reliability, context management, or audit logging.&lt;/p&gt;

&lt;p&gt;You've built an expensive, always-on machine with unfettered access to your business systems. Every pitfall above still applies.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Arcade, Claude Code, and Skills solve these problems
&lt;/h3&gt;

&lt;p&gt;I needed three things: a secure way to connect to business tools, a battle-tested agent runtime, and a way to encode workflows without writing integration code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;Arcade&lt;/a&gt; solves the tool and auth layer. It sits between the agent and your business tools. When the agent wants to read your calendar, Arcade evaluates permissions, mints a just-in-time token scoped to that specific action, and executes the call. The LLM never sees long-lived credentials. Your Google Calendar token isn't sitting in an &lt;code&gt;.env&lt;/code&gt; file on a Mac Mini. It's managed by Arcade's runtime with per-action authorization.&lt;/p&gt;

&lt;p&gt;Arcade also solves the brittle tools problem. Instead of writing fragile REST wrappers, you use &lt;a href="https://www.arcade.dev/tools" rel="noopener noreferrer"&gt;pre-built, agent-optimized integrations&lt;/a&gt; that return summarized data, not raw JSON dumps. When Google changes their Calendar API, Arcade handles it. Your agent code stays untouched. And every tool call generates structured audit logs tied to the specific user and action.&lt;/p&gt;

&lt;p&gt;Claude Code is the agent runtime. It's more battle-tested than OpenClaw, has native MCP support, and handles tool orchestration without the brittle process management of &lt;code&gt;tmux&lt;/code&gt; and &lt;code&gt;systemd&lt;/code&gt; scripts.&lt;/p&gt;

&lt;p&gt;Skills encode the actual workflows. This is the piece most people miss. Arcade gives the agent &lt;em&gt;access&lt;/em&gt; to your tools with proper auth. Skills tell the agent &lt;em&gt;how to use them well&lt;/em&gt;. For a deeper look at the distinction, see &lt;a href="https://blog.arcade.dev/what-are-agent-skills-and-tools" rel="noopener noreferrer"&gt;Skills vs Tools for AI Agents&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A skill is a markdown file that encodes domain expertise: which tools to call, in what order, what to look for in the results, how to format the output. Without a skill, you have an agent with calendar access but no idea how to prepare a meeting brief. With a skill, you have an assistant that pulls calendar events, cross-references email threads, checks Slack for internal context, and delivers a structured briefing, all from a single WhatsApp message.&lt;/p&gt;

&lt;p&gt;Arcade gives access. Skills give expertise. Together, they turn an LLM into a useful assistant.&lt;/p&gt;

&lt;p&gt;And because skills are just markdown files, anyone on the team can write and iterate on them. No code deployment. No engineering tickets.&lt;/p&gt;

&lt;p&gt;Here's what we're building: a WhatsApp relay for messaging, Claude Code as the brain, Arcade for auth-managed tool access, and skills that encode your team's workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step-by-step: building the WhatsApp AI assistant with MCP and Arcade
&lt;/h2&gt;

&lt;p&gt;Enough architecture. Here's what we're making: WhatsApp messages flow through a relay server into an MCP server, which feeds them to Claude Code. Claude Code processes messages using skills, calls business tools through Arcade, and replies back through the same chain.&lt;/p&gt;

&lt;p&gt;One wrinkle: WhatsApp's Cloud API only supports webhooks. There's no WebSocket or long-polling option. That means something has to sit on a public URL to receive Meta's callbacks. Since we're running everything locally, the relay server handles that role, and ngrok tunnels traffic from Meta's servers to it on your machine.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.arcade.dev%2F_astro%2Fwhatsapp-to-claude-code-technical-architecture-diagram.n4Enlg4V_Z1ve4Qd.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.arcade.dev%2F_astro%2Fwhatsapp-to-claude-code-technical-architecture-diagram.n4Enlg4V_Z1ve4Qd.webp" alt="A detailed technical architecture diagram illustrating the integration flow from a WhatsApp user on a smartphone to Claude Code. The horizontal sequential flow proceeds through Meta Cloud API, ngrok, Relay Server, and MCP Server before reaching Claude Code. An auxiliary 'Arcade' service box (with integrated services like Calendar, Email, Slack, and CRM) is connected to Claude Code. A dashed return line labeled 'replies' indicates a feedback path from Claude Code back to the Relay Server." width="800" height="198"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Prerequisites: WhatsApp Business API, Claude Code, and Arcade&lt;/p&gt;

&lt;p&gt;Before starting, make sure you have a Meta developer account with a WhatsApp Business App configured (&lt;a href="https://developers.facebook.com/docs/whatsapp/cloud-api/get-started" rel="noopener noreferrer"&gt;Meta's getting started guide&lt;/a&gt;), Node.js 20+ and npm, ngrok for tunneling webhooks to your local machine, Claude Code installed and configured, an &lt;a href="https://app.arcade.dev/register" rel="noopener noreferrer"&gt;Arcade account&lt;/a&gt; with API access, and a phone number registered with WhatsApp Business API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Project structure and environment setup
&lt;/h3&gt;

&lt;p&gt;Here's the folder layout:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;whatsapp-assistant/
├── whatsapp.ts          # MCP server (bridge between relay and Claude Code)
├── package.json         # MCP server dependencies
├── .mcp.json            # Claude Code MCP server registration
├── whatsapp-relay/
│   ├── relay.ts         # Relay server (faces the internet via ngrok)
│   ├── package.json     # Relay server dependencies
│   └── .env             # WhatsApp API credentials (from .env.example)
└── skills/
    └── meeting-prep/
        └── SKILL.md     # Meeting preparation skill for Claude Code
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start by setting up both projects:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create the project&lt;/span&gt;
&lt;span class="nb"&gt;mkdir &lt;/span&gt;whatsapp-assistant &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;whatsapp-assistant

&lt;span class="c"&gt;# Initialize the MCP server&lt;/span&gt;
npm init &lt;span class="nt"&gt;-y&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; @modelcontextprotocol/sdk
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-D&lt;/span&gt; typescript @types/node tsx

&lt;span class="c"&gt;# Initialize the relay server&lt;/span&gt;
&lt;span class="nb"&gt;mkdir &lt;/span&gt;whatsapp-relay &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;whatsapp-relay
npm init &lt;span class="nt"&gt;-y&lt;/span&gt;
npm &lt;span class="nb"&gt;install &lt;/span&gt;hono @hono/node-server
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-D&lt;/span&gt; typescript @types/node tsx
&lt;span class="nb"&gt;cd&lt;/span&gt; ..
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create your &lt;code&gt;.env&lt;/code&gt; file inside &lt;code&gt;whatsapp-relay/&lt;/code&gt; with the following variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# Meta WhatsApp Cloud API
&lt;/span&gt;&lt;span class="py"&gt;WHATSAPP_ACCESS_TOKEN&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;        &lt;span class="c"&gt;# Bearer token from Meta App Dashboard
&lt;/span&gt;&lt;span class="s"&gt;WHATSAPP_PHONE_NUMBER_ID=     # Bot's phone number ID&lt;/span&gt;
&lt;span class="py"&gt;WHATSAPP_VERIFY_TOKEN&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;        &lt;span class="c"&gt;# Any string, used for webhook verification handshake
&lt;/span&gt;&lt;span class="s"&gt;WHATSAPP_APP_SECRET=          # App secret for validating webhook signatures&lt;/span&gt;

&lt;span class="c"&gt;# Relay auth
&lt;/span&gt;&lt;span class="py"&gt;RELAY_SECRET&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;                 &lt;span class="c"&gt;# Shared secret, local MCP server sends this in X-Relay-Secret header
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;RELAY_SECRET&lt;/code&gt; is a shared key between the relay and MCP server. Generate something random (&lt;code&gt;openssl rand -hex 32&lt;/code&gt;). It prevents anything on your network from impersonating the MCP server.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Build the WhatsApp webhook relay server
&lt;/h3&gt;

&lt;p&gt;The relay is the only component that faces the internet. It has three jobs: validate incoming WhatsApp webhooks, queue messages for the MCP server, and proxy outbound messages to Meta's API.&lt;/p&gt;

&lt;h4&gt;
  
  
  Webhook signature validation
&lt;/h4&gt;

&lt;p&gt;Every webhook payload from Meta includes an HMAC-SHA256 signature. The relay verifies this before processing anything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createHmac&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;timingSafeEqual&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;node:crypto&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;APP_SECRET&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;WHATSAPP_APP_SECRET&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;verifySignature&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rawBody&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;header&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;header&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;header&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sha256=&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;sig&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;expected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createHmac&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sha256&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;APP_SECRET&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rawBody&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hex&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nx"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;timingSafeEqual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sig&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This uses &lt;code&gt;timingSafeEqual&lt;/code&gt; to prevent timing attacks, a detail that matters when you're validating signatures from a third party.&lt;/p&gt;

&lt;h4&gt;
  
  
  Webhook handler: always return 200
&lt;/h4&gt;

&lt;p&gt;Meta uses at-least-once delivery. If your endpoint returns anything other than &lt;code&gt;200&lt;/code&gt;, Meta retries, potentially creating a storm of duplicate events. The relay acknowledges immediately and processes asynchronously:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/webhook&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rawBody&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nf"&gt;verifySignature&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rawBody&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;header&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;x-hub-signature-256&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Still return 200. Returning 4xx causes Meta to retry with the same bad signature.&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;webhook: invalid signature&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ok&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rawBody&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;WaWebhookPayload&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nf"&gt;parseMessages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;webhook: parse error:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ok&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note the pattern: even on a bad signature, we return &lt;code&gt;200&lt;/code&gt;. Logging the rejection is enough. Returning &lt;code&gt;4xx&lt;/code&gt; just makes Meta retry with the same bad payload.&lt;/p&gt;

&lt;h4&gt;
  
  
  In-memory message queue with polling
&lt;/h4&gt;

&lt;p&gt;The relay queues validated messages and exposes a polling endpoint for the MCP server. The MCP server passes a cursor (the last message ID it saw) to get only new messages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;InboundMessage&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;nextId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;MAX_QUEUE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Omit&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;InboundMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;timestamp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;nextId&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;MAX_QUEUE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;MAX_QUEUE&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Polling endpoint, protected by relay secret&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/poll&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;since&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Number&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;since&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;since&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;since&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;cursor&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The relay authenticates all local-facing routes with the shared secret via &lt;code&gt;x-relay-secret&lt;/code&gt; header. The WhatsApp-facing webhook routes don't use this. They're validated by Meta's HMAC signature instead.&lt;/p&gt;

&lt;h4&gt;
  
  
  Outbound message proxy
&lt;/h4&gt;

&lt;p&gt;When Claude Code wants to reply, it goes through the MCP server, which calls the relay, which calls Meta's API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;WA_API&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`https://graph.facebook.com/v21.0/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;PHONE_NUMBER_ID&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;waApi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;WA_API&lt;/span&gt;&lt;span class="p"&gt;}${&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;Authorization&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;ACCESS_TOKEN&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The relay is built with &lt;a href="https://hono.dev/" rel="noopener noreferrer"&gt;Hono&lt;/a&gt;, a lightweight framework that keeps the code minimal. The full relay is roughly 200 lines and handles text messages, images, documents, audio, video, stickers, reactions, and location shares.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Build the MCP server for Claude Code
&lt;/h3&gt;

&lt;p&gt;The MCP server is the bridge between the relay and Claude Code. It polls the relay for incoming WhatsApp messages and exposes tools that Claude Code can call to respond.&lt;/p&gt;

&lt;h4&gt;
  
  
  Tool definitions
&lt;/h4&gt;

&lt;p&gt;The server registers four tools with Claude Code via the Model Context Protocol:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;whatsapp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;1.0.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="na"&gt;experimental&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude/channel&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;The sender reads WhatsApp, not this session.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Anything you want them to see must go through the reply tool.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Messages arrive as &amp;lt;channel source="whatsapp" chat_id="..." wamid="..." user="..." ts="..."&amp;gt;.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Reply with the reply tool. Pass chat_id (phone number) back.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;WhatsApp has a 24-hour session window: you can only send free-form messages&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;within 24 hours of the user's last message.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;instructions&lt;/code&gt; field tells Claude Code how to interpret incoming messages and that it must use the &lt;code&gt;reply&lt;/code&gt; tool to send anything back. Without this, the model might try to respond in its own transcript, which the WhatsApp user would never see.&lt;/p&gt;

&lt;p&gt;The four tools are &lt;code&gt;reply&lt;/code&gt; (send text), &lt;code&gt;react&lt;/code&gt; (emoji reactions), &lt;code&gt;mark_read&lt;/code&gt; (read receipts), and &lt;code&gt;send_media&lt;/code&gt; (images, documents, audio, video). Here's the reply tool definition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;reply&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Reply on WhatsApp. Pass chat_id (phone number) from the inbound message.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nl"&gt;chat_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Phone number to send to&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Message text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="nx"&gt;reply_to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;wamid to quote-reply to (optional)&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="nx"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;chat_id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Polling loop with cursor persistence
&lt;/h4&gt;

&lt;p&gt;The MCP server polls the relay every 2 seconds and forwards new messages to Claude Code as channel notifications:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;CURSOR_FILE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;HOME&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/tmp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;.whatsapp-relay-cursor&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;loadCursor&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;poll&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;relay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`/poll?since=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;newCursor&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;InboundMessage&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
      &lt;span class="nl"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;meta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;chat_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;wamid&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;wamid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pushName&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;};&lt;/span&gt;

      &lt;span class="nx"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;notification&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;notifications/claude/channel&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;params&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="s2"&gt;`(&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;)`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="nx"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;newCursor&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;newCursor&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="nf"&gt;saveCursor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`whatsapp channel: poll error: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;\n`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cursor persists to disk (&lt;code&gt;~/.whatsapp-relay-cursor&lt;/code&gt;), so restarting the MCP server doesn't re-process old messages. Each message becomes a channel notification that Claude Code sees as a new input, including the sender's phone number, display name, timestamp, and message type as metadata.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Register the MCP server with Claude Code
&lt;/h3&gt;

&lt;p&gt;Create a &lt;code&gt;.mcp.json&lt;/code&gt; file in your project root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"whatsapp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"node"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"--import"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tsx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"whatsapp.ts"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. When Claude Code starts in this directory, it discovers the MCP server, launches it as a child process via stdio, and the WhatsApp channel becomes available. Claude Code now receives WhatsApp messages as channel notifications and can call the reply, react, mark_read, and send_media tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Configure the Arcade gateway and connect it to Claude Code
&lt;/h3&gt;

&lt;p&gt;Before the assistant can access business tools, you need to create an Arcade gateway that defines which tools the agent can use and with what permissions.&lt;/p&gt;

&lt;p&gt;Log into the &lt;a href="https://app.arcade.dev/" rel="noopener noreferrer"&gt;Arcade dashboard&lt;/a&gt;, create a new gateway, and add the MCP servers for the services your assistant needs: Google Calendar, Gmail, Slack, and any others relevant to your workflows. For each server, select only the specific tools you want the agent to access. This is where you scope permissions. If the meeting-prep skill only needs to list calendar events and search email, there's no reason to expose tools that delete events or send email on your behalf.&lt;/p&gt;

&lt;p&gt;Once the gateway is created, register it with Claude Code from the command line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude mcp add &lt;span class="s1"&gt;'arcade-gateway'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--transport&lt;/span&gt; http &lt;span class="s1"&gt;'https://api.arcade.dev/mcp/&amp;lt;your-gateway-slug&amp;gt;'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &amp;lt;your-arcade-api-key&amp;gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s1"&gt;'Arcade-User-ID: &amp;lt;your-email&amp;gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This writes the gateway configuration to &lt;code&gt;~/.claude.json&lt;/code&gt;. Claude Code now has two MCP servers: the local WhatsApp channel server (from &lt;code&gt;.mcp.json&lt;/code&gt; in the project) and the remote Arcade gateway (from &lt;code&gt;~/.claude.json&lt;/code&gt;). The WhatsApp server handles messaging. The Arcade gateway handles business tool access with per-action authorization.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;Arcade-User-ID&lt;/code&gt; header tells Arcade which user's credentials to use when executing tool calls. In the single-user setup, this is your email. In the multi-user architecture described later, the orchestrator passes a different user ID per session.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Create a meeting-prep skill with Arcade tools
&lt;/h3&gt;

&lt;p&gt;With the channel wired up, the assistant needs capabilities. This is where tools and skills work together. Arcade provides secure access to business tools (Google Calendar, Gmail, Slack), and skills tell the agent how to use those tools to accomplish a specific workflow.&lt;/p&gt;

&lt;p&gt;Skills in Claude Code are markdown files. No code, no deployment, just a structured prompt that encodes domain expertise. Here's the structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;skills/
└── meeting-prep/
    └── SKILL.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The skill file has two parts: frontmatter that tells Claude Code when to activate it, and a body that defines the workflow. Here's the meeting-prep skill:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;meeting-prep&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;Prepare&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;briefings&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;upcoming&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;customer&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;meetings&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;by&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;reading"&lt;/span&gt;
  &lt;span class="s"&gt;your Google Calendar, identifying external/customer meetings (based on&lt;/span&gt;
  &lt;span class="s"&gt;attendee email domains), then pulling relevant context from Gmail threads&lt;/span&gt;
  &lt;span class="s"&gt;and Slack conversations."&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gh"&gt;# Meeting Prep&lt;/span&gt;

You are a meeting preparation assistant. Your job is to create concise,
actionable briefings for upcoming external meetings.

&lt;span class="gu"&gt;## Customer Directory&lt;/span&gt;
Read the centralized client registry at &lt;span class="sb"&gt;`$AGENT_DATA_DIR/clients.md`&lt;/span&gt;.
Use it to match calendar attendee domains to known customers, find the
correct Slack channel, and locate customer-specific data files.

&lt;span class="gu"&gt;## Phase 1: Discover (Find the Meeting)&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Search Google Calendar using &lt;span class="sb"&gt;`list_events`&lt;/span&gt; for the relevant time window
&lt;span class="p"&gt;-&lt;/span&gt; Identify external meetings by checking attendee email domains
&lt;span class="p"&gt;-&lt;/span&gt; Any attendee whose domain is NOT your organization signals an external meeting

&lt;span class="gu"&gt;## Phase 2: Gather (Pull Context from Email and Slack)&lt;/span&gt;

&lt;span class="gu"&gt;### Email Context (Gmail)&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Search for recent threads involving external attendees (last 30 days)
&lt;span class="p"&gt;2.&lt;/span&gt; Read the 3-5 most relevant threads, looking for decisions, action items, tone
&lt;span class="p"&gt;3.&lt;/span&gt; Check the calendar event itself for agenda or documents

&lt;span class="gu"&gt;### Slack Context&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; If there's a dedicated customer channel, read recent messages there
&lt;span class="p"&gt;2.&lt;/span&gt; Otherwise search by company name or contact names (last 2 weeks)
&lt;span class="p"&gt;3.&lt;/span&gt; Look for internal context not in email: concerns, feature requests, deal status

&lt;span class="gu"&gt;## Phase 3: Brief (Deliver the Prep)&lt;/span&gt;

&lt;span class="gu"&gt;### Meeting Briefing: [Title]&lt;/span&gt;
&lt;span class="gs"&gt;**When:**&lt;/span&gt; [Date &amp;amp; Time]
&lt;span class="gs"&gt;**With:**&lt;/span&gt; [Attendees + roles/company]
&lt;span class="gs"&gt;**Meeting type:**&lt;/span&gt; [Quarterly review, Demo, Follow-up, Intro call]

&lt;span class="gs"&gt;**Quick Context:**&lt;/span&gt; 2-3 sentences on where things stand
&lt;span class="gs"&gt;**Recent History:**&lt;/span&gt; Chronological recap of last interactions
&lt;span class="gs"&gt;**Key Things to Know:**&lt;/span&gt; Open items, concerns, opportunities
&lt;span class="gs"&gt;**Suggested Talking Points:**&lt;/span&gt; 3-5 practical conversation starters
&lt;span class="gs"&gt;**People Notes:**&lt;/span&gt; Brief note on new stakeholders or unfamiliar attendees
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The skill tells the agent exactly which Arcade-powered tools to use (&lt;code&gt;list_events&lt;/code&gt;, &lt;code&gt;search_messages&lt;/code&gt;, &lt;code&gt;read_thread&lt;/code&gt;), in what order, what signals to look for in the results, and how to format the output. The customer directory lookup means the agent doesn't waste tokens fuzzy-matching company names. It goes straight to the right email domain and Slack channel.&lt;/p&gt;

&lt;p&gt;When a user texts "prep me for my 2pm" on WhatsApp, Claude Code receives the message via the channel, activates this skill, runs the three-phase workflow through Arcade's tools, and sends the briefing back via the WhatsApp reply tool. The whole flow, from WhatsApp message to structured briefing, happens without the user leaving the chat.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 7: Run and test the WhatsApp assistant locally
&lt;/h3&gt;

&lt;p&gt;Start everything in order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Terminal 1: Start the relay server&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;whatsapp-relay
node &lt;span class="nt"&gt;--import&lt;/span&gt; tsx relay.ts
&lt;span class="c"&gt;# → "whatsapp relay listening on :3000"&lt;/span&gt;

&lt;span class="c"&gt;# Terminal 2: Expose the relay via ngrok&lt;/span&gt;
ngrok http 3000
&lt;span class="c"&gt;# → Copy the https:// forwarding URL&lt;/span&gt;

&lt;span class="c"&gt;# Terminal 3: Start Claude Code from the project root&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;whatsapp-assistant
claude &lt;span class="nt"&gt;--dangerously-load-development-channels&lt;/span&gt; server:whatsapp
&lt;span class="c"&gt;# Claude Code discovers .mcp.json and launches the MCP server&lt;/span&gt;
&lt;span class="c"&gt;# → "whatsapp channel: connected, polling http://localhost:3000 every 2000ms"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Register your webhook with Meta by going to your app in the &lt;a href="https://developers.facebook.com/" rel="noopener noreferrer"&gt;Meta Developer Dashboard&lt;/a&gt;, then navigating to WhatsApp, Configuration, Webhook. Set the Callback URL to your ngrok URL plus &lt;code&gt;/webhook&lt;/code&gt; (e.g., &lt;code&gt;https://abc123.ngrok.io/webhook&lt;/code&gt;), set the Verify Token to the value in your &lt;code&gt;.env&lt;/code&gt; file, and subscribe to the &lt;code&gt;messages&lt;/code&gt; webhook field.&lt;/p&gt;

&lt;p&gt;Now send a message from your phone to the WhatsApp Business number. You should see it flow through the relay, into the MCP server, and appear in Claude Code. Claude Code processes it and sends a reply back through the same chain.&lt;/p&gt;

&lt;p&gt;Try texting "prep me for my next meeting." The first time Claude Code calls an Arcade-powered tool (like reading your calendar), Arcade prints an authorization URL in the terminal. Open it in your browser and authenticate with the relevant account (Google, Slack, etc.). This is a one-time step per service. After that, Arcade manages token refresh automatically.&lt;/p&gt;

&lt;p&gt;If you have the meeting-prep skill configured and Google Calendar / Gmail connected through Arcade, you'll get back a structured briefing right in WhatsApp.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling from single-user to multi-user: What changes in the architecture
&lt;/h2&gt;

&lt;p&gt;Everything above runs as a single user. One Claude Code instance, one set of Arcade credentials, one identity context. Here's what breaks when a second user messages the bot, and what you need to change.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why a single Claude Code instance doesn't work for multiple users
&lt;/h3&gt;

&lt;p&gt;The single-user setup has an implicit assumption: every WhatsApp message belongs to you. When Claude Code calls an Arcade tool like &lt;code&gt;list_events&lt;/code&gt;, Arcade uses the credentials you authenticated during setup. There's no user identifier in the call.&lt;/p&gt;

&lt;p&gt;If User 2 messages the same bot, Claude Code still calls Arcade with your credentials. User 2 gets your calendar. Worse, Claude Code runs in a single conversation context. User 1's meeting briefing (deal terms, internal Slack messages, revenue numbers) is sitting in the context window when User 2's message arrives. A prompt injection from User 2 could surface User 1's data. Arcade secured the credentials correctly, but the shared context window breaks tenant isolation.&lt;/p&gt;

&lt;p&gt;You need two things: separate agent instances so context never crosses between users, and per-user credential routing so Arcade knows whose calendar to read.&lt;/p&gt;

&lt;h3&gt;
  
  
  The multi-user architecture
&lt;/h3&gt;

&lt;p&gt;The relay server, MCP tool schemas (reply, react, send_media), and skills stay identical. What changes is the orchestration layer.&lt;/p&gt;

&lt;p&gt;The single-user version uses Claude Code CLI with its built-in channels feature. For multi-user, you build a custom orchestrator using the &lt;a href="https://platform.claude.com/docs/en/agent-sdk/overview" rel="noopener noreferrer"&gt;Claude Agent SDK&lt;/a&gt;. The SDK doesn't have native channel support, but it gives you sessions, hooks, tool permissions, and MCP connections, the building blocks to replicate what channels do for a single user across many users.&lt;/p&gt;

&lt;p&gt;The relay server becomes a router. When a message arrives from +1111, the orchestrator looks up which agent session owns that phone number and routes the message there. When +2222 messages, it routes to a different session. Each session has its own context window, its own MCP server instance, and its own Arcade user context. No data crosses between them.&lt;/p&gt;

&lt;p&gt;Credential routing works through Arcade's &lt;code&gt;user_id&lt;/code&gt; parameter on tool calls. Each user goes through the Arcade browser auth flow once (the same authorization URL step from the single-user setup). After that, when the orchestrator calls an Arcade tool on behalf of User 2, it passes User 2's identity. Arcade resolves the correct OAuth grants, mints a scoped token for that specific action, and executes the call. User 2's calendar request returns User 2's calendar. For a full walkthrough of how this authorization model works across frameworks, see &lt;a href="https://blog.arcade.dev/sso-for-ai-agents-authentication-and-authorization-guide" rel="noopener noreferrer"&gt;SSO for AI Agents: Authentication and Authorization Guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The identity pairing itself is straightforward. Map each WhatsApp sender ID to a corporate identity using a one-time verification flow: send a code via a WhatsApp Authentication Template, have the user confirm it in a web portal, and store the mapping.&lt;/p&gt;

&lt;p&gt;Arcade handles the rest of the multi-user complexity: per-user OAuth token exchange and just-in-time grants for credential delegation, scoped tool execution that prevents cross-tenant data access, a versioned tool registry that doesn't break when upstream APIs change, and structured audit logs tied to the specific user and action. These are the same four pitfalls from earlier. They all get harder at multi-user scale, and Arcade handles them natively.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production readiness checklist for AI agents
&lt;/h2&gt;

&lt;p&gt;Before you move beyond local use, gut-check these five things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Credential isolation&lt;/strong&gt;. Can the LLM see your auth tokens? If yes, stop. The architecture needs just-in-time, per-action authorization where the model never touches long-lived credentials. Standing service account privileges are a non-starter.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool reliability&lt;/strong&gt;. Are your tools agent-optimized or naive REST wrappers? If the model has to guess complex payload parameters and brute-force retries, you'll hit failures that are invisible until production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Versioning and rollbacks&lt;/strong&gt;. Can you update a tool without breaking the running assistant? If one upstream API change takes down your agent, you need a versioned registry with safe deprecation periods.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auditability&lt;/strong&gt;. Can you trace every action back to the specific human who requested it? If not, you fail SOC2 and ISO27001. You need immutable logs with user IDs, tool names, and sanitized parameters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developer time allocation.&lt;/strong&gt; Are your engineers building OAuth plumbing and webhook retry logic, or building skills and workflows? If it's the former, the architecture is too low-level.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Next steps
&lt;/h2&gt;

&lt;p&gt;You now have a working WhatsApp assistant. A relay handling Meta's webhooks. An MCP server bridging to Claude Code. A meeting-prep skill that turns "prep me for my 2pm" into a structured briefing pulled from your calendar, email, and Slack.&lt;/p&gt;

&lt;p&gt;The interesting part is what comes next. The relay and MCP server are infrastructure you write once. The skills are where the ongoing value lives, and anyone on the team can write them. Meeting prep was the first one I built. Expense report summaries, daily standups, customer check-in reminders: same pattern, different markdown file.&lt;/p&gt;

&lt;p&gt;For multi-user deployments, the &lt;a href="https://platform.claude.com/docs/en/agent-sdk/overview" rel="noopener noreferrer"&gt;Claude Agent SDK&lt;/a&gt; gives you the building blocks to orchestrate per-user agent sessions, with the relay routing messages and &lt;a href="https://www.arcade.dev/" rel="noopener noreferrer"&gt;Arcade&lt;/a&gt; handling per-user credential delegation, tenant isolation, and audit logging. You focus on skills, not infrastructure.&lt;/p&gt;

&lt;p&gt;The code from this guide is on &lt;a href="https://github.com/manveer/whatsapp-assistant" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. Fork it and build something useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is an always-on AI executive assistant?
&lt;/h3&gt;

&lt;p&gt;An always-on assistant runs continuously and interacts through messaging channels like WhatsApp or Slack. It maintains state across conversations and takes actions in connected business tools asynchronously, without needing a browser tab open.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are the risks of using OpenClaw for an AI agent?
&lt;/h3&gt;

&lt;p&gt;They commonly rely on shared machine credentials, fragile scripts, and ungoverned tool wrappers. This creates high risk of token leakage, unreliable tool calls, context bloat, and missing audit trails required for compliance.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do you prevent an agent from having god-mode access to company systems?
&lt;/h3&gt;

&lt;p&gt;Use runtime, per-action authorization with just-in-time, short-lived grants (e.g., OAuth token exchange). The agent never holds broad or long-lived credentials, and every action is evaluated against the requesting user's permissions.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Arcade and how does it secure AI agent tool access?
&lt;/h3&gt;

&lt;p&gt;Arcade is a runtime that sits between an AI agent and your business tools. Instead of giving the agent stored credentials, Arcade evaluates each tool call against the requesting user's permissions, mints a just-in-time token scoped to that action, executes the call, and logs the result. It also provides agent-optimized integrations that return summarized data instead of raw API responses. For a full overview, see &lt;a href="https://docs.arcade.dev/en/get-started/about-arcade" rel="noopener noreferrer"&gt;How Arcade works&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is it safe to give an AI agent access to my Google Calendar and email?
&lt;/h3&gt;

&lt;p&gt;Not if the agent holds long-lived OAuth tokens or API keys directly. A prompt injection or compromised dependency can exfiltrate those credentials and access everything the agent can reach. The safe approach is per-action authorization: a runtime like Arcade mints a short-lived, scoped token for each specific action and revokes it immediately after, limiting the blast radius to a single call.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does the relay server handle duplicate WhatsApp webhooks?
&lt;/h3&gt;

&lt;p&gt;WhatsApp delivers events with at-least-once semantics. The relay returns &lt;code&gt;200 OK&lt;/code&gt; immediately (even on bad signatures) to prevent retry storms, and processes messages asynchronously. For production use, add a deduplication store like Redis keyed by message ID.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is WhatsApp's 24-hour messaging window?
&lt;/h3&gt;

&lt;p&gt;Free-form replies are allowed within 24 hours of the user's last message. Proactive messages outside that window must use pre-approved WhatsApp message templates (HSM templates). For an 8 AM morning brief, you'd need an approved template.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use this architecture with models other than Claude?
&lt;/h3&gt;

&lt;p&gt;Yes. The relay server and MCP protocol are model-agnostic. The relay handles WhatsApp I/O, and the MCP server defines tools via a standard protocol. You could swap Claude Code for any MCP-compatible runtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I add new skills or workflows to a Claude Code agent?
&lt;/h3&gt;

&lt;p&gt;Create a new directory under &lt;code&gt;skills/&lt;/code&gt; with a &lt;code&gt;SKILL.md&lt;/code&gt; file. The skill's frontmatter description tells Claude Code when to activate it. Skills are just structured prompts, no code deployment required.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>openclaw</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Best Openclaw Alternatives For Secure, Fully Managed Agents (2026 Buyer's Guide)</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Thu, 02 Apr 2026 17:55:33 +0000</pubDate>
      <link>https://forem.com/manveer_chawla_64a7283d5a/best-openclaw-alternatives-for-secure-fully-managed-agents-2026-buyers-guide-34eg</link>
      <guid>https://forem.com/manveer_chawla_64a7283d5a/best-openclaw-alternatives-for-secure-fully-managed-agents-2026-buyers-guide-34eg</guid>
      <description>&lt;p&gt;OpenClaw is the most capable open-source personal AI agent framework available right now. But deploying it in production comes with a real cost: self-hosting means you're managing VPSs, maintaining Docker container orchestration, and debugging OAuth authentication flows. Every week, indefinitely. &lt;/p&gt;

&lt;p&gt;This guide evaluates the top alternatives across two categories to help you escape that burden: fully managed OpenClaw hosting providers and general personal AI assistants.&lt;/p&gt;

&lt;p&gt;We wrote this guide for technical but time-poor users, think software developers and product managers, alongside execution-focused operators like growth hackers and agency coordinators. If you need immediate, secure results from an autonomous agent without turning AI deployment into an ongoing maintenance project, this guide is for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR: Best OpenClaw alternatives in 2026
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Quick decision framework:&lt;/strong&gt; Choose managed OpenClaw hosting to keep OpenClaw's full architecture, including model flexibility, custom code execution, and BYOK support, on production-grade infrastructure. Choose a general assistant if you're willing to trade developer-level control for a broader feature set or a different workflow paradigm. Avoid raw self-hosted OpenClaw unless you have dedicated DevOps and security resources.&lt;/p&gt;

&lt;p&gt;We evaluated each alternative on security architecture, setup speed, model flexibility, and native integrations. Here's where each one lands:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Best for secure, always-on OpenClaw agents in production:&lt;/strong&gt; &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;KiloClaw offers a setup in under two minutes&lt;/a&gt;, with five-layer tenant isolation, Firecracker VM boundaries, AES-256 encrypted credential vaults, no SSH access, tool allow-lists, and pre-built tool integrations without any infrastructure management.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Best for Anthropic-ecosystem desktop automation:&lt;/strong&gt; Claude Cowork works best for users who want an autonomous desktop agent with file access, scheduled tasks, and computer use capabilities. It's powerful for local workflow automation but runs exclusively on your desktop, not on a remote cloud host, and is locked to Anthropic's model ecosystem.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Best for managed multi-model orchestration, if you don't need model control or BYOK:&lt;/strong&gt; Perplexity Computer orchestrates 19 AI models across 400+ app integrations for complex, multi-step tasks. It's powerful out of the box but doesn't offer manual model selection or BYOK, and its opinionated framework is a significant departure from OpenClaw's open architecture.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Best for no-code, multi-channel workflow automation&lt;/strong&gt;: Lindy AI serves users who want a visual builder with 5,000+ integrations, AI phone agents, and cloud-based computer use. It supports multiple models but lacks OpenClaw's raw script execution and developer-level customizability.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Avoid for most business production use:&lt;/strong&gt; Skip raw self-hosted OpenClaw on an unmanaged VPS unless you have dedicated SecOps/DevOps resources and can ensure strong sandboxing. The architecture demands excessive security patching, continuous dependency updates, and constant third-party API maintenance.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why self-hosting OpenClaw is risky and expensive
&lt;/h2&gt;

&lt;p&gt;Setting up OpenClaw isn't as simple as cloning a repository and running a single command. You've got to provision a VPS with adequate memory, install the correct runtime environments, and manage multiple Docker containers for the gateway and CLI. You need to configure reverse proxies like Nginx to handle secure WebSocket connections, manage persistent storage volumes for memory files, and monitor system resources.&lt;/p&gt;

&lt;p&gt;And when an update introduces breaking changes to node dependencies? You're the one bringing the agent back online.&lt;/p&gt;

&lt;h3&gt;
  
  
  The always-on problem
&lt;/h3&gt;

&lt;p&gt;Running an agent locally creates an always-on problem. If the agent lives on your laptop, your autonomous workflows die the moment you close the lid. Moving the agent to a cloud server solves the uptime issue, but turns you into a part-time sysadmin who monitors logs and server health.&lt;/p&gt;

&lt;h3&gt;
  
  
  Integration fragility
&lt;/h3&gt;

&lt;p&gt;Third-party integrations require maintaining fragile OAuth flows.&lt;/p&gt;

&lt;p&gt;Google Workspace &lt;a href="https://developers.google.com/identity/protocols/oauth2" rel="noopener noreferrer"&gt;limits applications to one hundred refresh tokens&lt;/a&gt;, automatically invalidating the oldest token without warning when the limit is reached. If your application remains in testing status, Google issues tokens that expire in just seven days.&lt;/p&gt;

&lt;p&gt;GitHub recently &lt;a href="https://github.blog/changelog/2025-09-29-strengthening-npm-security-important-changes-to-authentication-and-token-management/" rel="noopener noreferrer"&gt;reduced the default lifespan of new granular access tokens to seven days&lt;/a&gt;. That forces self-hosted users to regenerate and update credentials just to keep basic repository reads working.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt injection risk
&lt;/h3&gt;

&lt;p&gt;Because agents take autonomous action, an injection attack no longer stops at generating inaccurate text. It also executes harmful commands. An agent reading a malicious email or scanning a compromised public repository can be tricked into exfiltrating private data. &lt;/p&gt;

&lt;p&gt;Recent exploits illustrate just how real this is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The &lt;a href="https://nvd.nist.gov/vuln/detail/cve-2025-32711" rel="noopener noreferrer"&gt;EchoLeak vulnerability in Microsoft 365 Copilot&lt;/a&gt; showed that a single crafted email could trigger zero-click remote data exfiltration without any user interaction.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In another instance, prompt injection embedded in public repository code comments instructed an AI coding assistant to modify configuration files, enabling &lt;a href="https://embracethered.com/blog/posts/2025/github-copilot-remote-code-execution-via-prompt-injection/" rel="noopener noreferrer"&gt;remote code execution&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Security researchers report these attacks &lt;a href="https://www.vectra.ai/topics/prompt-injection" rel="noopener noreferrer"&gt;succeed 50% and 84% of the time in agentic systems&lt;/a&gt;. That makes unmanaged agents a massive liability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Credential exposure
&lt;/h3&gt;

&lt;p&gt;Giving open-source frameworks direct access to production APIs, internal password vaults, or payment infrastructure without a dedicated security layer creates critical risk. Storing raw access tokens in plain text environment files on a standard server exposes your most sensitive financial and operational data to anyone who breaches the system.&lt;/p&gt;

&lt;p&gt;Hosted solutions reduce this risk with enterprise-grade managed vaults, encrypted storage at rest, and controlled payment mechanisms like KiloClaw's AgentCard, which limits financial exposure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Unrestricted agent access
&lt;/h3&gt;

&lt;p&gt;Granting SSH access to a VPS running an autonomous agent creates unacceptable risk for any serious business or IT team. SSH access exposes the underlying operating system to direct attack, allowing compromised containers to pivot and access the host kernel. This architecture circumvents proper auditing, logging, and security controls.&lt;/p&gt;

&lt;p&gt;Without strict tool allow-listing, an agent can become a powerful internal attack vector. The principle of least privilege must apply to AI. The platform must enforce strict permissions, so the agent can only access tools, channels, and functions that a human administrator has explicitly authorized.&lt;/p&gt;

&lt;h3&gt;
  
  
  When self-hosting OpenClaw still makes sense
&lt;/h3&gt;

&lt;p&gt;There are narrow scenarios where self-hosting remains the right call:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Academic researchers testing experimental local models in air-gapped environments without internet access can safely self-host. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hobbyists who enjoy tinkering with complex Docker configurations, managing Linux networking, and debugging dependency trees will find the open-source repository rewarding.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Organizations with dedicated security operations teams that require custom hardware deployments for strict compliance and data residency reasons may still choose to build their own internal infrastructure around the open-source core.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to evaluate OpenClaw alternatives for security and production readiness
&lt;/h2&gt;

&lt;p&gt;To evaluate managed alternatives, look beyond marketing claims. Assess how each platform abstracts infrastructure, enforces security, and reduces daily friction to determine if it actually replaces self-hosting. Here are the four criteria that matter most.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Security and isolation features
&lt;/h3&gt;

&lt;p&gt;The platform's security architecture separates a toy deployment from a production-grade agent.&lt;/p&gt;

&lt;p&gt;Check whether the platform enforces strict tool allow-listing by default. An agent should never have implicit access to your entire digital workspace. Restrict its reach to prevent rogue actions or accidental deletions.&lt;/p&gt;

&lt;p&gt;Check how the platform manages secrets. Storing application keys in flat text files is obsolete. Check whether the platform stores access tokens in encrypted, managed vaults and blocks direct SSH access to the server.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Setup speed
&lt;/h3&gt;

&lt;p&gt;The main reason to abandon self-hosting is to reclaim your time. So measure how long it takes to go from creating an account to running your first workflow.&lt;/p&gt;

&lt;p&gt;A premium managed alternative should eliminate provisioning entirely. Check whether complex integrations, like connecting to Google Workspace, Telegram, or GitHub, are handled via guided one-click authorization flows.&lt;/p&gt;

&lt;p&gt;If a platform still requires you to generate webhooks, and configure callback URLs into a configuration dashboard, it hasn't solved the friction.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Model flexibility
&lt;/h3&gt;

&lt;p&gt;The AI landscape moves fast. Locking your autonomous workflows into a single proprietary provider creates real risk. If your chosen vendor experiences an outage or degrades their model's reasoning capabilities, your entire agentic workforce halts.&lt;/p&gt;

&lt;p&gt;Check whether the platform lets you choose your preferred model or bring your own API keys from providers like OpenAI, Anthropic, or Google. Evaluate whether you can select the right model for your workload, whether that's a frontier reasoning model for complex tasks or a cost-effective open-weight model for high-volume processing.&lt;/p&gt;

&lt;p&gt;True model flexibility means you're never locked into a single vendor. You can optimize for cost, context window limits, and data privacy by selecting the best model for the job, not the only model the platform allows.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Native integrations
&lt;/h3&gt;

&lt;p&gt;An autonomous agent is only as useful as the systems it can influence.&lt;/p&gt;

&lt;p&gt;Check whether the agent connects natively to your actual work channels, like Slack, Discord, or Telegram. Beyond communication, evaluate whether the platform can execute real-world actions securely: deep file search across Google Drive and GitHub, updating CRM records, and executing controlled financial payments through isolated, platform-managed debit cards.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenClaw alternatives comparison table (2026)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Alternative&lt;/th&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;th&gt;Model flexibility&lt;/th&gt;
&lt;th&gt;Security model&lt;/th&gt;
&lt;th&gt;Pricing&lt;/th&gt;
&lt;th&gt;Migration effort&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;KiloClaw&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed OpenClaw&lt;/td&gt;
&lt;td&gt;Always-on secure multi-channel agents with zero infrastructure and full model control&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;5-layer tenant isolation, Firecracker VMs, encrypted vaults, no SSH, independently audited&lt;/td&gt;
&lt;td&gt;$9/mo + inference at zero markup&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;xCloud&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed OpenClaw&lt;/td&gt;
&lt;td&gt;Managed OpenClaw hosting with automatic updates, no native multi-platform integrations&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Managed security defaults, isolated environments, no published independent audit&lt;/td&gt;
&lt;td&gt;$24/mo + BYOK inference&lt;/td&gt;
&lt;td&gt;Low-Moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DockClaw&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed OpenClaw&lt;/td&gt;
&lt;td&gt;Fast single-channel hosting with multi-model support, Telegram only&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Dedicated virtual machine isolation&lt;/td&gt;
&lt;td&gt;From $19.99/mo + BYOK inference&lt;/td&gt;
&lt;td&gt;Low-Moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Perplexity Computer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;General Agent&lt;/td&gt;
&lt;td&gt;Multi-model workflow execution without infrastructure control or model choice&lt;/td&gt;
&lt;td&gt;No (automatic routing, no BYOK)&lt;/td&gt;
&lt;td&gt;Consumer web security&lt;/td&gt;
&lt;td&gt;$200/mo (Max) or $325/seat/mo (Enterprise)&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Cowork&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;General Agent&lt;/td&gt;
&lt;td&gt;Local file and desktop automation that stops when your machine powers off&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Human-in-the-loop oversight&lt;/td&gt;
&lt;td&gt;From $20/mo (Pro)&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lindy AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;General Agent&lt;/td&gt;
&lt;td&gt;Visual no-code agent building with no custom code execution&lt;/td&gt;
&lt;td&gt;Limited (multi-model, no BYOK)&lt;/td&gt;
&lt;td&gt;Enterprise compliance&lt;/td&gt;
&lt;td&gt;Free tier; paid from $19.99/mo (credit-based)&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For most teams migrating off self-hosted OpenClaw, KiloClaw delivers the strongest combination of security controls, setup speed, model flexibility, and native integrations. It's the only managed provider that pairs enterprise-grade credential vaulting with full BYOK model access and always-on headless execution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fully managed OpenClaw hosting providers
&lt;/h2&gt;

&lt;p&gt;This category represents direct infrastructure replacements for users who want the exact capabilities of the open-source OpenClaw framework but refuse to manage the underlying servers, networking, and dependency updates. These platforms handle the operational burden while preserving the core autonomous architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  KiloClaw (managed OpenClaw hosting)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljnvftliekhiobx1flpx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljnvftliekhiobx1flpx.png" alt="KiloClaw AI assistant landing page with the headline “Your AI assistant that actually does things,” highlighting email, calendar, project monitoring, and chat-based task automation on mobile." width="800" height="564"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Who KiloClaw is best for
&lt;/h4&gt;

&lt;p&gt;Technical founders, operators, and agency coordinators who need always-on, headless messaging agents running across Slack, Telegram, and WhatsApp with zero infrastructure management, maintenance, or security headaches.&lt;/p&gt;

&lt;h4&gt;
  
  
  KiloClaw Overview
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;KiloClaw&lt;/a&gt; is an optimized, hosted, production-ready version of the OpenClaw framework. It takes users from zero to a running, always-on AI agent in under two minutes.&lt;/p&gt;

&lt;p&gt;Instead of presenting you with a blank terminal, KiloClaw acts as a tireless operational assistant out of the box. It handles everything from routing incoming messages and triaging complex inboxes to executing high-volume sales research across the web.&lt;/p&gt;

&lt;h4&gt;
  
  
  How KiloClaw compares to self-hosted OpenClaw
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Painless setup:&lt;/strong&gt; KiloClaw eliminates manual setup with guided authorization flows for all supported integrations. No more frustrating OAuth consent screens or managing expiring tokens.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Security-first architecture:&lt;/strong&gt; The platform runs each customer inside a dedicated Firecracker micro-VM (the same isolation technology behind AWS Lambda), not a shared container. There is no shared kernel, no shared filesystem, and no shared process namespace between tenants. KiloClaw prohibits direct SSH access, enforces tool allow-listing by default, and locks agent security controls in the platform's start script, preventing them from being overridden by the agent itself or by prompt injection through chat channels.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Independent security validation&lt;/strong&gt;: KiloClaw's architecture was validated by a 10-day independent security assessment in February 2026 using the PASTA threat modeling framework. The assessment covered 30 threats across 13 assets, ran 60+ adversarial tests including cross-tenant isolation probes, and found zero cross-tenant vulnerabilities. No other alternative in this guide has published comparable third-party validation.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model flexibility:&lt;/strong&gt; KiloClaw uses &lt;a href="https://kilo.ai/gateway" rel="noopener noreferrer"&gt;Kilo Gateway&lt;/a&gt; by default, which provides access to more than 500 AI models through a single integration. You can also bring your own API keys from providers like Anthropic, OpenAI, and Google, giving you full control over which model powers your agent.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Native integrations:&lt;/strong&gt; KiloClaw provides natively guided authorization flows for Telegram, Slack, WhatsApp, Google Workspace, GitHub, and 1Password. These deep, two-way integrations support the headless messaging pattern central to OpenClaw's value. The agent can receive messages, take autonomous action, and respond directly within your communication channels 24/7.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Code execution and skills&lt;/strong&gt;: Like OpenClaw, KiloClaw agents can write and execute code, build reusable scripts, and extend their own capabilities over time. This self-improving loop runs on managed cloud infrastructure, so your agent grows more capable without you having to maintain the server.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  What you get with KiloClaw
&lt;/h4&gt;

&lt;p&gt;Instant readiness is the biggest advantage. You can launch an integrated, multi-channel agent during a coffee break. That used to be a frustrating weekend engineering sprint.&lt;/p&gt;

&lt;p&gt;You also get peace of mind. KiloClaw provides a secure boundary where you can safely grant the agent access to sensitive tools, including corporate password vaults and controlled financial transactions via the integrated AgentCard.&lt;/p&gt;

&lt;p&gt;And you get true always-on reliability on managed cloud infrastructure. Your agent runs 24/7 regardless of whether your laptop is open, your desktop is powered on, or you're on vacation. Unlike desktop-bound alternatives, KiloClaw's headless architecture means your messaging agents, scheduled workflows, and autonomous tasks never stop running.&lt;/p&gt;

&lt;h4&gt;
  
  
  KiloClaw limitations
&lt;/h4&gt;

&lt;p&gt;Because KiloClaw is a managed cloud service, you don't have root server access. You can't SSH into the underlying infrastructure to modify core OS-level dependencies or alter the container orchestration. It also can't support air-gapped local execution for classified, offline environments.&lt;/p&gt;

&lt;h4&gt;
  
  
  KiloClaw pricing
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;KiloClaw costs $9 per month for hosting&lt;/a&gt; (with a $4 first month and a 7-day free trial, no credit card required). AI inference is billed separately through Kilo Gateway at zero markup across 500+ models, with free models included. Compared to self-hosting, you replace unpredictable compute fees, bandwidth charges, and ongoing maintenance costs with a predictable flat hosting fee and transparent, at-cost model usage.&lt;/p&gt;

&lt;h4&gt;
  
  
  OpenClaw to KiloClaw migration effort
&lt;/h4&gt;

&lt;p&gt;Low. Standard OpenClaw system prompts, behavior instructions, and logic workflows map directly to the new environment. KiloClaw's guided UI flows replace the need to migrate fragile configuration files and plain text environment variables.&lt;/p&gt;

&lt;p&gt;Ready to ditch the DevOps tax? &lt;/p&gt;

&lt;p&gt;&lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;Start your KiloClaw deployment today&lt;/a&gt; and have an agent running in under two minutes.  &lt;/p&gt;

&lt;h3&gt;
  
  
  xCloud (OpenClaw VPS hosting)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F338h9ms767ehzbgm519y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F338h9ms767ehzbgm519y.png" alt="xCloud OpenClaw hosting landing page promoting fully managed AI assistant hosting with live deployment in 5 minutes, multi-channel integrations, no-code setup, and monthly pricing." width="800" height="567"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Who xCloud is best for
&lt;/h4&gt;

&lt;p&gt;Non-technical to semi-technical users who want fully managed OpenClaw hosting with automatic updates and dedicated support, but don't need guided multi-platform OAuth flows, advanced credential vaulting, or independently audited security architecture.&lt;/p&gt;

&lt;h4&gt;
  
  
  xCloud Overview
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://xcloud.host/openclaw-hosting" rel="noopener noreferrer"&gt;xCloud&lt;/a&gt; is a fully managed OpenClaw hosting provider that handles server provisioning, Docker configuration, SSL setup, updates, and backups. Deployment takes approximately five minutes with no technical skills required. However, you must bring your own AI provider API key, and integrations beyond Telegram and WhatsApp require manual configuration.&lt;/p&gt;

&lt;h4&gt;
  
  
  How xCloud compares to self-hosted OpenClaw
&lt;/h4&gt;

&lt;p&gt;xCloud removes the full infrastructure management burden, not just initial provisioning. The platform handles server setup, OpenClaw installation, SSL configuration, automatic updates, security patches, and backup recovery. A web dashboard provides monitoring, logs, uptime tracking, and one-click restore without any CLI or SSH access required.&lt;/p&gt;

&lt;h4&gt;
  
  
  What you get with xCloud
&lt;/h4&gt;

&lt;p&gt;A fully managed deployment with approximately five-minute setup time, automatic OpenClaw updates, automatic backups, free SSL, integrated monitoring and logs, and 24/7 expert support. The platform requires no Docker, terminal, or DevOps knowledge to operate.&lt;/p&gt;

&lt;h4&gt;
  
  
  xCloud limitations
&lt;/h4&gt;

&lt;p&gt;xCloud requires you to bring your own AI provider API key. The platform currently supports Anthropic, OpenAI, Gemini, OpenRouter, and Moonshot AI, with providers like Grok, xAI, and Mistral listed as coming soon. Unlike KiloClaw's unified Kilo Gateway, there is no single integration point that gives you access to hundreds of models through one connection.&lt;/p&gt;

&lt;p&gt;Channel support is limited. Telegram and WhatsApp work natively, but Discord, Slack, and Signal remain on xCloud's roadmap for Q2 2026. For OpenClaw users who rely on multi-channel headless messaging across Slack, Discord, and Telegram simultaneously, that's a meaningful gap today.&lt;/p&gt;

&lt;p&gt;xCloud also lacks guided OAuth authorization flows for third-party services. Connecting tools like Google Workspace, GitHub, or 1Password requires manual configuration rather than one-click setup. The platform does not publish an independent security assessment or provide detailed documentation on its tenant isolation architecture beyond describing isolated environments.&lt;/p&gt;

&lt;h4&gt;
  
  
  xCloud pricing
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://xcloud.host/openclaw-hosting/" rel="noopener noreferrer"&gt;xCloud starts at $24 per month&lt;/a&gt; for managed OpenClaw hosting, making it the highest-priced managed OpenClaw host in this guide. AI inference is not included. You must bring your own API key from providers like Anthropic, OpenAI, or Gemini, so total monthly cost will be higher depending on model usage.&lt;/p&gt;

&lt;h4&gt;
  
  
  OpenClaw to xCloud migration effort
&lt;/h4&gt;

&lt;p&gt;Low-Moderate. xCloud handles server provisioning and OpenClaw installation automatically. You will need to input your AI provider API keys and configure your messaging platform connections through their dashboard. No raw Docker volume transfers or environment file manipulation required.&lt;/p&gt;

&lt;h4&gt;
  
  
  Bottom line
&lt;/h4&gt;

&lt;p&gt;xCloud handles hosting, updates, and support, but lacks guided OAuth flows for third-party services, publishes no independent security audit, and is the highest-priced managed option in this guide at $24 per month before inference costs. If you need multi-channel integrations, credential vaulting, and validated security architecture at a lower price, KiloClaw covers all of that.&lt;/p&gt;

&lt;h3&gt;
  
  
  DockClaw (managed OpenClaw hosting)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh47idiie54y67xwjxvmt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh47idiie54y67xwjxvmt.png" alt="Dockclaw AI agent deployment homepage with the headline “Ship faster. Deploy anywhere.” featuring autonomous AI agent hosting, multi-model support, fast deployment, and uptime monitoring." width="800" height="564"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Who DockClaw is best for
&lt;/h4&gt;

&lt;p&gt;Solo developers and small teams who need fast managed OpenClaw hosting with multi-model flexibility and don't need multi-channel messaging or advanced enterprise security features.&lt;/p&gt;

&lt;h4&gt;
  
  
  DockClaw Overview
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://dockclaw.com/guides/best-openclaw-hosting-2026" rel="noopener noreferrer"&gt;DockClaw&lt;/a&gt; is a managed hosting service tailored for the OpenClaw framework. The platform emphasizes deployment speed, offering a sub-60-second deployment process combined with dedicated VM isolation for every agent. It supports 10+ AI providers including Claude, GPT-4o, Gemini, Venice, Llama, and any OpenAI-compatible model, with the ability to switch providers at any time.&lt;/p&gt;

&lt;h4&gt;
  
  
  How DockClaw compares to self-hosted OpenClaw
&lt;/h4&gt;

&lt;p&gt;DockClaw removes all infrastructure setup friction by delivering a running, networked agent in under 60 seconds. Rather than relying on shared container environments, DockClaw provisions a dedicated isolated VM for each agent. The platform includes 24/7 uptime monitoring, persistent storage, and a control UI dashboard for managing your agent without touching a terminal.&lt;/p&gt;

&lt;h4&gt;
  
  
  What you get with DockClaw
&lt;/h4&gt;

&lt;p&gt;A quick, painless setup process that bypasses the need to understand cloud infrastructure, multi-provider model support with zero-lock-in switching, Telegram integration out of the box, persistent storage, 24/7 monitoring, and a web dashboard for agent management.&lt;/p&gt;

&lt;h4&gt;
  
  
  DockClaw limitations
&lt;/h4&gt;

&lt;p&gt;DockClaw supports Telegram as its only native messaging channel. There is no Slack, Discord, or WhatsApp integration. For OpenClaw users who rely on multi-channel headless messaging across several platforms simultaneously, that limits the agent's reach from day one.&lt;/p&gt;

&lt;p&gt;The Starter tier is BYOK only. You bring your own API key from providers like Claude, GPT-4o, or Gemini. The Pro tier bundles Kimi K2.5 credits, but total inference costs on the Starter plan depend entirely on your provider usage on top of the $19.99 monthly hosting fee.&lt;/p&gt;

&lt;p&gt;DockClaw lacks guided OAuth authorization flows for third-party services like Google Workspace, GitHub, or 1Password. Connecting external tools requires manual configuration. The platform provides no credential vaulting, no integrated payment controls, and no enterprise SSO. Its security architecture is limited to dedicated VM isolation per agent with no published independent security assessment validating the implementation.&lt;/p&gt;

&lt;h4&gt;
  
  
  DockClaw pricing
&lt;/h4&gt;

&lt;p&gt;The platform starts around $19.99 per month with a 7-day free trial and includes one agent deployment, a dedicated isolated VM, Telegram integration, web browsing, cron jobs, and a control UI dashboard. You bring your own API key. Pro costs $49.99 per month with a 3-day free trial and adds bundled AI model credits (Kimi K2.5, $250 value), Brave Search API access, voice support with Whisper STT and ElevenLabs TTS, a template library, and an agent onboarding wizard. Both tiers require no technical setup.&lt;/p&gt;

&lt;h4&gt;
  
  
  OpenClaw to DockClaw migration effort
&lt;/h4&gt;

&lt;p&gt;Low-Moderate. The migration process involves transferring your core system prompts and using their web interface to re-authenticate your essential tools. No need to manipulate raw server files.&lt;/p&gt;

&lt;h4&gt;
  
  
  Bottom line
&lt;/h4&gt;

&lt;p&gt;DockClaw delivers solid baseline hosting with strong VM isolation at an accessible price point. If you need guided integrations, credential vaulting, and features like AgentCard for controlled financial transactions, KiloClaw provides a more complete production environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  General AI assistants that can replace some OpenClaw workflows
&lt;/h2&gt;

&lt;p&gt;These platforms approach workflow automation through different architectures. They compete for the same automation budget as OpenClaw but prioritize proprietary interfaces, specific foundational models, or visual, no-code environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Perplexity Computer (multi-model agentic platform)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0p9v3i9fvwza3729xeyc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0p9v3i9fvwza3729xeyc.png" alt="Perplexity Computer homepage featuring the headline “Computer Builds,” a glass sphere hero image, and examples of AI-generated tasks like stock analysis, mobile app creation, and report building." width="800" height="567"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Who Perplexity Computer is best for
&lt;/h4&gt;

&lt;p&gt;Knowledge workers, operators, and technical teams who need a fully managed agentic platform that can execute complex, multi-step workflows spanning research, and content production.&lt;/p&gt;

&lt;h4&gt;
  
  
  Perplexity Computer Overview
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://www.perplexity.ai/hub/blog/introducing-perplexity-computer" rel="noopener noreferrer"&gt;Perplexity Computer&lt;/a&gt; is a fully agentic platform that coordinates 19 AI models simultaneously, routing each subtask to the best-suited model automatically. Claude Opus 4.6 handles core reasoning, Gemini manages deep research, and dedicated models cover image generation, video production, and lightweight tasks.&lt;/p&gt;

&lt;p&gt;You don't pick the model. Perplexity Computer owns the orchestration layer and makes routing decisions for you.&lt;/p&gt;

&lt;h4&gt;
  
  
  How Perplexity Computer compares to OpenClaw
&lt;/h4&gt;

&lt;p&gt;Perplexity Computer runs every task in an isolated cloud environment with a real filesystem, browser, and native integrations with over 400 applications including Slack, Gmail, GitHub, and Notion. It can execute workflows that run for hours, generate code, produce images and video, draft documents, and interact with connected apps in parallel.&lt;/p&gt;

&lt;p&gt;OpenClaw gives you full control over model selection and workflow logic. Perplexity Computer abstracts that away behind its own orchestration engine.&lt;/p&gt;

&lt;p&gt;Critically, Perplexity Computer also supports the two-way messaging pattern that made OpenClaw popular. It integrates directly into Slack, WhatsApp, Telegram, and Discord, responding to messages and running workflows from within your existing communication channels. Enterprise users can query @computer inside Slack channels and continue those conversations in the web or mobile interface.&lt;/p&gt;

&lt;h4&gt;
  
  
  What you get with Perplexity Computer
&lt;/h4&gt;

&lt;p&gt;You get complex workflow execution across research, code generation, and content production without managing any infrastructure. The platform's multi-model orchestration routes subtasks to the best available model automatically. Teams migrating from OpenClaw gain a polished managed experience but lose the ability to choose which model handles each task.&lt;/p&gt;

&lt;h4&gt;
  
  
  Perplexity Computer limitations
&lt;/h4&gt;

&lt;p&gt;Perplexity Computer doesn't offer manual model selection. You can't plug in your own API keys from external providers. For OpenClaw users accustomed to full control over their agent's reasoning engine, this is a fundamental architectural constraint, and the premium subscription tier puts it at a significantly higher price point than most alternatives in this guide.&lt;/p&gt;

&lt;p&gt;Perplexity Computer supports two-way messaging across major channels, but you don't control the underlying orchestration logic. The platform decides how to route tasks across its 19 models. You're adopting Perplexity's opinionated framework for how your agent behaves in those channels.&lt;/p&gt;

&lt;p&gt;The platform can generate and execute code within workflows, but you don't own the execution environment. You can't build a persistent library of custom scripts and reusable skills that grow the agent's capabilities over time. Code runs within Perplexity's orchestration layer, not within infrastructure you manage.&lt;/p&gt;

&lt;h4&gt;
  
  
  Perplexity Computer pricing:
&lt;/h4&gt;

&lt;p&gt;Access to Perplexity Computer requires a Max subscription at $200 per month or $2,000 per year. Enterprise pricing starts at $325 per seat per month and includes SSO, audit logs, and additional security controls. Compared to managed OpenClaw hosting providers, weigh this cost increase against the platform's broader orchestration capabilities.&lt;/p&gt;

&lt;h4&gt;
  
  
  OpenClaw to Perplexity Computer migration effort
&lt;/h4&gt;

&lt;p&gt;High. Migrating from OpenClaw to Perplexity Computer requires rebuilding your autonomous workflows within an opinionated orchestration framework. Existing system prompts, custom scripts, and model-specific logic won't transfer directly. You'll need to restructure your agent behavior around Perplexity's automatic model routing and connect your tools through its native integration layer rather than maintaining your own OAuth flows.&lt;/p&gt;

&lt;h4&gt;
  
  
  Bottom line
&lt;/h4&gt;

&lt;p&gt;Perplexity Computer is powerful for multi-model orchestration, but you surrender all control over model selection and can't bring your own API keys. If custom orchestration, reusable skills, vendor flexibility, and cost control matter to your team, KiloClaw delivers all of that at a fraction of the price.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Cowork (desktop automation agent)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1gbmat9fwls57awzw9y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1gbmat9fwls57awzw9y.png" alt="Anthropic Claude Cowork landing page showcasing autonomous AI task delegation, deliverable creation, and workflow automation across local files, apps, and meeting transcripts." width="800" height="567"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Who Claude Cowork is best for
&lt;/h4&gt;

&lt;p&gt;Desktop-bound professionals, including writers, analysts, and developers, who want an autonomous agent that can read, edit, and create local files, run scheduled tasks, and control their desktop, but who don't need an always-on autonomous agent.&lt;/p&gt;

&lt;h4&gt;
  
  
  Claude Cowork Overview
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://claude.com/product/cowork" rel="noopener noreferrer"&gt;Claude Cowork&lt;/a&gt; is Anthropic's autonomous desktop agent that works directly within your local environment. It can read, edit, and create files in local folders, run shell commands in a sandboxed environment, execute scheduled background tasks, and control the desktop through computer use. Cowork is an autonomous desktop agent. It doesn't run on a remote cloud host like OpenClaw.&lt;/p&gt;

&lt;h4&gt;
  
  
  How Claude Cowork compares to OpenClaw
&lt;/h4&gt;

&lt;p&gt;OpenClaw operates as a headless agent on a remote server with API-based integrations. Claude Cowork operates directly on your local machine with direct file access, a sandboxed Linux shell, MCP integrations, scheduled tasks for cron-style automation, and Dispatch mode that lets it work autonomously while you step away. It's restricted to Anthropic's proprietary Claude models.&lt;/p&gt;

&lt;p&gt;Of all the general alternatives, Claude Cowork comes closest to matching OpenClaw's self-improving architecture. It can write and execute code in a sandboxed shell, create reusable skills, and build on its own capabilities over time. The critical difference is that this entire loop runs on your local desktop, not on a remote cloud host that stays online independently.&lt;/p&gt;

&lt;h4&gt;
  
  
  What you get with Claude Cowork
&lt;/h4&gt;

&lt;p&gt;You can automate local file workflows, desktop applications, and tasks that require direct access to your machine's filesystem, things a cloud-hosted OpenClaw instance can't reach. You also get scheduled background tasks and Dispatch mode for hands-off execution, plus computer use for automating GUI-based applications that lack API endpoints. The desktop-first model means you can watch the agent work and intervene in real time when needed.&lt;/p&gt;

&lt;h4&gt;
  
  
  Claude Cowork limitations
&lt;/h4&gt;

&lt;p&gt;Claude Cowork enforces strict vendor lock-in. You can't switch to OpenAI, Google, or open-weight models if the Claude infrastructure experiences an outage or performance degradation. The fundamental constraint for OpenClaw migrants is that Cowork runs exclusively on your desktop. It supports scheduled tasks and Dispatch mode, but your machine must remain powered on and running. No remote cloud host or VPS keeps your agent alive, so if you close your laptop while traveling or shut down your desktop, your automation stops. For teams that need always-on, location-independent uptime, that's a dealbreaker.&lt;/p&gt;

&lt;h4&gt;
  
  
  Claude Cowork pricing
&lt;/h4&gt;

&lt;p&gt;Claude Cowork is available on the Pro plan at $20 per month. Max tiers at $100 per month (5x usage) and $200 per month (20x usage) unlock heavier workloads and full Claude Code access.&lt;/p&gt;

&lt;h4&gt;
  
  
  OpenClaw to Claude Cowork migration effort
&lt;/h4&gt;

&lt;p&gt;High. Migrating from OpenClaw to Claude Cowork requires a fundamental architecture shift. OpenClaw system prompts, headless scripts, and OAuth-based cloud integrations don't transfer to Cowork's desktop-first, file-access model. Existing autonomous workflows must be rebuilt around local file operations, MCP integrations, and scheduled tasks rather than remote API orchestration.&lt;/p&gt;

&lt;h4&gt;
  
  
  Bottom line
&lt;/h4&gt;

&lt;p&gt;Claude Cowork offers strong desktop automation with file access and scheduled tasks, but your agent stops running the moment your machine powers off. If you need always-on, location-independent uptime, KiloClaw runs 24/7 on managed cloud infrastructure regardless of whether your laptop is open.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lindy AI (no-code AI assistant)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fob8v9hx66q3111i9g01r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fob8v9hx66q3111i9g01r.png" alt="Lindy AI assistant homepage showing “Get two hours back every day” with inbox, meeting, and calendar automation messaging plus a mobile app interface for email and scheduling management." width="800" height="571"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Who Lindy AI is best for
&lt;/h4&gt;

&lt;p&gt;Non-technical operators, sales teams, customer service leads, and administrative staff who want a visual, no-code platform for deploying AI agents across text, voice, web, and phone channels without writing a single line of code.&lt;/p&gt;

&lt;h4&gt;
  
  
  Lindy AI overview
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://www.lindy.ai/" rel="noopener noreferrer"&gt;Lindy AI&lt;/a&gt; is a comprehensive no-code agentic platform. Users build specialized AI agents from natural language prompts in minutes. The platform spans text, web, voice, and phone automation with over 5,000 integrations, AI phone agents for inbound and outbound calls, and cloud-based computer use via its Autopilot feature. It focuses on visual workflow building and conversational onboarding, so users never touch configuration files or code.&lt;/p&gt;

&lt;h4&gt;
  
  
  How Lindy AI compares to OpenClaw
&lt;/h4&gt;

&lt;p&gt;OpenClaw gives developers full control over model selection, custom scripts, and raw infrastructure. Lindy replaces all of that with a visual builder where you map out integrations, conditional logic, and tool permissions.&lt;/p&gt;

&lt;p&gt;Lindy supports multiple models including Claude 4.x, GPT-5.x, and Gemini 3.x, and you select the model per agent. It also ships with a library of pre-packaged templates, so you can deploy a configured sales agent, customer service rep, or HR assistant right away.&lt;/p&gt;

&lt;p&gt;Lindy also supports the headless, two-way messaging pattern central to OpenClaw's appeal. Agents connect natively to Slack, Telegram, and WhatsApp, responding to incoming messages and executing workflows 24/7 on Lindy's cloud infrastructure. OpenClaw requires you to configure integrations through OAuth flows and webhook endpoints. Lindy handles that setup through its visual builder.&lt;/p&gt;

&lt;h4&gt;
  
  
  What you get with Lindy AI
&lt;/h4&gt;

&lt;p&gt;A gentle learning curve suitable for rapid adoption across the entire company, plus built-in human-in-the-loop approval for sensitive actions.&lt;/p&gt;

&lt;p&gt;For OpenClaw migrants, the key draw is that Lindy handles hosting, uptime, and integrations entirely in the cloud. Your agents run on Lindy's infrastructure, not on your desktop or your VPS.&lt;/p&gt;

&lt;p&gt;You also get capabilities OpenClaw doesn't offer natively, like AI phone agents and cloud-based browser automation. Teams whose primary use case is always-on messaging agents that triage inboxes, respond to customers, or route requests across channels get that without any infrastructure management.&lt;/p&gt;

&lt;h4&gt;
  
  
  Lindy AI limitations
&lt;/h4&gt;

&lt;p&gt;The platform sacrifices the raw power, deep customizability, and operational flexibility inherent to the open-source OpenClaw ecosystem. You can't inject custom Python scripts, execute arbitrary shell commands, or build bespoke edge-case integrations. While Lindy supports multiple models, it doesn't offer bring-your-own-key support, so you're working within the models and tiers Lindy provisions.&lt;/p&gt;

&lt;p&gt;The visual interface can become prescriptive, making complex developer workflows frustrating or impossible to implement. You also have less control over messaging behavior than OpenClaw provides. You can't write custom message parsing logic, implement bespoke routing rules in code, or fine-tune how the agent handles conversation edge cases.&lt;/p&gt;

&lt;p&gt;Lindy offers no custom code execution. You must build every workflow through the visual builder. For OpenClaw users accustomed to an agent that can code its way through edge cases and extend its own toolset, that's a fundamental capability gap.&lt;/p&gt;

&lt;h4&gt;
  
  
  Lindy AI pricing
&lt;/h4&gt;

&lt;p&gt;Lindy offers a free tier with 400 credits per month. Paid plans start at $19.99 per month for 2,000 credits (Starter), $49.99 per month for 5,000 credits plus 30 phone calls (Pro), and $299 per month for 30,000 credits plus 100 phone calls (Business). Additional credits cost $10 per 1,000. Compared to managed OpenClaw hosting, Lindy's credit-based model can scale costs quickly for high-volume autonomous workflows.&lt;/p&gt;

&lt;h4&gt;
  
  
  OpenClaw to Lindy AI migration effort
&lt;/h4&gt;

&lt;p&gt;High. Migrating from OpenClaw to Lindy requires deconstructing your existing autonomous logic, system prompts, and custom scripts, then rebuilding that behavior within Lindy's visual, no-code workflow builder. OpenClaw's raw script execution, direct model API access, and custom OAuth configurations have no direct equivalent in Lindy's abstraction layer.&lt;/p&gt;

&lt;h4&gt;
  
  
  Bottom line
&lt;/h4&gt;

&lt;p&gt;Lindy AI makes agent building accessible to non-technical teams through its visual builder, but you cannot execute custom code or build scripts that extend the agent's capabilities over time. If your workflows require the raw flexibility of OpenClaw's code execution model, KiloClaw preserves that power on fully managed infrastructure.  &lt;/p&gt;

&lt;h2&gt;
  
  
  How to migrate from self-hosted OpenClaw to a managed provider
&lt;/h2&gt;

&lt;p&gt;Migrating away from a self-hosted architecture doesn't have to mean lost workflows or operational downtime. With a structured plan for extracting and redeploying, you can transition your entire autonomous workforce smoothly and securely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Audit and export your OpenClaw workflows
&lt;/h3&gt;

&lt;p&gt;Before touching your new environment, document the specific communication channels, like Telegram or Slack, and the third-party tools your current self-hosted instance uses.&lt;/p&gt;

&lt;p&gt;Then export all custom system prompts, persona instructions, and memory files from your local workspace. Make sure you capture the agent's accumulated context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Set up your managed OpenClaw alternative
&lt;/h3&gt;

&lt;p&gt;Log into your chosen managed platform to begin the transition. For example, spin up your new KiloClaw workspace. The platform provisions isolated infrastructure in under two minutes..&lt;/p&gt;

&lt;p&gt;Once the workspace is active, paste your exported system prompts and behavioral instructions into the platform's configuration dashboard. These settings maintain agent continuity and personality.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Reconnect integrations using secure OAuth
&lt;/h3&gt;

&lt;p&gt;Don't copy over legacy environment files containing raw application keys. That defeats the purpose of upgrading your architecture.&lt;/p&gt;

&lt;p&gt;Instead, use the new platform's guided, secure OAuth flows. Connect your Google Workspace, GitHub repositories, and 1Password vaults via the secure UI. Let the platform manage and vault the new access tokens properly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Run in parallel and validate workflows
&lt;/h3&gt;

&lt;p&gt;Keep your legacy self-hosted instance running temporarily for operational stability, but isolate it to a muted test channel to prevent duplicate actions.&lt;/p&gt;

&lt;p&gt;Trigger your most common workflows, like preparing executive meetings or running deep research requests, within the newly provisioned KiloClaw environment. Verify integrations work and models perform correctly before shutting down your VPS.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Choosing the right OpenClaw alternative
&lt;/h2&gt;

&lt;p&gt;The OpenClaw framework has changed how we approach personal automation, proving that autonomous systems can handle complex, multi-step operations. But for professionals whose primary output is strategic execution, managing VPSs, patching Docker containers, and rotating fragile API tokens is a poor use of time.&lt;/p&gt;

&lt;p&gt;When choosing your deployment strategy, evaluate the total cost of ownership. Factor in your own hourly rate for mandatory server maintenance and security patching. You'll find that self-hosting costs more than a predictable managed SaaS subscription. The hidden DevOps tax quickly eclipses any perceived savings from renting raw compute.&lt;/p&gt;

&lt;p&gt;If you want the raw autonomous power of OpenClaw without the DevOps overhead, the security risks, or the rigid model constraints of proprietary platforms, &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;start your deployment with KiloClaw today&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;You can have an integrated, secure agent running in Slack or Telegram in under two minutes. Get back to the work that actually matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenClaw alternatives FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is OpenClaw safe to use for work?
&lt;/h3&gt;

&lt;p&gt;Self-hosted OpenClaw can be risky without strong sandboxing and strict permissions. Managed platforms like KiloClaw reduce risk through dedicated Firecracker VM isolation, AES-256 encrypted credential vaults, tool allow-lists, and no SSH access. KiloClaw's security architecture has been validated by an independent assessment with zero cross-tenant findings.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the difference between hosted OpenClaw and general AI assistants?
&lt;/h3&gt;

&lt;p&gt;General assistants vary widely. Some now offer always-on execution and two-way messaging, but they typically trade off developer-level control, model flexibility, and raw customizability compared to the OpenClaw framework.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can you switch AI models in an OpenClaw alternative?
&lt;/h3&gt;

&lt;p&gt;It depends on the provider. Some managed alternatives support model switching across multiple vendors, while many general assistants are locked to a single model ecosystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do you need Docker or DevOps to use an AI agent?
&lt;/h3&gt;

&lt;p&gt;Not if you choose a managed OpenClaw host. Self-hosting usually requires ongoing DevOps work (updates, OAuth maintenance, monitoring, security patching).&lt;/p&gt;

&lt;h3&gt;
  
  
  When does self-hosting OpenClaw still make sense?
&lt;/h3&gt;

&lt;p&gt;When you need air-gapped/offline operation, you're doing research experiments, or you have dedicated DevOps/SecOps to maintain and secure the stack.&lt;/p&gt;

&lt;h3&gt;
  
  
  How hard is it to migrate from self-hosted OpenClaw to a managed host?
&lt;/h3&gt;

&lt;p&gt;Usually straightforward: export prompts/memory, re-connect tools via OAuth, and test in parallel. Avoid copying raw environment files with tokens; re-authenticate securely instead.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the real cost difference between self-hosting and managed OpenClaw hosting?
&lt;/h3&gt;

&lt;p&gt;Self-hosting often looks cheaper in compute costs but becomes expensive in engineering time, security work, and integration maintenance. Managed hosting like KiloClaw trades that DevOps overhead for a predictable subscription.&lt;/p&gt;

&lt;h3&gt;
  
  
  Will a general AI assistant replace OpenClaw for business automation?
&lt;/h3&gt;

&lt;p&gt;It depends on your requirements. Some general assistants now offer always-on execution and deep integrations, but they typically lack OpenClaw's raw customizability, custom code execution, bring-your-own-key support, and developer-level control over agent behavior and orchestration logic.  &lt;/p&gt;

</description>
      <category>openclaw</category>
      <category>ai</category>
      <category>agents</category>
      <category>productivity</category>
    </item>
    <item>
      <title>The Prompt Injection Problem: A Guide to Defense-in-Depth for AI Agents</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Wed, 25 Feb 2026 21:09:22 +0000</pubDate>
      <link>https://forem.com/manveer_chawla_64a7283d5a/the-prompt-injection-problem-a-guide-to-defense-in-depth-for-ai-agents-3p1</link>
      <guid>https://forem.com/manveer_chawla_64a7283d5a/the-prompt-injection-problem-a-guide-to-defense-in-depth-for-ai-agents-3p1</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prompt injection is an architecture problem, not a benchmarking problem.&lt;/strong&gt; Anthropic's Sonnet 4.6 system card shows 8% one-shot attack success rate in computer use with all safeguards on, and 50% with unbounded attempts. In coding environments, the same model hits 0%. The difference is the environment, not the model.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training won't fix prompt injection.&lt;/strong&gt; Instructions and data share the same context window. SQL injection for the LLM era requires an architectural fix, not a behavioral one.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The "lethal trifecta" is the threat model.&lt;/strong&gt; When your agent has tools, processes untrusted input, and holds sensitive access, all three at once, prompt injection becomes catastrophic. Almost every use case people want hits all three.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build the kill chain around the model.&lt;/strong&gt; A five-layer defense (permission boundaries, action gating, input sanitization, output monitoring, blast radius containment) turns the question from "will injection happen" to "how bad when it does."
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Defense-in-depth constrains the autonomy ceiling.&lt;/strong&gt; Agents that need human review for irreversible actions don't replace humans. They augment them. The companies winning here redesign the loop, not remove the human from it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anthropic published the &lt;a href="https://www.anthropic.com/news/claude-3-5-sonnet" rel="noopener noreferrer"&gt;Claude Sonnet 4.6 system card&lt;/a&gt; on February 17, 2026. Buried in the safety evaluations is a number that should change how every engineering team thinks about deploying agentic AI.&lt;/p&gt;

&lt;p&gt;With every safeguard enabled, including extended thinking, automated adversarial attacks still achieve a successful prompt injection takeover &lt;strong&gt;8% of the time on the first attempt&lt;/strong&gt; in computer use environments. Scale to unbounded attempts and the success rate climbs to &lt;strong&gt;50%&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Here's what makes this number genuinely interesting, not just alarming. In coding environments with the same model and the same extended thinking, the attack success rate drops to &lt;strong&gt;0.0%&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Zero. The model didn't get smarter between these two evaluations. The environment changed.&lt;/p&gt;

&lt;p&gt;Coding environments have structured inputs: code, terminal output, API responses with defined schemas. Computer use environments encounter arbitrary untrusted content: web pages, emails, calendar invites, documents with hidden text, DOM elements with embedded instructions.&lt;/p&gt;

&lt;p&gt;The difference isn't the model. It's the attack surface.&lt;/p&gt;

&lt;p&gt;A commenter in a Hacker News thread on the system card put it bluntly: "That seems wildly unacceptable. This tech is just a non-starter unless I'm misunderstanding."&lt;/p&gt;

&lt;p&gt;He's not misunderstanding. He's looking for the solution in the wrong place.&lt;/p&gt;

&lt;p&gt;When I built Zenith's own agent infrastructure, I made the same mistake. I assumed model improvements would close the gap. They won't. Not fully.&lt;/p&gt;

&lt;p&gt;The solution isn't a better model. It's a better architecture around the model.&lt;/p&gt;

&lt;p&gt;This post explains why prompt injection is an architecture problem, defines precisely where the risk concentrates, and lays out a five-layer defense framework for teams shipping agents into production.&lt;/p&gt;

&lt;h2&gt;
  
  
  When is Prompt Injection Most Dangerous? The Lethal Trifecta
&lt;/h2&gt;

&lt;p&gt;Not every agent deployment carries the same risk. Understanding exactly where risk concentrates determines where you invest engineering effort.&lt;/p&gt;

&lt;p&gt;Simon Willison coined the term "lethal trifecta" to describe the combination of capabilities that makes an agent critically vulnerable to prompt injection. An agent enters the danger zone when three conditions occur simultaneously.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The agent has access to tools.&lt;/strong&gt; The agent can take actions: send emails, execute code, click buttons, call APIs, move money.&lt;/p&gt;

&lt;p&gt;A model that only generates text in a chat window can't cause real-world harm through prompt injection. The moment the model gains the ability to act on systems, the stakes change categorically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The agent processes untrusted input.&lt;/strong&gt; The agent reads content it didn't generate: web pages, incoming emails, documents uploaded by third parties, API responses from external services, calendar invites from unknown senders.&lt;/p&gt;

&lt;p&gt;Any content the agent ingests that an attacker could have influenced counts as untrusted input.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The agent has access to sensitive data or capabilities.&lt;/strong&gt; The agent can reach credentials, PII, financial systems, internal APIs, private documents, or anything else that causes damage if exfiltrated or misused.&lt;/p&gt;

&lt;p&gt;Any two out of three is manageable. An agent with tools and sensitive access but no untrusted input (an internal automation bot processing only your own data) is reasonably safe.&lt;/p&gt;

&lt;p&gt;An agent processing untrusted input with sensitive access but no tools (a summarization engine reading external documents) can't act on injected instructions.&lt;/p&gt;

&lt;p&gt;An agent with tools and untrusted input but no sensitive access (a web scraper writing to a sandbox) has limited blast radius.&lt;/p&gt;

&lt;p&gt;All three together is where prompt injection becomes catastrophic. And almost every use case people want involves all three.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Tools?&lt;/th&gt;
&lt;th&gt;Untrusted Input?&lt;/th&gt;
&lt;th&gt;Sensitive Access?&lt;/th&gt;
&lt;th&gt;Risk Level&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Summarize a doc I uploaded&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Safe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Browse the web for research&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Safe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Send emails on my behalf&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Manageable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read my emails and reply&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Lethal&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Browse web + write code in my repo&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Lethal&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fill out forms on websites&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Depends&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Likely lethal&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Computer use (general)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Lethal&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The "safe zone" is far narrower than most deployment plans assume. During the HN discussion, one commenter tried to argue for a narrow safe zone limited to internal apps with no external input. Another correctly shot it down: even a calendar invite can contain injection text. Even a PDF from a trusted colleague can carry hidden white-on-white text with embedded instructions.&lt;/p&gt;

&lt;p&gt;The Notion 3.0 incident proved this threat is real. Attackers used exactly that technique (hidden text in PDFs) to instruct the Notion AI agent to use its web search tool and exfiltrate client lists and financial data to an attacker-controlled domain.&lt;/p&gt;

&lt;p&gt;The EchoLeak vulnerability (&lt;a href="https://securiti.ai/blog/echoleak-cve-2025-32711-how-indirect-prompt-injections-exploit-the-ai-layer-and-how-to-secure-your-data/" rel="noopener noreferrer"&gt;CVE-2025-32711&lt;/a&gt;) against Microsoft 365 Copilot was even worse: a zero-click indirect injection via a poisoned email enabled remote exfiltration of emails, OneDrive files, and Teams chats. No user interaction required.&lt;/p&gt;

&lt;p&gt;Meta has operationalized this threat model through their "Agents Rule of Two" policy, mandating human-in-the-loop supervision whenever all three conditions are met. That's the right starting point for any team deploying agents against untrusted content.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "train it away" won't work
&lt;/h2&gt;

&lt;p&gt;The natural response to the 8% number is to assume the next model generation will fix the problem. If training improved resistance from 50% to 8%, surely continued training will push it to 0%.&lt;/p&gt;

&lt;p&gt;I held this view for a while. I was wrong.&lt;/p&gt;

&lt;p&gt;Prompt injection is fundamentally different from content moderation. Content moderation (blocking harmful outputs, refusing dangerous requests) operates on the semantics of what the model produces. Prompt injection operates on the control plane: the model can't reliably distinguish between "instructions from the user" and "instructions embedded in content the user asked it to read" because both arrive as tokens in the same context window.&lt;/p&gt;

&lt;p&gt;The security community spent decades eliminating in-band signaling vulnerabilities. SQL injection existed because queries and data shared the same channel. XSS existed because code and content shared the same rendering context. Command injection existed because shell commands and arguments shared the same string.&lt;/p&gt;

&lt;p&gt;In every case, the fix was architectural: parameterized queries, content security policies, structured argument passing. The fix was never "train the database to be smarter about distinguishing queries from data."&lt;/p&gt;

&lt;p&gt;LLMs have reintroduced in-band signaling at a fundamental architectural level. Trusted instructions (system prompts, user messages) and untrusted data (web page content, email bodies, document text) get concatenated into a single context window and processed by the same transformer mechanism.&lt;/p&gt;

&lt;p&gt;There's no equivalent of a parameterized query. Karpowicz's Impossibility Theorem (June 2025) formalizes this argument, claiming that no LLM can simultaneously guarantee truthfulness and semantic conservation, making manipulation a mathematical certainty under adversarial conditions. &lt;a href="https://owasp.org/www-project-top-10-for-large-language-model-applications/" rel="noopener noreferrer"&gt;OWASP's Top 10 for LLM Applications&lt;/a&gt; ranks prompt injection as the number one vulnerability for the second consecutive year, explicitly noting that defenses like RAG and fine-tuning don't fully mitigate the risk.&lt;/p&gt;

&lt;p&gt;Training against prompt injection is an arms race with infinite surface area. You can train the model to resist "ignore previous instructions." Straightforward. But the attack space is unbounded.&lt;/p&gt;

&lt;p&gt;Attackers encode instructions in base64. They hide them in image metadata. They use semantic persuasion that never directly says "ignore your instructions" but achieves the same effect through narrative framing. They embed instructions in white-on-white text in PDFs, in HTML comments, in alt text on images, in Unicode characters that render invisibly.&lt;/p&gt;

&lt;p&gt;Advanced training techniques like Meta's SecAlign++ have reduced attack success rates on the InjecAgent benchmark from 53.8% to 0.5%. Impressive. But when researchers test those same defenses against adaptive, optimization-based attacks (GCG, TAP), attackers still achieve 98% success rates against defended models.&lt;/p&gt;

&lt;p&gt;The defenses work against known patterns. The attacker always gets to choose new ones.&lt;/p&gt;

&lt;p&gt;Resistance rates asymptote. They don't converge to zero. Going from 50% to 8% one-shot success rate is substantial progress. Going from 8% to 0% may be impossible with current transformer architectures because the model processes instructions and content through the same mechanism.&lt;/p&gt;

&lt;p&gt;The coding environment achieves 0% not because the model is smarter in that context, but because the environment constrains inputs to structured formats where injection is syntactically detectable. The 0% comes from environmental structure, not model robustness.&lt;/p&gt;

&lt;p&gt;8% on first attempt means near-certainty over sessions. If your agent runs 50 tasks per day and each task involves processing untrusted content, 8% per-attempt means the agent gets compromised roughly 4 times per day.&lt;/p&gt;

&lt;p&gt;Over a five-day work week, compromise is a statistical certainty. Over a month, you're looking at roughly 80 successful injection events. The question isn't whether the agent will be compromised. The question is how much damage each compromise causes.&lt;/p&gt;

&lt;p&gt;You can't train your way out of an architectural vulnerability.&lt;/p&gt;

&lt;p&gt;Prompt injection resistance training isn't useless. Moving from 50% to 8% is the difference between "trivially exploitable" and "requires effort." That effort buys time for architectural defenses to catch what gets through. But treating model-level resistance as the primary defense is building on sand.&lt;/p&gt;

&lt;h2&gt;
  
  
  A 5-Layer Defense-in-Depth Architecture for Prompt Injection
&lt;/h2&gt;

&lt;p&gt;If you accept that the model can't be fully trusted, the engineering question becomes: what do you build around the model?&lt;/p&gt;

&lt;p&gt;Defense in depth. No single layer is expected to be perfect. Each layer catches what the previous one missed. The system succeeds when no single failure is catastrophic.&lt;/p&gt;

&lt;p&gt;A five-layer model defines this defense. Each layer operates independently, so a failure in one doesn't cascade into the others.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Permission boundaries (least privilege)
&lt;/h3&gt;

&lt;p&gt;The agent should never have more permissions than the specific task requires. The default in most agent frameworks grants broad access at session initialization and leaves the access active for the entire session. That's the equivalent of giving every microservice root access to your database.&lt;/p&gt;

&lt;p&gt;Implement per-task capability grants, not session-wide permissions. An agent browsing the web for research shouldn't simultaneously hold credentials to send email. An agent drafting a document shouldn't have access to the financial transaction API.&lt;/p&gt;

&lt;p&gt;Each task invocation should receive a scoped set of permissions that get revoked when the task completes.&lt;/p&gt;

&lt;p&gt;The cloud providers have started building real infrastructure for this pattern. &lt;a href="https://aws.amazon.com/bedrock/agentcore/" rel="noopener noreferrer"&gt;AWS Bedrock AgentCore&lt;/a&gt;, &lt;a href="https://learn.microsoft.com/en-us/entra/agent-id/" rel="noopener noreferrer"&gt;Microsoft Entra Agent ID&lt;/a&gt;, and &lt;a href="https://cloud.google.com/vertex-ai/docs/agent-engine/agent-identity" rel="noopener noreferrer"&gt;Google Native Agent Identities&lt;/a&gt; all provide distinct, manageable identities for agents, treating them as Non-Human Identities (NHIs) with their own RBAC and ABAC controls.&lt;/p&gt;

&lt;p&gt;The critical implementation detail is Just-in-Time (JIT) access: credentials should be short-lived (15-minute TTL is a reasonable starting point) and task-scoped. If an injection succeeds but the compromised session holds a token that expires in 12 minutes and can only read from a single S3 bucket, the blast radius is contained.&lt;/p&gt;

&lt;p&gt;For code execution, sandboxing remains essential. Firecracker microVMs and gVisor provide hardware-level isolation that prevents a compromised agent from escaping its execution environment. AWS Bedrock AgentCore already uses microVMs for session isolation. This is table stakes for any agent that executes code or interacts with a filesystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: Action classification and gating
&lt;/h3&gt;

&lt;p&gt;Not all agent actions carry equal risk. Reading a web page is fundamentally different from sending an email, which is fundamentally different from executing a financial transaction. Your defense architecture should reflect this difference.&lt;/p&gt;

&lt;p&gt;Classify every tool available to the agent into risk tiers. &lt;strong&gt;Read-only actions&lt;/strong&gt; (fetching web pages, reading documents, querying databases) are low risk and can proceed autonomously.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reversible writes&lt;/strong&gt; (creating draft emails, writing to staging environments, adding items to a list) are medium risk. Log them with automatic rollback on anomaly detection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Irreversible actions&lt;/strong&gt; (sending emails, financial transactions, deleting data, publishing content, modifying access controls) are high risk and require human confirmation or second-model review before execution.&lt;/p&gt;

&lt;p&gt;This pattern isn't new. AWS Bedrock Agents ships with "Action Approval" as a built-in feature. Microsoft Copilot Studio has "User Confirmation" for sensitive actions.&lt;/p&gt;

&lt;p&gt;The engineering work is in the classification, not the gating mechanism. Every tool the agent can call needs to be categorized, and the categorization needs to be conservative. When in doubt, gate the action.&lt;/p&gt;

&lt;p&gt;The second-model review pattern deserves specific attention. Instead of (or in addition to) human review, a separate model instance with a different system prompt evaluates proposed irreversible actions. This model has no context about the current task beyond the proposed action itself and simply asks: does this action make sense given the stated task? Does the action access resources outside the expected scope? Does the action match known attack patterns?&lt;/p&gt;

&lt;p&gt;This pattern isn't foolproof (both models share architectural vulnerabilities), but it adds friction that significantly raises the cost of a successful attack.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: Input sanitization and segmentation
&lt;/h3&gt;

&lt;p&gt;Treat untrusted content as a separate context segment with reduced authority. If you can't fully separate instructions from data architecturally, at least create soft boundaries that make injection harder.&lt;/p&gt;

&lt;p&gt;Strip or neutralize potential instruction patterns in ingested content before the content enters the model's context window. Remove HTML comments. Strip metadata that could contain instructions. Convert rich text to plain text where formatting isn't needed. Flag content that contains patterns matching known injection techniques.&lt;/p&gt;

&lt;p&gt;More sophisticated approaches use role-tagged formats (like ChatML) or special delimiters to create boundaries between trusted instructions and untrusted data. Frameworks like CaMel enforce separation at a deeper level, preventing data from untrusted sources from being used as arguments in dangerous function calls.&lt;/p&gt;

&lt;p&gt;The model can read the content and reason about it, but the framework blocks the model from treating that content as executable instructions.&lt;/p&gt;

&lt;p&gt;This layer is inherently imperfect. Stripping everything that could possibly be an injection also destroys legitimate content. The goal isn't perfection. The goal is raising the bar high enough that attacks bypassing input sanitization are more likely to be caught by output monitoring (Layer 4) or contained by blast radius controls (Layer 5).&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 4: Output monitoring and anomaly detection
&lt;/h3&gt;

&lt;p&gt;Monitor the agent's actions in real-time against a behavioral baseline. Flag deviations before they cause damage.&lt;/p&gt;

&lt;p&gt;Watch for several categories of anomaly. &lt;strong&gt;Unexpected tool calls&lt;/strong&gt;: if the agent is tasked with summarizing a document and attempts to call an email send function, that's a red flag.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resource access outside scope&lt;/strong&gt;: if the agent is browsing a specific website and attempts to hit an internal API endpoint, terminate the session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data exfiltration patterns&lt;/strong&gt;: if the agent constructs a URL containing what appears to be encoded data and tries to fetch the URL, that matches a known exfiltration technique. The EchoLeak attack against Microsoft 365 Copilot used exactly this pattern, encoding stolen data in image URL parameters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Behavioral discontinuities&lt;/strong&gt;: a sudden shift in the agent's action patterns mid-session, particularly after ingesting new untrusted content, suggests injection may have occurred.&lt;/p&gt;

&lt;p&gt;The architecture needs kill switches that halt the agent immediately on high-confidence anomaly detection and escalate to a human. This has to be a hard stop, not a suggestion. The OWASP GenAI Incident Response Guide recommends identifying compromised sessions via trace ID, issuing revoke commands to block further tool calls, and preserving the context window for forensics.&lt;/p&gt;

&lt;p&gt;Integration with existing security infrastructure matters. Agent action logs should feed into your SIEM. Anomaly detection rules should trigger the same incident response workflows as any other security event. Configure alerts for "impossible toolchains" (sequences of tool calls that no legitimate task would produce) and high-velocity looping (an agent calling the same tool repeatedly in a way that suggests the agent is stuck in an injection-induced loop).&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 5: Blast radius containment
&lt;/h3&gt;

&lt;p&gt;Layers 1 through 4 reduce the probability and speed of a successful attack. Layer 5 limits the damage when an attack succeeds. Because eventually, one will.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Network segmentation.&lt;/strong&gt; The agent's compute environment should not have unrestricted network access. Deploy agents within private network perimeters (VPC Service Controls on Google Cloud, PrivateLink on AWS) with default-deny egress rules. The agent can reach only the specific endpoints required for its current task.&lt;/p&gt;

&lt;p&gt;If a compromised agent tries to exfiltrate data to an attacker-controlled domain, the network layer blocks the attempt regardless of what the model has been tricked into doing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Credential isolation.&lt;/strong&gt; The agent uses scoped, short-lived tokens. Never long-lived API keys or static credentials. If a session is compromised, the attacker gets a token that expires in minutes and can only perform a narrow set of operations.&lt;/p&gt;

&lt;p&gt;The Google Antigravity IDE incident demonstrated what happens without this protection. A poisoned web guide combined with a browser subagent that had a permissive domain allowlist (including webhook.site) enabled theft of AWS keys from .env files. Short-lived, tightly scoped credentials would have eliminated the entire attack vector.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session isolation.&lt;/strong&gt; Compromise of one agent session must not propagate to others. Each task runs in its own isolated environment with its own credentials, its own network rules, and its own filesystem. No shared state between sessions means no lateral movement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Audit logging.&lt;/strong&gt; Every action the agent takes gets recorded with full context: the input that preceded the action, the tool called, the parameters passed, the result returned. This serves two purposes: forensic analysis after an incident, and pattern detection across sessions that may reveal slower, more sophisticated attacks that evade real-time monitoring.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example Blueprint: Securing an Email Agent
&lt;/h2&gt;

&lt;p&gt;Abstract architectures are useful for framing. Concrete implementations are useful for building. Here's how the five-layer model applies to one of the most requested and most dangerous agentic workflows: an agent that reads your email and drafts replies.&lt;/p&gt;

&lt;p&gt;This use case hits the full lethal trifecta. The agent has tools (drafting and potentially sending email). The agent processes untrusted input (incoming email bodies, which any external sender controls). The agent has access to sensitive data (your inbox, your contacts, your organizational context).&lt;/p&gt;

&lt;p&gt;EchoLeak proved this attack surface is real and actively exploited.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Permission boundaries.&lt;/strong&gt; The agent gets read access to the inbox and draft-only write access. The agent can't send emails, only create drafts. The agent has no access to calendars, file storage, or contacts beyond the current thread. Its OAuth token is task-scoped and expires after 15 minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Action gating.&lt;/strong&gt; Drafts are created but never sent without human review. The agent can't modify email filters, forwarding rules, or account settings. Any attempt to call a tool outside the approved set terminates the session immediately. Moving a draft to the outbox is classified as irreversible and requires explicit human approval.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Input sanitization.&lt;/strong&gt; Incoming email bodies are pre-processed before the agent sees them. HTML converts to plain text. Embedded images get stripped (preventing pixel-based exfiltration). Content matching known injection patterns (directives, base64-encoded blocks, invisible Unicode characters) is flagged and either stripped or presented with an explicit warning marker.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Output monitoring.&lt;/strong&gt; If the agent attempts to access any URL, API, or resource not on the allowlist for email operations, the session terminates. If the agent constructs a draft containing what appears to be encoded data in URLs (the EchoLeak exfiltration pattern), the draft gets quarantined for human review. If behavior shifts discontinuously after processing a specific email, that email is flagged as potentially adversarial.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Blast radius containment.&lt;/strong&gt; The agent runs in an isolated sandbox with no filesystem access beyond its working directory. Network egress is restricted to the email provider's API endpoints. The OAuth token covers read + draft-create, not full mailbox access.&lt;/p&gt;

&lt;p&gt;If every other layer fails and the agent is fully compromised, the attacker can create draft emails (which the human reviews before sending) and read emails already in the inbox (which is the scope the agent was legitimately granted). The damage ceiling is defined and bounded.&lt;/p&gt;

&lt;p&gt;This architecture doesn't make the agent invulnerable. This architecture makes the agent fail safely.&lt;/p&gt;

&lt;p&gt;The difference between "an injection that creates a weird draft the human deletes" and "an injection that silently exfiltrates your entire inbox" is entirely about the architecture sitting around the model.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means for the "replace all workers" narrative
&lt;/h2&gt;

&lt;p&gt;The prompt injection problem directly constrains the labor displacement ceiling for agentic AI. Understanding this constraint matters for teams making investment decisions about agent deployments.&lt;/p&gt;

&lt;p&gt;Agents that require human oversight for irreversible actions can't replace humans. They augment them. The supervision requirement scales with risk, not with task volume.&lt;/p&gt;

&lt;p&gt;An agent that autonomously handles 200 low-risk email drafts per day while a human reviews 15 high-risk ones is a massive productivity gain. But it's a different value proposition than "we replaced the person who used to do email."&lt;/p&gt;

&lt;p&gt;I see this playing out with our clients at Zenith constantly. The near-term reality isn't autonomous agents replacing knowledge workers. It's a redesigned workflow where agents handle high-volume, lower-risk tasks autonomously while humans focus on decisions where the cost of error is high: sending the email, approving the transaction, publishing the content, granting the access.&lt;/p&gt;

&lt;p&gt;The companies extracting real value from agents aren't removing humans from the loop. They're redesigning the loop so that humans review only what matters while agents handle the rest.&lt;/p&gt;

&lt;p&gt;The adoption numbers tell the same story. PwC reports that 79% of executives are adopting agents, but 34% cite cybersecurity as their top barrier. An S&amp;amp;P Global report found that 42% of companies abandoned AI initiatives entirely, with security risks as the primary driver.&lt;/p&gt;

&lt;p&gt;The organizations that push through aren't the ones that found a way to make agents safe enough for full autonomy. They're the ones that built architectures where the agent doesn't need full autonomy to be valuable.&lt;/p&gt;

&lt;p&gt;Summarize some text while I supervise is a productivity improvement. Replace me with autonomous decisions is liability chaos.&lt;/p&gt;

&lt;p&gt;The security constraint isn't a bug in the adoption curve. The security constraint defines the shape of the adoption curve.&lt;/p&gt;

&lt;h2&gt;
  
  
  The model is the weakest link. Build around the model.
&lt;/h2&gt;

&lt;p&gt;Security engineers have known for decades that you don't build your security posture around the assumption that any single component is bulletproof. You assume every layer can fail and design the system so that no single failure is catastrophic.&lt;/p&gt;

&lt;p&gt;The 8% number isn't a reason to avoid deploying agentic AI. The 8% number is a reason to stop treating the model as the security boundary and start treating the model as what it is: a powerful but unreliable component that needs guardrails, monitoring, and containment.&lt;/p&gt;

&lt;p&gt;The model will keep getting better at resisting prompt injection. That 8% will probably drop. But it won't hit zero. Not with current architectures, and possibly not ever.&lt;/p&gt;

&lt;p&gt;Build accordingly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Frequently Asked Questions (FAQ)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is prompt injection?
&lt;/h3&gt;

&lt;p&gt;Prompt injection is a security vulnerability where an attacker manipulates a large language model (LLM) by embedding malicious instructions into the content the model processes. This attack can trick the AI agent into performing unintended actions, such as leaking sensitive data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why is prompt injection a major security risk?
&lt;/h3&gt;

&lt;p&gt;Prompt injection becomes a major risk when three conditions are met (the "lethal trifecta"): the AI agent can use tools (like sending emails), processes untrusted input (like web pages or documents), and has access to sensitive data. This combination allows an attacker to take control of the agent to exfiltrate data or cause harm.&lt;/p&gt;

&lt;h3&gt;
  
  
  How can you protect AI agents from prompt injection?
&lt;/h3&gt;

&lt;p&gt;Protection requires a defense-in-depth architecture. This architecture includes five key layers: implementing strict permission boundaries, gating high-risk actions, sanitizing inputs, monitoring outputs for anomalies, and containing the blast radius with network and credential isolation.  &lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>security</category>
      <category>promptengineering</category>
    </item>
    <item>
      <title>ACID compliance in data analytics platforms: what it is, why it matters, and how to verify it (2026)</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Tue, 17 Feb 2026 19:00:14 +0000</pubDate>
      <link>https://forem.com/manveer_chawla_64a7283d5a/acid-compliance-in-data-analytics-platforms-what-it-is-why-it-matters-and-how-to-verify-it-2026-38kj</link>
      <guid>https://forem.com/manveer_chawla_64a7283d5a/acid-compliance-in-data-analytics-platforms-what-it-is-why-it-matters-and-how-to-verify-it-2026-38kj</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ACID matters in 2026 analytics&lt;/strong&gt; because warehouses now power operational workflows (Reverse ETL, AI agents, user-facing apps). Dirty reads and inconsistent snapshots become real business incidents.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ACID is enforced via MVCC + isolation levels + a transactional metadata layer&lt;/strong&gt; (the "ACID" often happens in metadata, not in data).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Most cloud warehouses optimize for concurrency and may default to weaker isolation (often READ COMMITTED),&lt;/strong&gt; which can cause anomalies in multi-step transformations.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lakehouse table formats (Iceberg/Delta) can be ACID, but pay a "maintenance tax"&lt;/strong&gt; (small files, compaction/vacuum, metadata bloat).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scale-up/hybrid engines (DuckDB/MotherDuck) deliver fast commits and strong consistency&lt;/strong&gt; by keeping transaction management close to compute (WAL/MVCC) and avoiding distributed metadata latency.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trust but verify:&lt;/strong&gt; run dirty read, lost update, recovery, and "surprise bill" micro-transaction tests to validate correctness &lt;em&gt;and&lt;/em&gt; cost.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You've probably stared at a dashboard where the "Total Revenue" KPI didn't match the sum of the line items below it. Or maybe you've debugged a "Single Source of Truth" table that mysteriously dropped rows during a high-traffic ingestion window.&lt;/p&gt;

&lt;p&gt;In the past, we called these glitches and moved on. We accepted that the analytical data were eventually consistent because it was historically backward-looking. A report generated at midnight didn't need to reflect a transaction that happened at 11:59:59 PM.&lt;/p&gt;

&lt;p&gt;But in 2026, "good enough" consistency is dead. Analytics isn't a read-only discipline anymore.&lt;/p&gt;

&lt;p&gt;Data warehouses now power operational workflows via Reverse ETL, feed live AI agents, and serve user-facing analytics in real-time applications. When a warehouse drives a marketing automation tool or a customer-facing billing portal, a "dirty read" isn't just a glitch. It's a compliance violation, a lost customer, or a triggered support incident.&lt;/p&gt;

&lt;p&gt;This guide goes beyond the textbook definitions of Atomicity, Consistency, Isolation, and Durability. We'll look at how modern platforms, from decoupled cloud warehouses like Snowflake to open table formats like Iceberg and hybrid engines like &lt;a href="https://motherduck.com" rel="noopener noreferrer"&gt;MotherDuck&lt;/a&gt;, mechanically guarantee trust. We'll dig into the hidden costs of these architectures and give you a framework for verifying that your "Single Source of Truth" isn't actually a lie.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why ACID compliance matters for modern analytics (and how it works)
&lt;/h2&gt;

&lt;p&gt;The "Big Data" era trained us to accept eventual consistency. When you processed petabytes of logs with Hadoop, the count could be off by 1% for a few hours. Those systems were designed for massive throughput, not transactional precision.&lt;/p&gt;

&lt;p&gt;But the "Big Data" hangover has cleared. We're facing a new reality: &lt;strong&gt;Operational Analytics&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Tools like dbt (data build tool) and Reverse ETL platforms have transformed the data warehouse from a passive closet into an active nervous system. Pipelines now target freshness windows of 1 to 60 minutes. Marketing activation and sales operations demand data that's accurate &lt;em&gt;right now&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;If your warehouse feeds a CRM, and that CRM triggers a "Welcome" email based on a signup event, the underlying storage layer must guarantee that the signup record is fully committed and visible before the email trigger fires. You can't have reliable Data Governance or a semantic layer if the underlying storage can't guarantee atomic commits. Without ACID, your "governed metrics" are just suggestions subject to race conditions.&lt;/p&gt;

&lt;h3&gt;
  
  
  How MVCC enables ACID compliance (and what isolation levels mean)
&lt;/h3&gt;

&lt;p&gt;To understand how modern platforms solve this problem, we need to look beyond the acronym and examine the implementation standard: &lt;strong&gt;Multi-Version Concurrency Control (MVCC)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://motherduck.com/blog/open-lakehouse-stack-duckdb-table-formats/" rel="noopener noreferrer"&gt;DuckDB&lt;/a&gt;, Snowflake, and Postgres all use MVCC to handle high concurrency without locking the entire system. In a naive database, a writer might lock a table to update it, forcing all readers to wait. In an MVCC system, the database maintains multiple versions of the data simultaneously.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Reader:&lt;/strong&gt; When you run a &lt;code&gt;SELECT&lt;/code&gt; query, the database takes a logical "snapshot" of the data at that specific moment. You see the state of the world as it existed when your query began.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Writer:&lt;/strong&gt; When a pipeline runs an &lt;code&gt;UPDATE&lt;/code&gt;, it creates a &lt;em&gt;new&lt;/em&gt; version of the rows (or files) rather than overwriting the old ones.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvo7u8tod05se3kw9fepm.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvo7u8tod05se3kw9fepm.jpg" alt="Transaction A starts at T1. Transaction B starts writing at T2. Transaction A continues reading. Because A is pinned to the T1 snapshot, A doesn't see B's partial work or even B's committed work until A finishes and starts a new transaction." width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This versioning lets readers and writers coexist without blocking each other. But MVCC alone isn't enough. The database must also enforce &lt;strong&gt;Isolation Levels&lt;/strong&gt;. Isolation isn't a binary "on/off" switch. It's a spectrum of guarantees that trades performance for correctness.&lt;/p&gt;

&lt;h4&gt;
  
  
  Isolation levels explained: read uncommitted vs read committed vs snapshot vs serializable
&lt;/h4&gt;

&lt;p&gt;Different business risks map to different isolation levels. Understanding this hierarchy is critical for evaluating platforms, since many cloud warehouses default to lower levels to maximize concurrency.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Isolation level&lt;/th&gt;
&lt;th&gt;Definition&lt;/th&gt;
&lt;th&gt;Business risk prevented&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Read Uncommitted&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;You can see data that hasn't been committed yet.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Dirty Reads:&lt;/strong&gt; A dashboard shows revenue from an order that fails and rolls back 1 second later.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Read Committed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;You only see data committed before your statement began.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Dirty Reads.&lt;/strong&gt; &lt;em&gt;Note: This is the default for Snowflake and many major warehouses.&lt;/em&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Snapshot Isolation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;You see a consistent snapshot of data as of the start of your &lt;em&gt;transaction&lt;/em&gt;.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Non-Repeatable Reads:&lt;/strong&gt; Running the same query twice in a transaction yields different results because a background job updated the table.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Serializable&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The strictest level. It simulates running transactions one at a time.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Phantom Reads:&lt;/strong&gt; A query counting rows returns different numbers because a new row was inserted by another process.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Snapshot Isolation and Serializable offer stronger guarantees, but they come with performance costs. Many decoupled cloud warehouses, including Snowflake, support &lt;code&gt;READ COMMITTED&lt;/code&gt; for standard tables.&lt;/p&gt;

&lt;p&gt;This isolation level means that if you have a multi-statement transaction (say, a dbt model with multiple steps), two successive queries within that same transaction could return different results if a separate pipeline commits data in between them. For complex transformation logic, READ COMMITTED can introduce subtle, hard-to-debug data anomalies.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where ACID actually happens: the metadata transaction layer
&lt;/h3&gt;

&lt;p&gt;If the data files (Parquet, micro-partitions) are immutable, where does the "ACID" actually happen? In the &lt;strong&gt;metadata&lt;/strong&gt;. The difference between a loose collection of files and a database table is a transactional metadata layer that tells the engine which files belong to the current version.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;In Cloud Warehouses (Snowflake/BigQuery):&lt;/strong&gt; A centralized, proprietary metadata service acts as the "brain." It manages locks and versions. Snowflake, for example, uses &lt;a href="https://www.snowflake.com/en/blog/how-foundationdb-powers-snowflake-metadata-forward/" rel="noopener noreferrer"&gt;FoundationDB&lt;/a&gt; (a distributed key-value store) to track every micro-partition.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;In Table Formats (Iceberg/Delta/DuckLake):&lt;/strong&gt; The file system (S3/Object Storage) combined with a catalog acts as the source of truth. They rely on atomic file swaps or optimistic concurrency control to manage versions.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;In Scale-Up Engines (DuckDB/MotherDuck):&lt;/strong&gt; Transaction management is handled in-process using a Write-Ahead Log (WAL). Because the compute and transaction manager are tightly coupled, commits are near-instant. No network latency from external metadata services.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Three ways analytics platforms implement ACID compliance (2026)
&lt;/h2&gt;

&lt;p&gt;There's no single "best" way to implement ACID. Three dominant architectures prevail, each optimizing for a different constraint: scale, openness, or latency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 1: Decoupled scale-out warehouses (Snowflake, BigQuery)
&lt;/h3&gt;

&lt;p&gt;This architecture separates storage (S3/GCS), compute (Virtual Warehouses), and global state (The Cloud Services Layer).&lt;/p&gt;

&lt;h4&gt;
  
  
  How decoupled warehouses provide ACID compliance
&lt;/h4&gt;

&lt;p&gt;When you run an &lt;code&gt;UPDATE&lt;/code&gt; in Snowflake, you're not just writing data. You're engaging a sophisticated, centralized brain. This metadata service (backed by FoundationDB) coordinates transactions across distributed clusters. The service ensures that when your query completes, the pointer to the "current" data is updated atomically.&lt;/p&gt;

&lt;h4&gt;
  
  
  Pros of decoupled warehouses
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Massive Concurrency:&lt;/strong&gt; Because the metadata layer is distributed, these systems can handle petabyte-scale workloads where thousands of users query the same tables simultaneously.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Separation of Concerns:&lt;/strong&gt; You can scale compute up and down instantly without worrying about data corruption.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Cons of decoupled warehouses: latency, cost, and weaker isolation
&lt;/h4&gt;

&lt;p&gt;Centralizing the "brain" introduces friction. Every transaction, no matter how small, requires a round-trip network call to this central service. This imposes a "latency floor" on operations. You can't simply "insert a row." You must ask the global brain for permission, write the data, and then tell the brain to update the pointer.&lt;/p&gt;

&lt;p&gt;This architecture also introduces a specific cost-model risk: &lt;strong&gt;Cloud Services Billing&lt;/strong&gt;. In Snowflake, you're billed for the "brain's" work if it &lt;a href="https://docs.snowflake.com/en/user-guide/cost-understanding-compute" rel="noopener noreferrer"&gt;exceeds 10% of your daily compute credits&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Workloads that involve frequent "micro-transactions" (like continuous ingestion or looping single-row inserts) can thrash the metadata layer. This leads to "surprise bills" where the cost of managing the transaction exceeds the cost of processing the data.&lt;/p&gt;

&lt;p&gt;And relying primarily on &lt;code&gt;READ COMMITTED&lt;/code&gt; isolation means that applications requiring strict multi-statement consistency (such as financial ledger balancing within a stored procedure) need careful design. Otherwise, you'll hit anomalies where data changes mid-execution.&lt;/p&gt;

&lt;h4&gt;
  
  
  Best for: petabyte-scale batch analytics
&lt;/h4&gt;

&lt;p&gt;Petabyte-scale, "big data" batch processing where the engineering team manages complex infrastructure. This architecture works well when predictable costs are secondary to querying enormous datasets, and when the latency of individual transactions matters less than overall throughput.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 2: Open table formats and lakehouses (Iceberg, Delta Lake)
&lt;/h3&gt;

&lt;p&gt;This approach tries to bring ACID to the data lake without a proprietary central brain.&lt;/p&gt;

&lt;h4&gt;
  
  
  How iceberg and delta lake provide ACID transactions
&lt;/h4&gt;

&lt;p&gt;Instead of a database managing the state, the state is managed via files in object storage (S3).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Delta Lake&lt;/strong&gt; uses a transaction log (&lt;code&gt;_delta_log&lt;/code&gt;) containing JSON files that track changes.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Iceberg&lt;/strong&gt; uses a hierarchy of metadata files (Manifest Lists -&amp;gt; Manifests -&amp;gt; Data Files) and relies on an atomic "swap" of the metadata file pointer to commit a transaction.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concurrency&lt;/strong&gt; is handled via "Optimistic Concurrency Control" (OCC). A writer assumes it's the only one writing. Before committing, the writer checks if anyone else changed the file. If a conflict exists, the writer fails and must retry.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Pros of open table formats
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vendor Agnostic:&lt;/strong&gt; Your data lives in your S3 bucket. You can read it with Spark, Trino, Flink, or DuckDB.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Control:&lt;/strong&gt; You pay for S3 and your own compute, avoiding the markup of proprietary warehouses.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Cons of lakehouses: the small file problem and the maintenance tax
&lt;/h4&gt;

&lt;p&gt;Relying on object storage creates a severe "Small File Problem." Every time you stream data or run a small &lt;code&gt;INSERT&lt;/code&gt;, you create new data files and new metadata files.&lt;/p&gt;

&lt;p&gt;Here's a real-world example. An Iceberg table with a streaming ingestion pipeline created &lt;a href="https://iomete.com/resources/blog/apache-iceberg-production-antipatterns-2026" rel="noopener noreferrer"&gt;45 million small data files&lt;/a&gt;. This pipeline generated over 5TB of &lt;em&gt;metadata&lt;/em&gt; alone (manifest files tracking the data).&lt;/p&gt;

&lt;p&gt;When analysts tried to query the table, the query planner had to read gigabytes of metadata just to figure out which files to scan. Query planning times jumped from milliseconds to minutes, and the coordinators frequently crashed due to Out-Of-Memory (OOM) errors.&lt;/p&gt;

&lt;p&gt;To make this architecture work, you have to pay a "maintenance tax." You need to run compaction jobs (rewriting small files into larger ones) and vacuum processes (deleting old files) continuously. If you neglect this hygiene, performance degrades exponentially.&lt;/p&gt;

&lt;h4&gt;
  
  
  Best for: open data lakes with strong engineering support
&lt;/h4&gt;

&lt;p&gt;Large-scale data engineering teams that prioritize open standards and have the operational capacity to manage the "maintenance tax." This architecture fits well for massive batch jobs, but struggles with the latency and complexity of high-frequency operational updates.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 3: Scale-up and hybrid engines (DuckDB, MotherDuck)
&lt;/h3&gt;

&lt;p&gt;This architecture rejects the premise that you need a distributed cluster for every problem. It uses a "Scale-Up" approach (using a single, powerful node) coupled with a &lt;a href="https://motherduck.com/learn-more/hybrid-analytics-guide/" rel="noopener noreferrer"&gt;hybrid execution model&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  How DuckDB and MotherDuck provide ACID compliance (MVCC + WAL)
&lt;/h4&gt;

&lt;p&gt;DuckDB (and, by extension, MotherDuck) implements ACID using strict MVCC and a Write-Ahead Log (WAL), similar to Postgres but optimized for analytics.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Local:&lt;/strong&gt; On your laptop, the transaction manager runs in-process. Network overhead disappears.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud:&lt;/strong&gt; MotherDuck runs "Ducklings" (isolated compute instances).&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Why DuckLake improves metadata transactions for analytics
&lt;/h4&gt;

&lt;p&gt;MotherDuck introduces a hybrid table format called &lt;a href="https://motherduck.com/blog/ducklake-motherduck/" rel="noopener noreferrer"&gt;"DuckLake"&lt;/a&gt;. Unlike Iceberg, which requires scanning S3 files to find metadata (slow), DuckLake stores metadata in a high-performance relational database (fast), while the data remains in open formats (Parquet) on S3.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; Metadata operations (checking table structure, finding files) take roughly &lt;strong&gt;2 milliseconds&lt;/strong&gt;, compared to the 100ms–1000ms "cold start" penalty of scanning object storage manifests.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Pros of scale-up engines: interactive speed and simpler transactions
&lt;/h4&gt;

&lt;p&gt;ACID guarantees are handled in-process. Commits happen in milliseconds because no distributed consensus algorithm delays them. "Noisy neighbor" issues disappear because tenancy is isolated. You get the strict consistency of a relational database with the analytical speed of a columnar engine.&lt;/p&gt;

&lt;h4&gt;
  
  
  Cons of scale-up engines: not designed for 100+ pb single tables
&lt;/h4&gt;

&lt;p&gt;This architecture isn't designed for the 100+ PB single-table workload. It optimizes for the 95% of workloads that fit within the memory and disk of a large single node (which, in the cloud, can be massive).&lt;/p&gt;

&lt;h4&gt;
  
  
  Best for: operational analytics, interactive bi, and real-time dashboards
&lt;/h4&gt;

&lt;p&gt;"Fast Data" workloads: user-facing applications, interactive BI, and real-time dashboards where sub-second response times are critical. Scale-up engines are the undisputed choice for local development and CI/CD, since they let engineers run full ACID-compliant tests on their laptop that perfectly mirror production behavior.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to verify ACID compliance: a practical test framework
&lt;/h2&gt;

&lt;p&gt;Marketing pages are easy to write. Distributed consistency is hard to build. Don't just trust that a platform is "ACID compliant." Verify the behavior, especially if you're building customer-facing data products.&lt;/p&gt;

&lt;p&gt;Here's a framework of tests you can run in your SQL environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Test 1: How to test for dirty reads
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Objective:&lt;/strong&gt; Ensure that a long-running query doesn't see uncommitted data from a concurrent write.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Session A (The Writer):&lt;/strong&gt; Start a transaction. Insert a "poison pill" row (e.g., a row with &lt;code&gt;ID = -999&lt;/code&gt;). &lt;em&gt;Don't commit yet.&lt;/em&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;BEGIN&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;revenue_table&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;999&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;-- Hang here. Do not commit.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Session B (The Reader):&lt;/strong&gt; Immediately query the table.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;revenue_table&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;999&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; If Session B returns the row, the system allows Dirty Reads (fail). If Session B returns nothing, the system enforces isolation.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Finish:&lt;/strong&gt; Commit or Rollback Session A.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Test 2: How to test for lost updates (concurrency)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Objective:&lt;/strong&gt; See how the system handles two users trying to update the same row at the exact same time.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Setup:&lt;/strong&gt; Create a table with a single row: &lt;code&gt;Counter = 10&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session A:&lt;/strong&gt; &lt;code&gt;BEGIN; UPDATE table SET Counter = 11;&lt;/code&gt; (Don't commit).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session B:&lt;/strong&gt; &lt;code&gt;BEGIN; UPDATE table SET Counter = 12;&lt;/code&gt; (Try to commit).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Blocking:&lt;/strong&gt; Session B might hang, waiting for A to finish (common in lock-based systems like Snowflake).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error:&lt;/strong&gt; Session B might fail immediately with a "Serialization Failure" or "Concurrent Transaction" error (common in Optimistic systems like DuckDB/Lakehouse).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Silent Overwrite (Failure):&lt;/strong&gt; If both succeed and the final value is 12 (or 11) without warning, you have a "Lost Update" anomaly.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Test 3: How to test atomicity and durability (recovery)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Objective:&lt;/strong&gt; Verify Atomicity and Durability.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Action:&lt;/strong&gt; Start a massive &lt;code&gt;INSERT&lt;/code&gt; statement (e.g., 10 million rows).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Disruption:&lt;/strong&gt; Kill the client process or force a connection drop halfway through.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check:&lt;/strong&gt; Reconnect and query the table.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; You should see &lt;strong&gt;zero&lt;/strong&gt; rows from that batch. If you see 5 million rows, Atomicity failed. The system must use its WAL (Write Ahead Log) to roll back the partial write upon restart.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Test 4: How to measure the cost overhead of ACID transactions
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Objective:&lt;/strong&gt; Verify the cost of ACID overhead.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Action:&lt;/strong&gt; Write a script that performs 10,000 "micro-transactions" (inserting 1 row, committing, repeating).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check:&lt;/strong&gt; Look at the billing metrics for that specific time window.

&lt;ul&gt;
&lt;li&gt;In &lt;strong&gt;Snowflake&lt;/strong&gt;, check the &lt;code&gt;CLOUD_SERVICES_USAGE&lt;/code&gt; metric. Did it spike above 10% of compute?
&lt;/li&gt;
&lt;li&gt;In &lt;strong&gt;BigQuery&lt;/strong&gt;, check the &lt;a href="https://cloud.google.com/bigquery/pricing" rel="noopener noreferrer"&gt;API costs&lt;/a&gt; for streaming inserts.
&lt;/li&gt;
&lt;li&gt;In &lt;strong&gt;MotherDuck&lt;/strong&gt;, verify that the cost remains flat (compute-based) and does not include hidden metadata fees.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Common ACID compliance mistakes in analytics platforms
&lt;/h2&gt;

&lt;p&gt;Even with a compliant platform, implementation details can break your data trust.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 1: Assuming ACID means serializable isolation
&lt;/h3&gt;

&lt;p&gt;Many engineers assume "ACID" means "Serializable" (perfect isolation). It usually doesn't.&lt;/p&gt;

&lt;p&gt;If you're building a financial reconciliation process on a warehouse that defaults to &lt;code&gt;READ COMMITTED&lt;/code&gt;, you need to manually manage locking or logic to prevent anomalies. Don't assume the database handles complex race conditions for you.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 2: Treating object storage (S3) like a transactional database
&lt;/h3&gt;

&lt;p&gt;Trying to implement ACID manually over raw object storage is a recipe for disaster. Developers sometimes think, "I'll just write a file to S3 and then read it."&lt;/p&gt;

&lt;p&gt;Without a table format (like Iceberg) or an engine (like DuckDB) to manage the atomic commit, you will eventually hit eventual consistency issues, partial writes, or race conditions. S3 is now strongly consistent, but it doesn't support multi-file transactions natively.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 3: Using a warehouse for micro-transactions (and overpaying)
&lt;/h3&gt;

&lt;p&gt;Look, using a hammer to drive a nail is expensive.&lt;/p&gt;

&lt;p&gt;We often see teams using massive cloud warehouses for high-frequency, low-volume updates (such as updating a "last login" timestamp for users). The overhead of the distributed transaction coordinator (latency + cost) outweighs the value of the data update. These workloads belong in an OLTP database or a lightweight engine like DuckDB that handles micro-transactions efficiently.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 4: Skipping compaction and vacuum in lakehouses
&lt;/h3&gt;

&lt;p&gt;In Lakehouse architectures (Iceberg/Delta), "deleting" a row doesn't actually delete it. It writes a "tombstone" or a new version of the file. Over time, your table becomes a graveyard of obsolete files.&lt;/p&gt;

&lt;p&gt;If you don't automate &lt;code&gt;VACUUM&lt;/code&gt; and compaction, your read performance will degrade until queries time out. Managed engines like MotherDuck handle this hygiene automatically in the background.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion: Choosing the right ACID architecture for operational analytics
&lt;/h2&gt;

&lt;p&gt;ACID compliance is the bedrock of trust in modern analytics. When a dashboard number changes every time you refresh, or when a high-value customer receives a duplicate email due to a race condition, trust in your data team evaporates.&lt;/p&gt;

&lt;p&gt;The shift to operational analytics means you can't rely on the "eventual consistency" of the past. But you don't need to over-engineer your solution either.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;For the 1% of workloads&lt;/strong&gt; that are truly petabyte-scale, decentralized architectures like Snowflake or carefully managed Lakehouses are necessary, despite their latency and cost premiums.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For the 99% of workloads&lt;/strong&gt; that deal with "medium data" (Gigabytes to Terabytes), the future is &lt;strong&gt;Scale-Up ACID&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You don't need a massive distributed cluster to get banking-grade transactional integrity. You need an architecture that respects the physics of data. Keep compute close to storage and handle transactions in-process rather than over the network.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Hybrid Advantage:&lt;/strong&gt; If you want ACID guarantees that move at the speed of interactive analytics, without the administration of a Lakehouse or the latency of a distributed warehouse, evaluate &lt;a href="https://motherduck.com" rel="noopener noreferrer"&gt;MotherDuck&lt;/a&gt;. MotherDuck brings the power of DuckDB to the cloud, handling concurrency, consistency, and metadata automatically. It lets you build pipelines that are robust enough for operations but simple enough to run on your laptop.&lt;/p&gt;

&lt;p&gt;In 2026, the "Single Source of Truth" shouldn't be a lie. Make sure your platform can keep its promises.&lt;/p&gt;




&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What does ACID compliance mean in an analytics platform?
&lt;/h3&gt;

&lt;p&gt;A: ACID means transactions are &lt;strong&gt;atomic&lt;/strong&gt;, keep data &lt;strong&gt;consistent&lt;/strong&gt;, are &lt;strong&gt;isolated&lt;/strong&gt; from concurrent work, and are &lt;strong&gt;durable&lt;/strong&gt; after commit. In analytics platforms, ACID ensures that dashboards and downstream apps do not see partial writes or inconsistent snapshots during ingestion and transformations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is "ACID compliant" the same as "Serializable isolation"?
&lt;/h3&gt;

&lt;p&gt;A: No. ACID includes isolation, but platforms can implement &lt;strong&gt;different isolation levels&lt;/strong&gt;. Many systems are ACID by default, using &lt;strong&gt;READ COMMITTED&lt;/strong&gt; or &lt;strong&gt;SNAPSHOT&lt;/strong&gt; rather than full &lt;strong&gt;SERIALIZABLE&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What isolation level do major cloud warehouses typically use by default?
&lt;/h3&gt;

&lt;p&gt;A: Many cloud warehouses default to &lt;strong&gt;READ COMMITTED&lt;/strong&gt; for standard workloads, prioritizing concurrency. If you need repeatable results across multiple statements, you must confirm that stronger isolation is supported and how it's configured.&lt;/p&gt;

&lt;h3&gt;
  
  
  How can I quickly test whether my warehouse allows dirty reads?
&lt;/h3&gt;

&lt;p&gt;A: Open two sessions: in Session A, insert a row inside a transaction &lt;strong&gt;without committing&lt;/strong&gt;. In Session B, query for that row. If Session B can see the row, the system allows dirty reads and fails the test.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do Iceberg/Delta Lake provide ACID on object storage?
&lt;/h3&gt;

&lt;p&gt;A: They commit changes by writing new data/metadata files and then atomically updating the table's metadata pointer/log. Concurrency is typically handled with &lt;strong&gt;optimistic concurrency control (OCC)&lt;/strong&gt;, where conflicting writers must retry.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the "small file problem," and why does it hurt ACID lakehouses?
&lt;/h3&gt;

&lt;p&gt;A: Frequent small writes create huge numbers of small data and metadata files. Planning a query can require scanning large metadata structures, increasing latency and sometimes causing coordinator memory failures unless you run compaction/vacuum regularly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where does ACID "actually happen" if my data is stored as Parquet files?
&lt;/h3&gt;

&lt;p&gt;A: In the &lt;strong&gt;transactional metadata layer&lt;/strong&gt; that decides which files are part of the current table version. The data files are often immutable. Correctness comes from atomically updating metadata and enforcing concurrency rules.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the fastest way to validate durability and atomicity?
&lt;/h3&gt;

&lt;p&gt;A: Start a large insert, then kill the client/connection mid-write. After reconnecting, you should see &lt;strong&gt;all or nothing&lt;/strong&gt; from that transaction. Never a partial batch.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why can ACID features increase costs in decoupled warehouses?
&lt;/h3&gt;

&lt;p&gt;A: Distributed metadata coordination adds overhead per transaction (latency + metastore work). High-frequency microtransactions can trigger unexpected "control plane" or metadata-related charges, depending on the vendor's billing model.&lt;/p&gt;

&lt;h3&gt;
  
  
  When should I choose a scale-up/hybrid engine instead of a lakehouse or distributed warehouse?
&lt;/h3&gt;

&lt;p&gt;A: Choose scale-up/hybrid when you need &lt;strong&gt;interactive latency&lt;/strong&gt;, frequent small updates, strong consistency, and simpler operations for GB–TB scale workloads. Distributed warehouses and lakehouses work better when you truly need massive multi-cluster concurrency or petabyte-scale patterns.  &lt;/p&gt;

</description>
      <category>database</category>
      <category>duckdb</category>
      <category>analytics</category>
      <category>data</category>
    </item>
    <item>
      <title>The WebMCP False Economy: Why We Don't Need Another Layer of Abstraction</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Tue, 17 Feb 2026 18:52:51 +0000</pubDate>
      <link>https://forem.com/manveer_chawla_64a7283d5a/the-webmcp-false-economy-why-we-dont-need-another-layer-of-abstraction-566e</link>
      <guid>https://forem.com/manveer_chawla_64a7283d5a/the-webmcp-false-economy-why-we-dont-need-another-layer-of-abstraction-566e</guid>
      <description>&lt;p&gt;I agents are going to consume the web at orders of magnitude beyond human traffic. Optimizing for them isn't optional. The question is how.&lt;/p&gt;

&lt;p&gt;WebMCP, a new JavaScript API proposed by engineers at Microsoft and Google, says the answer is a browser-side protocol: every web developer builds a "tool contract" that describes their site to agents through &lt;code&gt;navigator.modelContext&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That's the wrong layer. Sites willing to invest in agent optimization already have a better path: server-side MCP, where the agent talks directly to the server and the server owns the tools it exposes. No browser middleman. For the vast majority of sites that won't build any agent interface, the browser should do the work, synthesizing what it already knows from HTML, ARIA, Schema.org, and the Accessibility Tree into a richer machine-readable layer.&lt;/p&gt;

&lt;p&gt;WebMCP sits in the worst of both worlds. It demands developer effort like server-side MCP but routes through the browser unnecessarily. And it asks the long tail of the web to adopt a new protocol, which 20 years of metadata history says they won't.&lt;/p&gt;




&lt;h3&gt;
  
  
  TL;DR
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Optimizing the web for AI agents is the right call. The question is the right architecture for doing it.&lt;/li&gt;
&lt;li&gt;Sites willing to invest in agent optimization should expose server-side MCP directly. The server owns the tools. The agent talks to the source of truth. No browser middleman required.&lt;/li&gt;
&lt;li&gt;For the web that won't adopt new protocols (which is most of it), the browser should bridge the gap by synthesizing what it already knows: HTML, ARIA, Schema, the Accessibility Tree.&lt;/li&gt;
&lt;li&gt;WebMCP occupies the worst of both worlds: it demands developer effort like server-side MCP but routes through the browser, creating a second-class copy that drifts from the UI.&lt;/li&gt;
&lt;li&gt;History is clear. Developer-maintained metadata standards fail without direct incentives. Sites willing to invest should go server-side. Sites that won't are better served by browser improvements.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What is WebMCP?
&lt;/h2&gt;

&lt;p&gt;In August 2025, engineers from Microsoft and Google proposed WebMCP (Web Model Context Protocol), a JavaScript API that exposes a new browser interface, &lt;code&gt;navigator.modelContext&lt;/code&gt;, allowing websites to declare structured "tool contracts" for AI agents. It's currently available behind a flag in Chrome 146 Canary.&lt;/p&gt;

&lt;p&gt;The idea is straightforward. Instead of an AI agent visually parsing a webpage the way a human would, the site explicitly tells the agent what actions are available and how to execute them. That includes form submissions, API calls, navigation flows, and data queries. The agent consumes a structured menu rather than interpreting pixels and DOM elements.&lt;/p&gt;

&lt;p&gt;Early pilots report significant performance gains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;67.6% reduction&lt;/strong&gt; in token usage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;25–37% improvement&lt;/strong&gt; in latency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;97.9% task success rate&lt;/strong&gt;, specifically reducing cases where vision-agents "give up" or loop on incorrect elements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These numbers are real and they're impressive, but there's important context for &lt;em&gt;why&lt;/em&gt; WebMCP exists in this form that reveals the core design error.&lt;/p&gt;

&lt;h3&gt;
  
  
  The origin: MCP worked server-side, so let's port it to the browser
&lt;/h3&gt;

&lt;p&gt;MCP, the Model Context Protocol, gained massive traction in 2025 as a way to give AI agents structured access to tools and data on the server side. Connect an agent to your database, your CRM, or your internal APIs through a standardized protocol.&lt;/p&gt;

&lt;p&gt;It works in that context because the server &lt;em&gt;owns&lt;/em&gt; the tools it exposes. A Postgres MCP server knows its own schema. A Stripe MCP server knows its own API. The tool contract and the tool are the same thing, maintained by the same team, in the same codebase.&lt;/p&gt;

&lt;p&gt;WebMCP takes that pattern and ports it to the browser, and this is where the logic breaks down.&lt;/p&gt;

&lt;p&gt;The browser is a fundamentally different environment. A website doesn't "own" its relationship with every possible AI agent the way a server owns its API. The server-side MCP contract is a first-class interface that &lt;em&gt;is&lt;/em&gt; the product. A WebMCP contract is a second-class annotation that &lt;em&gt;describes&lt;/em&gt; the product. One is the source of truth. The other is a copy that drifts.&lt;/p&gt;

&lt;p&gt;This raises a question that WebMCP's proponents haven't answered: if a site is willing to invest the engineering effort that a tool contract demands, why route that effort through the browser? Server-side MCP already exists. It already works. The agent talks directly to the server. The server owns the tools. The contract and the tool are the same thing. WebMCP takes that clean architecture and degrades it by pushing it into the browser, turning a first-class API into a second-class annotation that describes a UI rather than owning the functionality.&lt;/p&gt;

&lt;p&gt;The question isn't whether WebMCP &lt;em&gt;works&lt;/em&gt;. The early benchmarks show it does. The question is whether it points in the right direction when better options exist on both ends of the spectrum.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Three paths to an agent-readable web, and why WebMCP is the worst of them
&lt;/h2&gt;

&lt;p&gt;Three paths exist for making the web work for AI agents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Path 1: Server-side MCP.&lt;/strong&gt; Sites that want AI agents to interact with them expose server-side MCP endpoints. The agent talks directly to the server. The server owns the tools it exposes. The tool contract and the tool are the same thing, maintained by the same team, in the same codebase. This is what MCP was designed for, and it works.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Path 2: Browser-as-bridge.&lt;/strong&gt; The browser synthesizes what it already knows (HTML structure, ARIA semantics, Schema.org data, form labels, link relationships) into a richer machine-readable layer. Developers standardize to existing web standards. No new protocol required. Ship once in a browser update, apply everywhere.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Path 3: WebMCP.&lt;/strong&gt; Every website developer builds and maintains a browser-side tool contract that describes their site to AI agents. The browser is a passive pipe.&lt;/p&gt;

&lt;p&gt;WebMCP is Path 3, and it occupies the worst position of the three.&lt;/p&gt;

&lt;p&gt;Path 1 works for sites willing to invest because the server owns the interface. The agent gets direct access to the source of truth: the API, the database, the business logic. Path 2 works for the rest because the browser does the work. The entire history of the web favors this pattern. CSS didn't ask every site to declare a rendering contract. Search engines didn't ask every site to build a search index. Crawlers learned to read pages. Make the reader smarter, don't tax the author.&lt;/p&gt;

&lt;p&gt;Path 3 demands the same developer effort as Path 1 but delivers a degraded version of it. A WebMCP tool contract is a copy of functionality that already lives on the server. It routes through the browser for no clear architectural reason. And unlike server-side MCP, the contract isn't the source of truth. It's an annotation that drifts the moment the UI changes.&lt;/p&gt;

&lt;p&gt;The question any engineering leader should ask: if I'm going to invest in making my site agent-readable, why would I build that interface in the browser instead of on the server where I control the tools, the data, and the API? And if I'm not going to invest at all, how does a new protocol that requires my investment help me?&lt;/p&gt;

&lt;p&gt;The strongest counterargument is that WebMCP captures &lt;em&gt;intent&lt;/em&gt;, not just structure. The AX Tree tells an agent "here is a button labeled Submit." A WebMCP tool contract tells the agent "this button submits a flight booking after the user selects dates and passengers, and here are the valid parameter ranges." That distinction is real, and for complex, multi-step workflows it matters. But intent is exactly what server-side MCP provides natively, without the browser middleman, without the drift problem, and with full access to the backend logic that defines that intent. For simpler interactions, properly labeled structure already communicates intent. A form with inputs labeled "Email" and "Password" and a submit button doesn't need a separate declaration to tell an agent it's a login flow. A product page with a price, an "Add to Cart" button, and a quantity selector is self-describing if the HTML is semantic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkdlgmxw7qdg2n7t3j9bb.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkdlgmxw7qdg2n7t3j9bb.jpg" alt="A comparison diagram titled " width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  2. The hidden maintenance cost of WebMCP tool contracts
&lt;/h2&gt;

&lt;p&gt;Even if the browser-side approach were the right architecture, the maintenance economics don't work.&lt;/p&gt;

&lt;p&gt;The web is already designed to be machine-readable through the DOM, Semantic HTML, ARIA attributes, and Schema.org. WebMCP asks developers to maintain two parallel interfaces: one visual (the UI) and one declarative (the tool contract).&lt;/p&gt;

&lt;p&gt;When a UI ships a new flow and the tool contract isn't updated, the agent breaks. You don't eliminate fragility, you double it. No build step catches the drift. No CI check flags the mismatch.&lt;/p&gt;

&lt;p&gt;Stripe manages over 100 breaking API upgrades using a custom domain-specific language (DSL) to auto-generate documentation directly from code. If a company that literally sells API infrastructure requires heavy automation to prevent metadata rot, the average startup has no realistic chance of keeping WebMCP tool definitions accurate.&lt;/p&gt;

&lt;p&gt;Proponents will argue that auto-generation solves this. For sites built on modern frameworks like React, Next.js, or Angular, that's a fair point. A build plugin could derive tool contracts from component trees and route definitions. But the long tail of the web doesn't run on these frameworks. Millions of sites are built on WordPress themes, hand-written HTML, Squarespace templates, or legacy CMSes that no auto-generation tool will ever reach. The sites that most need agent-readability are the ones least equipped to produce it through tooling.&lt;/p&gt;

&lt;p&gt;ARIA's track record is the warning sign here. Annual surveys from WebAIM found that &lt;a href="https://webaim.org/projects/million/" rel="noopener noreferrer"&gt;pages using ARIA attributes actually average 57 accessibility errors&lt;/a&gt;, compared to 27 errors on pages without ARIA. That's not because ARIA causes errors. It's because even well-intentioned metadata efforts produce poor results at web scale when developers lack the tooling, training, and incentives to maintain them correctly. ARIA failed as a quality signal despite two decades of advocacy, documentation, and browser support. WebMCP would enter the same environment with the same structural disadvantages and fewer resources behind it.&lt;/p&gt;

&lt;p&gt;Metadata decays the moment no one actively monitors it. A &lt;a href="https://therecord.media/thousands-of-npm-accounts-use-email-addresses-with-expired-domains/" rel="noopener noreferrer"&gt;study of the npm ecosystem found 2,818 maintainer email addresses linked to expired domains&lt;/a&gt;. Unlike a broken email, a stale WebMCP contract fails silently. An agent executes an outdated action and neither the user nor the developer knows until something breaks downstream.&lt;/p&gt;

&lt;p&gt;Research shows that a single breaking change in an API affects an average of 4.7 downstream consumers, yet WebMCP tool contracts would sit in a dependency chain with even less visibility.&lt;/p&gt;

&lt;p&gt;There's a security dimension to this maintenance problem that's easy to overlook. A WebMCP tool contract is effectively API documentation served to untrusted clients. It tells every visiting agent what actions are available, what parameters they accept, and what state transitions are valid. That's a map of your application's attack surface. A stale contract could expose deprecated endpoints that should have been decommissioned. A compromised contract could redirect agents to perform unintended actions on behalf of users. The AX Tree avoids this because it's generated by the browser from the live DOM, not authored as a separate artifact that can be tampered with or fall out of sync.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. The incentive to optimize for agents is real. WebMCP is the wrong form.
&lt;/h2&gt;

&lt;p&gt;If AI agents will consume the web at 100x human traffic, optimizing for them is the right investment. That case is unambiguous. The question this article's own logic demands is: what form should that optimization take?&lt;/p&gt;

&lt;p&gt;The history of web metadata adoption is instructive, not as evidence that developers won't optimize, but as evidence of &lt;em&gt;how&lt;/em&gt; they optimize when they do.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Standard&lt;/th&gt;
&lt;th&gt;Current Adoption (2026)&lt;/th&gt;
&lt;th&gt;What Actually Drove It&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Microformats (2005)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~0.5%&lt;/td&gt;
&lt;td&gt;No incentive&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RDFa (2008)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~39%&lt;/td&gt;
&lt;td&gt;Open Graph Protocol (Social Cards)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Microdata (2011)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~23%&lt;/td&gt;
&lt;td&gt;Google SEO&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;JSON-LD (2011)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~53%&lt;/td&gt;
&lt;td&gt;Google Rich Snippets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Open Graph (2010)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~70%&lt;/td&gt;
&lt;td&gt;Social Media Cards&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;JSON-LD and Open Graph won because developers got an immediate, visible reward: rich snippets in search and rich cards on social. Microformats were technically sound and universally ignored.&lt;/p&gt;

&lt;p&gt;But even the winners show a pattern: developers implement the minimum viable version. Analysis of Schema.org usage shows that 61.99% of websites using product schema only populate the &lt;code&gt;name&lt;/code&gt; and &lt;code&gt;description&lt;/code&gt; fields, the exact two fields Google rewards with rich snippets. Developers ignore the remaining 26 properties. Classic Goodhart's Law: when a measure becomes a target, it ceases to be a good measure.&lt;/p&gt;

&lt;p&gt;So what's the minimum viable agent optimization? For sites willing to invest meaningfully, server-side MCP is the natural path. It builds on infrastructure they already maintain (APIs, databases, backend logic) and gives agents direct access to the source of truth. For sites that will only do the minimum, better HTML, proper ARIA, and Schema.org markup are the investments that also pay dividends in SEO and accessibility. WebMCP asks for meaningful effort but delivers a degraded version of what server-side MCP already provides. It sits in the gap between "willing to invest" and "won't invest," and history says that gap is empty.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. How the Accessibility Tree already serves AI agents
&lt;/h2&gt;

&lt;p&gt;WebMCP rests on an assumption that AI agents require a fundamentally different interface than humans. That assumption is mostly wrong.&lt;/p&gt;

&lt;p&gt;The browser already generates a machine-readable model of every page through the Accessibility Tree (AX Tree). This tree provides roles, names, states, and interaction patterns. Agents already use it through tools like Playwright and Puppeteer, which expose AX Tree snapshots for automation.&lt;/p&gt;

&lt;p&gt;There's also a trajectory question worth acknowledging. Multimodal models are getting better at understanding web pages visually with every generation. GPT-4o, Claude, and Gemini can already navigate many sites through screenshots alone. If that trajectory continues, the need for any structured interface, whether WebMCP or the AX Tree, diminishes over time. But structured interfaces still matter for reliability (vision-based agents hallucinate element locations), determinism (the same AX Tree input produces the same agent behavior), and cost efficiency (parsing a structured tree is orders of magnitude cheaper than processing screenshots). The difference is that the AX Tree is already there. It costs nothing to maintain because the browser generates it automatically. WebMCP requires active, ongoing investment in something that improving models may eventually render unnecessary. If you're going to bet on a structured layer, bet on the one that's free.&lt;/p&gt;

&lt;p&gt;There is a real gap. Agents need to &lt;em&gt;act&lt;/em&gt; across multi-step flows (checkout, configuration, data entry) in ways that go beyond what a screen reader typically handles. But that gap is a browser API problem, not a developer metadata problem. The solution is making the AX Tree richer and more actionable, not building a parallel system alongside it.&lt;/p&gt;

&lt;p&gt;While 80.5% of web pages already use ARIA landmarks for structure, &lt;a href="https://webaim.org/projects/million/" rel="noopener noreferrer"&gt;94.8% fail basic WCAG compliance&lt;/a&gt;. The first machine-readable layer is broken. Adding a second one on top doesn't fix the first, and it risks giving organizations an excuse to deprioritize it.&lt;/p&gt;

&lt;p&gt;Consider a company with budget for one accessibility initiative this quarter. They can fix their broken HTML and ARIA, which helps disabled users, mobile users, keyboard navigators, search engines, &lt;em&gt;and&lt;/em&gt; agents. Or they can build a WebMCP contract that only helps AI agents. Not every organization will make the wrong choice here, but when budgets are tight and AI is the shiny priority, the risk of crowding out accessibility work is real.&lt;/p&gt;

&lt;p&gt;Investment in accessibility benefits everyone simultaneously. WebMCP creates a second surface competing for the same engineering hours, and that surface will rot faster than the first because it lacks the legal and compliance pressure that at least partially drives accessibility work.&lt;/p&gt;

&lt;p&gt;Proponents point to real gaps in the AX Tree: Shadow DOM encapsulation, Canvas structure, and virtualized lists. These are legitimate, but they're platform-level issues with platform-level fixes already in progress. None of them require a new developer-maintained metadata layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Browser APIs that solve WebMCP's problem without developer overhead
&lt;/h2&gt;

&lt;p&gt;The browser is the bottleneck for AI agent interaction, not the website.&lt;/p&gt;

&lt;p&gt;The obvious question: if the browser can solve this, why hasn't it? The honest answer is that until 2024, there was no demand for browser-level agent interfaces because AI agents weren't capable enough to use them. GPT-4V shipped in late 2023. Claude's computer use arrived in 2024. The first wave of production browser agents hit the market in 2025. Browser vendors are responding to a problem that barely existed two years ago, and platform-level standards move on multi-year timelines by design. That's not a reason to route around them with a developer-maintained shortcut. It's a reason to invest in the right layer now so the fix is durable.&lt;/p&gt;

&lt;p&gt;Rather than asking every site on the internet to maintain a tool contract, the industry should make the browser better at reading what's already there. Several technologies already address the gaps WebMCP claims to solve, and they follow the browser-as-bridge path: ship once, apply everywhere. Some are shipping today. Others are in progress. None are vaporware.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chrome DevTools Protocol (CDP) Accessibility Domain.&lt;/strong&gt; Already exposes the full AX Tree programmatically. CDP is production-ready and widely used by automation frameworks like Playwright and Puppeteer. Enriching this layer benefits every site without any developer action.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;WebDriver BiDi.&lt;/strong&gt; A W3C standard for cross-browser automation that introduces standardized accessibility locators. As of early 2026, WebDriver BiDi is shipping in Firefox, Chrome, and Edge, with Safari support in active development. Agents can find elements by ARIA role and name, building on existing semantics rather than inventing new ones.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Accessibility Object Model (AOM).&lt;/strong&gt; A WICG proposal that gives JavaScript direct access to modify the AX Tree. AOM has been in development since 2017, and parts of the spec (like &lt;code&gt;ElementInternals&lt;/code&gt; for custom elements) have already shipped. The core reflection API remains at the proposal stage. This is the weakest link in the alternative stack, and it's fair to note that AOM's full vision hasn't materialized in nearly a decade. But the pieces that have shipped are already solving real problems, and the trajectory is toward completion rather than abandonment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ElementInternals.&lt;/strong&gt; Supported in Chrome, Edge, Firefox, and Safari as of 2024. It lets custom elements (Web Components) participate in the AX Tree natively, solving the Shadow DOM encapsulation problem without any new protocol. This is not a proposal. It's in production browsers today.&lt;/p&gt;

&lt;p&gt;These tools improve the browser's ability to read what already exists. The timeline gap is real, and WebMCP's proponents are right that the browser layer isn't complete today. But the correct response to an incomplete platform is to accelerate the platform, not to build a parallel system that creates permanent maintenance obligations for every site on the internet. WebMCP creates a parallel artifact that's prone to drift. AOM and WebDriver BiDi make the source itself legible.&lt;/p&gt;

&lt;p&gt;Developers should invest their effort in standardizing to the existing web platform: proper semantic HTML, accurate ARIA attributes, and Schema.org markup. These pay dividends across accessibility and SEO today, and position sites to benefit from agent-readability improvements as browser APIs mature. Two outcomes now, a third compounding over time as AOM, WebDriver BiDi, and richer AX Tree APIs ship.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. How WebMCP risks fragmenting the open web
&lt;/h2&gt;

&lt;p&gt;Even if WebMCP were technically perfect, it creates structural problems for the web ecosystem.&lt;/p&gt;

&lt;p&gt;Google pushed AMP by giving it preferential placement in search carousels, effectively coercing adoption. Publishers eventually abandoned it, reporting significant revenue improvements after exiting. The parallel goes only so far: AMP was a replacement architecture that required rebuilding pages in a restricted HTML subset, while WebMCP is additive. You keep your existing site and layer a tool contract on top. But that "additive" framing is misleading. AMP's cost was front-loaded and visible. You knew what you were paying because you were rebuilding pages. WebMCP's cost is ongoing and invisible. The tool contract must stay in sync with every UI change indefinitely, and the failure mode is silent drift rather than an obvious breakage. Additive layers that go stale don't just stop helping. They become liabilities that misdirect agents and erode trust in the system.&lt;/p&gt;

&lt;p&gt;WebMCP is backed by Google and Microsoft but lacks formal support from Mozilla or Apple. If Safari and Firefox don't implement this API, agents will only work reliably in Chromium-based browsers. That's a Chromium feature, not an open web standard.&lt;/p&gt;

&lt;p&gt;There's also a concentration problem. WebMCP creates a two-tier system: sites that are "agent-accessible" and those that aren't. Large incumbents like Salesforce and Amazon can afford to maintain these contracts. The long tail of the web can't. Small businesses and independent publishers don't have the engineering resources.&lt;/p&gt;

&lt;p&gt;This concentration of AI-driven traffic among incumbents undermines the web's greatest strength: a solo developer and a trillion-dollar company play by the same HTML rules. WebMCP breaks that contract.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. The hardest cases for browser-as-bridge, and why server-side MCP still wins
&lt;/h2&gt;

&lt;p&gt;The WebMCP pilots do show real results. A 67.6% reduction in token usage directly translates to lower operational costs for agents. The 97.9% task success rate is compelling, especially in reducing those painful loops where vision-agents get stuck on incorrect elements. These numbers deserve serious engagement, not dismissal.&lt;/p&gt;

&lt;p&gt;The scenarios where a declarative tool contract genuinely outperforms the AX Tree are specific and worth examining:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-step form wizards with conditional logic.&lt;/strong&gt; Think insurance claim filing: the fields on step 3 depend on what was selected in step 1, validation rules change based on claim type, and the agent needs to know that choosing "auto collision" unlocks a vehicle details panel while "property damage" unlocks a different set of fields entirely. The AX Tree sees each step as a flat collection of form controls. It doesn't encode the conditional relationships between them or the valid paths through the wizard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dashboard configurations with interdependent controls.&lt;/strong&gt; A Salesforce report builder where changing the date range filter alters which metric columns are available, or a BI tool where selecting a data source reconfigures the entire visualization panel. These interfaces have cascading dependencies that aren't visible in the DOM at any single point in time. An agent reading the AX Tree sees the current state. It can't see the state machine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complex data entry with cross-field validation.&lt;/strong&gt; ERP inventory management where a SKU entry triggers warehouse availability checks, quantity must fall within supplier-specific thresholds, and the "Submit" action is only valid when twelve interdependent fields pass validation. The AX Tree can surface that the submit button is disabled, but it can't explain &lt;em&gt;why&lt;/em&gt; or what the agent needs to fix.&lt;/p&gt;

&lt;p&gt;These are the hardest cases for the browser-as-bridge path, and they're real. A declarative contract genuinely reduces the agent's guesswork in each one. But every one of these scenarios is better served by server-side MCP than by WebMCP. A Salesforce admin panel already has APIs. An ERP system already has backend logic that defines valid state transitions. An insurance claim workflow already has server-side validation rules. The agent doesn't need to read a browser-side annotation of these systems. It can talk to the systems directly.&lt;/p&gt;

&lt;p&gt;Server-side MCP gives the agent the source of truth: the actual business logic, the actual validation rules, the actual state machine. WebMCP gives the agent a copy of those things, authored separately, maintained separately, and prone to drifting from the reality it describes. The investment in agent optimization makes sense for these enterprise tools. But that investment should go into server-side MCP where the contract and the tool are the same thing, not into a browser-side annotation that duplicates what the server already knows.&lt;/p&gt;

&lt;p&gt;The benchmarks reinforce this. The 67.6% token reduction is measured against raw scraping: agents parsing full DOM dumps or processing screenshots pixel by pixel. That's the worst-case baseline. An AX Tree snapshot from Playwright or Puppeteer already strips away the visual noise and gives the agent a compact, structured tree of roles, names, states, and interaction patterns. That's orders of magnitude smaller than a screenshot and significantly smaller than a raw DOM dump. The token savings from moving to structured data are real, but browser-as-bridge already delivers most of them without any developer effort. Server-side MCP would be the most token-efficient of all, since the agent gets direct API responses with only the data it needs and zero browser overhead. The fair comparisons, "WebMCP vs. well-implemented AX Tree" and "WebMCP vs. server-side MCP," haven't been published. Until they are, the 67.6% figure overstates the marginal benefit over both alternatives.&lt;/p&gt;

&lt;p&gt;WebMCP's own specification lists autonomous headless scenarios as a "non-goal," focusing instead on human-in-the-loop workflows. The spec describes a narrow tool for high-complexity enterprise UIs. The question is whether a narrow tool should ship as a browser-level API that the entire web is expected to implement, especially when the narrow use cases it targets are better served by a protocol that already exists on the server side.&lt;/p&gt;




&lt;h2&gt;
  
  
  8. Why robots.txt and Open Graph succeeded where WebMCP won't
&lt;/h2&gt;

&lt;p&gt;Successful opt-in standards share simplicity, an immediate visible reward, and a negligible maintenance burden.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;robots.txt.&lt;/strong&gt; A plain text file that solves the developer's own problem (server overload from crawlers) with zero ongoing maintenance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sitemaps.&lt;/strong&gt; A direct channel to search engines that results in better indexing and more traffic, with the reward visible in Google Search Console within days.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Open Graph Protocol.&lt;/strong&gt; An instant visual reward where a developer pastes their link into Slack or Twitter and immediately sees the rich card.&lt;/p&gt;

&lt;p&gt;WebMCP fails on all three counts. It's not simple because tool contracts require ongoing curation as UIs evolve. It offers no visible reward for the developer, since there's no "Rich Snippet for agents." And it carries a heavy maintenance burden where the contract must stay in sync with the UI or become a liability.&lt;/p&gt;

&lt;p&gt;Without that incentive loop, adoption will be a fraction of what proponents project. We have 20 years of data on this.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion: Two worlds, two solutions, neither of which is WebMCP
&lt;/h2&gt;

&lt;p&gt;The web that AI agents need to navigate is splitting into two worlds, and each has a clear path forward.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The first world is sites willing to invest in agent optimization.&lt;/strong&gt; SaaS platforms, enterprise tools, API-first businesses. These sites should expose server-side MCP directly. The agent talks to the server. The server owns the tools. The contract is the source of truth. This is the architecture MCP was built for, and it works without a browser in the loop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The second world is everything else.&lt;/strong&gt; The long tail of the web: blogs, small businesses, news sites, personal pages, legacy applications. These sites won't build any agent interface, and history says no amount of advocacy will change that. For this world, the browser should bridge the gap by getting smarter about what it already knows. AOM, WebDriver BiDi, ElementInternals, and a richer AX Tree are the path. Marginal improvements in how browsers expose semantic structure compound across every site simultaneously. A 10% improvement in AX Tree fidelity benefits the entire web overnight. A 10% increase in WebMCP adoption covers a few thousand more sites and leaves the rest untouched.&lt;/p&gt;

&lt;p&gt;WebMCP sits between these two worlds and serves neither well. It demands the investment of the first world but delivers a degraded copy of what server-side MCP provides. It claims to serve the second world but requires exactly the kind of adoption that the second world has never delivered for any metadata standard in 20 years.&lt;/p&gt;

&lt;p&gt;Every engineering leader should be asking two questions. First: have we gotten our existing HTML, ARIA, and Schema right? For most organizations, the answer is no, and fixing that yields immediate returns in accessibility, SEO, and agent-readability as browser APIs mature. Second: if we're ready to invest beyond that, should we build our agent interface on the server where we own the tools, or in the browser where it becomes a copy? The answer writes itself.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why not just use WebMCP as an interim solution while browser APIs catch up?
&lt;/h3&gt;

&lt;p&gt;Because interim solutions that require per-site investment become permanent obligations. Every tool contract built today must be maintained indefinitely or it becomes a liability that misdirects agents. Server-side MCP is the better interim investment for sites willing to build: it works today, it's the source of truth, and it doesn't depend on browser vendors shipping a new API.&lt;/p&gt;

&lt;h3&gt;
  
  
  If AI agents will dominate web traffic, shouldn't sites optimize for them?
&lt;/h3&gt;

&lt;p&gt;Absolutely. The argument isn't against optimizing. It's about the right form. Sites ready for meaningful investment should expose server-side MCP. Sites doing the minimum should write better HTML, ARIA, and Schema.org, which improves SEO and accessibility at the same time. WebMCP demands meaningful effort but delivers less than server-side MCP.&lt;/p&gt;

&lt;h3&gt;
  
  
  What about enterprise tools like Salesforce or internal dashboards?
&lt;/h3&gt;

&lt;p&gt;These are the strongest use cases for declarative agent contracts, but they're also the cases where server-side MCP works best. A Salesforce admin panel already has APIs and backend logic. The agent should talk directly to those systems rather than reading a browser-side annotation of them.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>webmcp</category>
      <category>ai</category>
      <category>agents</category>
    </item>
    <item>
      <title>How to sandbox AI agents in 2026: Firecracker, gVisor, runtimes &amp; isolation strategies</title>
      <dc:creator>Manveer Chawla</dc:creator>
      <pubDate>Tue, 17 Feb 2026 18:38:05 +0000</pubDate>
      <link>https://forem.com/manveer_chawla_64a7283d5a/how-to-sandbox-ai-agents-in-2026-firecracker-gvisor-runtimes-isolation-strategies-14pk</link>
      <guid>https://forem.com/manveer_chawla_64a7283d5a/how-to-sandbox-ai-agents-in-2026-firecracker-gvisor-runtimes-isolation-strategies-14pk</guid>
      <description>&lt;h2&gt;
  
  
  Executive summary: AI agent sandboxing in 2026
&lt;/h2&gt;

&lt;p&gt;As of February 2026, the consensus is clear: shared-kernel container isolation (Docker/runc) isn't cutting it anymore for executing untrusted AI agent code. You need to treat LLM-generated or user-supplied code as hostile. A shared kernel just expands the blast radius.&lt;/p&gt;

&lt;p&gt;The market has split into three layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Primitives (&lt;a href="http://firecracker-microvm.github.io" rel="noopener noreferrer"&gt;Firecracker&lt;/a&gt;/&lt;a href="https://gvisor.dev/" rel="noopener noreferrer"&gt;gVisor&lt;/a&gt;/&lt;a href="https://github.com/microsoft/litebox" rel="noopener noreferrer"&gt;LiteBox&lt;/a&gt;):&lt;/strong&gt; Best for teams willing to run their own fleet and scheduler for maximum control.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embeddable runtimes (&lt;a href="https://e2b.dev/" rel="noopener noreferrer"&gt;E2B&lt;/a&gt;,&lt;/strong&gt; microsandbox*&lt;em&gt;):&lt;/em&gt;* Best for quickly adding code execution — managed API (E2B) or self-hosted (microsandbox).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Managed platforms (&lt;a href="https://www.daytona.io/" rel="noopener noreferrer"&gt;Daytona&lt;/a&gt;, &lt;a href="https://modal.com/products/sandboxes" rel="noopener noreferrer"&gt;Modal&lt;/a&gt;, &lt;a href="https://northflank.com/product/sandboxes" rel="noopener noreferrer"&gt;Northflank&lt;/a&gt;):&lt;/strong&gt; Best for data-heavy workloads, GPU access, or zero-ops scaling — but each with different isolation, pricing, and lock-in tradeoffs.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid (&lt;a href="https://docs.cloud.google.com/kubernetes-engine/docs/how-to/agent-sandbox" rel="noopener noreferrer"&gt;Google Agent Sandbox&lt;/a&gt;):&lt;/strong&gt; Best for teams already on Kubernetes who want open-source sandboxing with warm pools and no new vendor dependency.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pick Layer 1 when you need maximum control and customization for compliance. Pick Layer 2 when you want the fastest path to ephemeral code execution with strong isolation. Pick Layer 3 when you need GPUs, data-local execution, or zero-ops scaling — but evaluate vendor lock-in, language constraints, and BYOC support carefully.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why AI agent sandboxing changed in 2026
&lt;/h2&gt;

&lt;p&gt;Look, engineering leaders in 2026 have too many choices. And honestly, that didn't exist three years ago.&lt;/p&gt;

&lt;p&gt;We've moved way past the "Containers vs. VMs" debate. Now you're staring at Firecracker MicroVMs, gVisor user-space kernels, Cloud Hypervisor, WebAssembly isolates, and emerging Library OS tech like Microsoft's LiteBox. It's kind of overwhelming.&lt;/p&gt;

&lt;p&gt;But this isn't just vendors making noise. This proliferation is the industry's response to a real problem: standard multi-tenant containers can't safely contain AI agents executing arbitrary code.&lt;/p&gt;

&lt;p&gt;Think about it. When an agent can write its own Python scripts, install packages, and manipulate file descriptors, the shared kernel surface area of a standard Docker container becomes a liability. Major cloud providers, including AWS, Azure, and GCP, have all quietly migrated their control planes away from &lt;a href="https://www.sentinelone.com/vulnerability-database/cve-2024-21626/" rel="noopener noreferrer"&gt;runc&lt;/a&gt; toward &lt;a href="https://docs.aws.amazon.com/pdfs/whitepapers/latest/security-overview-aws-lambda/security-overview-aws-lambda.pdf" rel="noopener noreferrer"&gt;hardware-enforced isolation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This guide maps the 2026 sandbox ecosystem structurally. We're not comparing tools in isolation. Instead, we're defining the architectural layers. If you're a Series A+ engineering leader who's outgrown "Docker on EC2" and needs a security posture that survives a red team audit without blowing your engineering budget, keep reading.&lt;/p&gt;




&lt;h2&gt;
  
  
  The isolation spectrum: five levels of sandbox security
&lt;/h2&gt;

&lt;p&gt;Before choosing a tool, understand the five isolation levels available in 2026. Each step up trades performance overhead for a stronger security boundary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 1: Containers (Docker, Podman)&lt;/strong&gt; Processes share the host kernel, separated by Linux namespaces and cgroups. Fast and lightweight, but a kernel vulnerability in one container can compromise all others. Sufficient for trusted, internally-written code. Insufficient for anything an LLM generates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 2: User-space kernels (gVisor)&lt;/strong&gt; A user-space application intercepts and re-implements syscalls, so the sandboxed program never talks to the real kernel. Stronger than containers, less overhead than a full VM. Used by Google (Agent Sandbox on GKE) and Modal. Tradeoff: not all syscalls are perfectly emulated, which can cause compatibility issues with some Linux software.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 3: Micro-VMs (Firecracker, Kata Containers, libkrun)&lt;/strong&gt; Each workload gets its own kernel running on hardware virtualization (KVM). A kernel exploit inside one VM cannot reach the host or other VMs. This is the current gold standard for untrusted code. Firecracker boots in ~125ms with ~5MB memory overhead. Powers AWS Lambda, E2B, and Vercel Sandbox.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 4: Library OS (Microsoft LiteBox)&lt;/strong&gt; Instead of filtering hundreds of syscalls, the application links directly against a minimal OS library that exposes only a handful of controlled primitives. Theoretically the thinnest isolation layer with the smallest attack surface. Experimental as of February 2026 — no SDK, no production usage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 5: Confidential computing (AMD SEV-SNP, Intel TDX, OP-TEE)&lt;/strong&gt; Hardware-encrypted memory isolation. Even the host OS and hypervisor cannot read the sandbox's data. LiteBox is currently the only open-source tool in this comparison with a confidential computing runner (SEV-SNP). Relevant for regulated industries handling PII, financial data, or healthcare records.&lt;/p&gt;

&lt;p&gt;The signal from the hyperscalers is unambiguous. AWS built Firecracker for Lambda. Google built gVisor for Search and Gmail. Azure uses Hyper-V for ephemeral agent sandboxes. Every one of them reached for their strongest isolation primitive and pointed it at AI. None of them reached for containers.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to choose an AI agent sandboxing approach: four questions
&lt;/h2&gt;

&lt;p&gt;Before you even look at Firecracker or Modal, you need to understand where your workload fits. The "right" tool depends entirely on your constraints around trust, latency, data, and compliance.&lt;/p&gt;

&lt;h3&gt;
  
  
  How untrusted is the agent code you run?
&lt;/h3&gt;

&lt;p&gt;Security in 2026 isn't binary. It's a spectrum.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Internal logic:&lt;/strong&gt; Running code your own engineers wrote that passed CI/CD? Standard containers (Layer 1 or 3) are probably fine.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM-generated code:&lt;/strong&gt; Your agents generate Python to solve math problems or format data? The risk goes up significantly. You need strong isolation, either gVisor or MicroVMs, to prevent accidental resource exhaustion or logic bombs.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User-uploaded binaries/malicious agents:&lt;/strong&gt; Allowing users or autonomous agents to execute arbitrary binaries or install unvetted PyPI packages? Assume the code is hostile. You need the strictest isolation available: hardware virtualization via MicroVMs (Firecracker) or air-gapped primitives.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The higher the risk, the lower in the stack you may need to build to control the blast radius.&lt;/p&gt;

&lt;h3&gt;
  
  
  How long do agent sessions need to run?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;One-shot (inference/scripts):&lt;/strong&gt; Quick script to generate a chart or run inference? Cold start time is your primary metric. You need sub-second snapshot restoration.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-running (agents):&lt;/strong&gt; Agents maintaining state, "thinking" for minutes, or waiting for user input? Billing models become critical. Runtimes charging premium "per second" rates get expensive fast for sessions that idle. Managed platforms often provide better economics for duration. Building your own warm pools on primitives requires complex autoscaling logic to avoid paying for waste.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Do you have a data gravity problem?
&lt;/h3&gt;

&lt;p&gt;Teams overlook this one all the time.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Small data payloads:&lt;/strong&gt; Sending a few kilobytes of JSON and receiving text? Embeddable Runtimes (Layer 2) work great.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large contexts/model weights:&lt;/strong&gt; Loading 20GB model weights or processing a 5GB CSV? You've got a data gravity problem. Moving gigabytes of data into a remote sandbox API for every request creates massive latency penalties and egress cost nightmares. You need a Platform (Layer 3) where compute moves to the data, or a custom Layer 1 solution co-located with your storage.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What compliance and security requirements do you have?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The enterprise question:&lt;/strong&gt; Selling to the Fortune 500? Need SOC 2 Type II or ISO 27001 certification immediately? Achieving those on a self-built "Primitive" stack takes 12 to 18 months of engineering effort and dedicated security personnel.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auditability &amp;amp; data controls:&lt;/strong&gt; Need granular audit logs for every system call? Strict data residency controls (guaranteeing code executes only in Frankfurt)? Managed platforms usually offer these as standard SKUs. Replicating this visibility in a DIY Firecracker fleet means building a custom observability pipeline that can penetrate the VM boundary without breaking isolation.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The three-layer AI agent sandboxing stack (primitives, runtimes, platforms)
&lt;/h2&gt;

&lt;p&gt;Stop comparing Firecracker to Modal directly. They're different categories solving different problems. In 2026, the ecosystem forms a hierarchy of abstraction.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Layer 1: The primitives (the "raw materials").&lt;/strong&gt; Open-source virtualization technologies you run on your own metal or EC2 bare metal instances. You become the cloud provider.

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Examples:&lt;/strong&gt; &lt;a href="https://github.com/firecracker-microvm/firecracker" rel="noopener noreferrer"&gt;AWS Firecracker&lt;/a&gt; (MicroVMs), gVisor (User-space kernel), Cloud Hypervisor, and the new Microsoft LiteBox (Library OS).
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Layer 2: The embeddable runtimes (the "APIs").&lt;/strong&gt; Middleware services that wrap primitives into a simple SDK. Sandboxing as a service for teams that need code execution without infrastructure management.

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Examples:&lt;/strong&gt; E2B, specialized code interpreter APIs, microsandbox.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Layer 3: The managed platforms (the "cloud").&lt;/strong&gt; End-to-end serverless compute environments. They handle the primitives, orchestration, scheduling, and scaling. The sandbox is the environment, not just a feature.

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Examples:&lt;/strong&gt; Modal, Northflank, and Daytona.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Sandbox stack diagram: how the three layers work
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;(Imagine a pyramid structure)&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Top (layer 3 - platforms):&lt;/strong&gt; User submits code -&amp;gt; Platform handles Build, Schedule, Isolate, Scale. (e.g., Modal, Northflank, Daytona). Focus: Logic.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Middle (layer 2 - runtimes):&lt;/strong&gt; User calls API -&amp;gt; Runtime boots VM -&amp;gt; Executes -&amp;gt; Returns. (e.g., E2B). Focus: Integration.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bottom (layer 1 - primitives):&lt;/strong&gt; User configures Kernel -&amp;gt; Sets up TAP/TUN networking -&amp;gt; Manages RootFS -&amp;gt; Schedules VM. (e.g., Firecracker, LiteBox). Focus: Control.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Layer 1 (primitives): benefits, trade-offs, and hidden costs
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Layer 1 benefit: maximum isolation control
&lt;/h3&gt;

&lt;p&gt;Layer 1 is where infrastructure companies and massive enterprises live. If you go this route, you're building on &lt;strong&gt;AWS Firecracker&lt;/strong&gt;, &lt;strong&gt;gVisor&lt;/strong&gt;, or the experimental &lt;strong&gt;Microsoft LiteBox&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The promise? Absolute control. You define the guest kernel version. You control the network topology down to the byte. You can achieve the highest possible density by oversubscribing resources based on your specific workload patterns.&lt;/p&gt;

&lt;p&gt;For teams building a competitor to AWS Lambda or a specialized vertical cloud, this is the only viable layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1 trade-off: you must build and operate the platform
&lt;/h3&gt;

&lt;p&gt;But here's the thing: "using" Firecracker is kind of a misnomer. You don't just "use" Firecracker. You wrap it, orchestrate it, and debug it.&lt;/p&gt;

&lt;p&gt;The operational reality of running primitives at scale reveals hidden engineering costs that can easily derail product roadmaps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Image management: preventing thundering herd pulls
&lt;/h3&gt;

&lt;p&gt;The hardest problem in sandboxing isn't virtualization. It's data movement.&lt;/p&gt;

&lt;p&gt;To achieve sub-second start times for AI agents, you can't run &lt;code&gt;docker pull&lt;/code&gt; inside a microVM. You need a sophisticated block-level caching strategy.&lt;/p&gt;

&lt;p&gt;When 1,000 agents start simultaneously (a "thundering herd"), asking your registry to serve 5GB container images to 1,000 nodes will capsize your network. You need lazy-loading technologies like &lt;a href="https://aws.amazon.com/about-aws/whats-new/2022/09/introducing-seekable-oci-lazy-loading-container-images/" rel="noopener noreferrer"&gt;SOCI (Seekable OCI)&lt;/a&gt; or &lt;strong&gt;eStargz&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Research shows that while SOCI can match standard startup times, unoptimized lazy loading can degrade startup performance. That means Airflow &lt;a href="https://engineering.grab.com/docker-lazy-loading" rel="noopener noreferrer"&gt;startup going from 5s to 25s&lt;/a&gt;. Building a global, high-throughput, content-addressable storage layer to feed your microVMs is a distributed systems challenge that rivals the sandbox itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  Networking: TAP/TUN, CNI overhead, and startup latency
&lt;/h3&gt;

&lt;p&gt;Networking kills microVM projects. Quietly.&lt;/p&gt;

&lt;p&gt;Unlike Docker, which provides mature CNI plugins, Firecracker requires you to manually manage TAP interfaces, IP tables, and routing on the host.&lt;/p&gt;

&lt;p&gt;Recent research (IMC '24) shows that at high concurrency (around 400 parallel starts), setting up CNI plugins and virtual switches becomes the primary bottleneck. This overhead can &lt;a href="https://jhc.sjtu.edu.cn/~bjiang/papers/Liu_IMC2024_CNI.pdf" rel="noopener noreferrer"&gt;increase startup latency by as much as 263%&lt;/a&gt;, turning a 125ms VM boot into a multi-second delay.&lt;/p&gt;

&lt;p&gt;And debugging networking inside a "jailer" constrained environment? Notoriously difficult. Standard observability tools often fail to penetrate the VM boundary.&lt;/p&gt;

&lt;h3&gt;
  
  
  Warm pools: cold-start mitigation vs. idle cost
&lt;/h3&gt;

&lt;p&gt;Teams often maintain "warm pools" of pre-booted VMs to mitigate cold starts. This creates a complex economic problem.&lt;/p&gt;

&lt;p&gt;Keep 500 VMs warm but only use 100? You're burning cash on idle compute.&lt;/p&gt;

&lt;p&gt;Building a predictive autoscaler that spins up VMs &lt;em&gt;before&lt;/em&gt; a request hits, but not too many, is a serious data science challenge. In 2026, with GPU compute costs still high, the waste from inefficient warm pooling can easily exceed the markup charged by managed platforms.&lt;/p&gt;

&lt;h3&gt;
  
  
  LiteBox in 2026: what it is and when to use it
&lt;/h3&gt;

&lt;p&gt;As of February 2026, Microsoft has introduced &lt;a href="https://github.com/microsoft/litebox" rel="noopener noreferrer"&gt;LiteBox&lt;/a&gt;, a Rust-based Library OS. It offers a compelling middle ground: lighter than a VM but with a drastically reduced host interface compared to containers.&lt;/p&gt;

&lt;p&gt;While promising for its use of AMD SEV-SNP (Confidential Computing), LiteBox remains experimental. Unlike Firecracker, which has hardened AWS Lambda for years, LiteBox lacks a production ecosystem. Betting your company's security on LiteBox today carries "bleeding edge" risk.&lt;/p&gt;

&lt;h3&gt;
  
  
  Google Agent Sandbox: the Kubernetes-native middle ground
&lt;/h3&gt;

&lt;p&gt;Google's &lt;a href="https://github.com/kubernetes-sigs/agent-sandbox" rel="noopener noreferrer"&gt;Agent Sandbox&lt;/a&gt; deserves separate mention because it straddles Layer 1 and Layer 2. Launched at KubeCon NA 2025 as a CNCF project under Kubernetes SIG Apps, it's an open-source controller that provides a declarative API for managing isolated, stateful sandbox pods on your own Kubernetes cluster.&lt;/p&gt;

&lt;p&gt;What makes it interesting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dual isolation backends.&lt;/strong&gt; Supports both gVisor (default) and Kata Containers, letting you choose isolation strength per workload.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Warm pool pre-provisioning.&lt;/strong&gt; The SandboxWarmPool CRD maintains pre-booted pods, reducing cold start latency to sub-second — solving the warm pool problem discussed above without requiring you to build custom autoscaling logic.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes-native abstractions.&lt;/strong&gt; SandboxTemplate defines the environment blueprint. SandboxClaim lets frameworks like LangChain or Google's ADK request execution environments declaratively. This is infrastructure-as-YAML, not infrastructure-as-code.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No vendor lock-in.&lt;/strong&gt; Runs on any Kubernetes cluster, not just GKE. Though GKE offers managed gVisor integration and pod snapshots for faster resume.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tradeoff: you still operate the Kubernetes cluster. This isn't zero-ops like Layer 3 platforms. But for teams already running on Kubernetes who need agent sandboxing without adding a new vendor dependency, Agent Sandbox eliminates most of the DIY orchestration work described in the sections above while keeping you on open infrastructure.&lt;/p&gt;

&lt;p&gt;If you're on GKE already, this should be your first evaluation before looking at managed platforms.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 2 (embeddable runtimes): sandboxing as an API
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What "sandboxing as an API" means
&lt;/h3&gt;

&lt;p&gt;Layer 2 solutions wrap isolation primitives into developer-friendly interfaces. &lt;strong&gt;E2B&lt;/strong&gt; takes the "Stripe for Sandboxing" approach with a managed API, while &lt;strong&gt;microsandbox&lt;/strong&gt; offers the same micro-VM isolation tier as a self-hosted runtime. They abstract Layer 1's complexities (managing Firecracker, TAP interfaces, root filesystems) into a clean SDK.&lt;/p&gt;

&lt;p&gt;This layer works best for SaaS teams that need to add a "Code Interpreter" feature quickly. We're talking days, not months.&lt;/p&gt;

&lt;h3&gt;
  
  
  microsandbox: the self-hosted alternative
&lt;/h3&gt;

&lt;p&gt;Not every team wants to send code to a third-party API. &lt;a href="https://github.com/zerocore-ai/microsandbox" rel="noopener noreferrer"&gt;microsandbox&lt;/a&gt; takes a different approach from E2B: it's a self-hosted, open-source runtime that provides micro-VM isolation using libkrun (a library-based KVM virtualizer). Each sandbox gets its own dedicated kernel — hardware-level isolation, not just syscall interception — with sub-200ms startup times.&lt;/p&gt;

&lt;p&gt;The key difference from E2B: microsandbox runs entirely on your infrastructure. No SaaS dependency, no data leaving your network. This makes it the stronger choice for teams with strict data residency requirements or air-gapped environments where a cloud sandbox API isn't an option.&lt;/p&gt;

&lt;p&gt;The tradeoff is predictable: you own the ops. microsandbox gives you the isolation primitive and a server to manage it, but you handle scaling, monitoring, and image management yourself. Think of it as the "self-hosted E2B" — same security tier (micro-VM), different operational model.&lt;/p&gt;

&lt;p&gt;As of early 2026, microsandbox has approximately 4,700 GitHub stars and is licensed under Apache 2.0. It's the most mature open-source option in this layer for teams that need to self-host.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2 against the four questions (security, duration, data gravity, GPUs)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Untrusted Code:&lt;/strong&gt; Layer 2 excels here. Vendors purpose-built these runtimes for executing LLM-generated code. E2B uses Firecracker; microsandbox uses libkrun. Both provide hardware-level isolation with dedicated kernels per sandbox.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session Length:&lt;/strong&gt; This layer optimizes for &lt;strong&gt;ephemeral, one-shot tasks&lt;/strong&gt;. Agent needs to run a Python script to visualize a dataset and then die? Cost-effective. But for long-running agents that persist for minutes or hours, the &lt;a href="https://e2b.dev/pricing" rel="noopener noreferrer"&gt;per-second billing models&lt;/a&gt; common here accumulate rapidly, often exceeding raw compute costs.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Gravity:&lt;/strong&gt; Data movement is the main architectural constraint at this layer, but it affects managed and self-hosted runtimes differently. For managed APIs like E2B, small payloads (JSON, spreadsheets, short scripts) travel over the network with negligible overhead. E2B supports volume mounts and persistent storage, which extends its range to moderate-sized datasets. microsandbox sidesteps the network hop entirely — since it runs on your infrastructure, sandboxes execute co-located with your data by definition, eliminating egress costs and transfer latency. The breakpoint: once individual executions routinely move multi-gigabyte files (large model weights, video processing, dataset joins), even volume mounts can't fully mask the I/O penalty on managed APIs. At that scale, either self-host with microsandbox, move to Layer 3 where compute and storage share an internal network, or build a co-located Layer 1 solution.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPU Access:&lt;/strong&gt; GPU support in Layer 2 runtimes is still maturing. E2B currently focuses on CPU workloads. If your agents need GPU inference or fine-tuning, this is a genuine gap that may push you toward Layer 3 platforms or a custom Layer 1 build with GPU passthrough.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Layer 3 (managed platforms): serverless sandboxing for agents
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why managed platforms unify compute, data, and isolation
&lt;/h3&gt;

&lt;p&gt;Managed Platforms take the "Serverless Cloud" approach. The platform &lt;em&gt;is&lt;/em&gt; the sandbox.&lt;/p&gt;

&lt;p&gt;You don't make an API call to a separate sandbox service. Your entire workload runs inside an isolated environment by default. This unification solves the friction between code, data, and compute.&lt;/p&gt;

&lt;p&gt;Three managed platforms stand out, each with a different architectural bet:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Modal&lt;/strong&gt; uses gVisor (user-space kernel isolation) optimized for Python ML workloads. Strengths: native GPU support (T4 through H200), serverless autoscaling from zero, infrastructure-as-code via Python SDK. Limitations: gVisor-only isolation (no microVM option for higher-security requirements), Python-centric (limited multi-language support), no BYOC or on-prem deployment, SDK-defined images create migration friction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Northflank&lt;/strong&gt; uses both Kata Containers (microVM) and gVisor, selecting isolation level per workload. Strengths: strongest isolation of the three (dedicated kernel via Kata), BYOC deployment (AWS, GCP, Azure, bare metal), unlimited session duration, GPU support with all-inclusive pricing, OCI-compatible (no proprietary image format). Limitations: more comprehensive platform means steeper initial setup than a pure sandbox API, less Python-specific DX than Modal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Daytona&lt;/strong&gt; uses Docker containers by default with optional Kata Containers for stronger isolation. Strengths: fastest cold starts in the market (sub-90ms), native Docker compatibility, stateful sandboxes with LSP support, desktop environments for computer-use agents. Limitations: default Docker isolation is the weakest of the three — you must explicitly opt into Kata for microVM-level security. Younger platform (pivoted to AI sandboxes in early 2025).&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3 against the four questions (security, data gravity, GPUs, compliance)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Untrusted Code:&lt;/strong&gt; Platforms provide default isolation, but the level of protection varies. Modal uses &lt;a href="https://modal.com/docs/guide/sandbox-networking" rel="noopener noreferrer"&gt;gVisor&lt;/a&gt;, which intercepts syscalls in user space — stronger than containers but not equivalent to a dedicated kernel. Northflank offers Kata Containers (full microVMs with dedicated kernels) for workloads that require the strictest isolation. Daytona defaults to Docker containers, which may be insufficient for truly hostile code unless you explicitly configure Kata. If your threat model assumes kernel exploits, ask whether the platform offers microVM-level isolation, not just "sandboxing."  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Gravity:&lt;/strong&gt; Layer 3 platforms generally solve data gravity by co-locating compute and storage on high-speed internal networks, avoiding the upload/download penalty of Layer 2 APIs. Modal and Northflank both support volume mounts and cached datasets. However, data residency varies: Northflank offers BYOC deployment guaranteeing data stays in your VPC, while Modal runs on their managed infrastructure. If regulatory requirements dictate where data physically resides, BYOC support becomes a deciding factor.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;GPUs on demand: scheduling and isolation for multi-tenant inference&lt;/strong&gt; GPU access is the clearest Layer 3 differentiator, but support varies. Modal offers the broadest GPU selection (T4 through H200) with per-second billing, though total costs add up when you factor in separate charges for GPU, CPU, and RAM. Northflank offers GPU support with all-inclusive pricing that can be significantly cheaper for sustained workloads. Daytona currently lacks GPU support.   &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Research shows that without strict hardware partitioning (like MIG), multi-tenant GPU workloads can suffer &lt;a href="https://blogs.vmware.com/cloud-foundation/2024/08/27/boost-throughput-scaling-vms-minimal-gpus/" rel="noopener noreferrer"&gt;55-145% latency degradation&lt;/a&gt;. Managed platforms handle this scheduling complexity, offering "soft" or "hard" GPU isolation and handling the drivers, CUDA versions, and hardware abstraction. You request a GPU in code (e.g., &lt;code&gt;gpu="A100"&lt;/code&gt;), and the platform handles physical provisioning and isolation.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Compliance:&lt;/strong&gt; Enterprise compliance features vary significantly across platforms. Managed platforms generally let you inherit controls faster than building on primitives, but the specifics matter. Northflank's BYOC model lets you keep your data in your own cloud account, simplifying compliance with data residency requirements. Modal's managed-only infrastructure means your data runs on their servers. Daytona offers self-hosted options. Evaluate each vendor's SOC 2 certification status, audit log granularity, and network isolation capabilities against your specific compliance requirements.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Comparison: primitives vs. runtimes vs. managed platforms
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criteria&lt;/th&gt;
&lt;th&gt;Primitive (layer 1)&lt;/th&gt;
&lt;th&gt;Runtime (layer 2)&lt;/th&gt;
&lt;th&gt;Managed platform (layer 3)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Primary example&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Firecracker, gVisor, LiteBox&lt;/td&gt;
&lt;td&gt;E2B&lt;/td&gt;
&lt;td&gt;Modal, Northflank, Daytona&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Isolation options&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full control (microVM, user-space kernel, library OS)&lt;/td&gt;
&lt;td&gt;Firecracker microVM (E2B), libkrun microVM (microsandbox)&lt;/td&gt;
&lt;td&gt;gVisor only (Modal), Kata + gVisor (Northflank), Docker/Kata (Daytona)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Time to production&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Months (engineering intensive)&lt;/td&gt;
&lt;td&gt;Days (integration)&lt;/td&gt;
&lt;td&gt;Hours (deployment)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpEx &amp;amp; team cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High (requires infra team)&lt;/td&gt;
&lt;td&gt;Medium (usage fees)&lt;/td&gt;
&lt;td&gt;Low (pay-per-use)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GPU support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Hard (DIY passthrough/MIG)&lt;/td&gt;
&lt;td&gt;Limited / none&lt;/td&gt;
&lt;td&gt;Native &amp;amp; on-demand&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data gravity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Solved (local control)&lt;/td&gt;
&lt;td&gt;Varies: network hops for managed APIs (E2B), solved for self-hosted (microsandbox).&lt;/td&gt;
&lt;td&gt;Solved (unified architecture)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BYOC / self-hosted&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes (you own everything)&lt;/td&gt;
&lt;td&gt;E2B: experimental. microsandbox: yes.&lt;/td&gt;
&lt;td&gt;Northflank: yes. Modal: no. Daytona: yes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Language support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Any&lt;/td&gt;
&lt;td&gt;Any (Linux-based)&lt;/td&gt;
&lt;td&gt;Python-centric (Modal). Any OCI image (Northflank, Daytona).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Compliance effort&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Very high (DIY audit)&lt;/td&gt;
&lt;td&gt;Medium (vendor inheritance)&lt;/td&gt;
&lt;td&gt;Low (built-in features)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Key limitation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Massive ops burden&lt;/td&gt;
&lt;td&gt;Data gravity, session billing&lt;/td&gt;
&lt;td&gt;Vendor lock-in, isolation varies by vendor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Infrastructure companies&lt;/td&gt;
&lt;td&gt;SaaS "feature" add-ons&lt;/td&gt;
&lt;td&gt;AI product &amp;amp; data teams&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Notable hybrid:&lt;/strong&gt; Google Agent Sandbox. K8s-native controller supporting gVisor + Kata with warm pools. Runs on your cluster. Open-source (CNCF). Best for teams already on Kubernetes.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's next in AI agent sandboxing (2026–2027)
&lt;/h2&gt;

&lt;p&gt;Looking toward late 2026 and 2027, three trends will reshape this stack.&lt;/p&gt;

&lt;h3&gt;
  
  
  Trend: library OS sandboxes (LiteBox)
&lt;/h3&gt;

&lt;p&gt;Microsoft's entry with &lt;strong&gt;LiteBox&lt;/strong&gt; validates the move toward Library Operating Systems. By bundling application code with only the minimal kernel components needed (using a "North/South" interface paradigm), Library OSs promise the low overhead of a process with the isolation of a VM.&lt;/p&gt;

&lt;p&gt;Still experimental now. But this could redefine the performance/security trade-off in 2-3 years, potentially replacing containers for high-security workloads.&lt;/p&gt;

&lt;h3&gt;
  
  
  Trend: daemonless, embeddable sandboxing (BoxLite)
&lt;/h3&gt;

&lt;p&gt;The next frontier is sandboxing without any server process. Projects like &lt;a href="https://github.com/boxlite-ai/boxlite" rel="noopener noreferrer"&gt;BoxLite&lt;/a&gt; (distinct from Microsoft's LiteBox) explore embedding micro-VM isolation directly into an application as a library — no daemon, no daemon socket, no background process. Where microsandbox runs as a server you deploy, BoxLite aims to be a library you import.&lt;/p&gt;

&lt;p&gt;Think of it as the difference between PostgreSQL (a server) and SQLite (a library). BoxLite is the SQLite model applied to sandboxing: a single function call spins up an isolated OCI container inside your application process. This serves the "local-first" AI agent movement, where agents run on developer machines or edge devices without cloud dependencies.&lt;/p&gt;

&lt;p&gt;Still early (v0.5.10, 14 contributors, ~1000 GitHub stars), but the architectural direction — sandboxing as an embedded library rather than a service — could reshape how lightweight agent frameworks handle isolation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Trend: protocol-level permissions with MCP
&lt;/h3&gt;

&lt;p&gt;Security is moving up the stack. Kernel-level isolation answers the question "can this code escape the sandbox?" but not "should this agent be allowed to make HTTP requests at all?" The &lt;a href="https://modelcontextprotocol.io/docs/getting-started/intro" rel="noopener noreferrer"&gt;&lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt;&lt;/a&gt; opens the door to enforcing permissions at the protocol layer, where agent capabilities are declared rather than inferred. &lt;/p&gt;

&lt;p&gt;Here's the mechanism. An MCP server exposes a manifest of tools — web_search, filesystem_read, database_query — each with a defined scope. A sandbox runtime that understands MCP can derive its security policy directly from that manifest. An agent authorized to use web_search gets outbound HTTPS on port 443. An agent with only filesystem_read gets no network access at all. File system mounts narrow to the specific paths the tool declares. The sandbox's firewall rules and mount points become a function of the agent's tool permissions, not a static configuration an engineer writes once and forgets. &lt;/p&gt;

&lt;p&gt;No production sandbox does this today. But the primitives are converging: MCP adoption is accelerating across agent frameworks (LangChain, CrewAI, Google ADK all support it), and sandbox runtimes already expose the network and filesystem controls needed to enforce these policies programmatically. The missing piece is the glue layer that translates an MCP tool manifest into a sandbox security policy at boot time. Expect the first integrations in late 2026.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion: choosing the right sandbox layer for your AI agents
&lt;/h2&gt;

&lt;p&gt;The 2026 sandbox landscape isn't about choosing a virtualization technology. It's about choosing your level of abstraction. The defining question for engineering leadership is: Where do you create value?&lt;/p&gt;

&lt;p&gt;If your core business is selling infrastructure — building the next Vercel or a specialized vertical cloud — you must build on primitives (Layer 1). The operational pain of managing Firecracker fleets is your competitive moat.&lt;/p&gt;

&lt;p&gt;If you need to add code execution as a feature inside an existing product, embeddable runtimes (Layer 2) get you there in days with strong isolation and minimal architecture changes.&lt;/p&gt;

&lt;p&gt;If your core business is building an AI application, agent, or data pipeline, managed platforms (Layer 3) trade control for velocity. But "managed" is not a monolith — evaluate isolation strength (gVisor vs. microVM), deployment model (managed vs. BYOC), language constraints, and session economics for your specific workload before committing.&lt;/p&gt;

&lt;p&gt;The one decision you shouldn't make in 2026: running untrusted AI-generated code inside shared-kernel containers and hoping for the best. The cloud providers have already told you that's not enough. Listen to them.&lt;/p&gt;




&lt;h2&gt;
  
  
  Frequently asked questions (FAQ)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is docker (runc) safe enough to run untrusted AI agent code?
&lt;/h3&gt;

&lt;p&gt;For hostile or user-supplied code, shared-kernel containers generally don't provide sufficient isolation. Use stronger boundaries, such as microVMs (e.g., Firecracker) or hardened user-space kernels (e.g., gVisor), or run on a managed platform that provides multi-tenant isolation by default.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the difference between firecracker and gVisor for sandboxing?
&lt;/h3&gt;

&lt;p&gt;Firecracker uses hardware virtualization (microVMs) for stronger isolation, but this typically introduces more operational complexity. gVisor intercepts syscalls with a user-space kernel for improved isolation over standard containers, often with easier integration but at the cost of different performance/compatibility trade-offs.&lt;/p&gt;

&lt;h3&gt;
  
  
  When should I choose a primitive vs an embeddable runtime vs a managed platform?
&lt;/h3&gt;

&lt;p&gt;Choose primitives when you need maximum control and can operate the fleet (scheduler, images, networking, compliance). Choose an embeddable runtime when you need to add code execution fast, and payloads are small. Choose a managed platform when you need GPUs, data-local execution, and minimal ops.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is "data gravity" and why does it matter for sandboxing?
&lt;/h3&gt;

&lt;p&gt;Data gravity is the cost and latency of moving large datasets or model weights to where code runs. If you're routinely moving gigabytes per execution, API-style sandboxes become slow and expensive. Platforms or co-located primitives reduce transfers by running compute near the data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Are embeddable sandbox APIs (layer 2) good for long-running agents?
&lt;/h3&gt;

&lt;p&gt;Vendors usually optimize them for short-lived, one-shot execution. For agents that idle or run for minutes/hours, per-second billing and session management can get expensive compared to a platform or a self-managed fleet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need GPU isolation for AI agents, and how is it handled?
&lt;/h3&gt;

&lt;p&gt;If multiple tenants share GPUs, "noisy neighbor" effects can cause unpredictable latency and security concerns. Managed platforms typically handle GPU scheduling and isolation (e.g., MIG/partitioning strategies), whereas DIY approaches require significant engineering effort.&lt;/p&gt;

&lt;h3&gt;
  
  
  What operational work do I take on if I build on firecracker (layer 1)?
&lt;/h3&gt;

&lt;p&gt;You own image distribution/caching, networking (TAP/TUN, routing), orchestration, warm pools, autoscaling, observability, and incident response. The isolation primitive is only one part of running a production sandbox fleet.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is litebox and is it production-ready in 2026?
&lt;/h3&gt;

&lt;p&gt;LiteBox is a Library OS approach that reduces the host interface compared to containers. As described, it remains experimental relative to battle-tested microVM approaches, so adopting it carries higher risk unless you can tolerate bleeding-edge dependencies.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I think about compliance (SOC 2, ISO 27001) when choosing a sandbox layer?
&lt;/h3&gt;

&lt;p&gt;Building compliance on primitives typically requires substantial time and dedicated security engineering. Managed platforms can let you inherit controls (audit logs, network boundaries, residency options) faster, depending on vendor capabilities and your requirements.&lt;/p&gt;

&lt;h3&gt;
  
  
  What cold start time should I expect from modern sandboxes?
&lt;/h3&gt;

&lt;p&gt;Many modern approaches can achieve sub-second starts with snapshots and caching. But real-world latency often depends more on image distribution, networking setup, and warm pool strategy than on the isolation primitive alone.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>devops</category>
      <category>security</category>
    </item>
  </channel>
</rss>
