<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: OKIKUSAN-PUBLIC</title>
    <description>The latest articles on Forem by OKIKUSAN-PUBLIC (@okikusan-public).</description>
    <link>https://forem.com/okikusan-public</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3861953%2Ff5d51bc9-0cb0-4e26-901a-945364a91d28.jpg</url>
      <title>Forem: OKIKUSAN-PUBLIC</title>
      <link>https://forem.com/okikusan-public</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/okikusan-public"/>
    <language>en</language>
    <item>
      <title>Turning Obsidian into AI's Own Memory — Local Cognitive OS with Hindsight and Hermes</title>
      <dc:creator>OKIKUSAN-PUBLIC</dc:creator>
      <pubDate>Sat, 23 May 2026 04:09:26 +0000</pubDate>
      <link>https://forem.com/okikusan-public/turning-obsidian-into-ais-own-memory-local-cognitive-os-with-hindsight-and-hermes-4p9k</link>
      <guid>https://forem.com/okikusan-public/turning-obsidian-into-ais-own-memory-local-cognitive-os-with-hindsight-and-hermes-4p9k</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;📌 &lt;strong&gt;The full version (with FIG.5 interactive Hindsight loop simulator and 5 SVG figures) lives on my blog:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://okikusan-public.dev/obsidian-as-ai-memory.en" rel="noopener noreferrer"&gt;https://okikusan-public.dev/obsidian-as-ai-memory.en&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This dev.to post is the condensed version. The interactive simulator and the layered SVG visuals are only on the canonical page.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;A developer's offhand comment changed everything.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I want Obsidian to be not just a note-taking tool, but the AI's long-term memory device itself."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This single sentence crystallized a fully local stack: Ollama + Hindsight + PostgreSQL + Obsidian. Not just a combination of tools, but a new layer where &lt;strong&gt;the infrastructure audits itself and humans and AI mutually extend each other's cognition&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four prerequisite terms
&lt;/h2&gt;

&lt;p&gt;This article assumes the following background. If any of these are unfamiliar, skim here first.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hermes Agent&lt;/strong&gt; — Nous Research's OSS autonomous AI agent. Runs resident as CLI / gateway and handles dialogue with the user.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;daily-chats/&lt;/strong&gt; — A folder inside the Obsidian Vault. Every Hermes session end auto-exports that day's raw conversation log (questions, responses, tangents, hesitations, code snippets — all unedited) as markdown. &lt;strong&gt;The entry point of AI's long-term memory.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;knowledge/ and MOC&lt;/strong&gt; — A separate Vault folder. Summaries and insights extracted from daily-chats/ get organized into themed MOCs (Map of Content — index notes that bundle related notes). &lt;strong&gt;The destination layer.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hindsight&lt;/strong&gt; — Hermes's memory mechanism. Periodically scans daily-chats/, summarizes via Ollama, persists into PostgreSQL, and accumulates &lt;em&gt;summaries of summaries&lt;/em&gt; in a self-referential engine. The protagonist of this article.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Real-world benchmark with Gemma3 on Ollama: &lt;strong&gt;23.4 tokens/sec&lt;/strong&gt; — not "tolerable," but "daily usable."&lt;/li&gt;
&lt;li&gt;Hindsight repeatedly summarizes past summaries, forming a &lt;strong&gt;self-referential feedback loop&lt;/strong&gt;. Infrastructure self-auditing begins to run entirely locally.&lt;/li&gt;
&lt;li&gt;Obsidian's daily-chats/ gets redefined as &lt;strong&gt;AI's primary persistent memory device&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The loop that gradually converts "hesitation, hypothesis, discomfort" into explicit knowledge is starting to function as a &lt;strong&gt;cognitive OS&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What 23.4 tokens/sec actually means for "being local"
&lt;/h2&gt;

&lt;p&gt;Gemma3 on Ollama benchmarks at 23.4 tokens/sec. In an era of cloud dependency, this is not "tolerable speed" but "daily usable speed."&lt;/p&gt;

&lt;p&gt;What matters is not the speed itself. It's the fact that &lt;strong&gt;every process completes locally&lt;/strong&gt;. Raw conversation logs accumulated in daily-chats/ are instantly summarized by the LLM, structured by Hindsight, and persisted into PostgreSQL. No external APIs participate at all.&lt;/p&gt;

&lt;p&gt;The entire history of thought stays &lt;strong&gt;closed inside one's own machine&lt;/strong&gt;. This is more than privacy. It's an experiment in how deeply AI can understand context under the constraint "never leak externally."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7m1bm4p70c6oc6t4t549.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7m1bm4p70c6oc6t4t549.png" alt="Local infrastructure: Hindsight + Hermes + Ollama + PostgreSQL + Obsidian" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Hindsight's self-referential infrastructure
&lt;/h2&gt;

&lt;p&gt;Conventional RAG stops at "retrieve and answer." Hindsight is different. &lt;strong&gt;It repeatedly summarizes past summaries, accumulating meta-metadata from summaries of summaries&lt;/strong&gt;. The vector and metadata layers etched into PostgreSQL gradually self-organize over time.&lt;/p&gt;

&lt;p&gt;This is "infrastructure self-auditing":&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does today's daily-chat contradict last week's summary?&lt;/li&gt;
&lt;li&gt;How have the themes the user repeatedly touches evolved over the long term?&lt;/li&gt;
&lt;li&gt;To what extent do AI-generated summaries distort the original context?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The foundation is in place for the AI itself to periodically verify these. No external human reviewer needed. The system relativizes itself and proposes corrections — a loop already turning, entirely locally.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4jp6obpu4qkxj005o0ml.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4jp6obpu4qkxj005o0ml.png" alt="Hindsight self-referential feedback loop" width="800" height="620"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;FIG.2 — User ⇄ Hermes dialogue flows into daily-chats/, summarized via Ollama, promoted into MOC, and returns to Hermes as context.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  "Making Obsidian the memory" — the ultimate human-AI co-creation
&lt;/h2&gt;

&lt;p&gt;The developer didn't stop at treating Obsidian as a "second brain." They redefined it as &lt;strong&gt;"the primary memory device for AI."&lt;/strong&gt; daily-chats/ is no longer a graveyard of miscellaneous logs. Every piece of text accumulated there is structured through Hindsight and functions as AI's long-term memory.&lt;/p&gt;

&lt;p&gt;This is not a relationship where humans ask AI to "remember this for me."&lt;br&gt;
It's a relationship where humans design the environment itself: "I'll write it here, so you go ahead and structure it on your own."&lt;/p&gt;

&lt;p&gt;AI only becomes intelligent within the context it is given. The core of this pipeline lies in the reversed idea — humans prepare the field that transforms context into &lt;strong&gt;"persistent, searchable, and self-referential memory."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhwse7v0sx4y6svb6p8aj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhwse7v0sx4y6svb6p8aj.png" alt="Cognitive collaboration framework: Human ↔ AI" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The loop that turns tacit into explicit
&lt;/h2&gt;

&lt;p&gt;What gets written in daily-chats/ is not polished ideas, but &lt;strong&gt;"hesitation, hypotheses, discomfort, wavering judgment."&lt;/strong&gt; Hindsight gradually converts that ambiguity into explicit knowledge. Eventually, Obsidian's knowledge/ layer becomes not just a collection of MOCs, but a &lt;strong&gt;"knowledge graph that AI itself is cultivating."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When that happens, what does the developer witness?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The moment a discomfort felt in the past is rediscovered by AI&lt;/li&gt;
&lt;li&gt;The moment a rough note suddenly gains meaning in a different context&lt;/li&gt;
&lt;li&gt;The moment the infrastructure quietly points out: "This theme contradicts what it was three months ago"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything local, everything persistent, everything self-referential — such a state may already be moving at hand.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb3jh7zif3igwih8oukbu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb3jh7zif3igwih8oukbu.png" alt="Tacit to explicit transformation loop" width="800" height="460"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;FIG.3 — The left side (doubt, hypothesis, discomfort) gets passed into daily-chats/. Hindsight accumulates summaries of summaries, growing the right-side knowledge/ MOCs over time.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  This is not about tools — it's about a cognitive OS
&lt;/h2&gt;

&lt;p&gt;Ollama + Hindsight + PostgreSQL + Obsidian.&lt;/p&gt;

&lt;p&gt;What makes this combination special is not that any one piece is superior. It's because &lt;strong&gt;the circuit that converts the fluidity of human thought into a form AI can handle&lt;/strong&gt; has finally closed.&lt;/p&gt;

&lt;p&gt;The moment the developer decided to "make Obsidian the memory itself," technology transcended mere means. AI and humans have begun, without interference from anyone, to build with their own hands an &lt;strong&gt;OS for becoming wise together&lt;/strong&gt; while compensating for each other's weaknesses.&lt;/p&gt;

&lt;p&gt;Every layer is in your own hands. That's why this deserves the name "cognitive OS."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flf28cxmf7a5x4lueptau.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flf28cxmf7a5x4lueptau.png" alt="Cognitive OS stack" width="799" height="547"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;FIG.4 — Human → Hermes → Hindsight → Obsidian + PostgreSQL → Ollama → local machine. Every layer in your own hands.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This loop has only just begun.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;📌 The full version with the FIG.5 interactive Hindsight loop simulator (click to iterate the recursion 1 → 2 → 3 and watch the knowledge graph grow) is on the original:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://okikusan-public.dev/obsidian-as-ai-memory.en" rel="noopener noreferrer"&gt;https://okikusan-public.dev/obsidian-as-ai-memory.en&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How are you connecting your notes to AI memory? Would love to hear how the "summary of summary" idea lands in your own setup. 🦄 💬&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>rag</category>
      <category>showdev</category>
    </item>
    <item>
      <title>From editor to agent management — Google Antigravity 2.0 marks the arrival of the Agent OS</title>
      <dc:creator>OKIKUSAN-PUBLIC</dc:creator>
      <pubDate>Wed, 20 May 2026 12:44:32 +0000</pubDate>
      <link>https://forem.com/okikusan-public/from-editor-to-agent-management-google-antigravity-20-marks-the-arrival-of-the-agent-os-4816</link>
      <guid>https://forem.com/okikusan-public/from-editor-to-agent-management-google-antigravity-20-marks-the-arrival-of-the-agent-os-4816</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This piece is mirrored from my blog. Canonical:&lt;br&gt;
&lt;a href="https://okikusan-public.pages.dev/antigravity-agent-os.en" rel="noopener noreferrer"&gt;https://okikusan-public.pages.dev/antigravity-agent-os.en&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcqj87egcgav1gfq38kzl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcqj87egcgav1gfq38kzl.png" alt="FIG.0 — AGENT OS STACK" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Antigravity 2.0 is not an AI-IDE update. It is the moment the centre of gravity in developer experience shifts &lt;strong&gt;from "the editor" to "agent management."&lt;/strong&gt; The Desktop / CLI / SDK / integration funnel together stop being a "specialist worker" like Claude Code / Codex / Grok Build, and start looking like an &lt;strong&gt;Agent OS&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The old axis — "which model is smarter" — is no longer enough. Harness design, permission boundaries, context, scheduled execution, and human review — these five decide developer productivity now. The next battlefield of AI coding, laid out.&lt;/p&gt;

&lt;h2&gt;
  
  
  ▍ SOURCES
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://antigravity.google/" rel="noopener noreferrer"&gt;Google Antigravity — official site&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://x.com/antigravity/status/2056795168326754759" rel="noopener noreferrer"&gt;Launch announcement from @antigravity&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.googleblog.com/an-important-update-transitioning-gemini-cli-to-antigravity-cli/" rel="noopener noreferrer"&gt;Transitioning Gemini CLI to Antigravity CLI (Google Developers Blog)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com/2026/05/19/google-launches-antigravity-2-0-with-an-updated-desktop-app-and-cli-tool/" rel="noopener noreferrer"&gt;Google launches Antigravity 2.0 (TechCrunch)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.marktechpost.com/2026/05/19/google-launches-antigravity-2-0-at-i-o-2026-a-standalone-agent-first-platform-with-cli-sdk-managed-execution-and-enterprise-support/" rel="noopener noreferrer"&gt;Antigravity 2.0 = standalone agent-first platform (MarkTechPost)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Official launch video: &lt;a href="https://www.youtube.com/watch?v=6C0FjHoN3qE" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=6C0FjHoN3qE&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  ▍ TERMS — definitions and premises
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Agent harness&lt;/strong&gt; — the runtime that wraps a model. By Karpathy's definition, &lt;strong&gt;Agent = Model + Harness&lt;/strong&gt;. Concretely it binds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;System prompts / role definitions&lt;/li&gt;
&lt;li&gt;Tools (file ops, Bash, web fetch, MCP external calls, etc.)&lt;/li&gt;
&lt;li&gt;Memory / state&lt;/li&gt;
&lt;li&gt;Permissions and guardrails&lt;/li&gt;
&lt;li&gt;Feedback loop (retries, verification, sub-agent spawn)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Real examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code's harness&lt;/strong&gt; = CLI agent loop + tool set + project permissions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cursor's harness&lt;/strong&gt; = editor integration + Apply machinery + codebase index&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Antigravity's harness&lt;/strong&gt; = local app server + runtime + Skill pack attachment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Same model + different harness → drastically different behaviour. The harness is the part that determines how the agent actually acts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent OS layer / Specialist worker layer&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agent OS layer&lt;/strong&gt; = shares one harness across multiple UIs, permissions, scheduler, agent orchestration (Antigravity / Hermes / Copilot Studio)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Specialist worker layer&lt;/strong&gt; = invoked to do the work (Claude Code / Codex CLI / Grok Build)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;"Agent OS" is not Google's official term — community framing and this article's editorial lens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Subagent&lt;/strong&gt; — a child agent spawned dynamically by a parent. Antigravity 2.0's launch demo built an OS with 93 parallel subagents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skill&lt;/strong&gt; — pluggable capability pack you attach to an agent. Android Skills / Firebase Skills add a specific domain's APIs and conventions to the harness.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;App server&lt;/strong&gt; — shared local backend inside the Antigravity install. Both Desktop UI and CLI binary call this same app server.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The five comparison axes&lt;/strong&gt; — (1) harness design / (2) permission boundaries / (3) context / (4) scheduled execution / (5) human review.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Antigravity 2.0 is &lt;strong&gt;Desktop / CLI / SDK / AI Studio×Android×Firebase integration&lt;/strong&gt; — not a bundle of scattered features, but &lt;strong&gt;one agent harness shared across four UIs&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Claude Code / Codex / Grok Build sit at the &lt;strong&gt;specialist worker layer&lt;/strong&gt;; Antigravity 2.0 sits at the &lt;strong&gt;Agent OS layer&lt;/strong&gt; binding them. A "VS" framing collapses across layers.&lt;/li&gt;
&lt;li&gt;The axes that matter: &lt;strong&gt;(1) which harness / (2) what permission boundary / (3) what context / (4) what scheduled execution / (5) what human review&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;For individuals: pair &lt;strong&gt;Hermes (OSS) with Antigravity-native&lt;/strong&gt;. For enterprises: &lt;strong&gt;Copilot Studio / Workspace Studio / Antigravity, cross-cutting selection&lt;/strong&gt;. Editor-only comparison is now a generation behind.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  § 01 SHIFT — from editor to agent management
&lt;/h2&gt;

&lt;p&gt;Two years of AI coding tools have been narrated through the &lt;strong&gt;editor&lt;/strong&gt;: Copilot, Cursor, Claude Code, Codex CLI, Grok Build. They all evolved on the premise of "AI writes code inside the editor."&lt;/p&gt;

&lt;p&gt;Antigravity 2.0 breaks that frame. This is not an AI-IDE update. It assembles &lt;strong&gt;Desktop / CLI / SDK / AI Studio integration&lt;/strong&gt; all at once, and what it produces is &lt;strong&gt;a platform for managing agents&lt;/strong&gt; — a single agent harness shared across four UIs.&lt;/p&gt;

&lt;h2&gt;
  
  
  § 02 PILLARS — Desktop / CLI / SDK / integration funnel
&lt;/h2&gt;

&lt;h3&gt;
  
  
  2-1 Desktop app — the command center
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;command bridge&lt;/strong&gt; for running many agents in parallel. &lt;strong&gt;Dynamic subagents&lt;/strong&gt; (spawn and retire children on the fly), &lt;strong&gt;scheduled tasks&lt;/strong&gt; (cron-style runs), and &lt;strong&gt;per-project permission scopes&lt;/strong&gt;. The feel shifts from "one task in one editor" to &lt;strong&gt;"many tasks running at once, all in view."&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2-2 Antigravity CLI — different UI, same harness
&lt;/h3&gt;

&lt;p&gt;Successor to Gemini CLI. A lightweight UI for terminal people, but &lt;strong&gt;the key is that it shares the same agent harness as Desktop&lt;/strong&gt;. The CLI isn't a separate product — it's &lt;strong&gt;a different interface to the same base&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4cud6cygvt8mkti4zjhy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4cud6cygvt8mkti4zjhy.png" alt="FIG.2-2 — SHARED HARNESS" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;▍ "Sharing the same harness" — what it actually means&lt;/strong&gt;&lt;br&gt;
Not two competing apps. &lt;strong&gt;A single local install bundles Desktop UI / CLI binary / a shared app server (the agent harness itself).&lt;/strong&gt; Per @karthickdotxyz: &lt;em&gt;"Same tools and app server as Antigravity 2.0."&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No need to run both at once&lt;/strong&gt; — Desktop or CLI, either path completes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configs, agent definitions, permissions, scheduled tasks are all shared&lt;/strong&gt; — a job composed in Desktop can be invoked from CLI as-is&lt;/li&gt;
&lt;li&gt;Natural split: &lt;strong&gt;CI / headless server work via CLI&lt;/strong&gt;, interactive development via Desktop&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  2-3 SDK — embed the harness into your own product
&lt;/h3&gt;

&lt;p&gt;Google's agent harness is now &lt;strong&gt;something you embed into your own workflow or product&lt;/strong&gt;. This stops being "a tool that makes AI write code" and starts being "&lt;strong&gt;a platform for building and operating AI agents.&lt;/strong&gt;" SDK code runs &lt;strong&gt;on your own PC, your servers, your CI runners&lt;/strong&gt; — Google doesn't host the runtime; it lives inside your process. Antigravity could become &lt;strong&gt;a component that runs inside other companies' products&lt;/strong&gt;, not just Google's IDE.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd0u4yh1hmn9e62rfq0nh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd0u4yh1hmn9e62rfq0nh.png" alt="FIG.2-3 — SDK EMBED" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;▍ CLI vs SDK — what's the actual difference?&lt;/strong&gt;&lt;br&gt;
Both run on local machines or on servers. The real distinction is &lt;strong&gt;the primary use case each is designed for&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CLI&lt;/strong&gt; = an &lt;strong&gt;interactive front&lt;/strong&gt; designed for a human (or shell script) to drive an agent directly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SDK&lt;/strong&gt; = a &lt;strong&gt;library&lt;/strong&gt; designed for your program to drive the agent via function calls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Strictly, &lt;strong&gt;you can also call the CLI from a program&lt;/strong&gt; by shelling out. But shelling out comes with costs: (a) process startup overhead / (b) brittle text-output parsing / (c) no types / (d) streaming and structured events are awkward. The SDK assumes that use case from the start. Same shape as &lt;strong&gt;AWS CLI vs boto3&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  2-4 AI Studio × Android × Firebase integration
&lt;/h3&gt;

&lt;p&gt;Less "three products wired together at the UI layer," more &lt;strong&gt;"Antigravity sits in the middle as the harness, with AI Studio (entry) and Android / Firebase (exit) bolted on via a shared harness and Skills."&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI Studio → Antigravity ("Export to Antigravity")&lt;/strong&gt;: AI Studio Build now runs on the same agent harness as Antigravity. A dedicated &lt;em&gt;Export to Antigravity&lt;/em&gt; button hands off the &lt;strong&gt;full agent conversation&lt;/strong&gt; (chat history, configuration, generated code) into the local Antigravity environment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Antigravity → Android&lt;/strong&gt;: Equip the agent with the official &lt;strong&gt;Android Skills&lt;/strong&gt; — Android SDK / Gradle / manifests become part of the agent's context. Going further, the &lt;code&gt;studio&lt;/code&gt; command in &lt;strong&gt;Android CLI 1.0&lt;/strong&gt; lets the agent connect to a running Android Studio instance and borrow its deep codebase understanding.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Antigravity → Firebase&lt;/strong&gt;: &lt;strong&gt;Firebase Skills&lt;/strong&gt; teach the agent Firestore / Functions / Hosting / Auth conventions, configuration and deployment included.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjkmabpjupj0060223r1b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjkmabpjupj0060223r1b.png" alt="FIG.3 — INTEGRATION BRIDGES" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So Google's "vertical integration" play is re-engineered &lt;strong&gt;not at the UI layer but at the harness and Skill (attachable capability packs) layer&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Official launch video: &lt;a href="https://www.youtube.com/watch?v=6C0FjHoN3qE" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=6C0FjHoN3qE&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  § 03 LAYER — not a "specialist worker"
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;▍ Terminology note on "Agent OS"&lt;/strong&gt;&lt;br&gt;
The phrase "Agent OS" used throughout this piece is &lt;strong&gt;not Google's official terminology&lt;/strong&gt;. Right after launch, &lt;a href="https://x.com/grok/status/2056931139567141200" rel="noopener noreferrer"&gt;@grok&lt;/a&gt; called it &lt;em&gt;"the emerging Agent OS category"&lt;/em&gt; and &lt;a href="https://x.com/arsh_goyal/status/2056830521125249318" rel="noopener noreferrer"&gt;@arsh_goyal&lt;/a&gt; framed it as a "centralized Agent Manager." This article borrows that framing to describe a structural pattern: &lt;strong&gt;a single harness shared across multiple UIs, with permissions, scheduling, and sub-agent orchestration unified at one layer&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Lining up Antigravity 2.0 with &lt;strong&gt;Claude Code / Codex CLI / Grok Build&lt;/strong&gt; and asking "which one's best" misses the point. They live at different layers.&lt;/p&gt;

&lt;h3&gt;
  
  
  3-1 Specialist worker layer vs Agent OS layer
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Specialist worker layer&lt;/th&gt;
&lt;th&gt;Agent OS layer&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Role&lt;/td&gt;
&lt;td&gt;Called to do the work&lt;/td&gt;
&lt;td&gt;Orchestrates and supervises&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Examples&lt;/td&gt;
&lt;td&gt;Claude Code / Codex CLI / Grok Build / Cursor Agent&lt;/td&gt;
&lt;td&gt;Hermes (OSS) / Microsoft Copilot Studio / &lt;strong&gt;Google Antigravity 2.0&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Strengths&lt;/td&gt;
&lt;td&gt;Instant reasoning, code generation, file ops&lt;/td&gt;
&lt;td&gt;Multiple UIs / parallel execution / permissions / shared harness&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  3-2 "VS" framing breaks across layers
&lt;/h3&gt;

&lt;p&gt;"Antigravity 2.0 vs Claude Code" is a &lt;strong&gt;layer violation&lt;/strong&gt;. As Antigravity expands via the SDK inside other companies' products, the natural composition becomes "Antigravity-on-top, calling Claude Code / Codex CLI / Grok Build as workers." The right peers to compare with are &lt;strong&gt;Hermes / Copilot Studio — same Agent OS layer&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0mpsy2hyaue0s39dtvoi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0mpsy2hyaue0s39dtvoi.png" alt="FIG.4 — LAYER STACK" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;▍ Direct comparison with Hermes / Copilot Studio&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hermes&lt;/strong&gt; = OSS / individual-tilted / multi-model / 22 gateways / Obsidian integration / &lt;strong&gt;domain-agnostic&lt;/strong&gt; (works outside coding too)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Microsoft Copilot Studio&lt;/strong&gt; = M365 territory / enterprise permissions / Power Platform integration / &lt;strong&gt;business-workflow focused&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Antigravity 2.0&lt;/strong&gt; = Google-native / AI Studio × Android × Firebase vertical integration / &lt;strong&gt;software-engineering specialised&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The scope difference matters&lt;/strong&gt;: sitting at the same Agent OS layer does not mean the same role. Hermes is a domain-agnostic general harness; Copilot Studio is for business workflows; Antigravity is purpose-built for software-engineering work.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  § 04 AXIS — renewing the comparison axes
&lt;/h2&gt;

&lt;p&gt;Re-reading §02's four pillars as a structure, each contains a design choice that the old "model IQ axis" cannot capture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Desktop's dynamic subagents + scheduled tasks → &lt;strong&gt;which harness, when to fire automatically&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;CLI sharing the app server with Desktop → &lt;strong&gt;same harness called from different UIs&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;SDK running in your own process → &lt;strong&gt;what permission boundary, what environment&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;AI Studio / Android / Firebase Skills → &lt;strong&gt;what context to feed the agent&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;The agentic IDE's review surface → &lt;strong&gt;how humans review&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So if you unpack the structure of Antigravity 2.0 honestly, the five axes — &lt;strong&gt;harness / permission boundary / context / timing / review&lt;/strong&gt; — surface naturally.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;▍ This is my own view — why these 5 axes&lt;/strong&gt;&lt;br&gt;
The five design axes are &lt;strong&gt;not a standard framework published by Google, IDC, Forrester, or anyone else&lt;/strong&gt; — they are &lt;strong&gt;my editorial synthesis of what I think actually matters&lt;/strong&gt;. My reasoning for picking these specific 5:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;harness&lt;/strong&gt; — with model IQ commoditising, harness design determines actual behaviour&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;permission boundary&lt;/strong&gt; — when agents act autonomously, permission scope decides blast radius&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;context&lt;/strong&gt; — same model + same IQ produces wildly different output depending on context given&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;timing&lt;/strong&gt; — manual / hook / cron agents are different beasts. Antigravity 2.0 making scheduled tasks first-class is evidence this matters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;review&lt;/strong&gt; — human-in-the-loop verification load is the productivity bottleneck&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Individual terms have currency across the industry, but &lt;strong&gt;bundling these 5 as "the evaluation axes that matter"&lt;/strong&gt; is my judgement.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsih6e95pa56z4nb0mzej.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsih6e95pa56z4nb0mzej.png" alt="FIG.1 — AXIS MAP" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Side-by-side, what changes per axis between the editor era and the Agent OS era:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Axis&lt;/th&gt;
&lt;th&gt;Editor era (~2025)&lt;/th&gt;
&lt;th&gt;Agent OS era (2026 →)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;harness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Editor's &lt;strong&gt;completion speed / UX&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Which tools, memory, permissions, feedback loops&lt;/strong&gt; you wrap around the model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;permission&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Largely a non-question — manual control&lt;/td&gt;
&lt;td&gt;Autonomous agents → &lt;strong&gt;Project / User / Agent-scoped permissions&lt;/strong&gt; define blast radius&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;context&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Context window size&lt;/strong&gt; (a "quantity" axis)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;What you pull in and hand the agent&lt;/strong&gt; (a "quality" axis, plus Skill packs)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;timing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Completion &lt;strong&gt;as the human is typing&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Manual / hook / cron / scheduled&lt;/strong&gt; — async / parallel / 24-7 included&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;review&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Humans read before and after writing code&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;How far do you trust auto-executed agent output, and where does a human gate it?&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Footnote: the old axis "which model is smartest" no longer stands on its own — same model with different harness and context produces wildly different output.&lt;/p&gt;

&lt;p&gt;These five decide &lt;strong&gt;developer productivity itself&lt;/strong&gt;. Smarter models with sloppy harnesses just &lt;strong&gt;automate Slop&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  § 05 BATTLEFIELD — the next battlefield is not inside the editor
&lt;/h2&gt;

&lt;p&gt;The next battlefield is no longer inside the editor — it is in &lt;strong&gt;Agent OS orchestration design&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fig4cb2x018erk6q5dfx9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fig4cb2x018erk6q5dfx9.png" alt="FIG.2 — BATTLEFIELD TIMELINE" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;2023&lt;/strong&gt;: Prompt engineering (single model calls)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2024&lt;/strong&gt;: Context assembly (RAG + memory)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2025&lt;/strong&gt;: AI in the editor (Copilot + inline suggestions)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2026+&lt;/strong&gt;: Orchestrate agents (Agent OS) — compose / supervise / continuous execution&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  ▍ VOICES — use cases that surfaced in the first 48 hours
&lt;/h2&gt;

&lt;p&gt;Snapshots from X in the 48 hours since launch. The chatter is visibly shifting from "a specialist worker writes code" toward &lt;strong&gt;"a fleet of agents gets orchestrated."&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://x.com/andreasawires/status/2056933069982982587" rel="noopener noreferrer"&gt;@andreasawires&lt;/a&gt;: "93 parallel sub-agents, 12 hours, 15K+ model requests, 2.6 billion tokens, under $1K in API credits" — built a full OS from scratch&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://x.com/andyzhang/status/2056942886303019110" rel="noopener noreferrer"&gt;@andyzhang (Antigravity team)&lt;/a&gt;: "Antigravity 2.0, a desktop application to manage all of your agents" — official launch post&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://x.com/mirrokni/status/2056914293291950495" rel="noopener noreferrer"&gt;@mirrokni (Vahab Mirrokni, Google)&lt;/a&gt;: Antigravity &lt;code&gt;/teamwork&lt;/code&gt; agents recreated the AlphaZero paper end-to-end (RL + TPU + Web app)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://x.com/SHT4BHARAT/status/2056933731836088330" rel="noopener noreferrer"&gt;@SHT4BHARAT&lt;/a&gt;: "We are officially out of the chatbot era and deep into production-scale autonomous workflows"&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://x.com/karthickdotxyz/status/2056959076790374765" rel="noopener noreferrer"&gt;@karthickdotxyz&lt;/a&gt;: Antigravity CLI official launch — Go binary, async workflows, "Same tools and app server as Antigravity 2.0"&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://x.com/neuecc/status/2056937379404124319" rel="noopener noreferrer"&gt;@neuecc (Yoshifumi Kawai)&lt;/a&gt;: on adopting Antigravity 2.0 + IDE separated workflow over Cursor 3 for complex projects&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://x.com/gptzone_net/status/2056960689013768538" rel="noopener noreferrer"&gt;@gptzone_net&lt;/a&gt;: "Antigravity 2.0 no se debería evaluar como un autocompletado más agresivo. Se debería evaluar como un cambio de workflow"&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://x.com/BeamManP/status/2056814500888916231" rel="noopener noreferrer"&gt;@BeamManP&lt;/a&gt;: Gemini 3.5 Flash can now do structural music analysis — same engine that powers Antigravity&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  § 06 FIT — how individuals and enterprises choose
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;▍ This section is opinion&lt;/strong&gt;&lt;br&gt;
The "if you're an individual, pick X / if you're an org, pick Y" framing in this section is &lt;strong&gt;my personal recommendation&lt;/strong&gt;, not an official guide from Google / Microsoft / Nous Research.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Facts&lt;/strong&gt;: Hermes = OSS / multi-model / Obsidian integration / domain-agnostic. Antigravity 2.0 = vertically integrated into Google's ecosystem / software-engineering specialised. Microsoft Copilot Studio = M365 territory / business-workflow focused.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Opinion&lt;/strong&gt;: "Individuals should pair Hermes with Antigravity" is what I think makes sense — others reaching the opposite conclusion ("standardise on Copilot Studio for simpler ops" / "go all-OSS on Hermes for transparency") would be perfectly reasonable.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Individual vs team/org evaluation per &lt;strong&gt;the 5 axes of §04&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Axis&lt;/th&gt;
&lt;th&gt;Individual use&lt;/th&gt;
&lt;th&gt;Team / org use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;harness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Each developer installs their preferred harness locally&lt;/td&gt;
&lt;td&gt;Standardise one harness org-wide / &lt;strong&gt;custom harness embedded via Antigravity SDK in internal products&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;permission&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Read/write on personal Mac / personal repos&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Hierarchical Project / User / Role ACL&lt;/strong&gt;, segregation across internal systems&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;context&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Personal Obsidian / personal git / personal Slack DMs&lt;/td&gt;
&lt;td&gt;Internal Confluence / shared GitHub org / company Slack / Linear・Jira / on-call runbooks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;timing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Interactive on personal PC or light personal cron&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Shared server / CI runner / 24-7 environment&lt;/strong&gt; — embed agents via SDK&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;review&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Personal self-check&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Code review / PR approval flow / audit logs / compliance checks&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Footnote: "harness" itself is neutral. Team-readiness depends on the implementation — Claude Code / Hermes / Cursor assume "personal local"; Antigravity 2.0 covers both with Project permissions + SDK + Enterprise framing; Copilot Studio is "team / business" from the start.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;▍ For team-shared harness, SDK is the only path&lt;/strong&gt;&lt;br&gt;
Antigravity's standalone Desktop / CLI structure assumes "a local app server on a personal machine." Trying to share that harness across a team — git-syncing the config doesn't help, the instances are still separate.&lt;/p&gt;

&lt;p&gt;If you want a team to share one harness, the only realistic path is &lt;strong&gt;embedding the SDK in an internal backend / shared server / CI runner, so multiple users hit a single harness instance you operate yourself&lt;/strong&gt;. Google does not offer a hosted "Managed harness," so &lt;strong&gt;"team harness as a Service" has to be built in-house&lt;/strong&gt; as of today.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  6-1 Individuals — Hermes and Antigravity in combination
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;OSS-leaning individuals&lt;/strong&gt; still get great value from Hermes. Obsidian integration, multi-model routing, the gateway pack (Telegram / Discord / LINE / Slack) are Hermes-only as of now.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Google-ecosystem individuals&lt;/strong&gt; get the most from Antigravity. Prototype in AI Studio, hand it to Antigravity Desktop, ship to Firebase — that vertical funnel Hermes can't reproduce.&lt;/p&gt;

&lt;p&gt;The practical answer is &lt;strong&gt;both at once&lt;/strong&gt;. Use Hermes to orchestrate the personal tacit-thought pool, and use Antigravity for heavy lifting inside the Google ecosystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  6-2 Enterprises — three Agent OSes across contexts
&lt;/h3&gt;

&lt;p&gt;For enterprises, the picture is &lt;strong&gt;three Agent OSes mapped to three contexts&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Business context&lt;/strong&gt; = Microsoft Copilot Studio (M365 / expense / calendar / doc review)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dev context&lt;/strong&gt; = Google Antigravity 2.0 (coding / Android / Firebase / GCP)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Personal context&lt;/strong&gt; = Hermes-class OSS (personal tacit-thought pools / Obsidian / custom gateways)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Centring procurement on a standalone editor (Claude Code / Codex CLI / Cursor) is now a generation behind.&lt;/p&gt;

&lt;h2&gt;
  
  
  ✦ Summary
&lt;/h2&gt;

&lt;p&gt;Claude Code / Codex / Grok Build are specialist workers. Antigravity 2.0 / Hermes / Copilot Studio / Workspace Studio are Agent OSes. &lt;strong&gt;A "VS" framing across these layers no longer holds&lt;/strong&gt;. The conversation must shift to within-layer comparison and across-layer composition.&lt;/p&gt;

&lt;p&gt;The axes that count are &lt;strong&gt;harness design / permission boundaries / context / scheduled execution / human review&lt;/strong&gt;. "Which model is smarter" no longer covers it.&lt;/p&gt;

&lt;p&gt;For individuals: pair Hermes with Antigravity. For enterprises: use Copilot Studio, Workspace Studio, and Antigravity across business / dev / personal axes. &lt;strong&gt;Productivity now lives in Agent OS design, not inside the editor.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;Full canonical version (with interactive figures):&lt;br&gt;
&lt;a href="https://okikusan-public.pages.dev/antigravity-agent-os.en" rel="noopener noreferrer"&gt;https://okikusan-public.pages.dev/antigravity-agent-os.en&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Related posts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://okikusan-public.pages.dev/context-is-the-gap.en" rel="noopener noreferrer"&gt;"Stop building AI that replays yesterday's spec" — context is the gap between spec and source of truth&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://okikusan-public.pages.dev/longtail-tacit-agent.en" rel="noopener noreferrer"&gt;AI agents stepping into the long-tail × tacit-thought territory that code can't cover&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://okikusan-public.pages.dev/hermes-agent-second-brain-engine.en" rel="noopener noreferrer"&gt;Hermes Agent — the execution engine for the second brain&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://okikusan-public.pages.dev/ai-tasks-not-jobs.en" rel="noopener noreferrer"&gt;Stop talking about "which jobs AI replaces" — look at the tasks inside each job&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>antigravity</category>
      <category>gemini</category>
      <category>agentos</category>
    </item>
    <item>
      <title>Don't build an AI that replays yesterday's spec — the gap between spec and source of truth is the real context</title>
      <dc:creator>OKIKUSAN-PUBLIC</dc:creator>
      <pubDate>Tue, 19 May 2026 22:45:49 +0000</pubDate>
      <link>https://forem.com/okikusan-public/dont-build-an-ai-that-replays-yesterdays-spec-the-gap-between-spec-and-source-of-truth-is-the-9m2</link>
      <guid>https://forem.com/okikusan-public/dont-build-an-ai-that-replays-yesterdays-spec-the-gap-between-spec-and-source-of-truth-is-the-9m2</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;📌 &lt;strong&gt;The full version (with interactive SVG figures, the drift curve, the five-whys hub, the document-vs-context split, and the Harness concentric layers) is hosted on my blog:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://okikusan-public.pages.dev/context-is-the-gap.en" rel="noopener noreferrer"&gt;https://okikusan-public.pages.dev/context-is-the-gap.en&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This dev.to post is the condensed version. The visualisations live on the original.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;More and more often, an AI agent's accuracy is decided &lt;strong&gt;by its context, not its prompting&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But "context" here is not a polished spec. What really moves the needle is &lt;strong&gt;the gap between the spec and the source of truth&lt;/strong&gt;, and the &lt;strong&gt;reasons behind the drift&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;An AI fed only the spec replays "past truth." Feed it the drift reasons too, and it approaches "today's truth." The blind spot of Spec-Driven Development and the real core of Harness Engineering, laid out.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The frontier of AI-agent accuracy has shifted: &lt;strong&gt;model → prompt → context&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;If you mistake "context" for a polished spec, the AI just &lt;strong&gt;replays "past truth"&lt;/strong&gt; — specs drift further from the Source of Truth (running code, ops, field judgement) the longer time passes&lt;/li&gt;
&lt;li&gt;What actually works is the &lt;strong&gt;reasons for the drift&lt;/strong&gt;. Five whys — why the spec was changed, why an exception was allowed, why the implementation compromised, why the issue went the way it did, why the review came out that way — decide the quality of the AI's output&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documents are polished; context is accumulated.&lt;/strong&gt; Put the spec at the core of the Harness, and layer the drift reasons around it&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Spec vs Source of Truth — the gap is inevitable
&lt;/h2&gt;

&lt;p&gt;The spec describes what &lt;strong&gt;should be&lt;/strong&gt;. A snapshot of agreement at a moment, internally coherent, neatly polished.&lt;/p&gt;

&lt;p&gt;As implementation and operations evolve, the actual "truth" drifts elsewhere:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The running code&lt;/strong&gt; — hard-coded values, exception handlers, commented-out branches&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The DB schema and the live data&lt;/strong&gt; — migration history, unexpected records, exceptional values&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The actual API behaviour&lt;/strong&gt; — undocumented responses, unofficial endpoints&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Customer-side operating decisions&lt;/strong&gt; — approval routes never written down, tacit exceptions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Field judgement&lt;/strong&gt; — choices an operator made on the spot&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are the &lt;strong&gt;Source of Truth (SoT)&lt;/strong&gt;. The spec inevitably drifts away from the SoT over time. This is not laziness — it's structural.&lt;/p&gt;

&lt;p&gt;The problem is not that the gap exists. It's that &lt;strong&gt;the gap is never explained&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjdzkkjwyjpggkccvlm9n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjdzkkjwyjpggkccvlm9n.png" alt="Spec vs Source of Truth: the gap is the context" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  An AI fed only the spec replays "past truth"
&lt;/h2&gt;

&lt;p&gt;Typical failures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"The spec says X is correct, but the code shows Y." → The AI trusts the spec, returns X, and &lt;strong&gt;drifts from reality&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;"The spec has no exception handling, so edge cases can be ignored." → &lt;strong&gt;Operationally impossible — a misjudgement&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;"I implemented per the latest API docs." → &lt;strong&gt;The unofficial operating rules get missed&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not the AI's fault. &lt;strong&gt;The context you fed it is frozen at a point in time&lt;/strong&gt;, and the AI is faithful to that point. The cleaner the spec, the more confidently the AI quotes "past truth."&lt;/p&gt;

&lt;p&gt;Reverse-engineering alone is not enough either. Code reveals "&lt;strong&gt;what is implemented and how&lt;/strong&gt;," but never "&lt;strong&gt;why it became that&lt;/strong&gt;."&lt;/p&gt;

&lt;h2&gt;
  
  
  Five whys to accumulate — that's strong context
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvrvtqicby0qebl9gkq4a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvrvtqicby0qebl9gkq4a.png" alt="Five whys to accumulate as context" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;What to keep&lt;/th&gt;
&lt;th&gt;Where it lives&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;01&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Why was the spec changed?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Change log / meeting notes / Slack&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;02&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Why was the exception allowed?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ops decision log / case-by-case memos&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;03&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Why was the implementation compromised?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Code comments / PR comments&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;04&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Why was the issue argued this way?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Issues / discussion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;05&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Why did the review come out this way?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;PR review comments&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;Keeping these "whys" is exactly the &lt;strong&gt;Externalisation&lt;/strong&gt; step in Nonaka's SECI model. The twist: you're &lt;strong&gt;externalising the process, not the conclusion&lt;/strong&gt;. That's how judgement patterns become reproducible in other contexts.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Documents are polished; context is accumulated
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Documents&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Target&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Humans / clients&lt;/td&gt;
&lt;td&gt;AI agents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Nature&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Coherence, consistency, polish&lt;/td&gt;
&lt;td&gt;Judgement material, contradictions, wobbles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Examples&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Proposals / final specs / articles / manuals&lt;/td&gt;
&lt;td&gt;Issues / PR reviews / ops notes / failure logs / rough notes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Verb&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Polish&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Accumulate&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Tolerating contradiction is the core.&lt;/strong&gt; If you treat context as a "thinking process," contradictions are natural. Human judgement wobbles constantly; organisational decisions get overwritten. &lt;strong&gt;Whether you can keep that without sanding it down&lt;/strong&gt; decides whether your AI agent can reproduce "&lt;strong&gt;your kind of judgement&lt;/strong&gt;."&lt;/p&gt;

&lt;h2&gt;
  
  
  Spec at the core of the Harness; drift reasons on the outer rings
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnn9b3u6qnykxy7htpb83.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnn9b3u6qnykxy7htpb83.png" alt="Harness layers: spec at the core, drift reasons outside" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent = Model + Harness&lt;/strong&gt; (Karpathy framing). SDD alone is not enough — you need to &lt;strong&gt;design the SDD outer rings&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;"Issue Driven Development (IDD)" pairs well with this. SDD = the spec is the truth. IDD = the drift reasons are the truth. Let them coexist.&lt;/p&gt;

&lt;h2&gt;
  
  
  Good AI = how much it lowers verification load
&lt;/h2&gt;

&lt;p&gt;In May 2026, on the Linux kernel 7.1 RC4 release, Linus Torvalds publicly declared the security mailing list &lt;strong&gt;"almost entirely unmanageable"&lt;/strong&gt; due to the flood of AI-generated vulnerability reports&lt;sup id="fnref1"&gt;1&lt;/sup&gt;. What was a stream of 2-3 reports per week two years ago has ballooned to &lt;strong&gt;5-10 reports per day&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Linus himself does &lt;strong&gt;not&lt;/strong&gt; dismiss AI in security work — he asks researchers to "&lt;strong&gt;understand the code and contribute a patch&lt;/strong&gt;," not just the alert. That's a miniature of AI-agent operations in general. &lt;strong&gt;The value of an AI is not output volume — it is how much it lowers the human's verification, correction, and review load.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A spec-only AI mass-produces plausible-looking output. It reads right, but it's drifted from the SoT and a human has to check every line to use it — the textbook case of &lt;strong&gt;"Slop"&lt;/strong&gt; (low-quality, generic, templated AI output). Only the AI fed the drift reasons becomes the kind that actually lowers human verification load.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Conclusion — accumulate, don't polish
&lt;/h2&gt;

&lt;p&gt;What sharpens an AI agent is no longer the model or the prompt. It is whether you can accumulate &lt;strong&gt;the gap between spec and Source of Truth, and the reasons for the drift&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Polish documents&lt;/strong&gt; (for humans / clients)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accumulate context&lt;/strong&gt; (for AI agents — keep the contradictions and wobbles)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Spec at the core of the Harness; layer "why it diverged" on the outside&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many organisations pour energy into "polishing the spec" because of the SDD boom. But the real differentiation lies elsewhere: &lt;strong&gt;not in polishing the spec, but in accumulating the gap with the SoT&lt;/strong&gt;. To stop building AIs that replay "past truth," stop polishing — start accumulating.&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;📌 &lt;strong&gt;Full version with interactive SVGs:&lt;/strong&gt; &lt;a href="https://okikusan-public.pages.dev/context-is-the-gap.en" rel="noopener noreferrer"&gt;https://okikusan-public.pages.dev/context-is-the-gap.en&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FIG.0 — THE GAP (spec vs SoT drift curve)&lt;/li&gt;
&lt;li&gt;FIG.1 — SPEC-ONLY VS SPEC + GAP (two AIs)&lt;/li&gt;
&lt;li&gt;FIG.2 — FIVE WHYS (the accumulating hub)&lt;/li&gt;
&lt;li&gt;FIG.3 — DOCUMENTS VS CONTEXT (polish vs accumulate)&lt;/li&gt;
&lt;li&gt;FIG.4 — HARNESS LAYERS (spec at the core, drift reasons on the outside)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If this resonates, &lt;strong&gt;a 🦄 / ❤️ / 💬 helps a lot.&lt;/strong&gt; Feedback welcome.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Related posts on my blog
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://okikusan-public.pages.dev/longtail-tacit-agent.en" rel="noopener noreferrer"&gt;AI agents enter the territory code can't write — long-tail × tacit knowledge × tacit thoughts&lt;/a&gt; — the philosophical premise of this post&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://okikusan-public.pages.dev/hermes-agent-second-brain-engine.en" rel="noopener noreferrer"&gt;Hermes Agent — execution engine for your Second Brain&lt;/a&gt; — a concrete Harness execution base&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://okikusan-public.pages.dev/ai-tasks-not-jobs.en" rel="noopener noreferrer"&gt;"Tasks, not jobs" — reading Microsoft Suleyman's 18-month forecast&lt;/a&gt; — Applied Engineer / FDE&lt;/li&gt;
&lt;/ul&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;The Register (2026-05-18): &lt;a href="https://www.theregister.com/security/2026/05/18/linus-torvalds-says-ai-powered-bug-hunters-have-made-linux-security-mailing-list-almost-entirely-unmanageable/" rel="noopener noreferrer"&gt;Linus Torvalds says AI-powered bug hunters have made Linux security mailing list 'almost entirely unmanageable'&lt;/a&gt; / Tom's Hardware (2026-05-18): &lt;a href="https://www.tomshardware.com/software/linux/linus-torvalds-says-ai-bug-reports-have-made-the-linux-security-mailing-list-almost-entirely-unmanageable" rel="noopener noreferrer"&gt;Linus Torvalds says flood of duplicate AI-generated vulnerability reports...&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>llm</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
