Forem: OKIKUSAN-PUBLIC

Turning Obsidian into AI's Own Memory — Local Cognitive OS with Hindsight and Hermes

OKIKUSAN-PUBLIC — Sat, 23 May 2026 04:09:26 +0000

📌 The full version (with FIG.5 interactive Hindsight loop simulator and 5 SVG figures) lives on my blog:

👉 https://okikusan-public.dev/obsidian-as-ai-memory.en

This dev.to post is the condensed version. The interactive simulator and the layered SVG visuals are only on the canonical page.

Introduction

A developer's offhand comment changed everything.

"I want Obsidian to be not just a note-taking tool, but the AI's long-term memory device itself."

This single sentence crystallized a fully local stack: Ollama + Hindsight + PostgreSQL + Obsidian. Not just a combination of tools, but a new layer where the infrastructure audits itself and humans and AI mutually extend each other's cognition.

Four prerequisite terms

This article assumes the following background. If any of these are unfamiliar, skim here first.

Hermes Agent — Nous Research's OSS autonomous AI agent. Runs resident as CLI / gateway and handles dialogue with the user.
daily-chats/ — A folder inside the Obsidian Vault. Every Hermes session end auto-exports that day's raw conversation log (questions, responses, tangents, hesitations, code snippets — all unedited) as markdown. The entry point of AI's long-term memory.
knowledge/ and MOC — A separate Vault folder. Summaries and insights extracted from daily-chats/ get organized into themed MOCs (Map of Content — index notes that bundle related notes). The destination layer.
Hindsight — Hermes's memory mechanism. Periodically scans daily-chats/, summarizes via Ollama, persists into PostgreSQL, and accumulates summaries of summaries in a self-referential engine. The protagonist of this article.

TL;DR

Real-world benchmark with Gemma3 on Ollama: 23.4 tokens/sec — not "tolerable," but "daily usable."
Hindsight repeatedly summarizes past summaries, forming a self-referential feedback loop. Infrastructure self-auditing begins to run entirely locally.
Obsidian's daily-chats/ gets redefined as AI's primary persistent memory device.
The loop that gradually converts "hesitation, hypothesis, discomfort" into explicit knowledge is starting to function as a cognitive OS.

What 23.4 tokens/sec actually means for "being local"

Gemma3 on Ollama benchmarks at 23.4 tokens/sec. In an era of cloud dependency, this is not "tolerable speed" but "daily usable speed."

What matters is not the speed itself. It's the fact that every process completes locally. Raw conversation logs accumulated in daily-chats/ are instantly summarized by the LLM, structured by Hindsight, and persisted into PostgreSQL. No external APIs participate at all.

The entire history of thought stays closed inside one's own machine. This is more than privacy. It's an experiment in how deeply AI can understand context under the constraint "never leak externally."

Hindsight's self-referential infrastructure

Conventional RAG stops at "retrieve and answer." Hindsight is different. It repeatedly summarizes past summaries, accumulating meta-metadata from summaries of summaries. The vector and metadata layers etched into PostgreSQL gradually self-organize over time.

This is "infrastructure self-auditing":

Does today's daily-chat contradict last week's summary?
How have the themes the user repeatedly touches evolved over the long term?
To what extent do AI-generated summaries distort the original context?

The foundation is in place for the AI itself to periodically verify these. No external human reviewer needed. The system relativizes itself and proposes corrections — a loop already turning, entirely locally.

FIG.2 — User ⇄ Hermes dialogue flows into daily-chats/, summarized via Ollama, promoted into MOC, and returns to Hermes as context.

"Making Obsidian the memory" — the ultimate human-AI co-creation

The developer didn't stop at treating Obsidian as a "second brain." They redefined it as "the primary memory device for AI." daily-chats/ is no longer a graveyard of miscellaneous logs. Every piece of text accumulated there is structured through Hindsight and functions as AI's long-term memory.

This is not a relationship where humans ask AI to "remember this for me."
It's a relationship where humans design the environment itself: "I'll write it here, so you go ahead and structure it on your own."

AI only becomes intelligent within the context it is given. The core of this pipeline lies in the reversed idea — humans prepare the field that transforms context into "persistent, searchable, and self-referential memory."

The loop that turns tacit into explicit

What gets written in daily-chats/ is not polished ideas, but "hesitation, hypotheses, discomfort, wavering judgment." Hindsight gradually converts that ambiguity into explicit knowledge. Eventually, Obsidian's knowledge/ layer becomes not just a collection of MOCs, but a "knowledge graph that AI itself is cultivating."

When that happens, what does the developer witness?

The moment a discomfort felt in the past is rediscovered by AI
The moment a rough note suddenly gains meaning in a different context
The moment the infrastructure quietly points out: "This theme contradicts what it was three months ago"

Everything local, everything persistent, everything self-referential — such a state may already be moving at hand.

FIG.3 — The left side (doubt, hypothesis, discomfort) gets passed into daily-chats/. Hindsight accumulates summaries of summaries, growing the right-side knowledge/ MOCs over time.

This is not about tools — it's about a cognitive OS

Ollama + Hindsight + PostgreSQL + Obsidian.

What makes this combination special is not that any one piece is superior. It's because the circuit that converts the fluidity of human thought into a form AI can handle has finally closed.

The moment the developer decided to "make Obsidian the memory itself," technology transcended mere means. AI and humans have begun, without interference from anyone, to build with their own hands an OS for becoming wise together while compensating for each other's weaknesses.

Every layer is in your own hands. That's why this deserves the name "cognitive OS."

FIG.4 — Human → Hermes → Hindsight → Obsidian + PostgreSQL → Ollama → local machine. Every layer in your own hands.

This loop has only just begun.

📌 The full version with the FIG.5 interactive Hindsight loop simulator (click to iterate the recursion 1 → 2 → 3 and watch the knowledge graph grow) is on the original:

👉 https://okikusan-public.dev/obsidian-as-ai-memory.en

How are you connecting your notes to AI memory? Would love to hear how the "summary of summary" idea lands in your own setup. 🦄 💬

From editor to agent management — Google Antigravity 2.0 marks the arrival of the Agent OS

OKIKUSAN-PUBLIC — Wed, 20 May 2026 12:44:32 +0000

This piece is mirrored from my blog. Canonical:
https://okikusan-public.pages.dev/antigravity-agent-os.en

Antigravity 2.0 is not an AI-IDE update. It is the moment the centre of gravity in developer experience shifts from "the editor" to "agent management." The Desktop / CLI / SDK / integration funnel together stop being a "specialist worker" like Claude Code / Codex / Grok Build, and start looking like an Agent OS.

The old axis — "which model is smarter" — is no longer enough. Harness design, permission boundaries, context, scheduled execution, and human review — these five decide developer productivity now. The next battlefield of AI coding, laid out.

▍ SOURCES

▍ TERMS — definitions and premises

Agent harness — the runtime that wraps a model. By Karpathy's definition, Agent = Model + Harness. Concretely it binds:

System prompts / role definitions
Tools (file ops, Bash, web fetch, MCP external calls, etc.)
Memory / state
Permissions and guardrails
Feedback loop (retries, verification, sub-agent spawn)

Real examples:

Claude Code's harness = CLI agent loop + tool set + project permissions
Cursor's harness = editor integration + Apply machinery + codebase index
Antigravity's harness = local app server + runtime + Skill pack attachment

Same model + different harness → drastically different behaviour. The harness is the part that determines how the agent actually acts.

Agent OS layer / Specialist worker layer

Agent OS layer = shares one harness across multiple UIs, permissions, scheduler, agent orchestration (Antigravity / Hermes / Copilot Studio)
Specialist worker layer = invoked to do the work (Claude Code / Codex CLI / Grok Build)

"Agent OS" is not Google's official term — community framing and this article's editorial lens.

Subagent — a child agent spawned dynamically by a parent. Antigravity 2.0's launch demo built an OS with 93 parallel subagents.

Skill — pluggable capability pack you attach to an agent. Android Skills / Firebase Skills add a specific domain's APIs and conventions to the harness.

App server — shared local backend inside the Antigravity install. Both Desktop UI and CLI binary call this same app server.

The five comparison axes — (1) harness design / (2) permission boundaries / (3) context / (4) scheduled execution / (5) human review.

TL;DR

Antigravity 2.0 is Desktop / CLI / SDK / AI Studio×Android×Firebase integration — not a bundle of scattered features, but one agent harness shared across four UIs.
Claude Code / Codex / Grok Build sit at the specialist worker layer; Antigravity 2.0 sits at the Agent OS layer binding them. A "VS" framing collapses across layers.
The axes that matter: (1) which harness / (2) what permission boundary / (3) what context / (4) what scheduled execution / (5) what human review.
For individuals: pair Hermes (OSS) with Antigravity-native. For enterprises: Copilot Studio / Workspace Studio / Antigravity, cross-cutting selection. Editor-only comparison is now a generation behind.

§ 01 SHIFT — from editor to agent management

Two years of AI coding tools have been narrated through the editor: Copilot, Cursor, Claude Code, Codex CLI, Grok Build. They all evolved on the premise of "AI writes code inside the editor."

Antigravity 2.0 breaks that frame. This is not an AI-IDE update. It assembles Desktop / CLI / SDK / AI Studio integration all at once, and what it produces is a platform for managing agents — a single agent harness shared across four UIs.

§ 02 PILLARS — Desktop / CLI / SDK / integration funnel

2-1 Desktop app — the command center

A command bridge for running many agents in parallel. Dynamic subagents (spawn and retire children on the fly), scheduled tasks (cron-style runs), and per-project permission scopes. The feel shifts from "one task in one editor" to "many tasks running at once, all in view."

2-2 Antigravity CLI — different UI, same harness

Successor to Gemini CLI. A lightweight UI for terminal people, but the key is that it shares the same agent harness as Desktop. The CLI isn't a separate product — it's a different interface to the same base.

▍ "Sharing the same harness" — what it actually means
Not two competing apps. A single local install bundles Desktop UI / CLI binary / a shared app server (the agent harness itself). Per @karthickdotxyz: "Same tools and app server as Antigravity 2.0."

No need to run both at once — Desktop or CLI, either path completes

Configs, agent definitions, permissions, scheduled tasks are all shared — a job composed in Desktop can be invoked from CLI as-is

Natural split: CI / headless server work via CLI, interactive development via Desktop

2-3 SDK — embed the harness into your own product

Google's agent harness is now something you embed into your own workflow or product. This stops being "a tool that makes AI write code" and starts being "a platform for building and operating AI agents." SDK code runs on your own PC, your servers, your CI runners — Google doesn't host the runtime; it lives inside your process. Antigravity could become a component that runs inside other companies' products, not just Google's IDE.

▍ CLI vs SDK — what's the actual difference?
Both run on local machines or on servers. The real distinction is the primary use case each is designed for:

CLI = an interactive front designed for a human (or shell script) to drive an agent directly

SDK = a library designed for your program to drive the agent via function calls

Strictly, you can also call the CLI from a program by shelling out. But shelling out comes with costs: (a) process startup overhead / (b) brittle text-output parsing / (c) no types / (d) streaming and structured events are awkward. The SDK assumes that use case from the start. Same shape as AWS CLI vs boto3.

2-4 AI Studio × Android × Firebase integration

Less "three products wired together at the UI layer," more "Antigravity sits in the middle as the harness, with AI Studio (entry) and Android / Firebase (exit) bolted on via a shared harness and Skills."

AI Studio → Antigravity ("Export to Antigravity"): AI Studio Build now runs on the same agent harness as Antigravity. A dedicated Export to Antigravity button hands off the full agent conversation (chat history, configuration, generated code) into the local Antigravity environment.
Antigravity → Android: Equip the agent with the official Android Skills — Android SDK / Gradle / manifests become part of the agent's context. Going further, the studio command in Android CLI 1.0 lets the agent connect to a running Android Studio instance and borrow its deep codebase understanding.
Antigravity → Firebase: Firebase Skills teach the agent Firestore / Functions / Hosting / Auth conventions, configuration and deployment included.

So Google's "vertical integration" play is re-engineered not at the UI layer but at the harness and Skill (attachable capability packs) layer.

Official launch video: https://www.youtube.com/watch?v=6C0FjHoN3qE

§ 03 LAYER — not a "specialist worker"

▍ Terminology note on "Agent OS"
The phrase "Agent OS" used throughout this piece is not Google's official terminology. Right after launch, @grok called it "the emerging Agent OS category" and @arsh_goyal framed it as a "centralized Agent Manager." This article borrows that framing to describe a structural pattern: a single harness shared across multiple UIs, with permissions, scheduling, and sub-agent orchestration unified at one layer.

Lining up Antigravity 2.0 with Claude Code / Codex CLI / Grok Build and asking "which one's best" misses the point. They live at different layers.

3-1 Specialist worker layer vs Agent OS layer

	Specialist worker layer	Agent OS layer
Role	Called to do the work	Orchestrates and supervises
Examples	Claude Code / Codex CLI / Grok Build / Cursor Agent	Hermes (OSS) / Microsoft Copilot Studio / Google Antigravity 2.0
Strengths	Instant reasoning, code generation, file ops	Multiple UIs / parallel execution / permissions / shared harness

3-2 "VS" framing breaks across layers

"Antigravity 2.0 vs Claude Code" is a layer violation. As Antigravity expands via the SDK inside other companies' products, the natural composition becomes "Antigravity-on-top, calling Claude Code / Codex CLI / Grok Build as workers." The right peers to compare with are Hermes / Copilot Studio — same Agent OS layer.

▍ Direct comparison with Hermes / Copilot Studio

Hermes = OSS / individual-tilted / multi-model / 22 gateways / Obsidian integration / domain-agnostic (works outside coding too)

Microsoft Copilot Studio = M365 territory / enterprise permissions / Power Platform integration / business-workflow focused

Google Antigravity 2.0 = Google-native / AI Studio × Android × Firebase vertical integration / software-engineering specialised

The scope difference matters: sitting at the same Agent OS layer does not mean the same role. Hermes is a domain-agnostic general harness; Copilot Studio is for business workflows; Antigravity is purpose-built for software-engineering work.

§ 04 AXIS — renewing the comparison axes

Re-reading §02's four pillars as a structure, each contains a design choice that the old "model IQ axis" cannot capture:

Desktop's dynamic subagents + scheduled tasks → which harness, when to fire automatically
CLI sharing the app server with Desktop → same harness called from different UIs
SDK running in your own process → what permission boundary, what environment
AI Studio / Android / Firebase Skills → what context to feed the agent
The agentic IDE's review surface → how humans review

So if you unpack the structure of Antigravity 2.0 honestly, the five axes — harness / permission boundary / context / timing / review — surface naturally.

▍ This is my own view — why these 5 axes
The five design axes are not a standard framework published by Google, IDC, Forrester, or anyone else — they are my editorial synthesis of what I think actually matters. My reasoning for picking these specific 5:

harness — with model IQ commoditising, harness design determines actual behaviour

permission boundary — when agents act autonomously, permission scope decides blast radius

context — same model + same IQ produces wildly different output depending on context given

timing — manual / hook / cron agents are different beasts. Antigravity 2.0 making scheduled tasks first-class is evidence this matters

review — human-in-the-loop verification load is the productivity bottleneck

Individual terms have currency across the industry, but bundling these 5 as "the evaluation axes that matter" is my judgement.

Side-by-side, what changes per axis between the editor era and the Agent OS era:

Axis	Editor era (~2025)	Agent OS era (2026 →)
harness	Editor's completion speed / UX	Which tools, memory, permissions, feedback loops you wrap around the model
permission	Largely a non-question — manual control	Autonomous agents → Project / User / Agent-scoped permissions define blast radius
context	Context window size (a "quantity" axis)	What you pull in and hand the agent (a "quality" axis, plus Skill packs)
timing	Completion as the human is typing	Manual / hook / cron / scheduled — async / parallel / 24-7 included
review	Humans read before and after writing code	How far do you trust auto-executed agent output, and where does a human gate it?

Footnote: the old axis "which model is smartest" no longer stands on its own — same model with different harness and context produces wildly different output.

These five decide developer productivity itself. Smarter models with sloppy harnesses just automate Slop.

§ 05 BATTLEFIELD — the next battlefield is not inside the editor

The next battlefield is no longer inside the editor — it is in Agent OS orchestration design.

2023: Prompt engineering (single model calls)
2024: Context assembly (RAG + memory)
2025: AI in the editor (Copilot + inline suggestions)
2026+: Orchestrate agents (Agent OS) — compose / supervise / continuous execution

▍ VOICES — use cases that surfaced in the first 48 hours

Snapshots from X in the 48 hours since launch. The chatter is visibly shifting from "a specialist worker writes code" toward "a fleet of agents gets orchestrated."

@andreasawires: "93 parallel sub-agents, 12 hours, 15K+ model requests, 2.6 billion tokens, under $1K in API credits" — built a full OS from scratch
@andyzhang (Antigravity team): "Antigravity 2.0, a desktop application to manage all of your agents" — official launch post
@mirrokni (Vahab Mirrokni, Google): Antigravity /teamwork agents recreated the AlphaZero paper end-to-end (RL + TPU + Web app)
@SHT4BHARAT: "We are officially out of the chatbot era and deep into production-scale autonomous workflows"
@karthickdotxyz: Antigravity CLI official launch — Go binary, async workflows, "Same tools and app server as Antigravity 2.0"
@neuecc (Yoshifumi Kawai): on adopting Antigravity 2.0 + IDE separated workflow over Cursor 3 for complex projects
@gptzone_net: "Antigravity 2.0 no se debería evaluar como un autocompletado más agresivo. Se debería evaluar como un cambio de workflow"
@BeamManP: Gemini 3.5 Flash can now do structural music analysis — same engine that powers Antigravity

§ 06 FIT — how individuals and enterprises choose

▍ This section is opinion
The "if you're an individual, pick X / if you're an org, pick Y" framing in this section is my personal recommendation, not an official guide from Google / Microsoft / Nous Research.

Facts: Hermes = OSS / multi-model / Obsidian integration / domain-agnostic. Antigravity 2.0 = vertically integrated into Google's ecosystem / software-engineering specialised. Microsoft Copilot Studio = M365 territory / business-workflow focused.

Opinion: "Individuals should pair Hermes with Antigravity" is what I think makes sense — others reaching the opposite conclusion ("standardise on Copilot Studio for simpler ops" / "go all-OSS on Hermes for transparency") would be perfectly reasonable.

Individual vs team/org evaluation per the 5 axes of §04:

Axis	Individual use	Team / org use
harness	Each developer installs their preferred harness locally	Standardise one harness org-wide / custom harness embedded via Antigravity SDK in internal products
permission	Read/write on personal Mac / personal repos	Hierarchical Project / User / Role ACL, segregation across internal systems
context	Personal Obsidian / personal git / personal Slack DMs	Internal Confluence / shared GitHub org / company Slack / Linear・Jira / on-call runbooks
timing	Interactive on personal PC or light personal cron	Shared server / CI runner / 24-7 environment — embed agents via SDK
review	Personal self-check	Code review / PR approval flow / audit logs / compliance checks

Footnote: "harness" itself is neutral. Team-readiness depends on the implementation — Claude Code / Hermes / Cursor assume "personal local"; Antigravity 2.0 covers both with Project permissions + SDK + Enterprise framing; Copilot Studio is "team / business" from the start.

▍ For team-shared harness, SDK is the only path
Antigravity's standalone Desktop / CLI structure assumes "a local app server on a personal machine." Trying to share that harness across a team — git-syncing the config doesn't help, the instances are still separate.

If you want a team to share one harness, the only realistic path is embedding the SDK in an internal backend / shared server / CI runner, so multiple users hit a single harness instance you operate yourself. Google does not offer a hosted "Managed harness," so "team harness as a Service" has to be built in-house as of today.

6-1 Individuals — Hermes and Antigravity in combination

OSS-leaning individuals still get great value from Hermes. Obsidian integration, multi-model routing, the gateway pack (Telegram / Discord / LINE / Slack) are Hermes-only as of now.

Google-ecosystem individuals get the most from Antigravity. Prototype in AI Studio, hand it to Antigravity Desktop, ship to Firebase — that vertical funnel Hermes can't reproduce.

The practical answer is both at once. Use Hermes to orchestrate the personal tacit-thought pool, and use Antigravity for heavy lifting inside the Google ecosystem.

6-2 Enterprises — three Agent OSes across contexts

For enterprises, the picture is three Agent OSes mapped to three contexts:

Business context = Microsoft Copilot Studio (M365 / expense / calendar / doc review)
Dev context = Google Antigravity 2.0 (coding / Android / Firebase / GCP)
Personal context = Hermes-class OSS (personal tacit-thought pools / Obsidian / custom gateways)

Centring procurement on a standalone editor (Claude Code / Codex CLI / Cursor) is now a generation behind.

✦ Summary

Claude Code / Codex / Grok Build are specialist workers. Antigravity 2.0 / Hermes / Copilot Studio / Workspace Studio are Agent OSes. A "VS" framing across these layers no longer holds. The conversation must shift to within-layer comparison and across-layer composition.

The axes that count are harness design / permission boundaries / context / scheduled execution / human review. "Which model is smarter" no longer covers it.

For individuals: pair Hermes with Antigravity. For enterprises: use Copilot Studio, Workspace Studio, and Antigravity across business / dev / personal axes. Productivity now lives in Agent OS design, not inside the editor.

Full canonical version (with interactive figures):
https://okikusan-public.pages.dev/antigravity-agent-os.en

Don't build an AI that replays yesterday's spec — the gap between spec and source of truth is the real context

OKIKUSAN-PUBLIC — Tue, 19 May 2026 22:45:49 +0000

📌 The full version (with interactive SVG figures, the drift curve, the five-whys hub, the document-vs-context split, and the Harness concentric layers) is hosted on my blog:

👉 https://okikusan-public.pages.dev/context-is-the-gap.en

This dev.to post is the condensed version. The visualisations live on the original.

Introduction

More and more often, an AI agent's accuracy is decided by its context, not its prompting.

But "context" here is not a polished spec. What really moves the needle is the gap between the spec and the source of truth, and the reasons behind the drift.

An AI fed only the spec replays "past truth." Feed it the drift reasons too, and it approaches "today's truth." The blind spot of Spec-Driven Development and the real core of Harness Engineering, laid out.

TL;DR

The frontier of AI-agent accuracy has shifted: model → prompt → context
If you mistake "context" for a polished spec, the AI just replays "past truth" — specs drift further from the Source of Truth (running code, ops, field judgement) the longer time passes
What actually works is the reasons for the drift. Five whys — why the spec was changed, why an exception was allowed, why the implementation compromised, why the issue went the way it did, why the review came out that way — decide the quality of the AI's output
Documents are polished; context is accumulated. Put the spec at the core of the Harness, and layer the drift reasons around it

Spec vs Source of Truth — the gap is inevitable

The spec describes what should be. A snapshot of agreement at a moment, internally coherent, neatly polished.

As implementation and operations evolve, the actual "truth" drifts elsewhere:

The running code — hard-coded values, exception handlers, commented-out branches
The DB schema and the live data — migration history, unexpected records, exceptional values
The actual API behaviour — undocumented responses, unofficial endpoints
Customer-side operating decisions — approval routes never written down, tacit exceptions
Field judgement — choices an operator made on the spot

These are the Source of Truth (SoT). The spec inevitably drifts away from the SoT over time. This is not laziness — it's structural.

The problem is not that the gap exists. It's that the gap is never explained.

An AI fed only the spec replays "past truth"

Typical failures:

"The spec says X is correct, but the code shows Y." → The AI trusts the spec, returns X, and drifts from reality
"The spec has no exception handling, so edge cases can be ignored." → Operationally impossible — a misjudgement
"I implemented per the latest API docs." → The unofficial operating rules get missed

This is not the AI's fault. The context you fed it is frozen at a point in time, and the AI is faithful to that point. The cleaner the spec, the more confidently the AI quotes "past truth."

Reverse-engineering alone is not enough either. Code reveals "what is implemented and how," but never "why it became that."

Five whys to accumulate — that's strong context

#	What to keep	Where it lives
01	Why was the spec changed?	Change log / meeting notes / Slack
02	Why was the exception allowed?	Ops decision log / case-by-case memos
03	Why was the implementation compromised?	Code comments / PR comments
04	Why was the issue argued this way?	Issues / discussion
05	Why did the review come out this way?	PR review comments

Keeping these "whys" is exactly the Externalisation step in Nonaka's SECI model. The twist: you're externalising the process, not the conclusion. That's how judgement patterns become reproducible in other contexts.

Documents are polished; context is accumulated

	Documents	Context
Target	Humans / clients	AI agents
Nature	Coherence, consistency, polish	Judgement material, contradictions, wobbles
Examples	Proposals / final specs / articles / manuals	Issues / PR reviews / ops notes / failure logs / rough notes
Verb	Polish	Accumulate

Tolerating contradiction is the core. If you treat context as a "thinking process," contradictions are natural. Human judgement wobbles constantly; organisational decisions get overwritten. Whether you can keep that without sanding it down decides whether your AI agent can reproduce "your kind of judgement."

Spec at the core of the Harness; drift reasons on the outer rings

Agent = Model + Harness (Karpathy framing). SDD alone is not enough — you need to design the SDD outer rings.

"Issue Driven Development (IDD)" pairs well with this. SDD = the spec is the truth. IDD = the drift reasons are the truth. Let them coexist.

Good AI = how much it lowers verification load

In May 2026, on the Linux kernel 7.1 RC4 release, Linus Torvalds publicly declared the security mailing list "almost entirely unmanageable" due to the flood of AI-generated vulnerability reports¹. What was a stream of 2-3 reports per week two years ago has ballooned to 5-10 reports per day.

Linus himself does not dismiss AI in security work — he asks researchers to "understand the code and contribute a patch," not just the alert. That's a miniature of AI-agent operations in general. The value of an AI is not output volume — it is how much it lowers the human's verification, correction, and review load.

A spec-only AI mass-produces plausible-looking output. It reads right, but it's drifted from the SoT and a human has to check every line to use it — the textbook case of "Slop" (low-quality, generic, templated AI output). Only the AI fed the drift reasons becomes the kind that actually lowers human verification load.

Conclusion — accumulate, don't polish

What sharpens an AI agent is no longer the model or the prompt. It is whether you can accumulate the gap between spec and Source of Truth, and the reasons for the drift.

Polish documents (for humans / clients)
Accumulate context (for AI agents — keep the contradictions and wobbles)
Spec at the core of the Harness; layer "why it diverged" on the outside

Many organisations pour energy into "polishing the spec" because of the SDD boom. But the real differentiation lies elsewhere: not in polishing the spec, but in accumulating the gap with the SoT. To stop building AIs that replay "past truth," stop polishing — start accumulating.

📌 Full version with interactive SVGs: https://okikusan-public.pages.dev/context-is-the-gap.en

FIG.0 — THE GAP (spec vs SoT drift curve)

FIG.1 — SPEC-ONLY VS SPEC + GAP (two AIs)

FIG.2 — FIVE WHYS (the accumulating hub)

FIG.3 — DOCUMENTS VS CONTEXT (polish vs accumulate)

FIG.4 — HARNESS LAYERS (spec at the core, drift reasons on the outside)

If this resonates, a 🦄 / ❤️ / 💬 helps a lot. Feedback welcome.

Forem: OKIKUSAN-PUBLIC

Turning Obsidian into AI's Own Memory — Local Cognitive OS with Hindsight and Hermes

Introduction

Four prerequisite terms

TL;DR

What 23.4 tokens/sec actually means for "being local"

Hindsight's self-referential infrastructure

"Making Obsidian the memory" — the ultimate human-AI co-creation

The loop that turns tacit into explicit

This is not about tools — it's about a cognitive OS

From editor to agent management — Google Antigravity 2.0 marks the arrival of the Agent OS

▍ SOURCES

▍ TERMS — definitions and premises

TL;DR

§ 01 SHIFT — from editor to agent management

§ 02 PILLARS — Desktop / CLI / SDK / integration funnel

2-1 Desktop app — the command center

2-2 Antigravity CLI — different UI, same harness

2-3 SDK — embed the harness into your own product

2-4 AI Studio × Android × Firebase integration

§ 03 LAYER — not a "specialist worker"

3-1 Specialist worker layer vs Agent OS layer

3-2 "VS" framing breaks across layers

§ 04 AXIS — renewing the comparison axes

§ 05 BATTLEFIELD — the next battlefield is not inside the editor

▍ VOICES — use cases that surfaced in the first 48 hours

§ 06 FIT — how individuals and enterprises choose

6-1 Individuals — Hermes and Antigravity in combination

6-2 Enterprises — three Agent OSes across contexts

✦ Summary

Don't build an AI that replays yesterday's spec — the gap between spec and source of truth is the real context

Introduction

TL;DR

Spec vs Source of Truth — the gap is inevitable

An AI fed only the spec replays "past truth"

Five whys to accumulate — that's strong context

Documents are polished; context is accumulated

Spec at the core of the Harness; drift reasons on the outer rings

Good AI = how much it lowers verification load

Conclusion — accumulate, don't polish

Related posts on my blog