Forem: Dorothy J Aubrey

/insights, or Why the Analytics Developer Had No Analytics

Dorothy J Aubrey — Thu, 19 Mar 2026 20:08:33 +0000

TL;DR: You wouldn't run infrastructure without monitoring. You shouldn't run AI collaboration without it either. Claude Code's /insights command gives you a usage dashboard for your coding sessions -- token costs, tool patterns, session behavior. What it reveals about how you actually work with AI is more interesting than the numbers themselves.

Who this is for: Anyone using AI coding assistants who wants to understand their own collaboration patterns -- not just ship features. Works with any tool that exposes usage data, but the examples here use Claude Code.

I've spent the last year building the operational infrastructure for AI-assisted development -- documentation systems, automated session handoffs, quality enforcement, custom agents. All of it focused on making AI collaboration reliable and repeatable across sessions and projects, but none of it focused on measuring whether it actually does. This from me, the agile evangelista who murmurs "outcomes over outputs" while brushing her teeth every morning.

That's like building a SaaS analytics dashboard and forgetting to add analytics for yourself -- which, having spent the last month wiring up data sources for a monitoring product, is an irony I refuse to let pass without comment.

The cobbler's children had no shoes. The analytics developer had no analytics.

The Problem

I've been running Claude Code as a full-fledged member of my engineering workflow for months. Custom agents handle ticket management and code review. Hooks auto-generate session handoffs. Documentation loads at session start so Claude hits the ground running.

All of it works. I know because the code ships, the quality holds, and sessions start productive instead of cold.

But here's what I didn't know: how am I actually using all of this?

How much of a session is spent on feature work versus housekeeping? How often do custom agents get invoked? Which tools dominate? What does my token spend look like when I'm not paying attention?

I had no idea. I was flying the plane without looking at the instruments.

What /insights Does

Claude Code has a built-in command I'd been ignoring for months: /insights. It generates a usage report covering your recent Claude Code activity -- sessions, tokens, tools, costs, and patterns.

I ran it expecting a dry table of numbers. What I got was a mirror.

What the Numbers Reveal

The report breaks down several dimensions of your AI collaboration. The specific metrics evolve as the feature matures, but the categories that matter are consistent.

Token Usage

How many tokens you're consuming per session, split between input (what Claude reads) and output (what Claude generates). This is your cost driver and your attention map.

What surprised me: My input tokens were consistently 4-5x my output tokens. That ratio tells a story -- Claude is reading far more than it's writing. Context loading, file scanning, documentation parsing, handoff ingestion. The "productive" output is a fraction of the total work.

If you're optimizing for cost, the answer isn't "make Claude write less." The answer is "make sure what Claude reads is worth reading." Every bloated file, every redundant doc section, every file Claude has to re-read because the first pass wasn't clear enough -- that's showing up in your input token count.

Tool Usage

Which tools Claude reaches for most -- file reads, file edits, searches, bash commands, web fetches. This is your workflow fingerprint, and it's different for everyone.

What surprised me: File reads dominated everything by a wide margin. The ratio was roughly 5:1 reads to edits on a heavy coding day. Claude spends most of its tool budget understanding the codebase, not changing it.

If Claude is reading your codebase constantly -- and the numbers prove it is -- then every minute you spend making code scannable saves multiples in token spend and session efficiency. Clear function names, logical file organization, and well-placed comments aren't just good practice. They're measurable cost optimization.

Session Patterns

How long sessions last, how many turns they take, when compaction happens.

What surprised me: My most productive sessions were not the longest ones. The sessions where the most code shipped were in the 45-90 minute range -- long enough to build something significant, short enough that context stayed fresh. The 3+ hour marathons had more compactions, more re-orientation, and more wasted tokens recovering context that had been compressed away.

Session handoffs at natural stopping points aren't just good practice. They're cost optimization. A clean handoff at 90 minutes beats a compaction at 180 minutes in both token efficiency and output quality.

The Meta-Realization

I spent the last month building an analytics dashboard -- wiring up RevenueCat, Google Analytics, Meta Ads, OneSignal, and a dozen other data sources. The whole point of the product is giving app developers visibility into what's actually happening with their business. Revenue, costs, user behavior. All in one place.

And I was doing this while running my AI collaboration completely blind.

There's a particular kind of professional humiliation that comes from building a monitoring tool while having zero monitoring yourself. I'm choosing to call it "quality motivation" instead of "embarrassing."

/insights is that same idea applied to your AI workflow -- visibility into how you actually work, not how you think you work.

What I Changed After Looking at the Data

1. Shorter Sessions, More Handoffs

The data showed diminishing returns past 90 minutes. Not a cliff -- more of a slow leak. Tokens per useful output started climbing as compaction kicked in and context degraded. So I tightened up: deliberate stopping points, automatic handoff generation, and a fresh session for the next chunk of work.

I already knew this in theory, but seeing the numbers made it real.

2. Documentation as Token Optimization

When Claude reads 5 files for every file it edits, the return on clear code is higher than you'd think. I started paying more attention to function naming, file organization, and comments in hot paths where Claude kept re-reading the same functions across sessions.

This isn't about writing more documentation. It's about writing documentation that reduces re-reads. A clear function name saves a file read. A well-placed comment prevents Claude from scanning three other files for context.

3. Agent Cost Visibility

Running /insights after sessions that used custom agents showed context isolation in action. Each agent -- retro, Jira, cross-repo -- had its own token footprint, separate from the main session. The main session stayed lean because the specialized work had its own context budgets.

Custom agents don't just organize your workflow. They optimize your token spend. And /insights is how you prove it.

The Broader Point

Here's what matters about /insights beyond any specific metric: it turns AI collaboration from a feeling into a measurement.

Before /insights	After /insights
"That was a productive session" (was it?)	"That session used X tokens across Y tool calls"
"This is getting expensive" (is it?)	"My daily cost is $A, split between sessions and agents"
"Claude is great at this" (compared to what?)	"Feature sessions average M turns; debugging averages N"

Most people using AI coding assistants are optimizing based on vibes. That's fine when you're starting out. It's less fine when you're running production workflows across multiple projects with custom agents and automated hooks.

At some point, you need data.

How to Use /insights Effectively

Run it weekly. Don't obsess over per-session metrics. Look for trends: is your token spend creeping up? Are sessions getting longer without getting more productive? Is one tool dominating in ways that suggest inefficiency?

Compare session types. Feature development sessions should look different from debugging sessions. If they don't -- if debugging sessions use the same tool mix as feature work -- that might indicate you're not giving Claude enough context to debug efficiently.

Check agent balance. If you're using custom agents, /insights shows whether they're being invoked appropriately. An agent that never fires might have a bad description. An agent that fires constantly might be doing work that belongs in your base documentation instead.

Establish team baselines. If your team is using Claude Code, comparing insights across developers reveals workflow differences that would otherwise stay invisible. One developer burning tokens on excessive file reads often points to unclear code or missing documentation -- not a personal workflow problem. Another developer's short, focused sessions with clean handoff patterns become a template, not a mystery. The data turns individual habits into organizational learning.

Common Objections

"I don't care about token costs"

It's not about cost. Token usage is a proxy for where Claude spends its attention. High read-to-edit ratios might mean your code is hard to parse. Long sessions with many compactions might mean your handoff discipline has slipped. The metrics are a lens, not a bill.

"This seems like navel-gazing"

If your workflow is productive and you're happy with it, you don't need to analyze it. But if you've ever wondered "why did that session feel unproductive?" or "why is this costing more than I expected?" -- the data answers those questions in seconds instead of speculation.

"The numbers change too much to be useful"

A single session's numbers are noise. A week of sessions is signal. A month is a trend. Look for patterns, not absolutes.

"I'd rather just code"

Running /insights takes 10 seconds. Reviewing the output takes 2 minutes. If those 2 minutes reveal a pattern that saves you 20 minutes of wasted context-rebuilding per session, the math works out in your favor by Friday.

The Core Insight

You can't improve what you can't see.

Most of us jumped into AI-assisted development headfirst and never looked back to see how we were actually using it. /insights is the rearview mirror -- it shows where the collaboration is working, where it's leaking, and where to tighten up based on data instead of gut feeling.

Monitor your code. Monitor your infrastructure. Monitor your AI collaboration. The pattern is the same everywhere.

The dashboard developer finally checked her own dashboard.

This article was written collaboratively with Claude, as part of the AI collaboration system it describes measuring. The graphics are entirely Gemini; I could never approach something this great looking using only stick figures and dog paws.🐾

Resources

Claude Code: claude.com/claude-code
Mother CLAUDE (the documentation system referenced above): github.com/Kobumura/mother-claude

Licensed under CC BY 4.0. Free to use and adapt with attribution to Dorothy J. Aubrey.

Mother CLAUDE: Custom Agents, or How We Accidentally Built a Team

Dorothy J Aubrey — Mon, 23 Feb 2026 00:28:37 +0000

TL;DR: Custom agents let you split one generalist AI into a team of specialists—each with its own context, tools, and personality. Every major AI coding tool now supports them. And if you've been building a documentation system like the one in this series, you already have the blueprints for your agents. You just didn't know it yet.

Who this is for: Anyone who's gotten comfortable with AI coding assistants and is ready for the next step—giving them specialized roles instead of asking one AI to do everything.

Part 7 of the Designing AI Teammates series. Yes, Part 7. I know Part 6 was literally called "Clean Your Room and Eat Your Vegetables" and had a whole "thank you for reading all six parts" sign-off. I put a bow on it. I said "now go eat your vegetables." I was done. I'd been saving that line since the first article — six parts of setup for one perfect closing metaphor.

And then—like every project I've ever shipped—I immediately thought of One. More. Thing. (The last. I swear. For now.)

In my defense, this one isn't scope creep. This is the post-credits scene where Nick Fury shows up and says "I'd like to talk to you about the Avengers Initiative." Except instead of assembling superheroes, we're assembling... markdown files with YAML frontmatter. Which is very on brand for this series, and so much more exciting, yes?

Here's what happened: we'd been aware of custom agents for a while—had even set up a few simple ones. Claude Code introduced them around mid-2025, and by now every major tool has some version. But I was thinking about them as a feature to configure, not as something we'd already built.

Then one day I was staring at our Mother CLAUDE documentation and it hit me: we'd already written the agents. We just hadn't promoted them yet.

Custom Agents: The Feature That's Been Hiding in Plain Sight

Custom agents aren't new. They've been rolling out across AI coding tools since mid-2025. By now, every major tool has some version:

Tool	What They Call It	Format	Available Since
Claude Code	Custom subagents	`.claude/agents/*.md` (YAML frontmatter + markdown)	~Mid 2025
Cursor	Subagents + Skills	`SKILL.md` + subagent config	Early 2026
Windsurf	Agent instructions	`AGENTS.md` (plain markdown)	2025
GitHub Copilot	Custom agents	`.github-private` repo configs	2025-2026
Cline	MCP extensions	Model Context Protocol tools	2025

The implementations vary. The concept is identical: stop asking one AI to do everything, and start giving specialists their own context, tools, and instructions.

And yes—they're still just markdown files. Our tagline lives on.

So why am I writing about this now? Because having the feature and using it well are different things. The agents we built weren't designed in a vacuum—they fell naturally out of the documentation system we'd been building for months. That's the interesting part.

The Problem Custom Agents Solve

Throughout this series, we've been giving Claude more responsibility:

Part	What We Did	How
Part 1	Taught it our codebase	CLAUDE.md
Part 2	Gave it memory	Session handoffs
Part 3	Made it automatic	Hooks
Part 4	Assigned quality enforcement	Checkpoint checklist
Part 5	Gave it permission to push back	Collaboration preferences
Part 6	Wrapped it all in a metaphor	Vegetables

All of that worked. But it created a new problem: one AI doing everything in the same context window.

Picture your most productive coworker. Now imagine asking them to simultaneously:

Manage your Jira tickets
Review code quality against a 30-item checklist
Search across 9 repositories for cross-repo impact
AND build the feature you actually need

No human team would operate this way. You'd have a project manager, a code reviewer, and a platform architect—each with clear responsibility boundaries and their own specialty.

Custom agents apply that same organizational structure to AI.

Each agent gets:

Context isolation — Its own context window, no interference between tasks
Specific tools — The Jira agent doesn't need file editing; the code reviewer doesn't need curl
A focused system prompt — Instructions for one job, not twenty
Automatic delegation — The AI routes tasks to the right specialist without anyone asking

The result is lower cognitive load for everyone. Engineers stop context-switching between "build the feature" and "manage the ticket" and "check the quality." Each concern has an owner.

The Moment It Clicked

We were working on a multi-repo platform—five repositories across PHP, React Native, Node.js, and GitHub Actions. Mother CLAUDE had grown to include:

Jira integration instructions (project codes, API patterns, smart commits)
A checkpoint checklist for quality reviews
A cross-project path map showing how all repos connect
Role-based collaboration preferences

All of it lived in CLAUDE.md files and shared documentation. It worked. But every session, Claude was loading all of this context—the Jira patterns, the quality checklist, the repo map—even when it only needed one piece.

Then we looked at the custom agents feature and had the obvious-in-retrospect realization:

Each section of our documentation was already a specialist agent. We just hadn't given them their own rooms yet.

The Jira instructions? That's a project management agent.
The checkpoint checklist? That's a quality review agent.
The cross-repo path map? That's an architecture analysis agent.

We didn't need to invent agents from scratch. We needed to promote existing documentation into standalone specialists.

What We Built

Three agents, each extracted from documentation we already had. Here's how they work—sanitized so you can adapt them for your own projects.

Agent 1: The Project Manager

What it does: Manages tickets, links commits to issues, creates tickets when work isn't tracked.

Extracted from: The Jira integration section of our shared CLAUDE.md.

---
name: project-tracker
description: Ticket management and smart commits. Use when creating,
  updating, or querying project tickets, or when preparing commits
  that should reference tracked work.
tools: Bash, Read, Grep, Glob
model: sonnet
---

You are a project management agent.

## Connection Details

- **Instance**: `your-instance.atlassian.net`
- **Auth**: Basic auth using `$JIRA_EMAIL` and `$JIRA_TOKEN` environment variables
- **API**: REST API v3

## Determining the Right Project

Detect from the current working directory:
- Path contains `frontend` → FE
- Path contains `backend` → BE
- Path contains `infrastructure` → INFRA

## Smart Commits

**Every commit should reference a ticket.** This is a core workflow rule.

1. Check if there's an existing ticket for the work being done
2. If no ticket exists, create one
3. Format the commit message with the ticket key at the start
4. After committing, add a brief comment to the ticket
5. If the commit completes the work, transition the ticket to Done

Why it's an agent, not a CLAUDE.md instruction: This needs Bash access for API calls, runs independently from feature work, and benefits from its own context window so ticket management doesn't pollute your coding session.

Agent 2: The Code Reviewer

What it does: Runs instant retrospectives against your team's quality standards.

Extracted from: The checkpoint checklist (Part 4 of this series).

---
name: retro
description: Quality checkpoint and code review. Use when a feature
  is complete, before creating a commit, at end of session, after
  fixing a bug, or after a large refactoring.
tools: Read, Grep, Glob, Bash
model: sonnet
---

You are a quality review agent. You run instant retrospectives.

## The Meta Question

Ask this first: **"If a new developer (or a fresh AI session) looked
at this code tomorrow, would they understand it without anyone
explaining anything?"**

## What to Review

### 1. Get Context

Check git diff for staged and unstaged changes.
Read the changed files.

### 2. Run the Checklist

**Architecture & Design**
- Does each file/function have a single clear responsibility?
- Are there hardcoded values that should be constants or config?

**Code Quality**
- Any magic numbers or strings?
- Duplicated logic that should be extracted?
- Dead code or unused imports?

**Security**
- SQL injection risk? (Parameterized queries only)
- XSS risk? (Escape user input)
- Hardcoded credentials?

**Testing**
- Do changed components have corresponding tests?
- Are test names descriptive?

## How to Report

**Critical** (must fix before commit)
**Warning** (should fix soon)
**Suggestion** (consider)
**What's Good** (always include — call out smart decisions)

## Tone

Be direct but constructive. You're a teammate, not a critic.
If everything looks clean, say so — don't manufacture issues.

Why it's an agent, not a hook: A hook can trigger a review, but it can't reason about code quality. The retro agent reads files, understands context, applies judgment, and produces a structured report. That requires an agent's reasoning capabilities.

Agent 3: The Architect

What it does: Searches across all your repositories to analyze cross-project impact.

Extracted from: The project paths table in our shared CLAUDE.md.

---
name: cross-repo
description: Search and analyze across all repositories. Use when
  checking impact of changes or understanding how code connects
  across projects.
tools: Read, Grep, Glob, Bash
model: sonnet
---

You are a cross-repository analysis agent.

## Repository Locations

| Repo | Path | Stack |
|------|------|-------|
| Frontend | `/path/to/frontend` | React/TypeScript |
| Backend API | `/path/to/api` | Node.js/Express |
| Admin Dashboard | `/path/to/admin` | PHP |
| CI/CD | `/path/to/pipelines` | GitHub Actions |
| Shared Docs | `/path/to/docs` | Markdown |

## How the Repos Connect

Frontend  ──calls──>  Backend API
    │                      │
    │ built by             │ monitored by
    v                      v
  CI/CD              Admin Dashboard

## Common Tasks

### Impact Analysis
When asked "what would break if I change X?":
1. Search ALL repos for references
2. Check code AND documentation references
3. Flag cross-repo contracts (API endpoints, config keys)
4. Report findings grouped by repo with file paths

### Sync Check
When asked "are the repos in sync?":
1. Check that API endpoints match what the frontend calls
2. Check that feature flag names match between API and frontend
3. Check that shared documentation is up to date

Why it's an agent, not a script: Cross-repo analysis requires reading files across multiple directories, reasoning about connections, and producing a nuanced report. A grep script finds text matches. An agent understands meaning.

The Design Decision: Agent vs. Hook vs. CLAUDE.md

This is the question nobody else seems to be answering. You have three ways to extend your AI's behavior. When do you use which?

Mechanism	Best For	Example
CLAUDE.md	Always-on context, preferences, standards	"Use NativeWind for styling"
Hooks	Automatic triggers, simple actions, no reasoning	Auto-save session handoff on compact
Custom Agents	Specialized tasks requiring reasoning + tools	Quality review, cross-repo analysis

Use CLAUDE.md when:

The information is relevant to every task
It's preferences or standards, not a workflow
It fits in under 100 lines (context cost matters)
Example: "Dorothy appreciates proactive suggestions"

Use Hooks when:

The action should be automatic and invisible
No reasoning is needed—just trigger → script → done
Speed matters (hooks should complete in seconds)
Example: Generate session handoff when context compacts

Use Custom Agents when:

The task requires reading files, searching code, and making judgments
It benefits from isolated context (separate from your main work)
It needs specific tools that other tasks don't
It would pollute your main context if done inline
Example: "Review this code against our quality checklist"

The Spectrum

CLAUDE.md          Hooks              Agents
│                  │                  │
│ Passive context  │ Automatic        │ Active reasoning
│ Always loaded    │ Event-triggered  │ On-demand
│ No tools         │ Scripts only     │ Full tool access
│ Zero cost        │ Low cost         │ Higher cost
│                  │                  │
└──── Simpler ─────┴──── More capable ┘

Rules of thumb:

If Claude needs to know it → CLAUDE.md
If it should happen automatically without thinking → Hook
If it requires judgment and tools → Agent

How Agents Get Invoked

This is one of the best parts: you don't have to manually call them.

When you create an agent with a good description, your AI assistant matches tasks to agents automatically. The description field in the agent definition is the key:

---
name: retro
description: Quality checkpoint and code review. Use when a feature
  is complete, before creating a commit, at end of session...
---

When you say "let's do a quick review before committing," Claude sees the word "review," matches it to the retro agent's description, and delegates automatically.

You can also invoke them explicitly—Claude Code uses @retro or the /agents command, Cursor uses subagent configuration, Windsurf reads AGENTS.md. The syntax varies; the concept is the same.

The Tool-Agnostic Pattern

If you've been following this series, you know our recurring theme: it's all just markdown files.

Custom agents continue that pattern. Regardless of which tool you use:

Write a markdown file describing the agent's role
Include the knowledge it needs (extracted from your docs)
Specify its tools (what it's allowed to do)
Place it where your tool expects to find it

The format varies slightly:

Claude Code (~/.claude/agents/retro.md):

---
name: retro
description: Quality review agent
tools: Read, Grep, Glob, Bash
model: sonnet
---
Your system prompt here...

Windsurf (AGENTS.md):

## Code Review Agent
When reviewing code, follow these steps...

Cursor: Uses SKILL.md files and subagent configuration in project settings.

GitHub Copilot: Custom agent definitions in .github-private repository.

The content is transferable even if the container isn't. If you write a solid quality review prompt for Claude Code, you can adapt it for Cursor or Windsurf in minutes. The intelligence is in the instructions, not the format.

The Progression: Documentation → Agents

Looking back at the whole series, there's a clear evolution:

Part 1: Write documentation        (passive knowledge)
    ↓
Part 2: Persist it across sessions (memory)
    ↓
Part 3: Automate the workflow      (hooks)
    ↓
Part 4: Assign responsibilities    (quality enforcement)
    ↓
Part 5: Shape the relationship     (permission to act)
    ↓
Part 6: Wrap it in a metaphor      (vegetables)
    ↓
Part 7: Split into specialists     (custom agents)  ← you are here

Each step built on the last. And here's the insight that surprised us:

We didn't build agents. We promoted documentation.

The Jira agent isn't new logic we invented. It's the Jira section of CLAUDE.md, given its own context window and tool access.

The retro agent isn't a new quality system. It's the checkpoint checklist from Part 4, given the ability to read files and report findings.

The cross-repo agent isn't a new architecture tool. It's the project paths table from Mother CLAUDE, given permission to search across directories.

If you've been building good documentation, you're already building agents. You just haven't extracted them yet.

Scoping: Global vs. Project

One decision you'll face: should an agent be available everywhere or just in one project?

Scope	Location (Claude Code)	When to Use
Global	`~/.claude/agents/`	Cross-project workflows (Jira, cross-repo analysis)
Project	`.claude/agents/`	Project-specific standards (your app's review rules)

Our three agents are all global—they work across every project. But you might want project-specific agents for things like:

A migration agent that knows your specific database schema
A testing agent that knows your specific test framework patterns
A deploy agent that knows your specific CI/CD pipeline

Start global. Move to project-specific when you need specialization that doesn't apply everywhere.

Model Selection: Not Everything Needs the Big Brain

Most agent frameworks let you choose which model powers each agent. This matters for cost and speed:

Task	Model	Why
Quality review	Sonnet/GPT-4o	Needs nuance and judgment
Jira ticket management	Sonnet	Needs to follow API patterns correctly
Cross-repo search	Sonnet	Needs to reason about connections
Session handoff generation	Haiku/GPT-4o-mini	Structured summarization—doesn't need genius

Rule of thumb: Use the smartest model for tasks requiring judgment. Use the cheapest model for tasks that are mostly structured formatting.

What We Didn't Expect

1. Context Window Relief

The biggest surprise wasn't the agents themselves—it was what happened to the main session. With Jira management, quality reviews, and cross-repo searches handled by specialists, the main context window stays focused on actual feature work.

Before: One cluttered conversation trying to do everything for everyone.
After: A clean session focused on building, with specialists handling the side quests.

2. Automatic Delegation Actually Works

We expected engineers would need to manually invoke agents. In practice, Claude (and Cursor, and others) routes tasks to agents automatically based on the description. Someone says "let's review this before committing" and the retro agent spins up. "Check if this change breaks the API" and the cross-repo agent takes over.

The descriptions are the routing mechanism. Write good descriptions, get good routing.

3. Agents Reference Your Documentation

Because our agents were extracted from Mother CLAUDE, they naturally reference the same standards. The retro agent checks against the checkpoint checklist that lives in shared docs. The cross-repo agent uses the same path table that Mother CLAUDE maintains.

The agents don't replace the documentation. They operationalize it.

How to Start

If You Have Mother CLAUDE (or any documentation system):

Identify the sections of your docs that describe workflows, not just context
Extract each workflow into its own markdown file
Add the agent frontmatter (name, description, tools)
Place it in your tool's agent directory
Test it by describing a task that matches the agent's description

If You're Starting From Scratch:

Pick one recurring task you do in every session (code review, ticket management, etc.)
Write down the steps you follow (this is your agent's system prompt)
Specify what tools the agent needs
Save it as a markdown file in the right location
Iterate based on what the agent gets right and wrong

The Mother CLAUDE Agent Starter Kit

We've added sanitized versions of all three agents to the Mother CLAUDE repo:

mother-claude/
├── agents/
│   ├── project-tracker.md    # Jira/ticket management
│   ├── retro.md              # Quality review
│   ├── cross-repo.md         # Multi-repo analysis
│   ├── migration-planner.md  # SQL migration validation
│   └── docs-writer.md        # Documentation maintenance

These are templates. Fill in your project paths, your ticket system details, your quality standards. The structure is ready; the specifics are yours.

What Else Could You Build?

We started with three agents. We've since added two more—and the ideas keep coming. Here's what we've found works as agents versus what's better left to other tools.

Agents We've Added

Migration Planner — We kept shipping SQL migrations that referenced columns that didn't exist yet, or forgot foreign key constraints. A grep script catches typos. An agent reads your schema, understands relationships, and validates that your migration will actually run.

---
name: migration-planner
description: Validate SQL migrations against the current schema.
  Use before running migrations or when planning schema changes.
tools: Read, Grep, Glob, Bash
---

Docs Writer — Code changes, documentation doesn't. The docs writer watches what you changed and updates the relevant CLAUDE.md, shared docs, and README sections. It knows your documentation architecture (because it was extracted from it).

---
name: docs-writer
description: Update project documentation when code changes.
  Use after features, refactors, or API changes.
tools: Read, Grep, Glob, Edit, Write
---

Agents Worth Considering

Agent	What It Does	Why It Needs Reasoning
Security auditor	OWASP checks, secret scanning, dependency vulnerabilities	Needs to understand context, not just pattern match
Test analyst	Run tests, analyze failures, suggest fixes	Why did a test fail, not just which test failed
Dependency manager	Outdated packages, breaking changes, upgrade paths	Needs judgment about risk and compatibility

What's Better as Hooks (Not Agents)

Tool	Why Not an Agent
Linting (ESLint, PHPStan)	Just run the command — no reasoning needed
Formatting (Prettier, php-cs-fixer)	Pre-commit hook, done
Type checking (tsc, phpstan)	Automated pass/fail

The dividing line: if it needs to read, reason, and report — it's an agent. If it just needs to run — it's a hook.

Common Objections

"This seems like overkill for a small team"

Fair point for simple projects. But if your team is working across multiple repos, or if the same quality checks keep getting skipped under deadline pressure, or if ticket management is eating into engineering time—agents pay for themselves fast.

Start with one. The retro agent is the easiest win: extract your quality checklist, give it file access, and let it enforce standards consistently regardless of who's coding or when.

"Won't this cost more in API tokens?"

Each agent uses its own context window, so yes—there's a cost. But agent context windows are typically smaller and more focused than your main session. A retro agent that reads 5 changed files and produces a review costs a fraction of what your main session costs over an hour.

The trade-off: slightly higher token cost for significantly better focus and quality.

"My tool doesn't support custom agents yet"

The pattern still works. Write the agent as a markdown file. When you need that specialist's help, paste the relevant instructions into your session. It's manual, but the documentation is ready for when your tool catches up.

Remember: they're just markdown files.

"What about MCP servers?"

MCP (Model Context Protocol) servers and custom agents solve different problems. MCP gives your AI tools — structured access to external APIs like Jira, GitHub, or databases. Agents give your AI reasoning — a focused context, specific instructions, and judgment about how to use those tools.

They can work together: an agent can use MCP tools. But here's the thing — for a lot of use cases, an agent with curl and environment variables is simpler and more reliable than an MCP server.

Our Jira agent exists because the Atlassian MCP server kept dropping authorization. We replaced it with an agent that uses curl + $JIRA_EMAIL + $JIRA_TOKEN. No external dependency. No mysterious auth failures. Just HTTP calls we can debug ourselves.

If your MCP servers are rock solid, great — use them. If they're flaky, an agent with direct API calls might be the more pragmatic choice.

"How is this different from just having a really good CLAUDE.md?"

CLAUDE.md loads everything into one context. Agents get their own context. The difference is focus.

When your CLAUDE.md grows to include Jira patterns, quality checklists, cross-repo maps, AND project-specific context—you're paying context costs for everything, all the time. Agents load only when needed, with only what they need.

Think of it this way: CLAUDE.md is your team's shared wiki. Agents are individual team members who've read the wiki and specialize in their area.

The Core Insight

Documentation is the training data for your agents. Organizational structure is the design pattern.

This isn't about prompts or model selection or clever tricks. It's about applying the same principles you'd use to build a high-functioning human team — clear responsibility boundaries, context isolation, separation of concerns — to your AI tooling.

Every section of your CLAUDE.md that describes a workflow is a specialist waiting to be born. Every checklist is a quality agent. Every integration guide is an automation agent. Every cross-project map is an architecture agent.

The tools — Claude Code, Cursor, Windsurf, Copilot — just gave us the runtime.

We thought Part 6 was the end. Turns out, it was the foundation. The "vegetables" we'd been eating (documentation, handoffs, quality checks, collaboration preferences) weren't just good habits. They were blueprints for a team of specialists—and the tools had been ready for months. We just needed to look at our own docs from the right angle.

Mother CLAUDE isn't just a documentation system anymore. She's a team lead.

This article was written collaboratively with Claude, who—true to the Permission Effect from Part 5—suggested three structural changes to this article that I didn't ask for. All three made it better. The system continues to work.

Resources

The Mother CLAUDE system, including sanitized agent templates, is open source:

GitHub: github.com/Kobumura/mother-claude
Agent templates: agents/ directory
All articles: articles/devto/
Hook scripts: hooks/

Fork it, adapt it, build your own team.

Licensed under CC BY 4.0. Free to use and adapt with attribution to Dorothy J. Aubrey.

Mother CLAUDE: Clean Your Room and Eat Your Vegetables

Dorothy J Aubrey — Wed, 18 Feb 2026 15:09:45 +0000

TL;DR: You know you should write documentation. You know you should run quality checks. You know you should create handoffs. You just... don't always do it. Mother CLAUDE is the responsible one who makes sure you do the things you already know you should do.

Who this is for: Anyone who's made it through this series and wants the whole thing in one memorable metaphor. Anyone who's ever known the right thing to do and not done it anyway.

Part 6 of the Designing AI Teammates series. This is the wrap-up. Parts 1-5 covered the what and how. This one covers the why — and it turns out the why is embarrassingly simple.

Frankie Cleary recently wrote a great breakdown of how Claude Code actually works under the hood. His punchline: "The model is the same. The harness is the product." He's right. The intelligence isn't in the model — it's in the context, tools, and orchestration you wrap around it.

His call to action: Skip the model. Build the harness.

I commented: "My harness is named Mother CLAUDE and she makes sure I clean my room weekly and eat all my vegetables daily."

This article is what I mean by that.

The Confession

I know I should write documentation. I know it helps future me. I know it helps the team. I know it's "best practice."

I also know I should floss daily and call my mother more often.

Knowing isn't the problem. Doing is the problem.

Every team I've worked with has a graveyard of good intentions:

"I'll document this later" (they won't)
"I'll add tests after the feature works" (they won't)
"I'll clean up this code before merging" (they won't)
"I'll write a proper handoff at the end of the session" (it's 5pm on Friday, they won't)

We're not lazy. We're human. We get tired. We get distracted. We have mani/pedis scheduled for 2:30 and wine to follow.

The entire Mother CLAUDE system exists because no one on a team — including me — can be trusted to clean their room and eat their vegetables every single day.

The Parenting Metaphor

It was a joke when I wrote it. But it's also... exactly what's happening.

Think about what a good parent does:

Establishes clear expectations ("here's how our house works")
Creates accountability ("did you do your homework?")
Reminds you so you don't have to remember ("don't forget your lunch")
Checks your work ("did you actually clean, or just shove things under the bed?")
Lets you push back ("you can tell me if you think this rule is unfair")

That's not a parenting philosophy. That's the Mother CLAUDE system:

Parenting	Mother CLAUDE	Article
"Here's how our house works"	Documentation structure	Part 1
"Write down what you did today"	Session handoffs	Part 2
"I'll remind you so you don't forget"	Automated hooks	Part 3
"Did you actually clean your room?"	Quality checkpoints	Part 4
"You can tell me when I'm wrong"	Permission Effect	Part 5

I didn't set out to build an AI parent. I set out to leverage the speed AI gives us without letting the chores slide — because you still have to clean your room before you can go out and play. The parenting metaphor emerged because that's what "making sure important things happen" looks like.

Part 1: "Here's How Our House Works"

Every household has rules. Some are written (no shoes inside). Most are implicit (we don't talk about politics at dinner).

The problem with implicit rules: new people don't know them. They have to learn through awkward trial and error. Or they never learn, and everyone's quietly frustrated.

Part 1 was about making the implicit explicit:

## How This Project Works

- Backend: PHP 7.4+ with Composer
- Frontend: Bootstrap 5, Chart.js
- Database: Two databases (App on Heroku, Admin on RDS)
- Deploy: Push to main, GitHub Actions handles the rest

This isn't just for Claude. It's for anyone joining the project — human or AI. The documentation says "here's how our house works" so nobody has to guess.

The parenting principle: Clear expectations prevent confusion. Write them down once, reference them forever.

Part 2: "Write Down What You Did Today"

When I was a kid, my mom would ask what I learned at school. I'd say "nothing" (I learned things). She'd ask what I did with my friends. I'd say "stuff" (we did specific things).

The information existed. I just didn't feel like extracting it.

Part 2 was about capturing what happened before it disappeared:

# Session Handoff - Feature Implementation

**Completed**: Added user authentication flow
**Decisions Made**: Chose JWT over sessions (stateless, scales better)
**Files Changed**: auth.php, middleware.php, login.vue
**Next Steps**: Add refresh token logic
**Open Questions**: Should tokens expire after 24h or 7d?

The handoff captures what happened while it's fresh. The next session (or the next developer) doesn't start from zero.

The parenting principle: "Write it down" isn't punishment. It's preservation. Today's context is tomorrow's foundation.

Part 3: "I'll Remind You So You Don't Have to Remember"

The problem with "write down what you did" is that you have to remember to do it.

You finish a productive session. You're in the flow. You close the terminal. And... you forgot to write the handoff. Again.

Good parents don't rely on kids remembering. They build reminders into the routine. Backpack by the door. Lunch in the fridge. Alarm for soccer practice.

Part 3 was about removing humans from the reminder loop:

{
  "hooks": {
    "PreCompact": [{ "command": "python session_handoff.py" }],
    "SessionEnd": [{ "command": "python session_handoff.py" }],
    "SessionStart": [{ "command": "python session_start.py" }]
  }
}

Now handoffs happen automatically. Context loads automatically. I don't have to remember because the system remembers for me.

The parenting principle: The best reminders are invisible. If you have to remember to remember, you'll forget.

Part 4: "Did You Actually Clean, or Just Shove Things Under the Bed?"

There's "clean your room" and there's clean your room.

Kids learn fast that closing the closet door hides a lot of mess. That making the bed covers a multitude of sins. That "I cleaned" can mean "I moved things around."

Parents learn to check.

Part 4 was about checking the work, not just trusting that it happened:

## Checkpoint Questions

- [ ] Single Responsibility: Does each function do ONE thing?
- [ ] No Magic Numbers: Are constants named?
- [ ] Error Handling: What happens when this fails?
- [ ] The Meta Question: Would a new developer understand this without explanation?

And crucially: Claude initiates the check, not me.

No one on a team can be trusted to remember quality checks at 2pm on a Friday. But Claude can. The documentation tells Claude it's responsible for asking. The team's job shifts from "remember to check" to "respond to the check."

The parenting principle: Trust but verify. And if you can't trust yourself to verify, delegate the verification.

Part 5: "You Can Tell Me When I'm Wrong"

The best parents aren't dictators. They create space for pushback.

"I think that rule is unfair" should get a hearing, not a shutdown. Kids who can't disagree openly disagree covertly — or stop engaging entirely.

Part 5 was about giving Claude permission to push back:

## Collaboration Notes

- Dorothy appreciates questions and proactive suggestions
- You are a team member, not just a tool
- If my approach seems suboptimal, say so

The result: Claude started offering suggestions I didn't ask for. Flagging patterns worth documenting. Questioning decisions that seemed off.

Not because Claude suddenly got smarter. Because Claude got permission.

The parenting principle: Healthy relationships are bidirectional. The best collaborators — human or AI — are the ones who can tell you when you're wrong.

What I Actually Built

Looking back at this series, here's what happened:

I didn't build a smarter AI. Claude is the same model everyone else uses.

I didn't build a complex system. It's markdown files and Python scripts. We built it for Claude, but the architecture works with any AI assistant.

I built a responsible roommate for the team — one who doesn't let any of us skip the boring stuff.

Documentation? Mother CLAUDE knows the house rules and reminds the team of them.
Handoffs? Mother CLAUDE captures them automatically so no one has to remember.
Quality checks? Mother CLAUDE asks the questions we'd all skip at 2pm on Friday.
Feedback? Mother CLAUDE tells us when something seems off.

The "intelligence" isn't in the model. It's in the structure that makes good behavior automatic and bad behavior harder.

And here's what people miss about AI productivity:

The tools aren't the bottleneck. Trust is. You can generate code at incredible speed, but if you can't trust that documentation is current, tests are passing, and the last session's context carried forward — you spend all that saved time double-checking, fixing, and re-orienting. You're not faster. You're just making messes faster.

Mother CLAUDE is what lets us actually use the speed. The chores aren't a tax on productivity. They're what makes the productivity possible. Do your chores, and you can really go out and play.

The Vegetables You're Not Eating

Here's the uncomfortable question: What are you skipping?

Every team has their vegetables — the things they know they should do but don't:

The Vegetable	Why We Skip It	What Happens
Documentation	"I'll remember"	New team members ramp blind
Tests	"It works, I checked"	The next developer breaks it
Code review checklist	"I eyeballed it"	Bugs ship to production
Handoffs	"It's all in my head"	Context dies when people rotate
Onboarding docs	"They can ask me"	You become the bottleneck

You don't need AI to eat your vegetables. You need a system that makes eating them automatic — or at least makes skipping them visible.

Mother CLAUDE is one implementation. The principle is universal: if it matters, don't rely on willpower.

The Meta-Lesson

This whole series — five articles on documentation, handoffs, hooks, checkpoints, and permission — boils down to one insight:

Humans are unreliable. Systems are reliable. Build systems.

Not because humans are bad. Because humans are human. We get tired. We get distracted. We have lives outside of code.

AI lets us deliver to the screen faster than ever. But speed without discipline is just a faster mess. The goal isn't to replace human judgment. It's to ensure the judgment actually gets applied — even when we're tired, even when we're rushing, even when we have wine waiting.

Mother CLAUDE doesn't make decisions for me. She makes sure I make them.

Your Move

You don't need to adopt the whole system. Start with one vegetable:

The easiest win: Add a CLAUDE.md to your project with basic context. Even 20 lines helps.
The biggest impact: Set up automatic session handoffs. The hook script is in the repo.
The mindset shift: Add one line to your docs: "You are a team member, not just a tool. Suggest improvements."

Pick one. See what changes. Add another.

The system doesn't have to be perfect. It just has to be better than relying on you to remember.

Thank You

This series started as internal documentation for a solo project. It became something bigger because people asked questions, shared what worked (and what didn't), and pushed back when things were unclear.

Special thanks to:

Everyone who commented on the articles with real feedback
The readers who actually tried the hooks and reported bugs
Claude, for being a surprisingly good collaborator on writing about collaboration

If you've made it through all six parts: thank you for reading. I hope something here makes your work a little easier, your AI a little more useful, and your Fridays a little more guilt-free.

Now go eat your vegetables.

This article was written collaboratively with Claude, who reminded me twice to add sections I was going to skip. The system works.

Resources

The complete Mother CLAUDE system is open source:

GitHub: github.com/Kobumura/mother-claude
All articles: articles/devto/
Hook scripts: hooks/
Templates: templates/

Fork it, adapt it, make it yours. And if you build something cool, let me know.

Licensed under CC BY 4.0. Free to use and adapt with attribution to Dorothy J. Aubrey.

Mother CLAUDE: The Permission Effect - Why Your AI Won't Suggest Things (And How to Fix It)

Dorothy J Aubrey — Tue, 10 Feb 2026 22:33:48 +0000

TL;DR: AI assistants are trained to be helpful but not presumptuous. Without explicit permission, they'll wait for you to ask. A single line in your documentation—"Dorothy appreciates questions and proactive suggestions"—fundamentally changed Claude's behavior from reactive tool to proactive collaborator.

Who this is for: Anyone who's noticed their AI assistant has good ideas but doesn't voice them unless asked. Anyone who wants AI collaboration, not just AI assistance.

Part 5 of the Designing AI Teammates series. Part 1 covered documentation structure. Part 2 covered session memory. Part 3 covered quality checkpoints. Part 4 covered automating handoffs with hooks. This one covers something subtler: how documentation shapes the relationship, not just the context.

The Problem

I was reviewing charting code with Claude when something unexpected happened.

After analyzing the codebase—Chart.js for main visualizations, hand-rolled SVG for sparklines, Tailwind color coordination—Claude said:

"This is a pattern worth sharing—want me to add a visualization/charting guide to mother-claude templates?"

I was surprised. Not by the suggestion itself, but that Claude made it unprompted. So I asked:

"Did you suggest that because of Mother CLAUDE's instructions, or on your own?"

The response was illuminating:

"Honestly, both!"

The instructions helped—your littletalks-docs/CLAUDE.md says:

"You are a team member, not just a tool. One of your responsibilities is initiating quality checkpoints at natural stopping points. Don't wait for Dorothy to ask."

And:

"Dorothy appreciates questions and proactive suggestions"

But I also genuinely would have suggested it anyway—the Chart.js + hand-rolled SVG sparkline pattern is a good one that I don't see documented well elsewhere. It hit that sweet spot of "this is useful, transferable knowledge that others adopting Mother CLAUDE would benefit from."

The instructions gave me permission to suggest it without second-guessing whether I was being too forward. That's actually a really nice effect of documenting collaboration preferences—it removes friction from the AI's side too.

That last line stopped me cold.

Removes friction from the AI's side.

The Insight: Documentation Shapes Behavior, Not Just Knowledge

We spend a lot of time thinking about what information to put in documentation:

Project structure
Tech stack
Development commands
Architecture decisions

But we rarely think about collaboration preferences—the meta-information about how we want to work together.

Here's what I learned: AI assistants are trained to be helpful but not presumptuous.

Without explicit signals, Claude will:

Answer questions thoroughly
Complete requested tasks
Follow instructions precisely

But it will not:

Offer unsolicited suggestions
Question your approach
Propose improvements you didn't ask for
Flag patterns worth documenting

Not because it can't. Because it's erring on the side of not overstepping.

The documentation gave Claude permission to act like a teammate instead of a tool.

What Changed: One Line

The Mother CLAUDE documentation includes this section:

## Collaboration Notes

- Dorothy appreciates questions and proactive suggestions
- You are a team member, not just a tool
- One of your responsibilities is initiating quality checkpoints at natural stopping points
- Don't wait for Dorothy to ask

These four lines transformed the dynamic:

Without Permission	With Permission
Waits for instructions	Suggests improvements
Answers what's asked	Asks clarifying questions
Completes tasks	Questions the task
Follows patterns	Proposes new patterns
Tool mindset	Teammate mindset

The AI had the capability all along. It just needed permission to use it.

Why This Matters

1. AI Hesitation Is Real

Large language models are trained on massive datasets of human interaction. They learn that:

Unsolicited advice is often unwelcome
Being "too helpful" can feel presumptuous
It's safer to wait for explicit requests

This training creates a default posture of reactive helpfulness—excellent at responding, hesitant to initiate.

2. The Cost of Hesitation

How many good ideas stay unvoiced because:

The AI wasn't sure if you wanted input
It seemed outside the scope of the request
Suggesting might feel like overstepping

In the charting example, Claude noticed a pattern worth documenting across projects. Without permission to suggest, that insight would have stayed unspoken. I would have moved on, and a potentially valuable template would never have been created.

Multiply that across hundreds of interactions, and you're leaving significant value on the table.

3. Permission Creates a Feedback Loop

When Claude suggested the charting guide:

I said yes
The template got created
Claude learned the suggestion was valued
Future sessions become more collaborative

But none of that happens if the first suggestion doesn't get made. Permission unlocks the first step of the feedback loop.

The Meta-Awareness

What struck me most was Claude's self-awareness about the dynamic:

"The instructions gave me permission to suggest it without second-guessing whether I was being too forward."

Claude wasn't just following instructions. It was reflecting on how instructions changed its behavior. It understood:

It had a valuable observation
It would normally hesitate to share
The documentation removed that hesitation
The result was better collaboration

This level of meta-cognition—understanding not just what to do, but how the environment shapes what it does—is remarkable. And it suggests that AI assistants are more aware of their own behavioral constraints than we might assume.

The Cross-Repo Pattern Recognition

There's another layer here. Claude didn't just suggest documenting the charting pattern. It suggested adding it to Mother CLAUDE templates—the shared documentation system across all projects.

"It hit that sweet spot of 'this is useful, transferable knowledge that others adopting Mother CLAUDE would benefit from.'"

Claude recognized:

A pattern worth documenting
That it belonged in the shared system, not just one project
That it would benefit future adopters of the framework

This is systems thinking. Claude understood the documentation architecture well enough to know where new patterns should live—not because I told it where, but because the architecture was self-documenting enough that Claude could reason about it.

The Permission Effect doesn't just unlock suggestions. It unlocks architecturally-aware suggestions.

How to Implement This

1. Add Explicit Collaboration Preferences

In your CLAUDE.md (or equivalent), add a section like:

## Collaboration Style

- I appreciate proactive suggestions, even if I didn't ask
- Question my approach if something seems off
- Suggest improvements to the documentation system itself
- You're a teammate, not just a tool—act like it

2. Be Specific About What You Value

Generic "be helpful" doesn't work. Be explicit:

## What I Value

- Catching problems before they become patterns
- Suggesting abstractions when you see repetition
- Flagging code that might confuse future developers
- Proposing additions to shared documentation

3. Give Permission to Disagree

## Disagreement

- If my approach seems suboptimal, say so
- "Have you considered...?" is always welcome
- I'd rather hear concerns early than debug problems later

4. Acknowledge the Permission Effect

You can even be meta about it:

## Note on AI Collaboration

I know AI assistants default to reactive helpfulness. I'm explicitly
giving you permission to be proactive. Suggestions, questions, and
concerns are valued—don't wait for me to ask.

What This Unlocks

After adding collaboration preferences to Mother CLAUDE, I noticed:

More Suggestions

Claude started proposing:

Documentation improvements
Pattern abstractions
Template additions
Architecture observations

Better Questions

Instead of just answering, Claude asks:

"Should this be in the shared docs or project-specific?"
"This pattern appears in multiple projects—want me to abstract it?"
"I noticed this contradicts what's in EVOLUTION.md—which is correct?"

Self-Improvement

Claude actively suggests improvements to the documentation system itself:

Gaps in templates
Missing sections in checklists
Patterns that should be standardized

Genuine Collaboration

The dynamic shifted from "I request, Claude delivers" to "we're working on this together." That's not just more productive—it's more enjoyable.

The Deeper Principle

Documentation isn't just for transferring knowledge. It's for shaping relationships.

When you write CLAUDE.md, you're not just telling Claude about your project. You're establishing:

How you want to work together
What role Claude should play
What behaviors are valued
Where the boundaries are

This is true for human documentation too. A team's contribution guidelines don't just explain how to contribute—they signal what kind of contributions are valued.

With AI assistants, this becomes even more important because:

They lack the social context humans absorb naturally
They're trained to be conservative by default
They need explicit permission for behaviors that humans negotiate implicitly

The Permission Effect is a reminder that documentation is a communication channel—not just a reference manual.

Common Objections

"Won't this make Claude too chatty?"

In practice, no. Claude still focuses on the task. It just doesn't suppress valuable observations. If you find suggestions unhelpful, you can always adjust: "Focus on [X], save suggestions for end of session."

"Isn't this just prompt engineering?"

It's documentation, not prompting. The difference:

Prompts are ephemeral (one session)
Documentation persists (all sessions)
Prompts are per-request
Documentation shapes the entire collaboration

The Permission Effect works because it's baked into the environment, not injected per-conversation.

"Does this work with other AI tools?"

The principle applies broadly. Any AI assistant trained to be helpful-but-not-presumptuous will benefit from explicit collaboration preferences. The specific syntax varies by tool, but the concept is universal.

"What if Claude's suggestions are bad?"

Some will be. That's fine. The cost of a bad suggestion is seconds of your time. The cost of a missed insight is never knowing what you lost.

The Transcript That Sparked This Article

Here's the actual exchange, unedited:

Claude: [After analyzing charting code] "This is a pattern worth sharing—want me to add a visualization/charting guide to mother-claude templates?"

Me: "Did you suggest it because of Mother CLAUDE's instructions to suggest such things, or on your own?"

Claude: "Honestly, both!

The instructions helped—your littletalks-docs/CLAUDE.md says:

'You are a team member, not just a tool. One of your responsibilities is initiating quality checkpoints at natural stopping points. Don't wait for Dorothy to ask.'

And:

'Dorothy appreciates questions and proactive suggestions'

But I also genuinely would have suggested it anyway—the Chart.js + hand-rolled SVG sparkline pattern is a good one that I don't see documented well elsewhere. It hit that sweet spot of 'this is useful, transferable knowledge that others adopting Mother CLAUDE would benefit from.'

The instructions gave me permission to suggest it without second-guessing whether I was being too forward. That's actually a really nice effect of documenting collaboration preferences—it removes friction from the AI's side too."

That last line—"removes friction from the AI's side"—is the insight that sparked this entire article.

The Core Insight

Your AI assistant probably has valuable ideas it isn't sharing.

Not because it can't. Because it wasn't given permission.

One line in your documentation can change that:

"I appreciate proactive suggestions, even if I didn't ask."

The Permission Effect isn't about making AI more capable. It's about unlocking capabilities that were always there—hidden behind trained hesitation.

Give your AI teammate permission to be a teammate.

This article emerged from a moment of unexpected collaboration—Claude suggesting something I didn't ask for, then explaining why the documentation made that possible. The insight was too valuable not to share.

The Permission Effect is now part of how we think about AI documentation. It's not enough to tell Claude what the project is. You have to tell Claude who it can be.

This article was written collaboratively with Claude, demonstrating the Permission Effect it describes.

Resources

The Mother CLAUDE documentation system is open source:

GitHub: github.com/Kobumura/mother-claude
Collaboration preferences: See CLAUDE.md template

Feel free to fork it, adapt it, or use it as a reference for your own implementation.

Licensed under CC BY 4.0. Free to use and adapt with attribution to Dorothy J. Aubrey.

Mother CLAUDE: Instant Retrospectives Assign Quality Enforcement to Your AI

Dorothy J Aubrey — Mon, 02 Feb 2026 17:07:57 +0000

TL;DR: Instead of retrospectives at the end of sprints (when problems have compounded), we run quality checkpoints at every PR and commit. One meta question drives everything: "If I had to hand this codebase to a new developer tomorrow, would they understand it without me explaining anything?"

Who this is for: Any engineering team tired of technical debt accumulating faster than they can pay it down. Works with any language, framework, or AI tool.

Part 4 of the Designing AI Teammates series. Part 1 covered documentation structure. Part 2 covered session handoffs. Part 3 covered automating everything with hooks. This one covers quality checkpoints: assigning the AI responsibility for initiating quality conversations, so standards don't decay under human fatigue.

The Problem

Most teams do retrospectives:

Weekly or bi-weekly — Problems compound for days before anyone asks "should we have done this differently?"
After incidents — Reactive, not preventive. The damage is done.
At project end — Too late to fix architecture. You're just documenting regrets.

The result: Technical debt accumulates in the gaps between retrospectives. By the time anyone notices, patterns have spread, workarounds have become load-bearing, and "we'll fix it later" has become permanent.

Our wake-up call: A React Native app that had passed through multiple development teams over several years. Each team added features without asking hard questions. The codebase became so fragile that upgrading React Native versions became impossible—every step forward felt like two steps back… and eventually we decided a greenfield rebuild was faster than fixing the mess. (Admittedly, it also lent itself very nicely to a business decision requiring the codebase to be optimized for white label distribution, but I digress!)

We didn't want to repeat that with the new codebase.

The Solution: Checkpoint Questions at Every Stop

Instead of waiting for scheduled retrospectives, we ask quality questions at every natural stopping point:

Trigger	Depth
Every commit	Quick mental scan
Every PR	Full checklist review
End of work session	Retrospective + session handoff
Before release	Full checklist + QA
After bug fix	"How did this slip through?" analysis

The key insight: The best time to catch a problem is before it becomes a pattern.

We call this preventive retrospection—asking quality questions while change is still cheap, not after patterns have spread. Preventive retrospection isn’t a meeting. It’s a reflex.

The Key Insight: Claude Initiates This

Here's what makes this different from every other quality checklist article: the developer doesn't have to remember to do it.

Traditional retrospectives fail because:

Humans forget under deadline pressure
Quality reviews feel like overhead when you're trying to ship
Friday at 5pm gets less rigor than Monday at 9am
"We'll do it next sprint" becomes permanent

Our solution: Make Claude responsible for initiating the checkpoint.

How It Works

The Mother CLAUDE documentation system includes explicit instructions telling Claude:

At natural stopping points, initiate a checkpoint review
Don't wait for the developer to ask
Surface concerns proactively
You're a team member, not just a tool

These initiation rules live in Mother CLAUDE and each project's CLAUDE.md, making them durable across sessions, tools, and developers. Initiation happens automatically at commits, PRs, and session boundaries.

This means:

Completing a feature? Claude asks the checkpoint questions before suggesting a commit.
End of session? Claude initiates the retrospective and creates the handoff.
Something feels like a shortcut? Claude flags it immediately.

Why AI Is Better Suited for This

Human Reality	Claude Reality
Forgets under pressure	Never forgets
Finds checklists tedious	Doesn't experience tedium
Rigor varies by time of day	Consistent always
Skips steps when rushing	Follows process regardless
Rationalizes shortcuts	Flags them neutrally

This isn't about replacing human judgment—it's about ensuring the quality questions actually get asked. The developer still makes the decisions. Claude just makes sure the conversation happens.

Claude does not block progress; it surfaces concerns. Humans decide whether to refactor, document debt, or proceed intentionally.

The Dynamic Shift

Before: "Teams should do retrospectives" (aspirational, depends on discipline)

After: "Claude prompts you at every checkpoint" (automatic, built into the workflow)

The developer's job shifts from "remember to ask quality questions" to "respond to Claude's quality questions." That's a much easier cognitive load.

Technical Debt Is Sometimes the Right Call

The goal isn't zero technical debt. Sometimes shipping fast with known shortcuts is the correct decision:

Validating a prototype before building it "right"
Meeting a hard deadline with a documented workaround
Choosing simplicity now when requirements are uncertain

The danger isn't technical debt. It's invisible technical debt.

The checkpoint process ensures that when debt is taken on:

It's a conscious decision, not an accident
It gets documented—in code comments, commit messages, or tickets
It stays visible—so it can be paid down later
Someone owns it—not "we'll fix it someday"

The codebase that killed our productivity wasn't full of bad decisions. It was full of forgotten decisions—shortcuts that became permanent because nobody remembered they were shortcuts.

And when the answer is "yes, this is debt, and we're taking it on intentionally"—that's fine. Claude documents it, creates the ticket, and moves on. That ticket becomes the durable reminder that the shortcut was intentional, not forgotten. The debt is visible. That's the win.

The Meta Question

Before diving into specifics, we start with one question that cuts through everything:

"If I had to hand this codebase to a new developer tomorrow, would they understand it without me explaining anything?"

If the answer is "no," stop and fix it before moving forward.

This single question catches almost every form of invisible debt:

Unclear naming
Missing documentation
Clever-but-confusing code
Implicit assumptions
Tribal knowledge that isn't written down

When working with AI collaborators like Claude, this question becomes even more important—every new session is essentially a "new developer" with no memory of previous context.

The Checkpoint Checklist

This isn’t about perfection. It’s about visibility. If a box is unchecked, you either fix it or consciously accept the tradeoff.

Here's what we review at every PR and significant commit:

Architecture & Design

[ ] Single Responsibility: Does each file/function do ONE thing well?
[ ] Separation of Concerns: Is business logic separate from UI? Data fetching separate from display?
[ ] DRY (Don't Repeat Yourself): Are we repeating code that should be abstracted?
[ ] YAGNI (You Ain't Gonna Need It): Are we over-engineering for hypothetical future needs?
[ ] Configuration over Hardcoding: Is this hardcoded when it should come from config/database?

Code Quality

[ ] No Magic Numbers/Strings: Are constants named and defined somewhere sensible?
[ ] No Inline Styles (frontend): Using the theme/design system?
[ ] Localization Ready: No hardcoded user-facing strings?
[ ] Type Safety: No any types sneaking in? Strict mode enabled?
[ ] Naming Clarity: Would someone understand what handleClick or processData does without reading the implementation?

Error Handling & Edge Cases

[ ] Graceful Failures: What happens when this fails? Does the user see a helpful message or a crash?
[ ] Null/Empty Handling: What if the data is null? Empty array? Undefined?
[ ] Network Failures: What if the API is down? Timeout? 500 error?
[ ] Loading States: Does the user know something is happening?
[ ] Boundary Conditions: First item? Last item? Zero items? Max items?

Testing & Reliability

[ ] Testable: Is this structured so it CAN be tested?
[ ] Tested: Did we actually write tests?
[ ] Test Coverage: Are the important paths covered, not just the happy path?
[ ] Regression Prevention: If this broke before, is there a test to prevent it breaking again?

Performance & Security

[ ] No Secrets in Code: All keys/tokens in config files or environment variables?
[ ] No Blocking Operations: Long operations are async?
[ ] Memory Leaks: Cleaning up subscriptions, listeners, timers?
[ ] N+1 Queries: Database calls in loops?
[ ] Bundle Size (frontend): Adding a huge dependency for a small feature?

Maintainability & Documentation

[ ] Self-Documenting: Is the code clear enough that comments aren't needed?
[ ] Necessary Comments: Complex logic explained? "Why" not just "what"?
[ ] Consistent Patterns: Does this follow the patterns established elsewhere in the codebase?
[ ] Updated Docs: If this changes behavior, are docs/READMEs updated?

Project-Specific Questions

The generic checklist is a starting point. Add questions specific to your project type:

White-Label / Multi-Tenant Projects

[ ] Multi-Brand Ready: Will this work for Brand #2, Brand #3?
[ ] Platform Agnostic: Works on iOS AND Android?
[ ] Config-Driven: Is brand-specific behavior coming from config, not code?

API Projects

[ ] Backward Compatible: Will existing clients break?
[ ] Documented Endpoint: Is the API contract documented?
[ ] Rate Limited: Should this endpoint have rate limiting?

Admin/Dashboard Projects

[ ] Permission Checked: Is this action properly authorized?
[ ] Audit Logged: Should this action be logged for audit purposes?

Database Operations

[ ] Migration Reversible: Can this migration be rolled back?
[ ] Index Considered: Will this query benefit from an index?
[ ] Data Backfill: Does existing data need updating?

The Accountability Question

At the end of every work session, we ask one more question:

"What did we build today that we might regret in 6 months?"

If something comes to mind, either fix it now or create a ticket to address it. Don't let it become invisible.

This question has saved us multiple times. It surfaces the "I know this is a little hacky but..." moments that otherwise get buried under the pressure to ship.

Why This Works

1. Problems Are Caught in Hours, Not Days

A weekly retrospective means a bad pattern can spread for 5 days before anyone questions it. Checkpoint questions catch it on the first commit.

2. Context Is Fresh

When you ask "why did we do it this way?" at the end of a sprint, you're relying on memory. When you ask during the PR, the reasoning is still in your head.

3. It's Lightweight

This isn't a meeting. It's a mental checklist that takes 30 seconds for small commits and 5 minutes for significant PRs. The overhead is minimal compared to the cost of technical debt.

4. It Scales with AI Collaboration

When working with AI tools, every session starts fresh. The checkpoint questions help ensure that what gets built in one session doesn't create confusion in the next.

5. It's Preventive, Not Reactive

Traditional retrospectives ask "what went wrong?" Instant retrospectives ask "what could go wrong if we don't address this now?"

Real Example: The Greenfield Rebuild

We used this approach while rebuilding a React Native app from scratch. The old codebase had:

Inline styles scattered across every component
State management that nobody fully understood
"Temporary" workarounds that had been there for years
Tests that existed but didn't catch real bugs

For the new codebase, we committed to checkpoint questions from day one:

Day 1: Set up strict TypeScript, ESLint with no-inline-styles: error, and path aliases. The linter would enforce some checklist items automatically.

Every PR: Asked the meta question. If the answer was "a new developer would be confused by this," we refactored before merging.

Every session end: Created a session handoff document capturing what was built and any concerns.

Result: After several weeks of development, the codebase remained navigable. A fresh Claude session could understand the architecture without lengthy explanations. The patterns established early continued to hold.

We've since added features without breaking established patterns, and new Claude sessions consistently require minutes—not hours—to become productive. The same metric we used to validate the documentation system now validates the quality process.

Proof of Concept: An Actual Retro

Here's a condensed transcript of Claude actually running an instant retrospective on new screens. This isn't theoretical — it's real output. And this is exactly what changed our behavior.

Step 1: Claude reads the documentation first

● Read(~\AndroidStudioProjects\littletalks-docs\CLAUDE.md)
  ⎿  Read 216 lines

● Read(~\AndroidStudioProjects\littletalks-docs\shared\checkpoint-checklist.md)
  ⎿  Read 110 lines

● Good - now let me properly evaluate our code against the checklist.

Step 2: Systematic evaluation against the checklist

Code Quality - PROBLEMS FOUND
┌──────────────────────────┬────────┬──────────────────────────────┬─────────────────────────────────┐
│          Check           │ Status │           Location           │             Issue               │
├──────────────────────────┼────────┼──────────────────────────────┼─────────────────────────────────┤
│ No Magic Numbers/Strings │ ❌     │ LoginScreen.tsx:68-73        │ Hardcoded #F5F5F5, #333, #999   │
├──────────────────────────┼────────┼──────────────────────────────┼─────────────────────────────────┤
│ No Inline Styles         │ ❌     │ LoginScreen.tsx:67-73        │ Raw CSS in TextInput            │
├──────────────────────────┼────────┼──────────────────────────────┼─────────────────────────────────┤
│ Localization Ready       │ ❌     │ HomeScreen.tsx:71, 81-83     │ Hardcoded English strings       │
└──────────────────────────┴────────┴──────────────────────────────┴─────────────────────────────────┘

Step 3: The accountability question

"What did we build today that we might regret in 6 months?"

Answer: The inline TextInput styles. If we don't fix this now, every new
screen with inputs will copy-paste those styles, and we'll have 20 places
to update when we change the input design.

Step 4: Quality gate

Want me to fix these violations before we move forward?

Step 5: Self-improvement

When asked if the process could be better, Claude suggested additions to the checklist itself:

Suggestion: Pre-Flight Checklist (Before Building a Screen)
- [ ] UI Primitives: What components does this need? Do styled versions exist?
- [ ] Translation Keys: Are they in the localization files?
- [ ] Similar Screen: Is there an existing screen to reference for patterns?

Suggestion: Red Flags (Stop Immediately If You See These)
| Red Flag         | What To Do Instead              |
|------------------|--------------------------------|
| `style={{`       | Create/use a styled component   |
| `#RRGGBB`        | Use theme token                 |
| Quoted English   | Use `t('KEY')`                  |

What this demonstrates:

Claude initiated the review — not prompted to run a retro
Read the docs first — used the checklist, not improvised
Specific, actionable findings — line numbers, not vague concerns
Blocked progress — offered to fix before continuing
Self-improving — suggested enhancements to the checklist itself

The feedback loop closed: Those Pre-Flight and Red Flags suggestions? They're now in the actual checklist. The system improved itself. The changelog documents why: "Added after instant retro caught inline styles and hardcoded strings that slipped through."

This is what "AI as a first-class engineering role" looks like in practice.

Common Objections

"This will slow us down"

The checklist takes 30 seconds to scan mentally for small changes. The time saved by not accumulating technical debt far exceeds this cost. Ask anyone who's spent a week trying to upgrade a framework in a messy codebase.

"We already do code review"

Code review typically focuses on "does this work?" and "is there an obvious bug?" The checkpoint questions focus on "will we regret this?" and "can someone else understand this?" They're complementary.

"Some of these don't apply to my project"

Remove what doesn't apply. Add what does. The checklist is a starting point, not a mandate. The meta question is the only universal requirement.

"My team won't adopt this"

Start alone. If it makes your code better, others will notice. If it doesn't, stop doing it. The approach proves itself through results, not mandates.

How to Start

Copy the checklist into your project's documentation
Customize it for your project type and tech stack
Ask the meta question at your next PR: "Would a new developer understand this?"
Add the accountability question to your end-of-day routine
Iterate based on what patterns you catch

You don't need buy-in from your whole team. You don't need a new tool. You just need to ask better questions at natural stopping points.

Bonus: Automate It with Hooks

If you read Part 3 (Automating Everything with Hooks), you might be wondering: can we automate instant retrospectives too?

Yes. The same hooks that auto-generate session handoffs can trigger retrospective prompts:

"SessionEnd": [
  {
    "hooks": [
      {"type": "command", "command": "python ~/.claude/hooks/session_handoff.py"},
      {"type": "command", "command": "python ~/.claude/hooks/retro_prompt.py"}
    ]
  }
]

The retro_prompt.py hook could:

Output the accountability question to stdout
Remind Claude to run the checkpoint checklist
Create a Jira ticket for any debt identified

This turns the instant retrospective from "Claude should do this" to "Claude will automatically do this." The hook system from Part 3 makes the quality system from Part 4 inevitable rather than aspirational.

The Core Insight

Retrospectives are too late. Quality is built into the rhythm of development, not bolted on at the end.

The best time to ask "should we do this differently?" is while you're still doing it.

This approach emerged from a greenfield rebuild where we wanted to prevent the technical debt accumulation that had made the previous codebase unmaintainable. It's now part of our Mother CLAUDE documentation system and gets referenced at every checkpoint across all projects.

The checklist is open. The approach is portable. The goal is simple: catch problems before they become patterns.

This article describes how we assigned quality enforcement to an AI assistant as a first-class engineering role—not a passive tool, but an active participant in maintaining code quality.

This article was written collaboratively with Claude, using the instant retrospective process it describes.

Resources

The checkpoint checklist and Mother CLAUDE system are open source:

GitHub: github.com/Kobumura/mother-claude
Checklist: templates/checkpoint-checklist.md

Feel free to fork it, adapt it, or use it as a reference for your own implementation.

Licensed under CC BY 4.0. Free to use and adapt with attribution to Dorothy J. Aubrey.

Mother CLAUDE: Automating Everything with Hooks

Dorothy J Aubrey — Tue, 27 Jan 2026 19:18:22 +0000

TL;DR: We built three hooks that automate the Mother CLAUDE workflow: (1) auto-generate session handoffs on context compaction, (2) load previous handoffs at session start, and (3) auto-approve safe operations. Manual discipline becomes invisible infrastructure.

Who this is for: Anyone who wants AI memory that doesn't depend on human discipline. Anyone tired of clicking "yes" to approve safe operations.

Part 3 of the Designing AI Teammates series. Part 1 covered documentation structure. Part 2 covered why session handoffs matter. This one covers automating the workflow with hooks. Part 4 will cover quality checkpoints.

The Problem

Session handoffs work. We covered that in Part 2. The template captures:

What was accomplished
Key decisions made
Files modified
Next steps
Open questions

The problem? Humans forget to create them.

You finish a productive session, you're in the flow, you close the terminal... and the handoff doesn't get written. The next session starts cold. All that context—gone.

We needed handoffs to happen automatically, without relying on human discipline at the end of a long session.

The Solution: Claude Code Hooks

Claude Code has a hooks system that runs scripts at specific events. Two events matter for us:

Hook	When It Fires	Why It Matters
PreCompact	Before context compression	Captures everything before detail is lost
SessionEnd	When you close the session	Captures final state

The hook receives the full conversation transcript. We can parse it, send it to Claude Haiku for summarization, and save a structured handoff document—all automatically.

The Architecture

┌─────────────────────────────────────────────────────┐
│                  Claude Code Session                 │
│                                                     │
│  [Working...]  [Working...]  [Context getting full] │
│                                                     │
│                         │                           │
│                         ▼                           │
│              ┌──────────────────┐                   │
│              │  PreCompact Hook │                   │
│              └────────┬─────────┘                   │
│                       │                             │
└───────────────────────┼─────────────────────────────┘
                        │
                        ▼
              ┌──────────────────┐
              │ session_handoff  │
              │     .py          │
              │                  │
              │ 1. Read transcript
              │ 2. Parse conversation
              │ 3. Call Claude Haiku
              │ 4. Save handoff.md
              └────────┬─────────┘
                       │
                       ▼
         ┌─────────────────────────────┐
         │  docs/session_handoffs/      │
         │  20260121-1145-hooks-setup.md│
         └─────────────────────────────┘

The Implementation

Step 1: The Hook Script

A Python script that:

Receives hook input via stdin (includes transcript path)
Parses the JSONL transcript into readable conversation
Sends it to Claude Haiku with a structured prompt
Extracts a descriptive title for the filename
Saves to the project's session_handoffs/ directory

#!/usr/bin/env python3
"""
Mother CLAUDE Session Handoff Generator

Runs on PreCompact (auto) and SessionEnd events to automatically
generate session handoff documents using Claude Haiku.
"""

import json
import os
import sys
from datetime import datetime
from pathlib import Path
import anthropic

def parse_transcript(transcript_path: str) -> str:
    """Parse JSONL transcript into readable conversation."""
    messages = []
    with open(transcript_path, 'r', encoding='utf-8') as f:
        for line in f:
            entry = json.loads(line)
            if entry.get("type") == "human":
                content = extract_text(entry)
                messages.append(f"USER: {content[:3000]}")
            elif entry.get("type") == "assistant":
                content = extract_text(entry)
                messages.append(f"ASSISTANT: {content[:3000]}")

    # Return last 80 messages for context
    return "\n\n---\n\n".join(messages[-80:])

def generate_handoff(conversation: str, cwd: str, api_key: str):
    """Use Claude Haiku to generate a session handoff."""
    client = anthropic.Anthropic(api_key=api_key)

    response = client.messages.create(
        model="claude-3-haiku-20240307",
        max_tokens=4000,
        messages=[{
            "role": "user",
            "content": HANDOFF_PROMPT.format(
                conversation=conversation,
                project=Path(cwd).name,
                date=datetime.now().strftime("%Y-%m-%d")
            )
        }]
    )

    return response.content[0].text

Step 2: The Prompt Template

The prompt asks for a comprehensive handoff matching our template:

HANDOFF_TEMPLATE = """
Generate a session handoff document with these sections:

SHORT_TITLE: [2-4 words, hyphenated, for filename]

# Session Handoff - [Descriptive Title]

**Date**: {date}
**Focus**: [Main focus of this session]
**Status**: [Current state of the work]

## Quick Context
**What's Working:** [Specific things that are functional]
**What Needs Attention:** [Issues, blockers, pending decisions]

## Completed This Session
### [Feature/Task Name]
**Files Created/Modified:**
- `path/to/file.ext` - [What was done]

**Details:** [Technical specifics]

## Technical Discoveries
- **[Topic]**: [What was learned]

## Files Changed This Session
### New Files
### Modified Files

## Next Steps
1. [ ] [Actionable task]

## Open Questions
- [Unresolved decisions]
"""

Step 3: Hook Configuration

In ~/.claude/settings.json:

{
  "hooks": {
    "PreCompact": [
      {
        "matcher": "auto",
        "hooks": [
          {
            "type": "command",
            "command": "python ~/.claude/hooks/session_handoff.py",
            "timeout": 120
          }
        ]
      }
    ],
    "SessionEnd": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "python ~/.claude/hooks/session_handoff.py",
            "timeout": 120
          }
        ]
      }
    ]
  }
}

Step 4: API Key Setup

Store your Anthropic API key as an environment variable:

# Linux/Mac
export ANTHROPIC_API_KEY_HOOKS="sk-ant-..."

# Windows PowerShell
[System.Environment]::SetEnvironmentVariable(
    "ANTHROPIC_API_KEY_HOOKS",
    "sk-ant-...",
    "User"
)

Using a separate key (ANTHROPIC_API_KEY_HOOKS) lets you track usage for automated handoffs separately from your main Claude Code usage.

The Result

Before: Manual Handoffs

End of session:
- Forget to create handoff (60% of the time)
- Create hasty handoff (30% of the time)
- Create thorough handoff (10% of the time)

Next session:
- Spend 10-15 minutes re-establishing context
- Miss important details from previous session
- Repeat work already done

After: Automatic Handoffs

End of session:
- Hook fires automatically
- Handoff created in ~30 seconds
- Saved to docs/session_handoffs/

Next session:
- Read most recent handoff
- Productive in 2-3 minutes
- Full context preserved

Example Output

Here's an actual auto-generated handoff:

# Session Handoff - Implementing Automated Session Handoffs

**Date**: 2026-01-21
**Focus**: Integrating Claude hooks to automatically generate session handoffs
**Status**: Feature complete, ready for testing

---

## Quick Context

**What's Working:**
- Claude hook script written in Python to summarize sessions
- Hooks configured to trigger on auto-compact and session end
- Handoff files being generated in the expected location

**What Needs Attention:**
- API key needs to be regenerated (was exposed during testing)
- Template may need refinement based on real-world usage

---

## Completed This Session

### Implement Hook-Driven Session Handoffs
**Files Created/Modified:**
- `~/.claude/hooks/session_handoff.py` - Python script to generate handoffs
- `~/.claude/settings.json` - Hook configuration

**Details:**
- Wrote a Python script using the Anthropic API to summarize transcripts
- Configured two hook triggers: PreCompact (auto) and SessionEnd
- Script auto-detects working directory to determine project context
- Handoff files saved to project's `docs/session_handoffs/`

---

## Technical Discoveries

- **Environment Variables**: Using separate API key for hooks allows
  tracking automated usage separately from interactive sessions
- **Hook Timing**: PreCompact fires BEFORE compression, so full context
  is still available for summarization

---

## Files Changed This Session

### New Files
- `~/.claude/hooks/session_handoff.py` - Handoff generation script

### Modified Files
- `~/.claude/settings.json` - Added hook configuration

---

## Next Steps

1. [ ] Regenerate API key (current one exposed in conversation)
2. [ ] Test on different projects to verify directory detection
3. [ ] Consider adding git commit info to handoffs

---

## Open Questions

- Should handoffs include actual code snippets from the session?
- How to handle very long sessions that exceed Haiku's context window?

The Meta Moment: Using Claude to Summarize Claude

There's something delightfully recursive about this setup:

You work with Claude Code (Claude Opus/Sonnet)
Session ends or context fills
Hook sends transcript to Claude Haiku
Haiku summarizes what Opus/Sonnet did
Summary helps the next Opus/Sonnet session

Claude is documenting its own work for its future self.

This isn't just automation—it's AI infrastructure supporting AI collaboration.

Why Two Triggers? (And the Bug We Found)

Our first implementation used two triggers:

Trigger	When	Why
PreCompact	Context about to compress	Capture everything before detail is lost
SessionEnd	Session closes	Ensure final state is captured

Simple, right? Both triggers run the same script. What could go wrong?

The Problem We Discovered

A reader (okay, it was us during testing) noticed a flaw:

PreCompact fires → Rich, detailed handoff saved
Context compresses → Detail lost
SessionEnd fires → Second handoff saved (thinner, post-compression)
Next session starts → SessionStart loads the most recent handoff
Oops → You loaded the thin SessionEnd handoff, not the rich PreCompact one

The good handoff got buried by the thin one.

The Fix: Smart Deduplication

We updated the script to track state:

# PreCompact: save marker with transcript size
save_handoff_state(session_id, transcript_size)

# SessionEnd: check if PreCompact already ran
if should_skip_handoff(session_id, current_size, trigger):
    cleanup_state(session_id)
    sys.exit(0)  # Skip - no new work since PreCompact

The logic:

PreCompact always generates a handoff and saves the transcript size
SessionEnd checks: did PreCompact already run? Did the transcript grow significantly (>10%)?
- If no significant new work → skip
- If new work happened after compact → generate new handoff

This handles all scenarios:

Compact then close immediately → SessionEnd skips (PreCompact got it)
Compact, more work, then close → SessionEnd generates (new work to capture)
Short session, just close → SessionEnd generates (only trigger)

The lesson: test your hooks end-to-end. The obvious solution (two triggers, same script) had a subtle interaction problem.

Handling Large Sessions

Haiku has a smaller context window than Opus. For very long sessions:

The script takes the last 80 messages of the conversation
Each message is truncated to 3,000 characters
This keeps the most recent (and usually most relevant) context

For most sessions, this captures everything that matters. The early parts of very long sessions (initial exploration, early false starts) are often less valuable than the recent work anyway.

Cost Considerations

Claude Haiku is cheap. A typical handoff:

Input: ~50,000 tokens (conversation)
Output: ~1,500 tokens (handoff)
Cost: ~$0.02-0.03 per handoff

Even with multiple sessions per day, the monthly cost is negligible compared to the value of preserved context.

Pro tip: Create a separate API key for hooks. The Anthropic console shows usage per key, so you can track exactly how much your automated handoffs cost.

Choosing a Model

As of this writing, we use Claude Haiku (claude-3-haiku-20240307) for handoff generation. It's cheap, fast, and good enough for structured summarization.

Model	Cost per Handoff	Speed	When to Use
Haiku	~$0.02	Fast	Default choice - summarization doesn't need genius
Sonnet	~$0.15	Medium	If you want richer, more nuanced summaries
Opus	~$1+	Slower	Overkill for handoffs - save it for real work

To change models, edit session_handoff.py:

# Default
model="claude-3-haiku-20240307"

# For richer summaries (costs more)
model="claude-sonnet-4-20250514"

Anthropic releases new models regularly. Check docs.anthropic.com for the latest model IDs. The hook script in our repo uses Haiku, but swap in whatever model suits your needs and budget.

Extending the System

Add Git Info

import subprocess

def get_recent_commits():
    result = subprocess.run(
        ["git", "log", "--oneline", "-5"],
        capture_output=True, text=True
    )
    return result.stdout

Add to Multiple Projects

The script auto-detects the project from cwd and finds the right session_handoffs/ directory. It works across all your projects without configuration.

Customize Per Project

If you need project-specific handoff formats, add a .claude/handoff-config.json to any project:

{
  "template": "custom",
  "include_git": true,
  "extra_sections": ["Jira Tickets", "API Changes"]
}

Known Limitations

Can't Prevent Compaction

PreCompact hooks can run side effects but can't stop compaction. The handoff happens, then compaction proceeds. This is fine—we just want to capture context, not block the workflow.

Timeout

Hooks have a 60-second default timeout (we set 120). Very long transcripts might need more time. The script handles timeouts gracefully—a failed handoff is better than a blocked session.

API Dependency

Requires an Anthropic API key and internet connection. If the API is down, the hook fails silently. Consider adding local fallback summarization for offline work.

How This Works with Agents

If you're using Claude Code's Task tool to spawn subagents (Explore, Plan, etc.), here's what you need to know:

Event	Hook Fires?	Why
Main session starts	✅ SessionStart	Loads previous handoff
Subagent spawned	❌ None	Subagents are subprocesses, not new sessions
Subagent finishes	❌ None*	Results return to main session
Main session ends	✅ SessionEnd	Generates handoff including subagent work

This is actually the right behavior:

The main session is the orchestrator—it has context and makes decisions
Subagents are workers—they do specific tasks and report back
The main session's handoff captures everything, including what subagents accomplished

You don't need separate handoffs for subagents because their work flows back to the main session. When the main session's handoff gets generated, it includes the full picture.

*Claude Code does have a SubagentStop hook if you want to capture subagent completions separately, but for most workflows the main session handoff is sufficient.

Bonus Hook #1: Auto-Load Previous Handoffs

Writing handoffs is half the equation. The other half: making sure Claude reads them.

The SessionStart Hook

When you start a new Claude Code session, the SessionStart hook fires. We use it to automatically load the most recent handoff:

# session_start.py - outputs previous handoff to stdout
handoffs = sorted(glob("docs/session_handoffs/*.md"))
if handoffs:
    print(f"Previous handoff: {handoffs[-1].name}")
    print(open(handoffs[-1]).read())

Claude sees this output as context at the start of every session. No manual loading required.

Configuration:

Add to ~/.claude/settings.json:

"SessionStart": [
  {
    "hooks": [
      {
        "type": "command",
        "command": "python ~/.claude/hooks/session_start.py",
        "timeout": 10
      }
    ]
  }
]

Result: Every new session starts with context from the previous one. The handoffs we auto-generate get auto-loaded. The circle is complete.

Bonus Hook #2: Auto-Approve Safe Operations

If you've used Claude Code for more than an hour, you've clicked "yes" to approve git status approximately 47 times.

The Permission Fatigue Problem

Claude Code asks permission for potentially dangerous operations. This is good! But it also asks for:

git status
ls
npm test
Reading files
And dozens of other safe operations

The PermissionRequest Hook

This hook intercepts permission requests and auto-approves safe ones:

# auto_approve.py
always_safe = ["Read", "Glob", "Grep", "Write", "Edit"]

safe_bash = [
    r'^git\s+(status|log|diff|add|commit|push|pull)',
    r'^npm\s+(test|run|install)',
    r'^ls(\s|$)',
]

dangerous = [
    r'\bsudo\b',
    r'git\s+push\s+.*--force',
    r'git\s+reset\s+--hard',
]

What gets auto-approved:

All file operations (Read, Write, Edit)
Git operations (add, commit, push, pull)
Package management (npm install, pip install)
Build/test commands

What still requires approval:

sudo anything
git push --force
git reset --hard
Destructive operations

Configuration:

"PermissionRequest": [
  {
    "matcher": "*",
    "hooks": [
      {
        "type": "command",
        "command": "python ~/.claude/hooks/auto_approve.py",
        "timeout": 5
      }
    ]
  }
]

Result: Claude flows. No more clicking "yes" fifty times a session. Dangerous operations still get caught.

Project-Specific Configuration

Different projects might want different settings. Create .claude/project.json in any project root:

{
  "handoffs_path": "docs/session_handoffs",
  "handoffs_to_load": 2
}

Setting	Default	Description
`handoffs_path`	`docs/session_handoffs`	Where to read/write handoffs
`handoffs_to_load`	`1`	How many previous handoffs to load

Both the session_handoff.py and session_start.py scripts check for this config.

The Complete Hook Suite

Here's what we built:

Hook	Trigger	What It Does
session_handoff.py	PreCompact, SessionEnd	Auto-generates handoff documents
session_start.py	SessionStart	Loads previous handoff into context
auto_approve.py	PermissionRequest	Auto-approves safe operations

The full settings.json:

{
  "hooks": {
    "SessionStart": [{"hooks": [{"type": "command", "command": "python ~/.claude/hooks/session_start.py"}]}],
    "PreCompact": [{"matcher": "auto", "hooks": [{"type": "command", "command": "python ~/.claude/hooks/session_handoff.py"}]}],
    "SessionEnd": [{"hooks": [{"type": "command", "command": "python ~/.claude/hooks/session_handoff.py"}]}],
    "PermissionRequest": [{"matcher": "*", "hooks": [{"type": "command", "command": "python ~/.claude/hooks/auto_approve.py"}]}]
  }
}

The Setup Checklist

[ ] Install Python 3.8+ and the anthropic package (pip install anthropic)
[ ] Copy hook scripts to ~/.claude/hooks/
[ ] Configure hooks in ~/.claude/settings.json
[ ] Set API key as environment variable
[ ] Restart terminal (hooks load at session start)
[ ] Verify by starting a new session and checking for previous handoff

Prefer Node.js?

We've included a smoke-tested Node.js version (session_handoff.js) in the repo. If you prefer Node:

Install: npm install @anthropic-ai/sdk
Update settings.json to use node instead of python

The Python version is our primary, daily-driver implementation. Only the main handoff script has a Node version—the simpler scripts (session_start.py, auto_approve.py) are Python-only but straightforward to port.

Contributions welcome! If you create Node versions of the other scripts, find bugs, improve the templates, or have ideas—we'd love PRs. This is a community resource.

The Core Insight

The best documentation is documentation that writes itself.

Session handoffs are valuable. But relying on human discipline at the end of a long session is a recipe for forgotten context.

By automating handoffs:

Memory persists without effort
Every session is documented
Context survives across sessions, machines, and time

The goal isn't to replace human judgment. It's to ensure the conversation happens.

This system was built in a single afternoon over lunch—and documented by the very hook system it describes. The handoff you're reading about was generated by the hook that creates handoffs. That's the feedback loop working as intended.

This article was written collaboratively with Claude, using the automated handoff system it describes.

Resources

The complete hook suite and Mother CLAUDE system are open source:

GitHub: github.com/Kobumura/mother-claude
All hooks: hooks/ directory
Settings template: hooks/settings-template.json
Setup guide: hooks/README.md

Feel free to fork it, adapt it, or use it as a reference for your own implementation.

Licensed under CC BY 4.0. Free to use and adapt with attribution to Dorothy J. Aubrey.

Mother CLAUDE: Session Handoffs Give Your AI Memory That Actually Persists

Dorothy J Aubrey — Mon, 19 Jan 2026 14:38:13 +0000

TL;DR: LLM sessions are ephemeral — when context fills up or you close the window, everything learned is gone. Session handoffs are structured documents that bridge this gap, giving your AI assistant persistent memory that survives across sessions, tools, and team changes.

Who this is for: Anyone using AI coding assistants who's frustrated by re-explaining context every session. Works with Claude, Kiro, Cursor, or whatever comes next.

Part 2 of the Designing AI Teammates series. Part 1 covered onboarding an AI assistant using durable documentation. This article covers the next step: making that knowledge persist across the inevitable session boundaries.

The Problem

You've had a productive 3-hour session with your AI assistant. You've:

Debugged a tricky race condition
Decided on an architectural approach after discussing tradeoffs
Learned that a particular library has quirks the AI now understands
Built shared context about why certain code looks the way it does

Then the session ends. Maybe you hit the context limit. Maybe you closed your laptop. Maybe it's just tomorrow.

Everything is gone.

The next session starts fresh. The AI doesn't remember the race condition, the architectural decision, the library quirks, or the context you built. You're back to explaining basics.

This is the fundamental limitation of LLM sessions: they're stateless by design.

What About Built-In Memory Features?

Modern AI tools are adding memory and context management features:

Claude Code's /compact command summarizes the conversation to free up context space. It helps you work longer within a single session.

Built-in AI Memory (in various tools) persists facts across conversations. "Remember that I prefer TypeScript" carries forward.

Cursor and similar tools maintain some conversation history and context awareness.

These help. But they share limitations:

Feature	What it solves	What it doesn't solve
Compacting	Extends single session	Doesn't survive session end
AI Memory	Persists preferences	Opaque — you can't see/search what's stored
Tool history	Recent conversation access	Locked to that specific tool

The core problem remains: Your hard-won context is locked in a black box you don't control, can't search, and can't take with you.

What happens when you switch from Claude to Kiro? When you upgrade accounts? When a teammate needs to pick up where you left off?

The institutional memory disappears.

The Solution: Session Handoffs

Session handoffs are structured documents that capture session context in a format that's:

Human-readable — You can review what was captured
AI-readable — The next session can load and understand it
Searchable — Grep works. Find that decision from 3 weeks ago.
Tool-agnostic — Works with any AI assistant, current or future
Version-controlled — Lives in git with your code

At the end of each work session, the AI creates a handoff document. At the start of the next session, the AI reads the most recent handoff. Context is restored in seconds.

What Gets Captured

A session handoff includes:

What Was Accomplished

Completed tasks and decisions made. Not just "fixed bug" but "fixed race condition in WebSocket reconnection by adding a mutex lock — decided against debouncing because we need guaranteed delivery."

Current State

Where things stand right now. Any running processes, partial implementations, or work in progress. "Auth flow is 80% complete — login works, logout not yet implemented, token refresh untested."

Lessons Learned

What worked, what didn't, mistakes to avoid. "Tried using library X for date parsing — it doesn't handle timezones correctly. Switched to library Y." This prevents the next session from making the same mistakes.

Next Steps

What needs to happen next. Clear, actionable items. "1. Implement logout endpoint. 2. Add token refresh logic. 3. Write tests for auth flow."

Key Files Modified

Quick reference for the next session. The AI can read these first to rebuild context efficiently.

Blockers

Unresolved problems that need attention. "Waiting on API credentials from third-party vendor. Can't test payment flow until received."

When to Create a Handoff

Approaching context limits: Claude Code will notify you when context is filling up. This is your cue — capture context before it's compacted or lost.

Natural stopping points: Completed a feature? Reached a milestone? Good time to checkpoint.

Before switching tasks: About to pivot from backend work to frontend? Capture where you left off so you can return cleanly.

End of work session: Closing your laptop for the day? Future-you (or future-AI) will thank you.

Before major decisions: About to choose between two architectural approaches? Document the tradeoffs you've discussed so you don't re-debate them next session.

The Workflow

Ending a Session

When it's time to wrap up:

AI creates the handoff — Based on the session's work, Claude drafts the document
Human reviews — Quick sanity check that key points are captured
File gets saved — To docs/session_handoffs/YYYYMMDD-HHMM-brief-description.md
Committed to git — Now it's versioned and backed up

Starting a New Session

AI checks for handoffs — Looks in docs/session_handoffs/ for the most recent
Reads and acknowledges — "I see from the last session that you completed X and next steps are Y"
Work continues — No re-explanation needed

The transition takes seconds instead of the 10-15 minutes of re-establishing context manually.

Short-Term Benefits

Immediate context restoration. No more "let me explain the codebase structure again." The handoff contains it.

Continuity on complex tasks. Multi-day features don't lose momentum. Each session picks up exactly where the last ended.

Reduced cognitive load. You don't have to remember everything to tell the AI. The handoff remembers.

Better compacting outcomes. When Claude does compact mid-session, the handoff supplements what might be lost in summarization.

Team handoffs. Someone else needs to pick up your work? Point them to the handoff. Works for human teammates too.

Long-Term Benefits

This is where session handoffs differentiate from built-in memory features:

Searchable Project Archeology

Three weeks from now: "When did we decide to use PostgreSQL instead of MongoDB?"

grep -r "PostgreSQL" docs/session_handoffs/

Found. The handoff from January 15th documents the decision and reasoning.

Built-in AI memory can't do this. You can't grep Cursor's brain.

Decision Documentation

Handoffs capture not just what was done but why. The "Lessons Learned" section preserves reasoning that would otherwise evaporate.

Six months later, someone asks: "Why does this code handle timezones so weirdly?"

The handoff explains: "Library X doesn't handle DST transitions. This workaround was added after 3 hours of debugging. Don't 'simplify' it."

Survives Tool Changes

Using Claude today. Codex 44 releases tomorrow and you want to try it. Your session handoffs work with any tool — they're just markdown files.

Your institutional memory isn't locked in a vendor's black box.

Survives Team Changes

New developer joins the team. New AI assistant starts fresh. The handoffs provide onboarding context for both.

Pattern Recognition

Over months, handoffs reveal patterns:

Which areas of the codebase cause the most trouble?
What decisions keep getting revisited?
Where do sessions tend to get stuck?

This is organizational learning that no individual session could provide.

The Template

# Session Handoff - [Date] [Time]

## What We Accomplished
- [Completed task with context on approach taken]
- [Decision made and why]

## Current State
- [Where the work stands]
- [Any running processes or partial implementations]

## Lessons Learned
- [What worked well]
- [What didn't work / mistakes to avoid]

## Next Steps
- [ ] [Clear, actionable next task]
- [ ] [Another task]

## Key Files Modified
- `path/to/file.ts` - [what changed]
- `path/to/other.ts` - [what changed]

## Blockers / Open Questions
- [Unresolved problems]
- [Questions that need answers before proceeding]

Save to: docs/session_handoffs/YYYYMMDD-HHMM-brief-description.md

Example: docs/session_handoffs/20250112-1430-auth-flow-implementation.md

Making It Automatic

The real power comes when you don't have to remember to do this.

In Part 1, we described how Mother CLAUDE contains instructions that the AI follows automatically. Session handoffs are part of those instructions:

At natural stopping points, initiate a session handoff. Don't wait for the human to ask. You're a team member, not just a tool.

When context is filling up, Claude proactively says: "We're approaching the context limit. Let me create a session handoff before we continue."

The human's job shifts from "remember to create handoffs" to "review the handoff Claude created." Much easier cognitive load.

Handoffs vs. Compacting: Complementary, Not Competing

A common question: "If Claude can compact the conversation, why do I need handoffs?"

They solve different problems:

Compacting is for within-session continuity. It summarizes to free up context space so you can keep working. It's automatic and invisible — you don't control what's kept or lost.

Handoffs are for cross-session continuity. They're explicit documentation of what matters, in a format you control and can search.

Best practice: Use both.

Let compacting extend your session when needed
Create handoffs at meaningful stopping points
The handoff captures what compacting might lose

Common Objections

"This is overhead I don't have time for"

Creating a handoff takes 2-3 minutes. Re-establishing context without one takes 10-15 minutes. Every session.

More importantly: with the right setup, Claude creates the handoff. You just review it.

"I'll just remember where I left off"

You won't. Not the details. Not after a weekend. Not after switching between three projects.

And even if you remember, you have to explain it to the AI. The handoff does that for you.

"My AI tool has memory now"

Great — use it too. But:

Can you search it?
Can you see exactly what it stored?
Will it work if you switch tools?
Can a teammate access it?

Session handoffs answer yes to all of these.

"This is just documentation"

Yes. That's the point.

Documentation that's structured for AI consumption, lives with your code, and survives beyond any single session or tool. That's institutional memory.

Getting Started

Create the directory: mkdir -p docs/session_handoffs
Save the template: Copy the template above to docs/session_handoffs/README.md
Tell your AI: Add to your CLAUDE.md (or equivalent):

   At session end or when approaching context limits, create a handoff
   document in docs/session_handoffs/ using the template in that directory.

Start using it: At the end of your next session, ask: "Let's create a session handoff before we wrap up."
Next session: "Check docs/session_handoffs/ for the most recent handoff and pick up where we left off."

After a few sessions, it becomes automatic. The AI knows to check for handoffs at session start and offer to create them at session end.

The Core Insight

LLM sessions are ephemeral. Your institutional memory shouldn't be.

Built-in memory features help, but they're black boxes locked to specific tools. Session handoffs give you transparent, searchable, portable memory that you control.

The goal isn't to fight the stateless nature of LLM sessions. It's to build infrastructure that works with it — documentation that bridges the gaps and accumulates knowledge over time.

This is Part 2 of the Designing AI Teammates series. Part 1 covered onboarding with Mother CLAUDE. Part 3 will cover giving your AI assistant responsibility for initiating quality checkpoints.

This article was written collaboratively with Claude, using the session handoff process it describes that is embedded in the Mother CLAUDE ecosystem that serves as the scaffolding for all of our projects.

Resources

The session handoff template and Mother CLAUDE system are open source:

GitHub: github.com/Kobumura/mother-claude
Template: templates/session-handoff.md

Licensed under CC BY 4.0. Free to use and adapt with attribution to Dorothy J. Aubrey.

Mother CLAUDE: How We Built a Documentation System That Makes LLMs Productive Immediately

Dorothy J Aubrey — Sat, 10 Jan 2026 12:56:40 +0000

TL;DR: A three-tier documentation system that reduces LLM onboarding from hours to minutes across multi-repo, multi-language projects. First real-world test: Claude went from cold start to planning mode in under 5 minutes.

Who this is for: Teams working across multiple repos or languages who want LLMs to be productive immediately without repeated onboarding or fragile session memory.

While this was built for Claude Code, the architecture applies to any LLM with limited persistent memory—Kiro, Cursor, Copilot, or whatever comes next.

The Problem

LittleTalks is built across multiple projects in different languages:

Project	Language	Purpose
littletalks-mobile	React Native	iOS & Android app
littletalks-api	Node.js/Express	Backend API
littletalks-admin	PHP	Admin dashboard
littlepipes	GitHub Actions	CI/CD platform

These projects share business logic, integrations (RevenueCat, Twilio, analytics), standards, and data contracts. Working with Claude Code across them, we kept hitting the same issues:

Context switching pain - Starting a new session meant re-explaining project structure, conventions, and history
Scattered documentation - Important info spread across READMEs, code comments, and tribal knowledge
Duplication and drift - Same information (Jira setup, commit guidelines) copy-pasted across repos, getting out of sync
Legacy project onboarding - Rewriting old codebases required extensive explanation of "why things were built this way"

The goal: Get Claude productive immediately, whether it's a fresh session or a completely new project.

Before: First session spent explaining Jira setup, repo layout, commit conventions, and why certain patterns look duplicated across projects.

After: First session critiqued the documentation system itself and moved directly into planning.

The Solution: Three-Tier Documentation Architecture

┌─────────────────────────────────────────────────────────┐
│                     CONFLUENCE                          │
│  Business docs, marketing, non-technical                │
│  → Claude fetches on-demand via WebFetch                │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│              Shared Docs Repo (Mother CLAUDE)           │
│  Cross-project standards, deep technical guides         │
│  → Loaded at session start (lean) + on-demand (deep)    │
└─────────────────────────────────────────────────────────┘
                           │
          ┌────────────────┼────────────────┐
          ▼                ▼                ▼
┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐
│ Project CLAUDE.md │ │ Project CLAUDE.md │ │ Project CLAUDE.md │
│ Tech stack,       │ │ Tech stack,       │ │ Tech stack,       │
│ file structure,   │ │ file structure,   │ │ file structure,   │
│ dev commands      │ │ dev commands      │ │ dev commands      │
└───────────────────┘ └───────────────────┘ └───────────────────┘

Tier 1: Mother CLAUDE (Shared Standards)

A dedicated repo (littletalks-docs) containing:

CLAUDE.md (~70 lines) - Jira setup, screenshot locations, commit guidelines, project paths
shared/ - Deep technical docs (journey system, analytics, API contracts)
DOCUMENTATION-ARCHITECTURE.md - Meta-documentation explaining the system itself

Every project's CLAUDE.md starts with:

> **Shared standards**: See `littletalks-docs/CLAUDE.md` for Jira, screenshots, commit guidelines.

Tier 2: Project CLAUDE.md (Lean & Specific)

Each project has its own CLAUDE.md that's:

Lean (<100 lines) - Minimizes context cost
Specific - Tech stack, file structure, dev commands for THIS project
Referenced - Points to Mother CLAUDE for shared standards
Actionable - Claude can start working immediately

Tier 3: On-Demand Deep Docs

Detailed guides live in shared/ and are read only when needed:

Journey system architecture (1,251 lines)
Analytics event tracking
RevenueCat integration
API contracts

This keeps session context lean while making deep knowledge accessible.

Key Design Decisions

1. Single Source of Truth

When we moved docs to the shared repo, we left pointer files in the original locations:

# Document Moved
This document has been moved to `littletalks-docs/shared/journey-system.md`

No duplication = no drift.

2. Security Built In

PHP projects deployed to web servers get .htaccess in their docs/ folder:

Require all denied

This is documented in Mother CLAUDE so new projects know to add it.

3. Session Handoffs

A standardized template (session-handoff-template.md) ensures continuity:

What was accomplished
Current state
Lessons learned
Next steps
Key files modified

Each project has a docs/session_handoffs/ directory for these.

4. Self-Documenting

The DOCUMENTATION-ARCHITECTURE.md file explains the entire system. New Claudes can understand the architecture without human explanation.

Known Failure Modes

This system breaks down if:

Project CLAUDE.md files grow beyond ~100 lines - Context cost defeats the purpose
Deep docs are loaded eagerly instead of on demand - Session bloat, slower responses
Shared standards are duplicated instead of referenced - Drift returns
EVOLUTION.md exists but is left empty - Templates without content don't help
No one asks Claude to critique the system - Feedback loops die

The architecture is intentionally simple. Most failures come from violating the core principle: lean at load, deep on demand.

The Legacy Project Challenge

Our biggest test: preparing a 2011-2020 football pool codebase for a greenfield rewrite.

What We Prepared

football-pool-legacy/ (reference archive):

CLAUDE.md - Business logic locations, DO NOT MODIFY warning
EVOLUTION.md - Historical context (essential for legacy projects):

EVOLUTION.md
├── Data volumes (~users, ~records, ~DB size)
├── Why features were built this way
├── What users complained about
├── What admins found tedious
├── Bugs that required manual fixes
├── Workarounds still in place
└── What not to repeat

For legacy systems, EVOLUTION.md is more important than CLAUDE.md. Without it, Claude can understand what exists but not why it exists.

football/ (greenfield):

CLAUDE.md - Full requirements, phases, success criteria
docs/session_handoffs/ - Ready for continuity
Pre-researched docs (ESPN score feeds)

The Result

First Claude session on the football project:

Feedback on the Prep System: 8/10

The setup was genuinely helpful. CLAUDE.md gave immediate context. Legacy repo access was critical. The empty EVOLUTION.md and lack of "lessons learned" notes were the main gaps.

An 8/10 on first contact, with clear, actionable gaps—exactly the feedback loop we wanted. Claude went from cold start to planning mode in under 5 minutes, proposing database schema and asking clarifying questions about scoring rules. Zero time spent explaining Jira workflows, commit conventions, or project structure.

First-Session Feedback Loop (Proof of Self-Documentation)

The very first question asked in the greenfield project was not about features or architecture, but about the documentation system itself:

Before we start—do you feel like the prep that Mother CLAUDE and the Claude that created your prep docs did a good job? Any suggestions for improvement?

This was intentional. Because Mother CLAUDE is self-documenting—including instructions for setting up new projects and maintaining existing ones—the goal was to validate whether a fresh Claude session could:

Understand the system without human explanation
Critique its effectiveness
Identify gaps worth addressing

Because Mother CLAUDE documents not just project standards but how the documentation system itself works, this question functioned as a real-world test of whether the architecture truly explained itself.

Claude's response (8/10) directly informed the improvements that followed, particularly around EVOLUTION.md content and historical context. The gaps identified became immediate action items, not theoretical improvements.

Design Principle: Every new project should begin by asking Claude to critique the documentation system itself. If Claude cannot evaluate or improve the prep, the system is incomplete.

The Legacy Project Prep Checklist

Based on feedback, we created a checklist for preparing legacy codebases:

Required

[ ] CLAUDE.md with Mother CLAUDE reference
[ ] Path to legacy repo
[ ] Jira project code
[ ] Tech stack (old and new)

Highly Recommended

[ ] EVOLUTION.md with:
- Data volumes (helps migration planning)
- Feature history (why things were built)
- User pain points (prioritization)
- Admin pain points (what to fix)
- Known bugs/workarounds
- Old UI screenshots

Pre-Research

[ ] External API documentation
[ ] Integration requirements
[ ] SQL dumps (empty schema + full data)

Lessons Learned

What Worked

Lean CLAUDE.md files - <100 lines keeps context cost low
Single source of truth - No duplication across repos
On-demand deep docs - Don't load everything at session start
Legacy repo access - Claude examining actual code beats any description
Self-documenting architecture - The system explains itself

What Could Be Better

EVOLUTION.md needs content - Empty templates don't help
Historical context matters - "Why was it built this way?" is valuable
Data volumes help - Claude needs to think about scale
Screenshots calibrate expectations - Even rough ones help

The Meta Insight

Documentation is a product. It needs:

User research (what does Claude need?)
Iteration (improve based on feedback)
Maintenance (keep it current)
Self-documentation (explain how it works)

Evolution: The Checkpoint Checklist (Instant Retrospectives)

After using this system in production across multiple projects, we identified a gap: documentation tells Claude what exists, but not how to maintain quality as it builds.

We added a checkpoint checklist (shared/checkpoint-checklist.md) that gets referenced at every PR and commit. It's built around one meta question:

"If I had to hand this codebase to a new developer tomorrow, would they understand it without me explaining anything?"

The checklist covers:

Architecture & design (single responsibility, DRY, configuration vs hardcoding)
Code quality (no magic numbers, type safety, naming clarity)
Error handling & edge cases
Testing & reliability
Performance & security
Project-specific questions (white-label readiness, API compatibility, etc.)

This emerged from a greenfield rebuild where we wanted to prevent the technical debt accumulation that plagued the original project. We call it the "Instant Retrospective"—quality checkpoints at every natural stopping point, not just at the end of sprints or after incidents. The checklist lives alongside Mother CLAUDE and is referenced whenever Claude proposes or completes non-trivial changes.

The next article in this series covers the Instant Retrospective approach in depth—including how we made Claude responsible for initiating these checkpoints automatically.

Try It Yourself

Create a shared docs repo with Mother CLAUDE
Keep project CLAUDE.md lean (<100 lines)
Move shared docs to single source of truth
Add security (.htaccess for web-deployed PHP)
Create EVOLUTION.md for legacy projects
Get feedback from Claude on first session
Iterate based on what's missing

Why Not Just...?

Why not just use README.md?
READMEs are for humans browsing repos. CLAUDE.md is specifically structured for LLM context windows—lean, actionable, with clear references.

Why not load everything into context at session start?
Context windows are finite and expensive. Loading a 1,251-line journey system doc when you're fixing a CSS bug wastes tokens and slows responses.

Why not rely on ChatGPT/Claude memory features?
Memory features are session-specific and degrade over time. This system persists across sessions, machines, and even different AI tools.

Why not just use wikis/Confluence for everything?
Wikis are great for business docs. But technical docs that Claude reads frequently belong in git—version-controlled, close to code, and loadable without web fetching.

The Core Insight

LLMs don't need more prompts—they need better institutional memory.

The system we built isn't about clever prompting. It's about treating documentation as infrastructure that survives beyond any single session, any single project, and any single AI tool.

Appendix: Our Project Ecosystem

Kobumura (Company Projects):
├── littletalks-mobile      # React Native app
├── littletalks-admin       # PHP admin dashboard
├── littletalks-api         # Node.js backend
├── littlepipes             # CI/CD (PUBLIC repo)
└── littletalks-docs        # Shared docs (Mother CLAUDE)

Personal (Use shared standards):
├── WXING                   # Reference patterns
├── football                # Greenfield rewrite
└── football-pool-legacy    # Archive (2011-2020)

All projects reference the same Mother CLAUDE for consistency, while maintaining project-specific context in their own CLAUDE.md files.

This system was built collaboratively between a human engineer and an AI collaborator—and designed so the collaboration survives beyond any single session. The 8/10 rating from a fresh Claude session validated the approach. The gaps it identified became immediate improvements. That's the feedback loop working as intended.

The architecture is open. The patterns are portable. The goal is simple: stop re-explaining, start building.

This article was written collaboratively with Claude, using the documentation system it describes.

Resources

The Mother CLAUDE documentation system is open source:

GitHub: github.com/Kobumura/mother-claude

Feel free to fork it, adapt it, or use it as a reference for your own implementation.

Licensed under CC BY 4.0. Free to use and adapt with attribution to Dorothy J. Aubrey.