Forem: Amitai Levy

Using Claude Code to Post-Mortem Its Own Mistakes

Amitai Levy — Thu, 05 Mar 2026 10:29:34 +0000

In a previous post, I wrote about building a Docker Compose dev environment using Claude Code — and how it took 15 commits across 10+ sessions because the first session produced everything as one intertwined system instead of incrementally. That post told the story. This one is about the process — how I used Claude Code itself to analyze session logs, identify failure patterns, and turn them into targeted fixes.

The Session Logs Are There

Claude Code stores conversation transcripts as JSONL files in ~/.claude/projects/. Each session is a complete record — every user message, every tool call, every file read and edit. I had months of history across 8 repos, totaling hundreds of sessions.

I wouldn't want to read through all of that manually. But Claude Code can.

Step 1: Gather Evidence

I started a new session and pointed Claude at the session log directory. The prompt was roughly: "look at conversation history to see the many iterations it took to get the dev-environment right. Dive into what went wrong."

It launched parallel agents to:

Walk the full git history (15 commits) and summarize what changed in each
Extract all user messages from the largest session files
Read CLAUDE.md files across all sibling repos to find related patterns

From the session logs, it found concrete evidence: 41 user messages in the main session, at least 7 redirections where I interrupted mid-execution, 5 explicit "stop and simplify" requests. It identified the specific things that were over-engineered (custom shell script, shared Dockerfile, env file layering, port offset arithmetic) and mapped them to the commits where they were eventually removed.

Step 2: Find Cross-Repo Patterns

A single bad session might look like an isolated mistake. So I asked Claude to look across other repos for the same pattern.

It found it. In a different repo, Claude had delivered a full-stack feature — backend schemas, validation endpoints, frontend wizard, form integration — all in one pass. My response in that session: "This is a very big feature. Perhaps we can iterate implementation a bit, leave parts blank or mock then fill them in? It's hard to plan all the details at once." Same problem, different project.

Step 3: Categorize the Failure Modes

Claude categorized the user messages it extracted — redirections, corrections, explicit rejections, "why does this exist?" questions. This turned vague dissatisfaction into a ranked list of specific, recurring failure modes. Not "Claude is too verbose" but things like "when asked to build X, Claude builds the most thorough version of X instead of the simplest."

Step 4: Turn Patterns into Rules

This is where the analysis becomes useful. Each failure mode maps to a specific, implementable fix. The key is to write rules that are concrete behavioral changes, not general principles. "Prefer simplicity" doesn't change behavior. "Propose a sequence of increments, implement the first one, check in before the next" does.

The rules went into my global CLAUDE.md — the config file that Claude Code loads into every conversation. They're untested — I don't know yet if they'll change behavior enough. But they're grounded in specific documented patterns, not vibes.

If You Want to Try This

The ingredients are simple: the session logs exist, Claude Code can read them, and you probably have enough history to surface patterns. The steps generalize:

Point Claude at session logs for a project where you felt like you fought the tool
Ask it to find cross-repo patterns, not just single-session issues
Have it categorize user messages by type (redirections, corrections, rejections)
For each failure mode, design a specific rule or workflow change — not a general principle

The output doesn't have to be CLAUDE.md rules. It could be hook scripts, skill definitions, per-repo instructions, or changes to how you prompt. The point is turning accumulated friction into targeted fixes.

In fact — I just started writing blog posts, and I'm writing them with Claude. These two posts took a fair amount of back-and-forth to get right. After a few more, I'll probably have enough session history to use this exact method to write myself a blog-writing skill.

The Wrong Unit of Work: What Happened When I Built My Dev Environment using Claude Code

Amitai Levy — Tue, 03 Mar 2026 21:11:17 +0000

I use Claude Code daily for real product engineering. I'm building a clinical trials platform at PhaseV, and Claude Code is my primary coding partner across a multi-repo microservices stack.

Recently I asked it to set up a local Docker Compose dev environment with multi-instance support. It took 15 commits across 10+ sessions to get there. Not because the problem was hard, but because the first session produced everything as one intertwined system instead of building up to it incrementally.

What I Asked For

Several backend services, a frontend, and shared infrastructure. Each service in its own repo. I wanted a central repo that orchestrates everything with Docker Compose so a developer can run docker compose up. I mentioned that multi-instance support would be important — being able to run the stack twice for different feature branches.

What I Got

The first session produced everything at once:

A compose.yaml with includes for each service
A shared Dockerfile for all Python services
A custom shell script called pv with subcommands for managing instances
Port offset arithmetic and environment file layering for multi-instance support
Workspace selection logic for choosing which repos to include

I asked for a compose-based dev environment with multi-instance support. The implementation approach came as a package — one intertwined system. Each piece was defensible in isolation. Together they were complex, fragile, and hard to maintain.

The Simplification Marathon

What followed was 10+ sessions of me pulling things back out:

Session 2-3: "Why do we need the pv compose script? Can't we just use docker compose directly?" Claude explained the benefits. I said drop it. Dropped.

Session 4-5: "Why does each service use a shared Dockerfile? Let each repo own its own." Switched to per-repo Dockerfiles with Compose Watch for live reload. Removed the shared Dockerfile, entrypoint-dev.sh, and the pv rebuild command.

Session 6-7: "Why is infrastructure separate from apps? Just put them in the same Compose project." Merged infra into the main compose file. Removed container_name directives, the shared network, and the separate pv infra command.

Session 8: "Why do we need port variables for every service? Just let Docker assign random ports." Removed a dozen port variables from .env. Only the frontend kept a fixed port.

Session 9-10: "Why is .env so big? Most of these values are constants." Moved defaults into each service's compose file. The .env went from ~30 variables to 2.

Final state: A 15-line compose.yaml with includes, a compose.infra.yaml, a 2-line .env.example, and a README. Multi-instance works via Compose project names — no scripts, no port arithmetic.

What Happened

I went back and read the session transcripts — not just for this feature, but across other repos.

The unit of work is wrong. The default behavior is to treat each task as a single delivery — plan, build, present. By the time I saw the full output, the pieces were intertwined and hard to pull apart. If it had delivered a bare compose file first and then added multi-instance support on top, I could have steered the implementation approach early instead of unwinding it after the fact.

What I'm Trying

After the analysis, I added a rule to my global CLAUDE.md — the config file that Claude Code loads into every conversation:

## Lead with increments

For complex features, act as a tech lead managing the process:

1. Propose a sequence of increments — not a detailed plan,
   just the milestones.
2. Implement the first increment autonomously. Build the
   smallest complete version that works. Deliver it.
3. Check in briefly before the next increment. Let the user
   decide if more is needed. Often the first increment is enough.

This is untested. I'm hoping it shifts Claude from "deliver the comprehensive solution" to "deliver the first working version and let the user steer from there." For the dev-environment case, that would have looked like: "Step 1: compose file that starts all services. Step 2: add profiles. Step 3: multi-instance support." By step 3, the basic setup would already be working, and I could have evaluated what multi-instance actually needs — instead of unwinding an over-engineered approach after the fact.

Whether a CLAUDE.md rule can actually change this behavior, I don't know yet. Discussion welcome.