Forem: Laurent Charignon

Building with LLMs at Scale: Part 5 - Learning and Knowledge Accumulation

Laurent Charignon — Fri, 17 Oct 2025 03:14:50 +0000

This is the final article in my series on working with LLMs at scale. Part 1 through Part 4 focused on building with LLMs—managing sessions, ensuring quality, coordinating parallel work.

This article is about something different: using LLMs as learning tools. Not just to write code, but to help you understand complex systems, internalize patterns, and build lasting knowledge.

The Learning Problem

As a developer, you're constantly learning:

New frameworks and libraries
Complex algorithms
System architecture patterns
Domain-specific knowledge
Best practices and anti-patterns

Traditional learning methods are slow:

Reading documentation takes hours
Watching tutorials is passive
Trial-and-error is inefficient
Taking notes manually is tedious

What if LLMs could accelerate this? Not by just answering questions, but by creating personalized learning materials tailored to how you learn best?

Pattern 1: Heavily-Annotated Code (Worksheets)

When learning a new codebase or pattern, I ask Claude to generate worksheets—code with extensive inline comments explaining every decision.

Example: Understanding React Hooks

Instead of:

Explain React hooks to me

I ask:

Generate a complete React component using hooks, but add inline comments
explaining:
- Why this hook is needed
- What problems it solves
- Common mistakes to avoid
- How it compares to class components
- Performance implications

Claude produces:

import { useState, useEffect, useCallback, useMemo } from 'react';

function UserProfile({ userId }) {
  // useState: Manages local component state
  // Why: We need to track user data that changes over time
  // Replaces: this.state in class components
  // Common mistake: Forgetting state updates are asynchronous
  const [user, setUser] = useState(null);
  const [loading, setLoading] = useState(true);

  // useEffect: Handles side effects (API calls, subscriptions)
  // Why: We need to fetch data after component mounts
  // Dependencies: [userId] means "re-run when userId changes"
  // Common mistake: Missing dependencies causes stale closures
  // Performance: Cleanup function prevents memory leaks
  useEffect(() => {
    // Start loading
    setLoading(true);

    // Fetch user data
    fetch(`/api/users/${userId}`)
      .then(res => res.json())
      .then(data => {
        setUser(data);
        setLoading(false);
      });

    // Cleanup: Cancel fetch if component unmounts
    return () => {
      // In production, you'd actually cancel the fetch here
    };
  }, [userId]); // Dependency array: re-run when userId changes

  // useMemo: Memoizes expensive computations
  // Why: Prevents recalculating on every render
  // Only recalculates when 'user' changes
  // Common mistake: Overusing for cheap operations (adds overhead)
  const displayName = useMemo(() => {
    if (!user) return 'Loading...';
    // Expensive operation example (yours might be more complex)
    return `${user.firstName} ${user.lastName}`.toUpperCase();
  }, [user]);

  // useCallback: Memoizes function references
  // Why: Prevents child components from re-rendering unnecessarily
  // Without this: New function created on every render
  // Performance: Only useful if passed to optimized child components
  const handleRefresh = useCallback(() => {
    setLoading(true);
    // Refresh logic here
  }, []);

  if (loading) return <div>Loading...</div>;
  if (!user) return <div>User not found</div>;

  return (
    <div>
      <h1>{displayName}</h1>
      <button onClick={handleRefresh}>Refresh</button>
    </div>
  );
}

This isn't just code—it's a learning artifact. I can:

Read it top to bottom to understand the flow
See why each decision was made
Learn common mistakes to avoid
Understand performance implications

The Prompt Pattern

The key is being specific about what you want explained:

Generate [concept] code with inline comments explaining:
- Why this approach (vs alternatives)
- Common mistakes
- Performance considerations
- When to use / not use
- How it connects to [related concept]

I've used this pattern to learn:

Rust's ownership system
Clojure's transducers
Kubernetes operators
Distributed system patterns
Database indexing strategies

Pattern 2: Flashcards for Spaced Repetition

Reading annotated code helps understanding, but doesn't ensure retention. For that, I use flashcards with spaced repetition.

The Flashcard System

I built a flashcard CLI (see github.com/charignon/flashcards) that implements spaced repetition algorithms (SM-2, Leitner boxes).

But manually creating flashcards is tedious. So I ask Claude to generate them.

Example: Learning System Design

Generate flashcards for distributed systems concepts. Format as JSON.
Focus on: CAP theorem, consensus algorithms, partitioning strategies.

Each card should have:
- A precise question
- A complete answer
- Tags for organization
- Notes with additional context

Claude generates:

[
  {
    "question": "In the CAP theorem, what does 'Partition Tolerance' actually mean?",
    "answer": "The system continues to operate despite network partitions (messages between nodes being lost or delayed). You must choose: consistency (all nodes see same data) or availability (all requests get responses) during a partition.",
    "tags": ["cap-theorem", "distributed-systems", "fundamentals"],
    "notes": "Common misconception: You can't 'give up' partition tolerance—network partitions will happen. You choose between CP (consistent but not available during partition) or AP (available but inconsistent during partition)."
  },
  {
    "question": "What is the key difference between Raft and Paxos consensus algorithms?",
    "answer": "Raft uses leader election + log replication (simpler to understand). Paxos uses proposal numbers + acceptors (more flexible but complex). Raft is easier to implement correctly.",
    "tags": ["consensus", "raft", "paxos", "algorithms"],
    "notes": "Raft was explicitly designed to be more understandable than Paxos. In practice, most systems use Raft (etcd, Consul) or Raft-like algorithms (Zab in ZooKeeper)."
  },
  {
    "question": "When should you use hash-based partitioning vs range-based partitioning?",
    "answer": "Hash partitioning: Uniform distribution, good for random access. Range partitioning: Preserves ordering, good for range queries. Choose based on your query patterns.",
    "tags": ["partitioning", "sharding", "database"],
    "notes": "Real example: User IDs → hash partition (uniform load). Timestamps → range partition (range queries like 'last 24 hours'). Hybrid approaches exist (compound keys)."
  }
]

I import these directly into my flashcard system:

cat system_design_cards.json | ./flashcards bulk-create --project system-design

Now I can quiz myself:

$ ./flashcards quiz --project system-design --limit 5

Question 1/5: In the CAP theorem, what does 'Partition Tolerance' actually mean?

[Show answer? y/n]

The Spaced Repetition Loop

Here's how the system works:

Generate flashcards from concept (Claude)
Import into flashcard CLI
Quiz yourself (spaced repetition schedules reviews)
Rate confidence (1-5): Low = see again soon, High = see much later
Repeat over days/weeks until internalized

The beauty: I can generate flashcards for any technical topic in minutes. No manual card creation. No finding existing decks that don't quite match my needs.

Topics I've Created Flashcard Decks For

Algorithms: Time complexity, specific algorithm details, when to use what
System design: Distributed systems patterns, scalability techniques
Languages: Rust ownership, Clojure idioms, Python async/await
Databases: Indexing strategies, transaction isolation levels, CAP tradeoffs
Kubernetes: Resource types, networking concepts, operator patterns
Interview prep: LeetCode patterns, behavioral question frameworks

The Meta-Pattern: AI as a Learning Compiler

These patterns share a common insight:

LLMs can transform any learning goal into a personalized learning artifact.

You provide a concept to learn, your preferred style (flashcards, annotated code), and your knowledge gaps. Claude generates materials tailored to you—at the right complexity, in the format you learn best from.

Practical Workflow

My typical learning workflow:

Generate annotated code: Complete implementation with explanatory comments
Extract flashcards: Key concepts become cards for spaced repetition
Quiz over time: Spaced repetition embeds knowledge long-term

This takes 30-60 minutes to generate materials, then ongoing review. Traditional learning (reading papers, watching videos, manual note-taking) would take hours or days to achieve the same depth.

The Flashcard System Details

Since I mentioned my flashcard system, here's how it works:

Features:

Multiple spaced repetition algorithms (SM-2, Leitner, fixed intervals)
Project organization (tag by topic)
Adaptive quiz sessions (repeat failed items immediately)
JSON import/export (easy to generate with LLMs)
CLI for terminal-based studying

Usage:

# Create from Claude-generated JSON
flashcards bulk-create --file cards.json --project rust

# Start adaptive quiz session
flashcards quiz --project rust --limit 10

# Review a specific card
flashcards review <id> --confidence 4

# See statistics
flashcards stats --project rust

The adaptive sessions are key: if you get a card wrong (confidence < 3), it immediately re-queues it. You practice until you get it right, then spaced repetition takes over for long-term retention.

Key Learnings

After a year of using LLMs for learning:

Annotated code beats plain code. The "why" matters more than the "what."

Flashcards with spaced repetition work. Generate hundreds of cards in minutes, internalize them over weeks.

Active learning beats passive. Quizzing yourself is more effective than rereading notes.

Conclusion: Beyond Code Generation

This series has been about working with LLMs at scale—managing multiple sessions, ensuring quality, building coordination patterns. But it's also been about a fundamental shift in how we work with code and knowledge.

LLMs aren't just code generators. They're:

Productivity multipliers (Part 2)
Quality enforcers (Part 3)
Coordination layers (Part 3)
Learning accelerators (Part 5)

The developers who thrive in this new world won't be the ones who use LLMs as fancy autocomplete. They'll be the ones who build workflows, tooling, and patterns that let them work at 10x their previous speed while maintaining quality and continuously learning.

My system—tmux integration, telemetry, memento, smoke tests, code review workflows, flashcards—is just one approach. Yours will look different based on your needs, your style, your projects.

But the core principles apply:

Make the invisible visible (ergonomics, telemetry)
Enforce quality automatically (smoke tests)
Share context between sessions (memento)
Learn continuously (flashcards, annotated code)

The future of software development is human-AI collaboration. Not humans being replaced, but humans being amplified. The question isn't whether to use LLMs—it's how to use them most effectively.

I hope this series has given you ideas, tools, and patterns you can adapt to your own workflow.

Now go build something amazing.

All tools mentioned are open source:
- Memento - Note system for LLM context
- Flashcards - Spaced repetition CLI
- Dotfiles - Tmux scripts and config
- AppDaemon - Home automation examples

Find me on GitHub or read more on my blog.

Building with LLMs at Scale: Part 4 - Experiments and Works-in-Progress

Laurent Charignon — Fri, 17 Oct 2025 03:06:48 +0000

In Part 1, Part 2, and Part 3, I covered pain points and solutions that work reliably. This article is different—it's about experiments, works-in-progress, and lessons from things that didn't quite pan out.

Not every tool needs to be polished. Some are scaffolding for better ideas. Some solve problems that disappear with faster models. And some teach valuable lessons even when they fail.

The Project Explorer: Solving Yesterday's Problem

The Original Problem

Before Sonnet 4.5, exploring a codebase with Claude was slow. Reading 20 files meant 20 sequential API calls, token limits to manage, and 10+ minutes of setup time.

Workarounds emerged: naming key files with an @ prefix (@README.md, @main.go) so they'd appear first in directory listings, making them easier for Claude to discover. Some users created special "guide" files that aggregated important context.

I built project-ingest (inspired by gitingest.com) to solve this. The tool would output a single markdown document with the project structure, key file contents, and dependency graph. Claude could ingest this in one shot instead of reading files incrementally.

What Changed

Sonnet 4.5 changed the game, though I'm not entirely sure how. Is it just faster at reading files? Does it batch requests differently? Does it handle context more efficiently? Whatever the implementation, the result is clear: it's fast enough that project ingestion overhead feels worse than just reading files directly.

Before (Sonnet 3.5):

Run project-ingest → 15 seconds
Claude reads summary → 5 seconds
Total: 20 seconds

After (Sonnet 4.5):

Claude reads 20 files directly → 8 seconds
Total: 8 seconds

The ingester became slower than the problem it solved.

When It's Still Useful

I haven't deleted project-ingest because it remains valuable for:

Very large codebases (100+ files): Still faster to get a high-level view
Project snapshots: Capturing codebase state at a point in time
Documentation generation: Creating an overview for human readers
Cross-project analysis: Comparing architecture across multiple projects

But for everyday "help me understand this project" tasks? Obsolete.

The Lesson

Build for today's constraints, not tomorrow's. The tool was perfect for its time, but model improvements made it obsolete. That's okay. The investment taught me patterns I applied elsewhere (like how to efficiently traverse project structures).

When a tool becomes unnecessary because the problem disappeared, that's a success, not a failure.

Code Review in Emacs: Closing the Loop

The Review Problem

I'm browsing through a codebase—maybe one I wrote months ago, maybe one Claude just generated, maybe something I'm casually exploring. I spot issues: a function that could be clearer, error handling that's too generic, a repeated pattern that should be abstracted.

The problem: I'm in discovery mode, not fix mode. I don't want to stop and fix each issue immediately. I want to:

Mark the issue at the exact line while I'm looking at it
Keep browsing without losing flow
Later, batch all issues together and have an LLM fix them all at once

This is where the Code Review Logger comes in. It decouples discovery from fixing.

The Emacs Integration

I built an Emacs mode (code-review-logger.el) that tracks review comments in an org-mode file:

;; While reviewing code in Emacs:
;; SPC r c - Log comment at current line
;; SPC r r - Log comment for selected region
;; SPC r o - Open review log

(defun code-review-log-comment (comment)
  "Log a review comment with file/line tracking"
  (let* ((file (buffer-file-name))
         (line (line-number-at-pos)))
    (code-review-format-entry comment file line "TODO")))

This creates entries in ~/code_review.org:

** TODO [[file:~/repos/memento/src/cli.py::127][cli.py:127]]
   :PROPERTIES:
   :PROJECT: memento
   :TIMESTAMP: [2025-09-30 Mon 14:23]
   :END:
   This error handling is too generic - catch specific exceptions

** TODO [[file:~/repos/memento/src/search.py::89][search.py:89]]
   :PROPERTIES:
   :PROJECT: memento
   :TIMESTAMP: [2025-09-30 Mon 14:25]
   :END:
   Add caching here - search is called repeatedly with same query

The Workflow

Review code in Emacs (with syntax highlighting, jump-to-def, all IDE features)
Mark issues as I find them (SPC r c for quick comment)
Trigger the automated fix process:

   Read code-review-llm-prompt-template and follow it

Claude automatically:
- Reads ~/code_review.org for all TODO items
- Fixes each issue in the actual code
- Runs make test after every change
- Marks items as DONE only when tests pass
- Provides a summary of what was fixed

The entire workflow is encoded in a memento note (code-review-llm-prompt-template) that Claude reads. This note contains:

The review format specification
Priority order (correctness → architecture → security → performance)
Testing requirements (always run make test, never leave tests failing)
Guidelines for what makes a good vs. bad review
The complete fix-and-verify process

Why This Works

Batch processing is more efficient than interactive fixes:

Claude sees all issues at once and can plan holistically
No back-and-forth during fixing
Tests run after every change (not just at the end)
Clear audit trail of what was fixed

Emacs integration solves the "review without IDE" problem:

I'm in my editor with all my tools
Jump to definitions, search references, check blame
Clicking org links takes me directly to the code

Structured format means Claude gets precise instructions:

Exact file paths (clickable org-mode links)
Exact line numbers
Context about the issue
Project name for multi-repo workflows

Current State: Automated Fix Process

The system is fully automated for the fix workflow. When I have pending reviews, I simply say:

Read code-review-llm-prompt-template and follow it

Claude then:

Reads the standardized prompt from memento
Processes all TODO items from ~/code_review.org
Fixes issues, runs tests, marks items DONE
Never leaves the codebase with failing tests

The key insight: encoding the entire workflow in a memento note makes it repeatable and consistent. I don't need to remember the exact prompt or process—it's all documented and ready to execute.

Future improvements:

Auto-trigger on commit: Git hook that checks for pending reviews before allowing commits
Proactive review suggestions: Claude analyzing code during normal sessions and adding items to the review log
Review metrics: Track what types of issues are most common to improve coding patterns

The Diff Workflow: Bringing Changes Back to Emacs

The Problem

Claude makes changes in the terminal. I want to review them in Emacs. How do I bridge that gap?

The Current Solution

Simple but effective:

# Claude generates changes, I run:
git diff > /tmp/review.diff

# In Emacs:
# Open the diff file
# Use Emacs diff-mode for navigation
# Apply/reject hunks interactively

This works but feels clunky. I'm copying diffs manually, opening files, navigating around.

What I Want

A tighter integration:

Claude signals "I made changes"
Emacs automatically shows the diff in a split window
I review with full IDE context
I approve/reject specific changes
Claude sees my feedback and adjusts

This requires:

MCP server for Emacs communication
Claude code that can signal "review needed"
Emacs mode that listens for review requests
Two-way communication (Claude → Emacs → Claude)

I've prototyped pieces of this but nothing production-ready yet.

The Barrier

Building reliable two-way communication between Claude and Emacs is hard:

Emacs server needs to be always-on
Need protocol for structured messages
Need to handle failures gracefully
Race conditions when multiple Claudes talk to one Emacs

I'm experimenting with using memento as the message bus:

Claude writes "review-needed" note
Emacs polls memento for new reviews
Emacs writes feedback to memento
Claude reads feedback

Clunky but doesn't require real-time communication.

What Didn't Work: Session Auto-Resume

The Idea

When I restart my computer, I lose all tmux sessions. What if Claude could auto-resume?

# Before shutdown, save session state:
tmux-save-sessions  # Captures all window/pane layouts

# After restart:
tmux-restore-sessions  # Recreates everything

Each session would:

Restore to the correct directory
Read the last prompt from history
Show a summary: "You were working on memento refactoring"

Why It Failed

Context loss is too severe. Even if I restore the directory and prompt, Claude doesn't remember:

What code was already written
What decisions were made
What tests were run
What bugs were found

I'd need to capture and replay the entire conversation, which means:

Huge token usage (replaying thousands of tokens)
Slow startup (processing all that history)
Potential for Claude to make different decisions on replay

The Lesson

Session continuity requires more than just state restoration. You need:

Explicit checkpoints (memento notes with "current status")
Clear handoff documents ("Session ended here, next steps are...")
Project-specific context (not just conversation history)

Instead of auto-resume, I now use explicit handoff notes:

# Session Checkpoint: 2025-09-30 14:30

## What We Did
- Refactored CLI argument parsing to use argparse
- All tests pass
- Committed changes: git log -1

## What's Next
- [ ] Add JSON output support to all commands
- [ ] Update documentation
- [ ] Add integration tests

## Key Decisions
- Using argparse instead of manual parsing for consistency
- All commands must support --json flag

## Files Modified
- src/cli.py (lines 1-89, 127-145)
- src/parser.py (new file)

Next session reads this note and picks up where we left off. Works better than trying to resume the conversation.

Experiments in Progress

1. MCP Coordination Server

Building an MCP server specifically for coordinating parallel LLM sessions:

# Hypothetical API
coordinator.claim_file("src/parser.py", session="A")
coordinator.add_barrier("refactor-complete", required=["A", "B"])
coordinator.wait_for_barrier("refactor-complete")
coordinator.get_session_status("A")  # → "in_progress" | "blocked" | "completed"

This would solve the "stepping on each other" problem when running parallel sessions.

2. Telemetry Mining

I have months of telemetry data (see Part 2). Now I want to mine it:

# Which prompts lead to longest sessions?
# Which projects have the most rework?
# When do I context-switch most?
# Correlation between session length and memory usage?

The goal: optimize my workflow based on data, not intuition.

3. LLM-Generated Architecture Docs

After a major refactor, can Claude generate architecture documentation automatically?

Read all files in src/. Generate an architecture document explaining:
- Key components and their responsibilities
- Data flow through the system
- API boundaries
- Design decisions and trade-offs

Early experiments are promising. The docs aren't perfect but are good starting points.

Key Learnings

Embrace obsolescence. If a tool becomes unnecessary, that's progress.

Perfect is the enemy of done. The code review logger works even though it's not fully automated. Ship it.

Tight integration is hard. Two-way communication between tools (Claude ↔ Emacs) requires careful design.

Explicit beats implicit. Session handoff notes work better than trying to auto-resume from history.

Data reveals patterns. Telemetry showed me I context-switch too often—now I batch similar tasks.

What's Next

Part 5 (final article) covers using Claude as a learning tool: generating flashcards, creating annotated worksheets, and building a spaced-repetition system for technical concepts.

The code review logger is in the memento repo. The project ingester is at ~/bin/project-ingest. The tmux session tools are in my dotfiles. All MIT licensed—use freely.

Building with LLMs at Scale: Part 3 - Higher-Level Abstractions

Laurent Charignon — Fri, 17 Oct 2025 03:06:40 +0000

In Part 1 I described the pain points of working with multiple LLM sessions. Part 2 covered the ergonomics layer that made individual sessions manageable.

But ergonomics alone isn't enough when you're running 5-10 parallel Claude sessions. You need coordination, quality enforcement, and shared context. This article covers the higher-level abstractions that make LLM teams actually work.

The Smoke Test Paradigm: Designing Software for Rapid Iteration

Here's the key insight: software design principles that help human developers also help LLMs. The same things that trip up human coders—complex interfaces, tight coupling, unclear contracts—trip up LLMs too.

When building software that LLMs will write and modify, the classic principles still apply:

Modular code: Small, well-defined components
Simple interfaces: Clear inputs and outputs
Loose coupling: Changes in one area don't cascade
Fast feedback: Know immediately when something breaks

The difference is velocity. LLMs can iterate 10x faster than humans—but only if the feedback loop is tight. That's where smoke tests become critical.

Why Smoke Tests Over Unit Tests?

I tried comprehensive unit test suites. They worked, but the overhead was crushing:

Writing tests took longer than writing features
Tests became brittle as code evolved
Mocking and fixtures added complexity
False positives made me ignore failures

The problem: unit tests are designed for human-paced development. When Claude can refactor an entire module in 30 seconds, waiting 5 minutes for a full test suite kills momentum.

Instead, I adopted smoke tests: simple, end-to-end checks that verify the system works. Run in seconds. Clear pass/fail. No ambiguity.

Example from my flashcards project (test/smoke_test.sh):

#!/bin/bash
# Smoke test: Does the basic workflow work?

# Create a flashcard
./flashcards create \
    --question "What is 2+2?" \
    --answer "4" \
    --project "math"

# Get quiz items
./flashcards quiz --limit 1 | grep "What is 2+2?"

# Review it
./flashcards review <id> --confidence 5

# Check it's in the list
./flashcards list | grep "What is 2+2?"

echo "✅ Smoke test passed!"

That's it. No mocking. No fixtures. No complex assertions. Just: Does it work end-to-end?

The Make Test Convention

Every project has a Makefile with a test target:

test:
    @echo "Running smoke tests..."
    @./test/smoke_test.sh
    @echo "✅ All tests passed"

Claude knows this convention. After every code change, it automatically runs make test. If tests fail, Claude must fix them before continuing.

This simple pattern has caught hundreds of regressions. Claude refactors a function? Tests catch it. Claude renames a variable? Tests catch it. Claude adds a feature? Tests verify it.

Why This Works

Smoke tests have unique advantages for LLM workflows:

Fast: Run in seconds, not minutes
Clear failures: "Command failed" is unambiguous
Self-documenting: Reading the test shows how the system should work
Easy to maintain: When features change, tests are obvious to update
Catches real issues: Integration problems that unit tests miss

The trade-off: you don't get fine-grained coverage. But in my experience, that's fine. I'd rather have 90% confidence in 5 seconds than 99% confidence after 5 minutes of test runs.

Memento: Shared Context Between Sessions

The core challenge of parallel LLM sessions: they don't know about each other.

Session A refactors the authentication system. Session B adds a new feature that uses authentication. Session A's changes break Session B's code—but Session B has no idea until tests fail.

I needed a shared knowledge base. Enter memento.

What Is Memento?

Memento is my note-taking system built on org-roam, which implements the Zettelkasten method for networked thought. I expose it to Claude via MCP (Model Context Protocol).

Think of it as a shared brain for all Claude sessions—a personal knowledge graph where notes link to each other, concepts build on each other, and every LLM session can read and contribute to the collective knowledge.

Key features:

Public notes tagged with PUBLIC are accessible via MCP
Searchable with full-text search
Structured with org-mode properties and tags
Version controlled in git
Persistent across sessions

The Global Context Pattern

Every Claude session starts by reading the claude-global-context note:

;; Automatically loaded by Claude at session start
(mcp__memento__note_get :note_id "claude-global-context")

This note contains:

My coding preferences
Project structure
Common pitfalls
Tools available (memento, MCP servers, custom scripts)
Reminders (never access ~/.roam directly, always use MCP)

As I discover patterns, I add them to this note. Every future Claude session gets that knowledge automatically.

Example from my global context:

## 🧪 Testing Approach:
- Write tests for new features
- Rely on smoke tests for projects (trigger with `make test`)
- **Whenever all tests pass after a change, make a commit with a descriptive message**

## 🔧 ELISP DEVELOPMENT WITH DOOMSCRIPT:
See the note tagged `elisp` for patterns and testing approaches

Session-Specific Context

For complex projects, I create dedicated notes:

memento-clojure-patterns: Clojure idioms and anti-patterns
appdaemon-testing-guide: How to test Home Assistant automations
mcp-server-patterns: How to build reliable MCP servers

When Claude works on these projects, I explicitly reference the notes:

Read the note `mcp-server-patterns` and apply those patterns
to this new server implementation.

Claude reads the note, absorbs the context, and applies it. The next Claude session working on the same project does the same thing—they're building on shared knowledge.

Coordination Patterns (Experimental)

I'm experimenting with explicit coordination notes for parallel sessions:

# working-on-memento-refactor

## Current State
- Session A: Refactoring CLI argument parsing (IN PROGRESS)
- Session B: Adding new `bulk-update` command (WAITING)
- Session C: Updating tests (COMPLETED)

## Decisions Made
- Use argparse instead of manual parsing (Session A, 2025-09-28)
- All commands must support JSON output (Session B, 2025-09-27)

## Upcoming Work
- [ ] Migrate all commands to new arg structure
- [ ] Add integration tests
- [ ] Update documentation

Each session reads this note before starting work. Session A updates its status when done. Session B sees that and can proceed safely.

This is informal right now—I'm still exploring better patterns. Some ideas:

Barrier functionality: Session B blocks until Session A completes
Lock mechanism: Only one session can modify a file at once
Dependency tracking: Session C depends on Session A and Session B

I'm considering building an MCP server specifically for project coordination. Something like:

# Hypothetical coordination MCP server
mcp_coordinator.claim_file("src/parser.py", session_id="A")
# Other sessions get an error if they try to edit it
mcp_coordinator.add_barrier("refactor-complete", required_sessions=["A", "B"])
mcp_coordinator.wait_for_barrier("refactor-complete")  # Blocks until A and B finish

The Supervisor Pattern: Orchestrating LLM Teams

When I need major changes, I run multiple Claude sessions in parallel:

Session A: Implements feature X
Session B: Writes tests for feature X
Session C: Updates documentation
Session D: Reviews changes from A, B, and C

This is the supervisor pattern—but instead of manually coordinating, I use an LLM to generate prompts for other LLMs.

The Meta-LLM Approach

Here's the key insight: planning parallel work is itself an LLM task. So I have Claude generate the work breakdown and individual prompts:

I describe the goal to a planning session: "Implement feature X with tests and docs"
The planner LLM creates:
- A work plan broken into phases (represented as a DAG)
- Individual prompt files for each parallel task
- Memento-based coordination scheme
- A supervisor prompt for monitoring progress
I review and launch using my automation tools

This meta-approach scales much better than manual coordination. The planner understands dependencies, estimates complexity, and generates consistent prompt structures.

The Tooling: claude-parallel

I built claude-parallel to automate the workflow:

# Step 1: Generate the plan
claude-parallel plan -P myproject -p "requirements.txt"

# This launches a planning Claude session that:
# - Breaks work into phases and tasks
# - Creates prompt files in ~/.projects/myproject/prompts/
# - Generates plan.json with the dependency DAG
# - Creates a supervisor.txt prompt for monitoring

# Step 2: Dispatch work to parallel sessions
claude-parallel dispatch -p prompts/phase-1-task-auth.txt src/auth.py
claude-parallel dispatch -p prompts/phase-1-task-tests.txt tests/test_auth.py

The dispatch command automatically:

Creates a new tmux window
Changes to the file's directory
Launches Claude with the prompt
Monitors completion via memento notes

Tmux Automation

For complex projects with many parallel sessions, I use generate_tmuxinator_config:

# Generate tmuxinator config from prompt files
generate_tmuxinator_config -n myproject prompts/*.txt > ~/.config/tmuxinator/myproject.yml

# Launch all sessions at once
tmuxinator start myproject

This creates a tmux session with:

One window per prompt file
Proper window naming for easy navigation
All sessions starting in the correct directory

How I Do It Today

Write high-level requirements in a text file
Run claude-parallel plan to generate work breakdown
Review the generated prompts (adjust if needed)
Launch sessions via claude-parallel dispatch or tmuxinator
Use memento for coordination (automatically set up by the planner):
- Sessions read/write status notes
- Sessions check phase completion before starting
- Blocker notes communicate issues
Rely on smoke tests to catch integration issues
Monitor via tmux status indicators (see Part 2) or run the supervisor prompt

Persona-Driven Architecture

Assigning roles to sessions improves output quality, but I use personas differently than you might expect.

I use Robert C. Martin (Uncle Bob) as the planner and architect. When breaking down a complex feature into parallel tasks, I ask the planner session:

You are Robert C. Martin (Uncle Bob). Review this feature request and break it
down into clean, well-separated tasks for parallel implementation. Focus on
SOLID principles and clear interfaces between components.

This gives me a work breakdown that follows clean architecture principles: small, focused components with clear responsibilities.

Then for the worker sessions (the ones actually implementing the tasks), I experiment with different prompts. Sometimes specific personas help:

"You are obsessed with performance and correctness" for algorithm-heavy code
"You are paranoid about edge cases and defensive programming" for input validation
"You value simplicity above all else, avoid any unnecessary complexity" for utility functions

Other times, I just use the task description from the planner without additional persona framing. I'm still experimenting with what works best for different types of work.

What's Missing

Current gaps in my supervisor pattern:

No automatic conflict detection: I manually ensure sessions don't edit the same files
No rollback mechanism: If Session A breaks tests, I manually revert
No progress tracking: I eyeball tmux windows instead of having a dashboard
No automatic merging: I manually integrate changes from parallel sessions

These are ripe for automation. The MCP coordination server would solve 1-3. Number 4 might need a specialized "merger" session that reads changes from all other sessions and integrates them.

Knowledge Accumulation Over Time

Traditional LLM conversations are ephemeral. Each session starts fresh. But with memento, knowledge compounds.

Example workflow:

Week 1: I discover that MCP servers should validate input strictly
I add to global context: "MCP servers must validate all inputs and return clear error messages"
Week 2: Claude builds a new MCP server, automatically applies that pattern
Week 3: I discover another pattern (connection pooling), add it to global context
Future sessions: Apply both patterns automatically

Over months, my global context evolved from 50 lines to 500+ lines of hard-won knowledge. New Claude sessions are more productive from day one.

The Memento Notes Index

To make knowledge discoverable, I maintain a memento-notes-index:

## Development & Technical Guides

- **mcp-server-patterns**: Patterns for building reliable MCP servers
- **smoke-test-paradigm**: Why smoke tests work better than unit tests
- **elisp-testing-guide**: Fast testing with doomscript
- **code-review-guide**: How to review code and log issues for AI

## Quick Lookup by Use Case

- Building MCP servers → `mcp-server-patterns`
- Emacs development → `elisp-testing-guide`
- Testing frameworks → `smoke-test-paradigm`

When Claude asks "How should I structure this?", I can say: "Check the notes index for relevant guides."

Key Learnings

After 6 months of parallel LLM workflows:

Smoke tests are a game-changer. They catch 90% of issues with 10% of the effort of comprehensive test suites.

Shared context is essential. Without memento, each session reinvents the wheel. With it, knowledge compounds.

Personas improve output quality. "Be Uncle Bob" consistently produces cleaner code than "refactor this."

Informal coordination works at small scale. For 3-5 parallel sessions, a shared note is enough. Beyond that, I'll need real tooling.

Every discovery should be captured. If I solve a problem once, I never want to solve it again. Write it down in memento.

What's Next

The patterns in this article work but aren't fully automated. I'm manually coordinating sessions, manually managing shared context, manually merging changes.

Part 4 covers experiments and works-in-progress: the project explorer tool, Emacs integration for code review, diff workflows, and ideas that didn't quite work out.

Part 5 shifts to learning: using Claude to generate flashcards, worksheets, and annotated code for studying complex topics.

The memento system is open source at github.com/charignon/memento. The global context patterns are in my CLAUDE.md. The flashcards smoke tests are at github.com/charignon/flashcards.

Building with LLMs at Scale: Part 2 - Ergonomics and Observability

Laurent Charignon — Fri, 17 Oct 2025 02:56:32 +0000

Part 1 described the problems of managing multiple LLM sessions. This article shows the ergonomic layer that solves them: visual indicators, session recording, logging, and telemetry.

The Complete Picture

Here's my workflow using tmux to manage multiple LLM sessions.

Tmux is a terminal multiplexer—it lets you run multiple terminal sessions inside a single window and switch between them quickly. Think of it like having tabs in a browser, but for your terminal. You can have one tmux session with 10 different windows, each running a different LLM conversation, and easily switch between them with keyboard shortcuts.

Here's how it works in practice:

I open a new tmux window (like opening a new tab) and start an LLM session—maybe Claude Code working on a bug fix. Metrics tracking begins automatically in the background. As the LLM works, my tmux status bar (the line at the bottom of the terminal) shows a 🤖 emoji next to that window's name. I can glance at the status bar and instantly see that window 3 is busy with an LLM.

When the LLM finishes and waits for my input, the emoji changes to 💬. If I'm currently in a different window (say, window 5 where I'm reviewing code), I just press ` n (backtick followed by n) to jump directly to the waiting session. No manually cycling through windows, no remembering which number it was.

Every context switch gets recorded with a timestamp. A week later, when I need to understand what happened in that session—what prompts I gave, what the LLM suggested, what decisions were made—I can query the session history and replay the logs.

The Visual Layer: Terminal Session Management

Problem: 10 terminal windows (or in tmux terminology, 10 windows within one tmux session), each running a different LLM conversation. No visibility into which LLM needs attention.

Solution: emoji indicators showing window state in the tmux status bar.

💬 memento        # LLM waiting for input
🤖 appdaemon      # LLM actively working
📝 config         # Editor open
🐍 analyzer       # Python script running
⌨️ bash           # Shell waiting for command

Window Status Script

The tmux-window-status script analyzes each tmux pane (a pane is like a split section within a window) and adds contextual emojis. Here's how it works:

Capture recent output: Grab the last 100 lines of text from the pane
Detect LLM patterns: Look for LLM-specific text like > prompts or dialog boxes asking "Do you want to..."
Check the process: See what command is actually running in that pane
Return the right emoji: Based on what we found, add the appropriate emoji to the window name

Here's the key detection logic:

check_llm_waiting() {
    local pane_content="$2"
    local last_lines=$(echo "$pane_content" | tail -5)

    # Check for common LLM prompts
    if echo "$last_lines" | grep -qE "^>\s*$|^> "; then
        return 0  # LLM is waiting
    fi

    # Check for dialog boxes
    if echo "$last_lines" | grep -qE "Do you want to|❯.*Yes"; then
        return 0  # Waiting for decision
    fi

    return 1  # Not waiting
}

Jump to Next Waiting Window

The tmux-next-waiting script cycles through windows where an LLM is waiting. It loops through all your tmux windows, checks which ones have the 💬 emoji (meaning an LLM is waiting for input), and jumps to the next one after your current window:

#!/bin/bash
# Find all windows with 💬 emoji (LLM waiting)
windows_waiting=""
for window in $(tmux list-windows -F "#{window_index}"); do
    formatted_name=$(~/bin/tmux-window-status "$window_name" "$pane_id")
    if echo "$formatted_name" | grep -q "💬"; then
        windows_waiting="$windows_waiting $window"
    fi
done

# Jump to next waiting window after current
# (wraps around to first if at end)

To use this, bind it to a tmux key in your tmux configuration (~/.tmux.conf):

bind-key n run-shell "~/bin/tmux-next-waiting"

Now pressing ` n (assuming you've set ` as your tmux prefix key) jumps to the next LLM session that needs attention. The prefix key is like a "modifier" that tells tmux "the next key is a command for you." With this setup, switching is fast: ` 1 goes to window 1, ` TAB toggles to your last window, ` n finds the next waiting LLM.

The Logging Layer: Complete Auditability

Remember the problem from Part 1? Code written last week is unrecognizable without session history. You need to understand what the LLM did, what decisions were made, and why certain approaches were taken.

The solution: record everything. I use asciinema, a terminal session recorder, to capture complete LLM sessions. Unlike text logs (which just save the text), asciinema records the actual terminal output with timing information—think of it like a video recording of your terminal session. You can replay sessions later and see exactly what appeared on screen, when it appeared, and in what order.

For complex refactoring sessions or experiments, I use this wrapper script:

#!/usr/bin/env bash
# llm-record - Record LLM sessions with asciinema

RECORDING_NAME="${1:-llm-$(date '+%Y%m%d-%H%M%S')}"
RECORDINGS_DIR="${HOME}/llm-recordings"
RECORDING_FILE="${RECORDINGS_DIR}/${RECORDING_NAME}.cast"

asciinema rec \
    --title "LLM Session: ${RECORDING_NAME}" \
    --idle-time-limit 10 \
    "${RECORDING_FILE}"

The --idle-time-limit 10 flag compresses long waits (like when the LLM is thinking or making API calls) to 10 seconds in playback. This makes replaying sessions much faster—you're not sitting through minutes of "Processing..." messages.

When Claude Code encounters bugs or issues, I can extract the exact terminal transcript with asciinema cat and share it. This works around a limitation in current LLM tools: they don't have built-in access to session history, so providing a complete transcript helps them understand what went wrong.

The Telemetry Layer: Metrics and Patterns

Visual indicators solve the immediate "which window needs attention?" problem. But I wanted to understand deeper patterns: how many parallel sessions do I actually run? When am I most productive? Which projects consume the most time?

To answer these questions, I built a telemetry system using Prometheus—an open-source monitoring system originally built at SoundCloud. Prometheus collects metrics (numerical measurements) over time and lets you query them later. A background script runs every 15 seconds, collecting metrics about my tmux environment and LLM sessions.

The script tracks session-level metrics like total tmux sessions, windows per session, and which sessions are actively attached. It also captures LLM-specific data: the number of active LLM processes, memory usage per session (RSS in MB), CPU usage, session duration in minutes, and the working directory for each session.

All of this gets pushed to a Prometheus pushgateway:

LLM_SESSIONS_TOTAL = Gauge('llm_sessions_total',
    'Total number of active LLM sessions')

LLM_SESSION_MEMORY_MB = Gauge('llm_session_memory_mb',
    'Memory usage per LLM session in MB',
    ['pid', 'command'])

LLM_SESSION_DURATION_MINUTES = Gauge('llm_session_duration_minutes',
    'Duration of LLM session in minutes',
    ['pid', 'command'])

What This Reveals

With proper dashboarding, the metrics answer practical questions. When are you most productive? You can see which times of day correlate with longer, more focused sessions. Which projects consume the most time? Resource usage aggregated by working directory shows exactly where hours go. Do you context-switch too much? Tracking window switches per hour reveals patterns you might not consciously notice.

The data also catches problems early. If session memory usage steadily climbs over time, you know something's leaking. If you're consistently running 8+ parallel sessions, maybe your workflow needs simplification.

Prometheus makes it easy to query historical patterns and correlate them with specific projects or time periods. The metrics themselves don't make you productive, but they reveal patterns that inform better workflow decisions.

Key Learnings

Visual indicators eliminate the "which window?" hunt
Complete session history invaluable for debugging
Metrics reveal workflow patterns you don't consciously notice
Record complex sessions, not everything
Automation essential—manual logging fails

Are LLMs Making Us More Productive?

The tools in this article—tmux integration, session recording, telemetry—exist because I'm managing 10 parallel LLM coding sessions. But that raises the obvious question: are LLMs actually making me more productive at writing code?

I don't believe that's the case for everyone using them. Handing an LLM to a developer without workflow engineering is like giving someone a race car without teaching them to drive. They might go faster on straightaways, but they'll crash on the first turn.

But if you know how to use them—if you build the right workflows, enforce quality with tests, coordinate multiple sessions, and maintain proper oversight—they're a game changer. The productivity gains are real, but they're not automatic. They come from deliberate workflow design.

The ergonomic layer in this article is what makes those gains possible. Without visibility into session state, without audit trails, without metrics to understand patterns, you're flying blind. The tools don't make LLMs productive—they make you productive when using LLMs.

What's Next

The ergonomics layer makes individual sessions manageable. But coordinating multiple LLM sessions to work together without conflicts requires higher-level abstractions.

Part 3: Higher-Level Abstractions covers shared context systems for long-term memory, the smoke test paradigm for quality, and patterns for running a "team" of LLM instances on a single project.

Building with LLMs at Scale: Part 1 - The Pain Points

Laurent Charignon — Fri, 17 Oct 2025 02:49:51 +0000

Working with 10 parallel LLM coding sessions exposes problems that don't appear at smaller scale. Managing multiple conversations, maintaining context across sessions, and ensuring quality all require different approaches than single-session work.

This series documents those problems and the solutions that emerged. The tools shown use Claude Code and Emacs, but the patterns apply broadly to any LLM workflow.

The Pain Points

The problems:

Managing Multiple Conversations - 10 terminal windows, no visibility into which sessions need attention
Lost Context - No audit trail of past sessions or decisions made
Quality & Regressions - LLMs fix one thing, break another
Language-Specific Edit Challenges - Parenthesis balance issues in Lisp
Project Exploration Speed - 10+ minutes to load a 20-file project
Context Switching Between Sessions - No shared knowledge between parallel sessions
Review Without Full IDE Context - Reviewing diffs without syntax highlighting and jump-to-def
No Long-Term Memory - Every session starts from scratch
Parallelization Challenge - Coordinating multiple LLMs working simultaneously
Safety and Access Control - Too easy to grant access to private data

Let's dive into each of these.

Problem 1: Managing Multiple Conversations

Picture this: 10 terminal windows, each running a different LLM session. One is refactoring your note system, another is debugging a home automation script, a third is implementing a new feature. Zero visibility into which needs your attention.

The problem becomes clear when context switching:

Which session is waiting for input?
Which is still processing?
Which finished 10 minutes ago and has been idle?

Without state tracking across sessions, every context switch means manually checking each window. You switch to a session only to find the LLM finished 10 minutes ago while you were focused elsewhere.

Problem 2: Lost Context

Open a project you worked on last week with an LLM. The code looks unfamiliar. You don't remember writing it. Questions arise:

What was the original prompt?
Did I review this properly?
What architectural decisions were made?
Why this approach instead of alternatives?

Without an audit trail of past sessions, there's no way to reconstruct the reasoning behind the code. You're essentially trusting that past-you made good decisions—but you have no record of what those decisions were.

Automatic context compaction makes this worse. LLMs will drop older messages to fit within token limits, but I want explicit control over what gets retained from session to session, not an algorithm deciding what's "important."

Problem 3: Quality and Regressions

Whack-a-mole development: LLMs fix one issue and silently break another. The problem wasn't the LLM's capabilities—it was my process. I was treating LLM sessions like conversations with a developer I trusted to test their own code.

The first solution: treat every change like a pull request. Tests must pass.

# After every LLM change
make test  # Must pass before continuing

This catches regressions but doesn't solve architectural consistency. Code generated across dozens of separate sessions felt scattered, like it was designed by committee where no one talked to each other.

The second solution: persona-based prompts. Instead of "Refactor this code":

You are Robert C. Martin (Uncle Bob). Review this code and refactor
it according to clean code principles.

The difference was striking. Suddenly: smaller functions, better separation of concerns, consistent naming conventions across the codebase.

You can use different personas for different needs. Want paranoid security review? "You are a security-minded, paranoid QA engineer who trusts nothing." Need simplicity? "You are obsessed with reducing complexity and eliminating unnecessary abstractions." The persona focuses the LLM's attention on specific concerns.

Problem 4: Language-Specific Edit Challenges

Lisp-based languages (Elisp, Clojure, Scheme) are harder for LLMs to edit because of parenthesis balance.

The problem: Remove one closing paren and get "end-of-file during parsing" with no location. The error could be 200 lines away from the actual edit.

The feedback loop:

LLM edits code
Compile fails
Hunt for unbalanced paren manually
Fix and retry

This affects any language with nested structure spanning many lines: deeply nested JSON, XML, etc.

The solution: validation tooling that gives precise error locations. Without that, you're debugging blind.

Problem 5: Project Exploration Speed

New codebase? Get ready to spend 10+ minutes on initial exploration. A 20-file project means feeding files one by one to the LLM, waiting for API calls, managing context windows.

This creates a cold-start problem. Every new project or every time you switch projects means a lengthy ramp-up period before the LLM has enough context to be productive.

The solution: a way to efficiently snapshot and load project context—not just individual files, but the structure, key patterns, and architectural decisions all at once.

Problem 6: Context Switching Between Sessions

I'd discover a great pattern in session A. Session B, working on a related problem, had no idea it existed.

Each LLM conversation was an island. Problems with this isolation:

Can't share knowledge between sessions
Contradictory decisions across different LLM instances
Manual copy-paste required to propagate learnings
If I made an architectural decision in conversation A, conversation B would make a different one

The solution: a shared context system where different LLM sessions can coordinate and learn from each other.

Problem 7: Review Without Full IDE Context

Code review without your IDE is code review on hard mode.

The LLM generates a diff. You're looking at it in a terminal or web interface. You're missing:

Syntax highlighting
Jump-to-definition
Project-wide search
Static analysis
Your configured linters

Example: The LLM renames process() to process_data(). Questions you can't answer:

What calls this function?
Is this part of a larger refactoring?
Did it affect other functions that depend on it?

Tools like Cursor solve this with deep editor integration—the LLM changes happen natively in your IDE. But if you're using terminal-based LLM tools or trying to integrate with Emacs/Vim, you need a workflow to bring LLM-generated changes into your full development environment.

Problem 8: No Long-Term Memory

Sessions had amnesia. Yesterday's architectural decisions? Gone. Last week's patterns? Forgotten.

Sure, I had a global CLAUDE.md file with preferences, but that was static. I couldn't easily capture evolving patterns like:

"When working on MCP servers, always check the umcp wrapper patterns"
"The smoke test paradigm works better than unit tests for these projects"
"Remember that the memento CLI should never be called directly—use MCP"

These insights lived in my head, not in a form the LLM could access and build upon. Each new session started from zero, unable to leverage the accumulated knowledge from previous sessions.

Problem 9: Parallelization Challenge

I wanted parallel LLM sessions building different parts of the same project. Chaos ensued.

The ideal workflow:

Session A: implements a feature
Session B: writes tests for that feature
Session C: updates documentation
Session D: reviews the changes from A, B, and C

But coordinating multiple LLM sessions is harder than coordinating humans. Problems:

Sessions can't see each other's progress
No natural communication channel between sessions
They'll happily work on the same file and create conflicts
No way to express dependencies (Session B needs Session A to finish first)

The solution: orchestration patterns to divide tasks, prevent conflicts, and merge results without manual intervention.

Problem 10: Safety and Access Control

When you're in flow, you say 'yes' to everything. That's how the LLM reads your private notes.

Claude Code prompts have become like cookie consent banners or Terms of Service pages. You've seen the prompt 50 times today. "Do you want to let Claude read this file?" Yes. "Run this command?" Yes. "Search this directory?" Yes. Decision fatigue sets in. You stop reading carefully. You just click yes to make the prompt go away and get back to work.

This is exactly how website designers exploit users with cookie banners—they know after the 10th website, you'll just click "Accept All" without reading. The same psychological pattern applies to LLM tool use.

I discovered a serious problem when building my note management system. Despite explicit prompts telling the LLM "do NOT access private notes," I'd occasionally review logs and find it had read private files anyway. This wasn't malicious—the LLM was trying to be helpful, pattern-matched similar file paths, and I'd reflexively approved the request without carefully reading which specific file it wanted.

Risk areas where this becomes dangerous:

Personal notes or journals
Configuration files with API keys or tokens
Any sensitive data mixed with development work

The fundamental tension:

Speed vs Safety: Careful review of every action slows you down
Context vs Control: The LLM needs broad context to be useful, but that increases risk
Automation vs Oversight: You want automated workflows, but automation can bypass safety checks

The real solution isn't better logging—it's making the wrong thing impossible by design. Don't rely on prompts or careful review. Build systems where sensitive data simply can't be accessed.

For my note system, I mark notes as PUBLIC in org-mode by setting a property. Only PUBLIC notes are accessible to the LLM via MCP. The system enforces this at the API level—no amount of prompt engineering or reflexive approval can expose private notes.

But this pattern doesn't scale well to code. You can't mark every file in a codebase as PUBLIC or PRIVATE.

A more scalable approach: leverage Unix file permissions. Make LLM tools run as a specific user or group with restricted permissions:

Private files: chmod 600 (owner-only)
Public files: chmod 644 (world-readable)
LLM runs as different user/group: physically cannot read private files

This enforces access control at the OS level. The LLM tool literally can't open the file, regardless of prompts or approval. You could even use chattr +i on Linux to make sensitive files immutable.

The challenge: this requires discipline in setting permissions and may conflict with normal development workflows. But it's the right direction—making violations impossible, not just logged.

Other needed patterns:

Directory-level access control (allow ~/projects/blog, block ~/.ssh)
Pattern-based restrictions (block *.env, *credentials*, *secrets*)
API-level enforcement that tools can't bypass
Audit trails that make violations obvious

Until we solve this systematically, the onus is on us to be vigilant—and that's exhausting when you're trying to move fast.

The Solutions

Ergonomics (Part 2): Terminal integration showing LLM state, telemetry tracking all sessions, logging every command
Abstractions (Part 3): Shared context between sessions, smoke test paradigm, coordinating parallel LLMs
Experiments (Part 4): Project exploration tools, diff review workflows, lessons from failures
Learning (Part 5): Flashcard generation, annotated code worksheets, spaced repetition

The next articles show how each works.

What's Next

Part 2: Ergonomics Layer (coming soon) - Terminal integration for managing multiple LLM sessions, telemetry and logging infrastructure that makes everything auditable.

Part 3: Higher-Level Abstractions (coming soon) - Shared context systems for long-term memory, smoke tests as the foundation of quality, patterns for coordinating multiple LLM sessions.

Part 4: Works in Progress (coming soon) - Project exploration tools, diff review workflows, experiments that didn't quite pan out (but taught me valuable lessons).

Part 5: Learning & Knowledge (coming soon) - Using LLMs to generate flashcards, worksheets, and heavily-annotated code for studying complex topics.

Building a Smart Home with AppDaemon: From Baby Monitors to Language Learning

Laurent Charignon — Fri, 17 Oct 2025 02:49:40 +0000

Over the past year, I've been building a comprehensive home automation system using AppDaemon, a Python framework for Home Assistant automation. What started as simple light controls has evolved into a sophisticated ecosystem of 30+ interconnected applications managing everything from CPAP machines to interactive language learning games for my child.

Why AppDaemon?

Home Assistant's built-in automations are great for simple rules, but when you need complex logic, state management, or want to reuse code across multiple automations, AppDaemon shines. It provides:

Full Python programming capabilities
Object-oriented design with inheritance
Async/await support for concurrent operations
Direct access to Home Assistant's state and service APIs
Easy testing with pytest

The Architecture: Base Class Pattern

At the core of my system is a BaseApp class that all automations inherit from:

class BaseApp(hass.Hass):
    ArgClass=None
    def __init__(self, ad, name, logging, args, config, app_config, global_vars):
        super().__init__(ad, name, logging, args, config, app_config, global_vars)
        assert self.ArgClass is not None
        self.statsd=statsd.StatsClient("statsd.lan", 9125)
        self.instance_name = name
        self.filtered_args = {k:v for k,v in self.args.items() if k in fields_}
        self.input = self.ArgClass(**self.filtered_args)

This pattern provides several benefits:

Type Safety: Using dataclasses for arguments ensures type checking and validation
Metrics Collection: Built-in StatsD integration for monitoring
Consistent Logging: Standardized logging across all apps
Code Reuse: Common functionality available to all automations

Real-World Applications

1. CPAP Light Synchronization

One of the more unique automations synchronizes bedroom lighting with my CPAP machine. When the CPAP starts (detected via power monitoring), the lights gradually dim to create ideal sleeping conditions:

class CPAPLightSync(BaseApp):
    async def initialize(self):
        await self.listen_state(
            self.cpap_state_changed,
            self.input.cpap_sensor,
            attribute="state"
        )

    async def cpap_state_changed(self, entity, attribute, old, new, *_):
        if float(new) > 5:  # CPAP is running
            await self.turn_off("light.bedroom", transition=30)

2. Multi-Language Learning Game

Perhaps the most ambitious application is an interactive language learning system for my son Luca. It integrates a smart button (Hatch Baby Rest) with text-to-speech to teach vocabulary in multiple languages:

@dataclass
class LucaMultiLanguageLearningArgs:
    hatch_button: str
    media_player: str
    dictionary_path: str
    available_languages: Dict[str, Dict[str, str]]
    current_language: str = "french"

The system:

Loads JSON dictionaries for French, Spanish, Farsi, and other languages
Tracks recently used words to avoid repetition
Speaks word pairs (foreign word → English translation)
Plays Spotify playlists during specific hours
Handles Unicode characters gracefully for non-Latin scripts

3. Motion-Activated Devices

The MotionActivatedDevice class demonstrates the power of abstraction:

class MotionActivatedDevice(BaseApp):
    async def motion_detected(self, entity, attribute, old, new, *_):
        if new == "on" and self.should_activate():
            await self.activate_device()
            self.schedule_deactivation()

This single class powers multiple automations:

Bathroom lights that stay on longer during nighttime
Office fans that run when occupied
Display screens that wake on approach

4. Smart Environmental Control

The CarbonDioxydeRegulator monitors CO₂ levels and automatically controls ventilation:

async def regulate_co2(self, entity, attribute, old, new, *_):
    co2_level = float(new)
    if co2_level > 1000:
        await self.turn_on("fan.office")
    elif co2_level < 800:
        await self.turn_off("fan.office")

Testing Infrastructure

Quality is ensured through comprehensive testing. The repository includes pytest-based tests that validate argument parsing, state changes, and service calls:

def test_args_types():
    for app in apps:
        module = importlib.import_module(f"apps.{app}")
        assert hasattr(module.ArgClass, '__dataclass_fields__')

Enhanced Git Hooks for Quality

To maintain code quality, I've implemented a sophisticated pre-commit hook that goes beyond simple test execution. The hook provides:

Visual Progress Indicators: Green checkmarks (✓) for passing checks, red crosses (✗) for failures
Staged Test Results: Shows exactly which tests passed or failed with formatted output
Syntax Validation: Checks all Python files for syntax errors before commit
Debug Statement Detection: Warns about leftover print statements or pdb imports
Detailed Error Reporting: Highlights file paths and line numbers in test failures

The hook transforms the typical pytest output into a developer-friendly format that makes it immediately obvious what needs attention.

Metrics and Observability

Every automation reports metrics to a StatsD server, enabling monitoring of:

Automation execution frequency
State change patterns
Error rates and types
Performance metrics

This data feeds into Grafana dashboards, providing insights into home patterns and automation effectiveness.

Key Learnings

After a year of development, here are the key takeaways:

Start with a solid foundation: The BaseApp pattern has saved countless hours
Type hints are invaluable: Dataclasses catch configuration errors early
Test everything: Automated tests prevent regressions when updating complex automations
Monitor everything: Metrics reveal patterns you didn't know existed
Keep it maintainable: Clear naming and documentation matter when returning to code months later

Looking Forward

The system continues to evolve. Upcoming projects include:

Voice-activated scene controls using local speech recognition
Predictive heating/cooling based on weather forecasts and historical patterns
Integration with security cameras for person detection
Automated plant watering based on soil moisture sensors

Conclusion

AppDaemon has transformed my home into a responsive environment that adapts to our family's needs. From ensuring optimal sleep conditions to helping my son learn new languages, these automations have become an invisible but essential part of daily life.

The combination of Python's flexibility, Home Assistant's ecosystem, and AppDaemon's framework creates a powerful platform for home automation that goes far beyond simple "if this then that" rules.

The full source code is available on GitHub, and I welcome contributions and questions from the community. Whether you're looking to automate a single room or your entire home, I hope these patterns and examples help you build something amazing.

Have questions about specific automations or want to share your own AppDaemon projects? Feel free to reach out or open an issue on the repository.

Prompt Tracker: Turn Your Coding Sessions into a Star Wars Opening Crawl

Laurent Charignon — Fri, 17 Oct 2025 02:23:12 +0000

What if your coding sessions looked like a Star Wars opening crawl? Prompts scrolling through time, color-coded by quality, with dramatic bubbles floating across a timeline like ships in hyperspace?

Prompt Tracker transforms your Claude conversation history into an entertaining, interactive visualization—part productivity tool, part time-traveling dashboard, part cinematic experience. Watch your prompting habits unfold like an epic saga, rate your greatest hits, and learn from your mistakes—all while having way more fun than you should with a JSON log file.

It's both informative (analytics on your AI usage), productive (build a library of your best prompts), and genuinely entertaining (watching your 3am debugging sessions visualized as a chaotic cluster of red bubbles is oddly satisfying).

The Problem: Lost Conversations

Claude Code stores every conversation in ~/.claude/history.jsonl. While this is great for persistence, it's terrible for:

Learning from patterns: Which prompts led to the best outcomes?
Finding past solutions: "I remember asking about Docker networking last month..."
Tracking productivity: When do I have the most productive Claude sessions?
Building a prompt library: Which prompting techniques actually work for me?

Your conversation history is a goldmine of data about how you work with AI. But without tools to analyze it, those insights remain buried.

The Star Wars Effect: Your Coding Sessions as Cinema

The core feature is an interactive D3.js timeline that transforms your workday into a cinematic experience. Picture the Star Wars opening crawl, but instead of "A long time ago in a galaxy far, far away," it's "A productive Tuesday in your terminal..."

[Video: /videos/prompt-tracker-demo.mp4]

Each bubble is a prompt floating through time:

Size = Prompt length (that desperate 500-word debugging plea? You'll see it)
Color = Quality (green stars ⭐️, red disasters 💀, gray unknowns ❓)
Position = When you asked it (2am cluster? We've all been there)
Swim lanes = Projects (watch them flow like parallel storylines)

The fun part? You can zoom into the chaos, pan across your day like a camera sweep, filter by project to see one saga at a time, and click any bubble to relive the moment—complete with a side panel where you can rate it from 1 to 5 stars.

It's like watching a replay of your coding session, but with production value.

Here's what it looks like in action:

[Cast recording: /casts/prompt-tracker-cli-demo.cast]

The CLI in action: checking stats, listing prompts, and generating timelines

The interactive timeline showing a day's worth of conversations

How It Works

Prompt Tracker is a single Python script (no dependencies!) that syncs data from Claude's history file into a local SQLite database:

# Sync your Claude history
./prompt-tracker sync

# Generate today's timeline (auto-opens in browser)
./prompt-tracker timeline 2025-10-02

# Or start a web server for interactive browsing
./prompt-tracker serve --port 8080

# Rate your favorite prompts
./prompt-tracker rate 42 5

# Search for specific topics
./prompt-tracker list --search "docker"

# Filter by project
./prompt-tracker list --project /Users/you/repos/myproject

# Find your best work
./prompt-tracker list --min-rating 4

# Export as JSON for custom analysis
./prompt-tracker list --format json

Web Interface Mode

The serve command launches a local web server that turns prompt browsing into a fully interactive experience:

./prompt-tracker serve --port 8080

This opens an interface where you can:

Navigate between different dates with URL routing (/timeline/2025-10-02)
Rate prompts directly in the browser with AJAX updates
Filter and explore without regenerating static HTML files
Keep the server running while you work, refreshing to see new prompts

Think of it as the difference between watching a movie on DVD (static timelines) versus streaming (live web interface).

The Database Schema

At its core, Prompt Tracker maintains two simple tables:

# Prompts table - imported from history.jsonl
CREATE TABLE prompts (
    id INTEGER PRIMARY KEY,
    timestamp INTEGER UNIQUE NOT NULL,
    display TEXT NOT NULL,
    project TEXT,
    created_at DATETIME
)

# Ratings and notes - your annotations
CREATE TABLE prompt_metadata (
    prompt_id INTEGER PRIMARY KEY,
    rating INTEGER CHECK(rating >= 1 AND rating <= 5),
    note TEXT,
    updated_at DATETIME
)

This separation means your annotations never touch the original history file—all ratings and notes stay local.

Real-World Use Cases

1. Building a Prompt Library

After rating hundreds of prompts, patterns emerge. I noticed my 5-star prompts often share characteristics:

Specific context: "In this React app using TypeScript and Vite..."
Clear success criteria: "The function should handle these three edge cases..."
Example-driven: "Like this pattern we used in auth.ts..."

Now when I face a new problem, I search my highest-rated prompts for similar situations:

prompt-tracker list --min-rating 4 --search "typescript error handling"

This reveals my own best practices—not generic advice, but patterns that have worked specifically for my projects.

2. Debugging Prompt Engineering

Sometimes a prompt doesn't work. Was it too vague? Too specific? Missing context?

The timeline visualization makes this obvious. I can see:

Clusters of low-rated prompts → I was struggling with something
Long gaps between prompts → Complex task or Claude got stuck
Rapid-fire sequences → Iterative refinement working well

Color-coding by rating reveals temporal patterns—are my morning prompts more effective than late-night ones?

3. Tracking Productivity Patterns

The 24-hour timeline shows when I'm most productive with Claude:

Morning (9-11am): Complex architecture discussions
Afternoon (2-4pm): Bug fixes and refactoring
Evening (7-9pm): Experimental projects

This data-driven insight helps me schedule challenging work during high-effectiveness windows.

4. Knowledge Transfer to Memento

When I find a particularly valuable prompt, I can publish it to my Memento knowledge base:

prompt-tracker publish 42

This extracts the prompt and creates a Memento note with:

The original prompt text
Any ratings and notes I've added
Timestamp and project context
Tags for searchability

These notes become part of my long-term LLM memory, available in future Claude sessions.

The Architecture

Prompt Tracker embraces simplicity:

Single-file executable: The entire CLI is one Python script. No installation, no dependency hell—just chmod +x and run.

Zero dependencies for core features: SQLite is built into Python. JSON parsing is standard library. The CLI works anywhere Python 3.7+ exists.

Progressive enhancement: The timeline visualization loads D3.js from CDN. If you're offline, you still have full CLI functionality.

Local-first: Everything stays on your machine. No cloud services, no analytics, no data leaving your computer.

Statistics Dashboard

Beyond visualizations, Prompt Tracker provides statistical insights:

$ prompt-tracker stats

Total prompts: 1,247
Rated prompts: 89 (7.1%)
Average rating: 3.67

Rating distribution:
  ★☆☆☆☆    8  ████████
  ★★☆☆☆   14  ██████████████
  ★★★☆☆   31  ███████████████████████████████
  ★★★★☆   26  ██████████████████████████
  ★★★★★   10  ██████████

Top projects by prompt count:
  blog-automation: 156
  memento:        142
  appdaemon:      89

This reveals:

How much you rely on Claude
Whether you're consistently getting value (rating trends)
Which projects generate the most questions
Opportunities to improve (lots of 1-star prompts? time to refine your approach)

Advanced Features

Full-Text Search

SQLite's built-in text search makes finding past conversations instant:

# Find that Docker networking conversation from last month
prompt-tracker list --search "docker network" --date-from 2025-09-01

# All your highest-rated debugging sessions
prompt-tracker list --min-rating 4 --search "bug"

Notes and Context

Sometimes the prompt alone doesn't capture why it worked:

# Add context about what made this prompt effective
prompt-tracker note 42 "Great example of providing full error stack trace"

# Later, review notes on all your top prompts
prompt-tracker list --min-rating 4

Notes become a meta-layer of learning—documenting not just what you asked, but why it was effective.

Project Filtering

The timeline's project-based swim lanes let you focus on specific areas:

Click a project name to filter the timeline
See all conversations related to one codebase
Compare prompt patterns across different projects

Word Cloud Analysis

Bonus: Prompt Tracker includes scripts for linguistic analysis:

# Generate word cloud from all your prompts
python create_wordcloud.py

# Technical terms only (filters out common words)
python create_technical_wordcloud.py

These visualizations reveal your vocabulary patterns—do you ask about "bugs" or "issues"? "functions" or "methods"? The language you use affects Claude's responses.

Implementation Highlights

Efficient Syncing

The sync operation uses SQLite's INSERT OR IGNORE to handle the append-only history file:

def sync(self, history_path):
    cursor = self.conn.cursor()
    new_count = 0

    with open(history_path, 'r') as f:
        for line in f:
            entry = json.loads(line)
            cursor.execute("""
                INSERT OR IGNORE INTO prompts (timestamp, display, project, pasted_contents)
                VALUES (?, ?, ?, ?)
            """, (entry['timestamp'], entry['display'], entry.get('project', ''), ...))

            if cursor.rowcount > 0:
                new_count += 1

    return new_count

The UNIQUE constraint on timestamp ensures no duplicates, and INSERT OR IGNORE makes re-running sync idempotent.

Interactive Timeline Generation

The timeline uses D3.js with a custom time scale:

// Map timestamps to 24-hour positions
const xScale = d3.scaleLinear()
    .domain([0, 24 * 60 * 60 * 1000])  // 24 hours in milliseconds
    .range([0, width]);

// Color by rating
const colorScale = d3.scaleLinear()
    .domain([1, 3, 5])
    .range(['#ff4444', '#ffaa00', '#44ff44']);

Each prompt becomes an SVG circle with zoom/pan support and hover interactions.

Getting Started

Installation takes 30 seconds:

# Clone and make executable
git clone https://github.com/charignon/prompt_tracker.git
cd prompt_tracker
chmod +x prompt-tracker

# Sync your history
./prompt-tracker sync

# Generate today's timeline
./prompt-tracker timeline $(date +%Y-%m-%d)

The timeline opens automatically in your default browser.

Integration with Existing Tools

Prompt Tracker complements other productivity tools:

With Memento: Export valuable prompts as Memento notes for long-term memory

With Git: The SQLite database can be version-controlled, synced across machines

With Analytics: Export data as JSON for custom analysis in R, Python, etc.

With Note-taking: Copy high-rated prompts into documentation, READMEs, guides

What I've Learned

After tracking 1,200+ prompts over several months:

Rating prompts immediately is crucial. Wait a day and you forget what made it good or bad. Rate right after you see the response.

The best prompts are specific. 5-star prompts average 3x longer than 1-star prompts. Context matters.

Patterns emerge unexpectedly. I didn't realize how often I ask about "testing patterns" until the word cloud showed it as my #2 topic.

Visual feedback changes behavior. Seeing red clusters on the timeline makes me more thoughtful about next prompts.

Knowledge compounds. Exporting prompts to Memento creates a feedback loop—Claude gets better context, leading to better prompts, which get added to Memento...

Future Directions

Ideas I'm exploring:

AI-powered clustering: Use embeddings to group semantically similar prompts, revealing topics beyond keyword matching

Prompt templates: Extract patterns from 5-star prompts to generate templates for common tasks

Collaborative learning: Anonymous sharing of highly-rated prompts to learn from the community

Conversation threading: Link related prompts across sessions to show how solutions evolved

Export to Claude: Feed your best prompts back into Claude as examples of effective communication

Conclusion

Who knew analyzing your prompting habits could be this entertaining?

Prompt Tracker proves that productivity tools don't have to be boring. By turning your Claude conversation history into a cinematic timeline—complete with color-coded bubbles, zoom controls, and Star Wars-level drama—it makes self-reflection actually fun.

You get the productivity benefits (learning from patterns, building a prompt library, tracking what works), the analytical insights (which topics dominate, when you're most effective, how your style evolves), and the sheer entertainment value of watching your 3am debugging sessions play out like an action sequence.

Every prompt is a tiny experiment. With Prompt Tracker, you don't just see which experiments succeed—you watch them unfold across time like chapters in an epic saga. And honestly? That's way more engaging than staring at a JSON file.

So go ahead: turn your coding sessions into cinema. Rate your greatest hits. Learn from your disasters. And maybe, just maybe, have some fun while becoming a better prompt engineer.

Prompt Tracker is open source at github.com/charignon/prompt_tracker. It's a single Python script with zero dependencies. MIT licensed—use freely and contribute improvements.