Forem: Norrie Taylor

Full-Proof: Distillery 0.4.0 and the Agent Memory Problem

Norrie Taylor — Mon, 20 Apr 2026 05:18:38 +0000

Full-Proof: Distillery 0.4.0 and the Agent Memory Problem

!!! note "Release summary"
Distillery 0.4.0 shipped April 19, 2026. It's the release where the
MCP tool surface becomes a public contract: stable tool names,
consistent error codes, predictable response shapes. The release
body lives on the GitHub release page.

A few weeks ago I wrote the first post in this series about why I built Distillery. The short version: the knowledge your team generates while working with Claude Code is mostly evaporating, and the fix is to capture it where the work happens, not in a separate tool.

That was the capture story. This post is about the memory story, and why I spent a release hardening the surface instead of shipping features.

What Karpathy's LLM Wiki actually argues

The shape of the agent-memory conversation changed in April. Karpathy posted his "LLM Wiki" idea (HN discussion), and within a week the pattern had been cloned into at least five projects: Knowledge Raven, Memoriki, OpenTrace KG-MCP, OptiVault, and a handful of Obsidian harness variants. The shape of the argument, simplified:

Raw sources are not useful on their own. Chat logs, PRs, tickets, and docs are too lossy to reason over directly.
The intermediate layer is an LLM-maintained wiki that compounds synthesis over time. You don't re-derive context per query. You maintain a living artifact.
A query layer sits above that, CLAUDE.md-style and schema-first, trimmed for whatever context budget the model in use gives you.

My own investigation of the memory research last week kept landing on a related but distinct conclusion. Three-tier layered memory (fast index on top, episodes in the middle, raw transcripts on demand) is now the default pattern in serious memory systems. LongMemEval-S at 73% is a reasonable community-stack baseline (Vektori, BGE-M3 plus Flash-2.5-lite), and MemPalace's ChromaDB-backed stack has since posted 96.6% on LongMemEval R@5 in raw mode, which is the kind of jump worth watching. The leaked Claude Code memory internals, three subsystems plus a "Dream" consolidation pass, is almost exactly this architecture. The failures in agent memory aren't mostly retrieval: they sit in the reasoning layer between retrieval and action, and in the moment a session ends and all the volatile context evaporates.

Distillery is built around Karpathy's compile-and-query pattern, not the Vektori three-tier stratification. /distill writes, /recall queries, /pour synthesizes. Under the hood it's a single DuckDB table with typed entries, hybrid BM25 plus vector search, and deduplication thresholds, not L0/L1/L2 access tiers. The LongMemEval benchmark and transcript-mining integration we're tracking in issue #233 is how we plan to measure ourselves against the three-tier systems on the same yardstick.

The operational side of the wiki pattern is where the interesting work actually lives, though, and it's where contributor feedback shaped what 0.x shipped. Every entry carries provenance (author, session ID, source). Every entry can be corrected without losing its history. Entries can be marked expired or unreliable without being deleted (issue #177). Those aren't decorative. They're the primitives that let a shared knowledge base admit it was wrong, which is what separates a living memory layer from a static dump.

The memory layer is load-bearing

If you're building agents, the memory layer sits under everything else. Planners read from it. Tools write to it. Evals depend on it being deterministic across runs. When Claude Managed Agents launched in April with memory labeled "research preview," the entire community ecosystem (memsearch, Honcho, Hippo, Memoriki, thebrain, Knowledge Raven, Octopoda, MemPalace) rushed to fill the gap, because nobody ships serious agent work on top of a preview.

The same argument applies one layer down. If the memory layer you build on has drifting tool names, inconsistent error codes, response envelopes that change shape between minor versions, and defaults that flood your context window without warning, every downstream agent inherits that instability. Planners inherit it. Evals inherit it. Shared team knowledge bases inherit it.

That's what this release is about.

The stability pledge

From 0.4.0 onward, the MCP tool surface is a public contract. That covers:

Tool names
Parameter shapes and defaults
Error codes
Response envelopes

Breaking changes require a major version bump. Evolution happens additively: new optional parameters, new tools, new output modes behind an explicit opt-in. Skills and plugins can declare min_server_version with confidence that the surface they compiled against will still be there.

This is a commitment, not a feature flag. If something has to break, it breaks on a major, with a deprecation window and loud warnings on the deprecated path first.

What shipped in 0.4.0

Sixty-plus PRs landed under the staging/api-hardening line. The narrative categories:

API surface hardening. distillery_store's dedup_action now means what it says (#332). "stored" means a new row was written. "merged" and "linked" are reserved for true folds, not informational similarity hints; the similarity signal lives on existing_entry_id and similarity where it belongs. distillery_list defaults to output_mode="summary" (#311), which shrinks a typical limit=50 gh-sync response from roughly 300 KB of content to a few kilobytes of titles, tags, and previews. Error codes consolidated on a single ToolErrorCode enum across every tool. resolve_review is idempotent for no-op transitions (#333). Canonical entry_type values are suggested on INVALID_PARAMS.

Storage quality. Aborted transactions roll back and surface query failures in distillery_status instead of swallowing them (#363). WAL is flushed after writes and preserved on recovery, with signature matching (#346). FTS WAL replay no longer fails on cold start (#349). The "ghost entry ID" class of bug is gone. storage_bytes scopes to the filtered set when filters are active, so usage numbers stop lying about what you actually searched.

Feeds. gh-sync now runs async via server-side background jobs (#348), so long syncs don't block the caller. Poll distillery_sync_status for progress. Liveness fields are populated across poll and sync paths (#334), so /watch reports accurate freshness. Crucially for anyone using ambient intelligence: feed entries are now excluded from the interest profile, which means /radar no longer drifts toward whatever feed happens to be loudest that week.

Scheduling. /setup and /watch now configure Claude Code routines (#272) instead of CronCreate jobs or GitHub Actions webhook scheduling. Three routines ship: hourly feed poll, daily stale check, weekly maintenance. The webhook endpoints (/hooks/poll, /hooks/rescore, /hooks/classify-batch) are deprecated and log warnings when hit. /api/maintenance is retained for orchestrated ops.

What this unlocks

Stability is boring on its own. What makes it worth writing about is everything it lets other people do.

Dashboards. There's a SvelteKit dashboard in progress (the dashboard/ directory in the repo is the seed). With the MCP surface contracted, the dashboard can ship pinned against min_server_version=0.4.0, and nothing downstream will break when internal implementations move.

Community plugins. If you want to build on top of the Distillery MCP tools, you can pin against the 0.4.0 contract with the same confidence you'd pin against any public SDK.

Memory-layer integrations. LangChain orchestrators, Letta-style stateful-agent frameworks, and any MCP-native runtime can now treat Distillery as a durable backend instead of a moving target.

Stability is the prerequisite for anyone else building on top.

Try it

uvx distillery-mcp@0.4.0
# or
pip install distillery-mcp==0.4.0

The hosted demo server at https://distillery-mcp.fly.dev/mcp has been redeployed to match. If you're coming from an older version, nothing you wrote against the 0.3.x surface should break. If it does, that's now a bug against the pledge, not a design decision, and I want to hear about it.

The full release notes are on the GitHub release. The discussion thread lives in GitHub Discussions.

Karpathy's point about the LLM Wiki is that knowledge compounds when the layer under your agent is treated as infrastructure, not as something you rebuild every sprint. That's the model 0.4.0 commits to. Pour a full-proof one.

Building a Second Brain for Claude Code

Norrie Taylor — Sat, 04 Apr 2026 03:51:12 +0000

Building a Second Brain for Claude Code

Every team I've worked on has the same problem. Someone makes a decision — a good one, usually — with a lot of context behind it. Why we chose DuckDB over Postgres. Why we inverted that dependency. Why the authentication flow goes through a middleware layer instead of a decorator. And then, six months later, someone asks "why does this work this way?" and the answer is... gone. Buried in Slack. Lost in a PR description nobody remembers. Living only in the head of the person who wrote it, if they're still on the team.

In the age of agentic development this fundamental problem has only been exacerbated. The time it takes to code up an epic is no longer the long pole in the SDLC tent. Knowledge is being generated at exponential rates and no one seems to be able to keep up.

The problem isn't generating knowledge. It's retaining it.

I started building Distillery to solve this problem for my own work with Claude Code. It turned into something more interesting than I expected.

The Problem: Knowledge Lives in Chat

When you work with Claude Code all day, you're having real conversations. You're debugging, designing, deciding. Some of those conversations contain genuinely valuable knowledge — not just the code produced, but the reasoning behind it. The context that makes the code make sense.

But that context lives in your chat history. It's ephemeral by design. Next session, fresh context. Next week, you can't find it. Next month, a new team member joins and has no idea why things work the way they do.

The standard answers — wikis, Confluence, Notion — all share the same failure mode: they require friction to populate. A wiki doesn't help if writing to it is slower than the pace of actual work. Documentation that requires a separate workflow doesn't get written. Or it does, once, and then it rots.

The irony is that the knowledge gets generated constantly. Every time you and Claude Code figure out an approach, that's value. Every time you decide not to do something and explain why, that's value. It just doesn't get captured.

The Insight: Capture Where Work Happens

The fix isn't a better wiki. The fix is making capture so frictionless that it happens at the moment of insight, without switching context.

If you're already talking to Claude Code, the capture tool should be Claude Code. Not a sidebar. Not a separate app. A slash command.

/distill "Decided to use DuckDB for local storage because it supports vector similarity search natively via the VSS extension. Postgres would require pgvector and a separate service. For local dev and small-team deployments, DuckDB's file-based model is significantly simpler. Revisit if we need multi-writer concurrency."

That's it. The decision, the reasoning, the trade-offs, captured in the moment, from inside the tool you're already using. And because it's semantic search, you can find it later in natural language:

/recall why did we choose DuckDB

This is the core insight Distillery is built on: capture-at-source, inside the assistant.

What Distillery Does

Distillery is a knowledge base system for Claude Code. It stores, searches, and classifies knowledge entries using DuckDB with vector similarity search. It includes ambient intelligence that monitors GitHub repos and RSS feeds for relevant developments. It exposes 18 MCP tools and 10 Claude Code slash commands. Install it with pip install distillery-mcp.

Here are the commands:

Skill	What it does
`/distill`	Capture knowledge from the current session
`/recall`	Semantic search across all knowledge
`/pour`	Multi-pass synthesis — summarize across many entries
`/bookmark`	Store URLs with auto-generated summaries
`/minutes`	Capture meeting notes with append support
`/classify`	Review and triage the review queue
`/watch`	Manage monitored GitHub repos and RSS feeds
`/radar`	Ambient digest of what's changed in your feeds
`/tune`	Adjust relevance scoring thresholds
`/setup`	Onboarding wizard for MCP connectivity

The deduplication system is one of the things I'm most happy with. When you try to store something, Distillery checks semantic similarity against existing entries. Above 95% similarity, it skips — you already have this. Between 80-95%, it offers to merge — same concept, maybe new detail. Between 60-80%, it links — related, worth knowing about. This keeps the knowledge base from getting cluttered with near-duplicates, which is the main reason most personal knowledge systems eventually become unusable. (Details in the dedup docs.)

The /pour command is the one that feels most like magic. You ask a synthesis question:

/pour how does our authentication system work?

And it does a multi-pass synthesis: first pass retrieves the most relevant entries via semantic search, second pass extracts the key points, third pass synthesizes a coherent answer with citations showing which entries contributed what. For complex topics where knowledge is distributed across many entries, this is significantly better than any single search result. See the pour docs for examples.

Ambient Intelligence: The Game Changer

The feature that changed how I work isn't capture or search — it's ambient intelligence. The /watch and /radar commands turn Distillery from a passive knowledge store into something that actively works for you.

Here's the idea: you tell Distillery to watch sources — GitHub repos, RSS feeds, subreddits — and it polls them on a schedule. But it doesn't just dump everything into a feed. It scores every item for relevance against your existing knowledge base using embedding similarity. If an item is semantically close to things you've already captured, it surfaces. If it's noise, it's filtered out.

/watch add https://github.com/anthropics/claude-code
/watch add https://simonwillison.net/atom/everything

The scoring pipeline has layers that make it genuinely smart:

Interest-boosted relevance. Distillery mines your knowledge base to build an interest profile — your most-used tags (recency-weighted, so recent work matters more than old entries), your bookmarked domains, your tracked repos, your expertise areas. When a feed item matches entries tagged with your top interests, the relevance score gets boosted by up to 15%. This creates a positive feedback loop: the more you capture about a topic, the better Distillery gets at finding related content.

Trust-weighted sources. Not all sources are equal. You can set a trust_weight per source — your team's own repos at 1.0, a secondary blog at 0.7. The poller multiplies all scores by trust weight, giving you a tunable signal-to-noise ratio.

Two-tier thresholds. Items scoring above 0.85 trigger alerts. Items between 0.60 and 0.85 are quietly stored for the next digest. Below 0.60, they're dropped. This prevents alert fatigue while still capturing everything worth knowing. Both thresholds are adjustable via /tune.

Smart deduplication. When the same story appears in three different feeds, Distillery catches it. A fast external ID check filters exact duplicates, then semantic dedup at 0.95 similarity catches rephrased duplicates. Batch-aware dedup prevents items from the same poll run from blocking each other.

When you're ready for your briefing:

/radar

You get a synthesized digest grouped by source, with cross-cutting themes highlighted and links to the original items. Add --suggest and Distillery will recommend new sources based on your interest profile — repos you reference but don't track, domains you bookmark but don't follow.

The whole system runs on a schedule — hourly polls, daily rescoring, weekly maintenance — without any manual intervention. Your knowledge base gets smarter over time because the interest profile evolves as you capture more. I've caught breaking changes in dependencies, relevant new tools, and team-relevant discussions days before I would have otherwise. See the feed system docs and radar docs for the full details.

Architecture: Four Layers, Clean Separation

The architecture is four layers:

The skills are Markdown files with YAML frontmatter — they're instructions for Claude Code, not code. The MCP server is where the actual logic lives. The core protocols are Python Protocol classes (structural subtyping, not ABCs) that define the contract between layers. The backends implement those protocols.

I chose MCP for the transport layer for a specific reason: it means Distillery tools are available to any MCP-compatible client, not just Claude Code. And because FastMCP handles the wire protocol, I can focus on the tool logic.

The transport choice — stdio or HTTP — matters for deployment. Stdio is simpler: the MCP server runs as a subprocess of Claude Code, single user, no authentication needed. HTTP transport enables multi-user deployment: multiple Claude Code instances connect to a shared server, authentication is handled via GitHub OAuth, and the knowledge base is genuinely shared.

# Stdio: local single-user
distillery-mcp

# HTTP: team deployment with GitHub OAuth
distillery-mcp --transport http --port 8000

For storage, DuckDB with the VSS extension handles both structured queries and vector similarity search in a single file. No separate vector database. No Postgres. For local dev and small-team deployment, this is the right trade-off — simple operations (backup is cp distillery.db distillery.db.bak), good enough performance, no infrastructure overhead.

The embedding layer is pluggable. Default is Jina AI's embedding API (good quality, generous free tier). OpenAI's embeddings work too. The EmbeddingProvider protocol makes it straightforward to add others.

Team Access: GitHub OAuth

Running Distillery locally is a one-person knowledge base. The interesting case is team access: shared knowledge that everyone on the team can read and write.

The HTTP transport with GitHub OAuth handles this. You deploy, configure GitHub OAuth credentials, and every team member connects their Claude Code to the same server. Knowledge captured by one person is searchable by everyone. /pour synthesizes across the whole team's collective knowledge.

What's Next

Distillery v0.1.0 is now on PyPI — pip install distillery-mcp. With the package published, the next priorities are:

MCP directory listings — Submitting to the major MCP directories (Glama, mcp.so, Smithery, modelcontextprotocol.io) so teams can discover Distillery where they're already looking for MCP servers.

Team skills — The next wave of skills is built around shared knowledge bases. /whois builds an evidence-backed expertise map from contributions — who on the team knows what, backed by the entries they've captured. /digest generates team activity summaries so everyone stays aware of what others are learning. /briefing provides a dashboard view of the team's collective knowledge state. /investigate compiles deep context on a domain by synthesizing across the whole team's entries. /gh-sync connects GitHub issues and PRs as knowledge sources. These are designed for teams running a shared Distillery instance with GitHub OAuth, where the real value compounds — every team member's captures make everyone else's searches and syntheses better. See the roadmap for the full list.

Classification improvements — The LLM-based classification engine uses a confidence threshold (default 60%) to decide what goes to the review queue. I want to experiment with few-shot examples in the classification prompt to improve precision on domain-specific knowledge.

The full roadmap is at norrietaylor.github.io/distillery/roadmap/.

Try It

From the Claude Code plugin marketplace:

claude plugin marketplace add norrietaylor/distillery
claude plugin install distillery

Then run the onboarding wizard from a Claude Code session:

/setup

The wizard checks MCP connectivity, detects whether you're on stdio or HTTP, and walks you through configuration. If you want to connect to the demo server to try it without any setup, /setup handles that too. (The demo server at distillery-mcp.fly.dev is for evaluation only — don't store anything sensitive there.)

You'll need a Jina AI API key for embeddings (free tier is generous) and then to configure the MCP server in Claude Code's settings. The Local Setup Guide walks through the full process.

The thing I keep coming back to is that the knowledge problem is fundamentally a friction problem. Every system that requires you to leave your current context to capture knowledge will fail. The capture has to be where the work is.

For teams using Claude Code, that's already solved: the capture tool is Claude Code. Distillery just gives that capture tool a place to store things, and a way to get them back.

I'd love to hear what use cases you're running into — find me on GitHub at norrietaylor/distillery, check out the full documentation, or drop a comment below.