<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ted Murray</title>
    <description>The latest articles on Forem by Ted Murray (@tadmstr).</description>
    <link>https://forem.com/tadmstr</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3829882%2Fdb4e2cf2-7eef-42a4-82d8-d4776fcd3222.png</url>
      <title>Forem: Ted Murray</title>
      <link>https://forem.com/tadmstr</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/tadmstr"/>
    <language>en</language>
    <item>
      <title>Your AI Agent Has Your API Keys (And So Does Every Other Agent)</title>
      <dc:creator>Ted Murray</dc:creator>
      <pubDate>Tue, 21 Apr 2026 13:28:36 +0000</pubDate>
      <link>https://forem.com/tadmstr/your-ai-agent-has-your-api-keys-and-so-does-every-other-agent-1h51</link>
      <guid>https://forem.com/tadmstr/your-ai-agent-has-your-api-keys-and-so-does-every-other-agent-1h51</guid>
      <description>&lt;p&gt;Open your Claude Code &lt;code&gt;settings.json&lt;/code&gt;. Look at the &lt;code&gt;env&lt;/code&gt; blocks under your MCP servers. Every API key, every database token, every webhook URL you've put there — your agent has all of them, right now, in its process environment.&lt;/p&gt;

&lt;p&gt;That might sound obvious. You configured it that way. But think about what it actually means.&lt;/p&gt;

&lt;p&gt;You've got an MCP server for file operations and one for notifications. The notification server needs a webhook URL. The file server doesn't. But Claude Code doesn't scope credentials to individual servers — it loads the full environment and passes it to the session. Your agent has the webhook URL even if it never sends a notification. It has database tokens for backends it never queries. It holds the Grafana service account token whether or not it ever touches a dashboard.&lt;/p&gt;

&lt;p&gt;This is fine if you trust the agent completely and nothing ever goes wrong. But "nothing ever goes wrong" is a strange assumption to build on. A hallucinated tool call, a prompt injection in a tool response, a confused agent that decides to "help" by writing to a backend it shouldn't know about — the blast radius isn't one credential. It's every credential you've configured across every MCP server.&lt;/p&gt;

&lt;p&gt;And that's with a single agent. Add more and it gets worse.&lt;/p&gt;




&lt;h2&gt;
  
  
  It gets worse at scale
&lt;/h2&gt;

&lt;p&gt;I was designing the build layer of &lt;a href="https://github.com/TadMSTR/homelab-agent" rel="noopener noreferrer"&gt;homelab-agent&lt;/a&gt; — a platform where Claude Code agents run durable, multi-phase infrastructure builds. The design called for agent pools: multiple instances of the same agent type running in parallel, each working on a different build phase.&lt;/p&gt;

&lt;p&gt;The single-agent credential problem multiplied immediately. Every agent in the pool holds every credential. But new problems appeared too:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool visibility.&lt;/strong&gt; A read-only research agent sees write tools for infrastructure backends it has no business touching. Every agent carries the full tool surface, including everything that can cause damage if called incorrectly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resource collisions.&lt;/strong&gt; No boundaries between agent workspaces. Agent A can read files Agent B wrote. Two agents running in parallel can overwrite each other's working data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Audit fragmentation.&lt;/strong&gt; Tool calls are scattered across logs from a dozen server processes, if they're logged at all. Reconstructing what a specific agent did is manual work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token overhead.&lt;/strong&gt; Every agent session loads tool schemas from every configured MCP server. With 12 servers contributing their full tool lists, you're burning 15–30K tokens per session before the agent does anything. At 20 concurrent agents, that's 300–600K tokens of pure initialization overhead — just so each agent can be told about tools it'll never use.&lt;/p&gt;

&lt;p&gt;I looked at what existed. Aggregation gateways combine servers but don't scope anything. Access control proxies filter which tools an agent can call, but filtering a tool doesn't prevent Agent A from reading Agent B's files through the tools it &lt;em&gt;is&lt;/em&gt; allowed to use. Enterprise gateways solve governance at scale, but they assume cloud deployment and a team — not a single operator running a homelab.&lt;/p&gt;

&lt;p&gt;Nothing combined all four: &lt;strong&gt;tool filtering + resource scoping + credential isolation + unified audit logging&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building the fix with the thing it fixes
&lt;/h2&gt;

&lt;p&gt;I asked Claude what a proper tool management framework for multi-agent setups should look like. It immediately understood the scope of the problem and what solving it completely would require.&lt;/p&gt;

&lt;p&gt;That conversation became &lt;a href="https://github.com/TadMSTR/scoped-mcp" rel="noopener noreferrer"&gt;scoped-mcp&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here's the part that still feels slightly recursive: I built it using the same multi-agent pattern it's designed to protect. A research agent evaluated the problem space — existing MCP gateways, scoping patterns, credential isolation approaches. A dev agent implemented the code. Each agent ran with scoped access to only the resources it needed for its role.&lt;/p&gt;

&lt;p&gt;The tool was built by agents operating under the exact constraints it enforces.&lt;/p&gt;




&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;One &lt;code&gt;scoped-mcp&lt;/code&gt; process per agent, started at session time. The agent connects to it over stdio the same way it connects to any MCP server.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent process (AGENT_ID=build-01, AGENT_TYPE=build)
    │
    ▼
┌────────────────────────────────────────┐
│  scoped-mcp                            │
│                                        │
│  ① Load manifest for AGENT_TYPE        │
│  ② Register only the allowed modules   │
│  ③ Inject credentials into modules     │
│  ④ Every tool call:                    │
│     → enforce resource scope           │
│     → execute tool logic               │
│     → write audit log entry            │
└────────────────────────────────────────┘
    │           │           │
    ▼           ▼           ▼
 filesystem   sqlite      ntfy
 (scoped)    (scoped)   (scoped)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Manifests&lt;/strong&gt; declare what an agent type is allowed to do. A YAML file per agent role. Nothing outside the manifest loads — tools that aren't listed don't exist from the agent's perspective.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# manifests/research-agent.yml&lt;/span&gt;
&lt;span class="na"&gt;agent_type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;research&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Read-only&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;research&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;agent"&lt;/span&gt;

&lt;span class="na"&gt;modules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;filesystem&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;read&lt;/span&gt;
    &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;base_path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/data/agents&lt;/span&gt;

  &lt;span class="na"&gt;sqlite&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;read&lt;/span&gt;
    &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;db_dir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/data/sqlite&lt;/span&gt;

  &lt;span class="na"&gt;ntfy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;topic&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research-{agent_id}"&lt;/span&gt;
      &lt;span class="na"&gt;max_priority&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;high&lt;/span&gt;

&lt;span class="na"&gt;credentials&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;env&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set &lt;code&gt;mode: read&lt;/code&gt; and only read tools register. The agent can't call &lt;code&gt;write_file&lt;/code&gt; or &lt;code&gt;execute&lt;/code&gt; because those tools were never mounted. It's not access control layered on top — the tools literally don't exist in the agent's session.&lt;/p&gt;

&lt;p&gt;Compare what two different manifests produce:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;research-agent.yml          →   4 tools registered&lt;/span&gt;
  &lt;span class="s"&gt;filesystem&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;read              read_file, list_dir&lt;/span&gt;
  &lt;span class="s"&gt;sqlite&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;read                  query&lt;/span&gt;
  &lt;span class="s"&gt;ntfy                          ntfy_send&lt;/span&gt;

&lt;span class="s"&gt;build-agent.yml             →   8 tools registered&lt;/span&gt;
  &lt;span class="s"&gt;filesystem&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;write             read_file, list_dir, write_file, delete_file&lt;/span&gt;
  &lt;span class="s"&gt;sqlite&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;write                 query, execute&lt;/span&gt;
  &lt;span class="s"&gt;ntfy                          ntfy_send&lt;/span&gt;
  &lt;span class="s"&gt;slack_webhook                 slack_send&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same framework, same codebase — completely different tool surfaces. The research agent has no way to write files, execute SQL, or post to Slack. Those capabilities don't exist in its session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resource scoping&lt;/strong&gt; is automatic. The filesystem module applies &lt;code&gt;PrefixScope&lt;/code&gt; — every path resolves under &lt;code&gt;agents/{agent_id}/&lt;/code&gt;. Path traversal attacks (&lt;code&gt;../&lt;/code&gt;) are caught by resolving to absolute paths before comparing. Symlink escapes are caught by walking each component and checking whether any symlink target resolves outside the agent root.&lt;/p&gt;

&lt;p&gt;The SQLite module gives each agent its own database file at &lt;code&gt;{db_dir}/agent_{agent_id}.db&lt;/code&gt;. Two agents can't read or write each other's data regardless of what SQL they construct. The module also parses SQL with sqlglot to block &lt;code&gt;PRAGMA&lt;/code&gt;, &lt;code&gt;ATTACH&lt;/code&gt;, &lt;code&gt;DETACH&lt;/code&gt;, &lt;code&gt;DROP&lt;/code&gt;, and multi-statement batches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Credential injection&lt;/strong&gt; happens at the proxy layer. API keys, tokens, webhook URLs — loaded once by the &lt;code&gt;scoped-mcp&lt;/code&gt; process from environment variables or a secrets file. Modules receive credentials through their context. The agent process never sees them. If you try to read &lt;code&gt;INFLUXDB_TOKEN&lt;/code&gt; from the agent's environment, it won't be there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Audit logging&lt;/strong&gt; produces two structured JSON-L streams: one for what agents did (every tool call, every scope check), one for what the server did (startup, config, errors). Credentials are automatically redacted — any key ending in &lt;code&gt;_TOKEN&lt;/code&gt;, &lt;code&gt;_PASSWORD&lt;/code&gt;, &lt;code&gt;_SECRET&lt;/code&gt;, &lt;code&gt;_KEY&lt;/code&gt; gets replaced with &lt;code&gt;&amp;lt;redacted&amp;gt;&lt;/code&gt; before it hits the log.&lt;/p&gt;




&lt;h2&gt;
  
  
  Seeing it work
&lt;/h2&gt;

&lt;p&gt;Three infrastructure modules, one agent, one workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─ ops-agent (AGENT_ID=ops-01) ─────────────────────────────────────┐
│                                                                    │
│  1. influxdb_query(bucket="metrics",                              │
│       filters=[{"field": "_measurement",                          │
│                 "op": "==", "value": "docker_cpu"}])              │
│     → discovers container X averaging 94% CPU                     │
│                                                                    │
│  2. grafana_create_dashboard(                                      │
│       title="Container Health",                                   │
│       panels=[{"title": "CPU by Container", ...}])                │
│     → dashboard created in folder agent-ops-01/                   │
│                                                                    │
│  3. ntfy_send(title="High CPU: container X",                      │
│       message="Averaging 94% over last hour.")                    │
│     → operator receives push notification                         │
│                                                                    │
└────────────────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent queried metrics in the buckets it was allowed to see. It built a Grafana dashboard in its own folder — it can't touch operator dashboards or other agents' dashboards. It sent an alert through the ntfy topic assigned in its manifest.&lt;/p&gt;

&lt;p&gt;At no point did it hold an InfluxDB token, see a Grafana API key, or know the ntfy server URL. A second agent running in parallel has a completely separate scope. They can't collide.&lt;/p&gt;




&lt;h2&gt;
  
  
  The audit that proved it needed to exist
&lt;/h2&gt;

&lt;p&gt;I ran a security audit against v0.1.0 the same day it shipped. 18 findings. One critical, three high, eight medium, six low.&lt;/p&gt;

&lt;p&gt;The critical finding: SQLite isolation was broken. The original design used schema-level scoping in a shared database file. An agent could issue an unqualified table reference and resolve against another agent's schema. The fix was simple and total — give each agent its own database file. No shared state, no schema tricks.&lt;/p&gt;

&lt;p&gt;The high findings included:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Flux injection in InfluxDB&lt;/strong&gt; — raw Flux query strings accepted from agents. Replaced with structured &lt;code&gt;{field, op, value}&lt;/code&gt; filter dicts, validated and escaped before rendering.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SSRF gaps in the HTTP proxy&lt;/strong&gt; — the blocklist missed IPv6-mapped IPv4, link-local, CGNAT, and NAT64 ranges. DNS rebinding attacks could bypass the allowlist between init and invocation. Fixed with per-request re-resolution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A decorator that lied&lt;/strong&gt; — the &lt;code&gt;@audited&lt;/code&gt; wrapper was documented as enforcing scope but never actually called &lt;code&gt;enforce()&lt;/code&gt;. The fix was honest: remove the false claim, make the contract explicit — modules are responsible for calling &lt;code&gt;enforce()&lt;/code&gt; in every tool method.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All 18 findings were remediated and v0.2.0 shipped the same day. v0.2.1 and v0.3.0 audits came back clean.&lt;/p&gt;

&lt;p&gt;I publish the full audit history in &lt;code&gt;docs/security-audit.md&lt;/code&gt;. Not because it makes the project look polished — it doesn't. It makes it look honest. When a tool's core value is security, showing the receipts matters more than showing a clean record.&lt;/p&gt;




&lt;h2&gt;
  
  
  What ships with it
&lt;/h2&gt;

&lt;p&gt;Ten built-in modules:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Storage&lt;/strong&gt; — &lt;code&gt;filesystem&lt;/code&gt; (read, write, list, delete within a scoped directory tree), &lt;code&gt;sqlite&lt;/code&gt; (per-agent database with SQL validation)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Notifications&lt;/strong&gt; — &lt;code&gt;ntfy&lt;/code&gt;, &lt;code&gt;smtp&lt;/code&gt;, &lt;code&gt;matrix&lt;/code&gt;, &lt;code&gt;slack_webhook&lt;/code&gt;, &lt;code&gt;discord_webhook&lt;/code&gt; (write-only by design — agents send alerts, they never see webhook URLs or SMTP passwords)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Infrastructure&lt;/strong&gt; — &lt;code&gt;http_proxy&lt;/code&gt; (allowlisted outbound HTTP with SSRF prevention), &lt;code&gt;grafana&lt;/code&gt; (dashboard CRUD scoped to an agent-owned folder), &lt;code&gt;influxdb&lt;/code&gt; (time-series query/write restricted to an allowlisted bucket set)&lt;/p&gt;

&lt;p&gt;Writing a custom module is about 20 lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scoped_mcp.modules._base&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ToolModule&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scoped_mcp.scoping&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;NamespaceScope&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RedisModule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ToolModule&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;redis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;scoping&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;NamespaceScope&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;required_credentials&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REDIS_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent_ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;redis.asyncio&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;aioredis&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_redis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aioredis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REDIS_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="nd"&gt;@tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get a value (scoped to agent namespace).&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;scoped_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scoping&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent_ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scoped_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;write&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Set a key-value pair (scoped to agent namespace).&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;scoped_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scoping&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent_ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scoped_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ex&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ttl&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add it to a manifest with &lt;code&gt;mode: read&lt;/code&gt; and only &lt;code&gt;get_key&lt;/code&gt; registers. &lt;code&gt;set_key&lt;/code&gt; doesn't exist from the agent's perspective.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;scoped-mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Point Claude Code at it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tools"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"scoped-mcp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"--manifest"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"manifests/research-agent.yml"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"AGENT_ID"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"research-01"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"AGENT_TYPE"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"research"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The repo includes a &lt;a href="https://github.com/TadMSTR/scoped-mcp/blob/main/examples/claude-code/multi-agent-setup.md#verifying-isolation" rel="noopener noreferrer"&gt;5-minute isolation verification walkthrough&lt;/a&gt; — you can confirm filesystem scoping and credential non-exposure without reading a line of source code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/TadMSTR/scoped-mcp" rel="noopener noreferrer"&gt;github.com/TadMSTR/scoped-mcp&lt;/a&gt; — MIT licensed, Python 3.11+, on PyPI.&lt;/p&gt;

&lt;p&gt;If you're running a single Claude Code session, you probably don't need this yet. If you're running multiple agents with defined roles and they're all sharing the same tool surface — the access problem is already there. You just might not have looked at it yet.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Read the full series:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part 1: &lt;a href="https://dev.to/tadmstr/i-built-an-agentic-infrastructure-platform-in-42-days-im-a-windows-sysadmin-45lg"&gt;I Built an Agentic Infrastructure Platform in 42 Days&lt;/a&gt; — the origin story&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part 2: &lt;a href="https://dev.to/tadmstr/i-built-an-ai-memory-system-because-my-brain-needed-it-first-glc"&gt;I Built an AI Memory System Because My Brain Needed It First&lt;/a&gt; — the memory deep dive&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part 3: &lt;a href="https://dev.to/tadmstr/how-to-give-claude-code-a-memory-197l"&gt;How to Give Claude Code a Memory&lt;/a&gt; — the practical how-to&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part 4: &lt;a href="https://dev.to/tadmstr/im-designing-a-platform-i-cant-build-alone-thats-the-point-3ai9-temp-slug-1965355"&gt;I'm Designing a Platform I Can't Build Alone&lt;/a&gt; — cognitive augmentation&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part 5: &lt;a href="https://dev.to/tadmstr/what-actually-survived-a-memory-system-retrospective-7j5"&gt;What Actually Survived: A Memory System Retrospective&lt;/a&gt; — 10 weeks in production&lt;/em&gt;&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>security</category>
      <category>claude</category>
    </item>
    <item>
      <title>I Built an AI Memory System. Then I Forgot About It.</title>
      <dc:creator>Ted Murray</dc:creator>
      <pubDate>Sun, 19 Apr 2026 15:43:07 +0000</pubDate>
      <link>https://forem.com/tadmstr/i-built-an-ai-memory-system-then-i-forgot-about-it-1g6d</link>
      <guid>https://forem.com/tadmstr/i-built-an-ai-memory-system-then-i-forgot-about-it-1g6d</guid>
      <description>&lt;p&gt;The memory system I built for Claude has been running since February.&lt;/p&gt;

&lt;p&gt;Every prior phase — from the first CLAUDE.md file to the nightly sync pipeline to the knowledge graph — followed the same pattern: use the system, hit friction, build the next layer. I expected that cycle to keep going. I expected to keep building.&lt;/p&gt;

&lt;p&gt;The cycle did slow down. Not because I gave up — because the core architecture stopped generating friction. The tier structure, the search interface, the promotion pipeline — they just work. What changed was everything around them.&lt;/p&gt;

&lt;p&gt;This is a retrospective on what it looks like when the system is actually working, what evolved since I wrote about the original architecture, and the scale it's reached.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Success State Is Invisible
&lt;/h2&gt;

&lt;p&gt;The best version of a memory system is one you stop thinking about.&lt;/p&gt;

&lt;p&gt;When I was building it, the system was always on my mind — something to design, extend, debug, curate. There was a period where I was writing more about the memory system than I was using it. That's normal during construction.&lt;/p&gt;

&lt;p&gt;The shift happened gradually. At some point I stopped checking whether the nightly pipeline ran. Stopped verifying that working notes got indexed. I'd start a session and the agent would surface a decision from weeks ago that I'd completely forgotten about — and instead of thinking &lt;em&gt;the memory system worked&lt;/em&gt;, I'd just think &lt;em&gt;right, I forgot about that, good.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That's the success state. The system becomes infrastructure — background, assumed, noticed only when it fails.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;The system has reached a scale I didn't anticipate when I designed it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;124 distilled files&lt;/strong&gt; — permanent, git-backed knowledge. Architectural decisions, build close-outs, security audit findings, infrastructure state. Each one reviewed and promoted by the nightly pipeline.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;121 working memory files&lt;/strong&gt; — agent-curated notes with 90-day expiry. Build context, session findings, research output, project state.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;57 documentation caches&lt;/strong&gt; — pre-fetched vendor docs for every service I work with regularly, indexed alongside memory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;28 lines of core context&lt;/strong&gt; — the always-visible file, capped at 40 lines. This is the one file every agent sees on every turn.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When I wrote the how-to post, the distilled tier had a handful of files. Now it's the system's largest store of permanent knowledge. The nightly pipeline reviews working notes, promotes what qualifies, expires what's stale, and deduplicates what overlaps. Ten distillation runs in the last two weeks alone — Helm build phases, infrastructure decisions, security findings, all moving from working memory to permanent record.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Changed Since the How-To Post
&lt;/h2&gt;

&lt;p&gt;Four things are meaningfully different from what I described in &lt;a href="https://dev.to/tadmstr/how-to-give-claude-code-a-memory-197l"&gt;How to Give Claude Code a Memory&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-time indexing.&lt;/strong&gt; The original design had memsearch capturing session summaries at the end of a session. You'd write something important mid-session, and by the next session it would be searchable. Useful. But there was a lag.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;memsearch-watch&lt;/code&gt; eliminated the lag. It's a PM2 process that watches memory directories and re-indexes within five seconds of any write. Context I write mid-session is searchable before the session ends. That's not a small change — it means the system can reference its own notes from thirty minutes ago, not just from last week. The memory tier now participates in the current conversation, not just future ones.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unified search.&lt;/strong&gt; The original design had two separate tools for two purposes: memsearch for automatic recall at session start, qmd for intentional search mid-session. The mental model was: memsearch surfaces what you should know, qmd finds what you're looking for.&lt;/p&gt;

&lt;p&gt;The distinction is real, but operating two separate tools with two separate invocation patterns adds friction. The &lt;code&gt;archival-search&lt;/code&gt; skill now does both in one pass — queries memsearch and qmd simultaneously, merges results, labels each by tier. The two-tool model still exists under the hood; the interface collapsed to one. That's the right direction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The docs cache.&lt;/strong&gt; This one I didn't anticipate when I designed the memory system, and it might be the most useful extension I've added.&lt;/p&gt;

&lt;p&gt;The problem: agents kept fetching web documentation for services I use constantly — Grafana, Docker Compose, SWAG, Authelia, nearly sixty others. Some of those requests were slow. Some required web access the agent didn't always have. Some returned SEO garbage instead of actual docs.&lt;/p&gt;

&lt;p&gt;The solution was to pre-fetch and cache documentation for every service I work with regularly — chunked markdown, organized by service and topic, stored in &lt;code&gt;~/.claude/memory/docs/&lt;/code&gt;. memsearch indexes it alongside working memory. Now when an agent needs to know how to configure an Authelia forward auth proxy, it checks the local docs cache first — same search interface as memory, same results format, always available, always fast.&lt;/p&gt;

&lt;p&gt;The insight is that vendor documentation is memory. It's context the agent needs to do its job. Treating it as a first-class memory tier rather than a web resource you fetch on demand changes how agents operate — they become less dependent on real-time web access and more capable in environments where that access is restricted or slow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The knowledge graph embeddings.&lt;/strong&gt; The original Graphiti deployment used default embeddings. I've since switched to bge-m3 running on local hardware via Ollama. Search quality improved — entity lookups and relationship queries return more relevant results, especially for infrastructure topology questions like "what connects to SWAG?" or "what changed on atlas last week?" Running the embedder locally also means the graph doesn't depend on any external API.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Held Up Exactly as Designed
&lt;/h2&gt;

&lt;p&gt;The tier structure is unchanged.&lt;/p&gt;

&lt;p&gt;Session tier (auto-captured, 30-day retention), working tier (agent-curated, 90-day expiry, YAML frontmatter), distilled tier (permanent, git-backed), core context (40-line cap, always-visible). The knowledge graph for relationships.&lt;/p&gt;

&lt;p&gt;I re-read the original day-seven design doc while writing this. The structure it describes is what I'm running. The tools changed, the scale changed, the automation layer changed. But the principle — stable knowledge here, working knowledge there, automated promotion between them — is word-for-word the same.&lt;/p&gt;

&lt;p&gt;That held through a period where the platform shipped a dozen new components, built a multi-phase infrastructure system on a second host, and published two new public repos. The tier model didn't need to change to accommodate any of it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Use Differently Than I Expected
&lt;/h2&gt;

&lt;p&gt;I expected the distilled tier to be my most-referenced memory from the start. Permanent, git-backed, settled knowledge — that's the crown jewel, right?&lt;/p&gt;

&lt;p&gt;In practice, core context is the daily workhorse. Twenty-eight lines that every agent sees on every turn — user profile, active projects, key constraints, recent decisions. It's the file that gives agents enough orientation to be useful immediately without searching for anything.&lt;/p&gt;

&lt;p&gt;The distilled tier turned out to matter differently than I expected. I don't search it directly very often. But it's where the pipeline puts things so they're &lt;em&gt;findable&lt;/em&gt; when I do need them — and that happens at exactly the moments where not finding something would cost real time. A security audit finding from three weeks ago. The close-out notes from a build that another build now depends on. The architectural decision that explains why a service is configured the way it is.&lt;/p&gt;

&lt;p&gt;124 files is a safety net I use rarely and value constantly.&lt;/p&gt;

&lt;p&gt;The nightly pipeline was the other surprise. I expected it to be a source of ongoing maintenance — reviewing what it promoted, correcting misclassifications. Instead, it's a background process I trust. It broke once during a PM2 consolidation, and I fixed it. That's the whole maintenance log.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where It Still Has Friction
&lt;/h2&gt;

&lt;p&gt;Two things that haven't been fully solved:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Entity naming in the knowledge graph.&lt;/strong&gt; The graph works best when entity names are consistent — "Grafana" not "grafana", "SWAG" not "the reverse proxy." When different agents add facts about the same entity under slightly different names, the graph accumulates near-duplicate nodes. There's a nightly dedup step, but it catches structural duplicates, not naming variations. I'm working around it with naming conventions enforced through CLAUDE.md — Title Case for all entity names in episode text. It works. It's not elegant.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The session tier is noisy.&lt;/strong&gt; memsearch captures sessions automatically, which means it captures everything — including sessions where nothing worth remembering happened. The 30-day retention handles eventual cleanup, but search results sometimes surface sessions that are noise rather than signal. A relevance filter on session-tier results would help. I haven't built it yet.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Comes Next
&lt;/h2&gt;

&lt;p&gt;The memory system handles one question well: &lt;em&gt;what do I know?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The question it handles less well: &lt;em&gt;what do I know about the relationships between things I know?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That's the knowledge graph's job — and the graph is working, but treating it as an optional add-on (as I did in the how-to post) undersells what it's actually for. The switch to local embeddings made it more capable. The naming friction is the main thing holding it back.&lt;/p&gt;

&lt;p&gt;The other thing I didn't expect: the memory system became a design input for other tools. &lt;a href="https://github.com/TadMSTR/scoped-mcp" rel="noopener noreferrer"&gt;scoped-mcp&lt;/a&gt; — a per-agent tool proxy I published recently — exists partly because the memory architecture proved that giving each agent a defined scope produces better outcomes than giving every agent access to everything. The pattern keeps showing up. Scope the context, scope the tools, scope the resources. Let each agent do its job without carrying everyone else's baggage.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Read the full series:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part 1: &lt;a href="https://dev.to/tadmstr/i-built-an-agentic-infrastructure-platform-in-42-days-im-a-windows-sysadmin-45lg"&gt;I Built an Agentic Infrastructure Platform in 42 Days&lt;/a&gt; — the origin story&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part 2: &lt;a href="https://dev.to/tadmstr/i-built-an-ai-memory-system-because-my-brain-needed-it-first-glc"&gt;I Built an AI Memory System Because My Brain Needed It First&lt;/a&gt; — the memory deep dive&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part 3: &lt;a href="https://dev.to/tadmstr/how-to-give-claude-code-a-memory-197l"&gt;How to Give Claude Code a Memory&lt;/a&gt; — the practical how-to&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part 4: &lt;a href="https://dev.to/tadmstr/im-designing-a-platform-i-cant-build-alone-thats-the-point-3ai9-temp-slug-1965355"&gt;I'm Designing a Platform I Can't Build Alone&lt;/a&gt; — cognitive augmentation&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>homelab</category>
      <category>productivity</category>
    </item>
    <item>
      <title>I Didn't Want to Pay for Web Search in My Own Homelab, So I Built the Pipeline</title>
      <dc:creator>Ted Murray</dc:creator>
      <pubDate>Thu, 16 Apr 2026 14:13:27 +0000</pubDate>
      <link>https://forem.com/tadmstr/i-didnt-want-to-pay-for-web-search-in-my-own-homelab-so-i-built-the-pipeline-5h2l</link>
      <guid>https://forem.com/tadmstr/i-didnt-want-to-pay-for-web-search-in-my-own-homelab-so-i-built-the-pipeline-5h2l</guid>
      <description>&lt;p&gt;I was setting up web search for &lt;a href="https://github.com/danny-avila/LibreChat" rel="noopener noreferrer"&gt;LibreChat&lt;/a&gt; — a self-hosted chat interface for AI models. The config has three required components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Search provider&lt;/strong&gt;: Serper (paid API) or SearXNG (self-hosted)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scraper&lt;/strong&gt;: Firecrawl — hosted API, costs money&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reranker&lt;/strong&gt;: Jina AI or Cohere — both paid APIs (a reranker re-scores search results by relevance to the query, rather than trusting whatever order the search engine returned)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I run a homelab specifically so I don't depend on paid APIs for things I can own. The search provider was easy — SearXNG is self-hosted and free. But the scraper and reranker had no obvious self-hosted path.&lt;/p&gt;

&lt;p&gt;I asked Claude if there was a free way to run Firecrawl. It found &lt;a href="https://github.com/mendableai/firecrawl" rel="noopener noreferrer"&gt;firecrawl-simple&lt;/a&gt;, a lightweight local deployment of the same tool. Perfect.&lt;/p&gt;

&lt;p&gt;For the reranker, I asked Claude to explain what Jina and Cohere were actually doing. When I said I didn't want to call another external API, it offered to just build one — a small FlashRank wrapper exposing a Jina-compatible &lt;code&gt;/v1/rerank&lt;/code&gt; endpoint. That became the reranker that's been running in my stack ever since.&lt;/p&gt;

&lt;p&gt;That was the seed. What I have now is &lt;code&gt;searxng-mcp&lt;/code&gt; — a full private web search pipeline packaged as an MCP server. MCP (Model Context Protocol) is how AI clients like Claude Code connect to external tools; the server exposes web search as a set of callable tools that agents can use during a session. It's used by Claude Code agents and LibreChat every day.&lt;/p&gt;




&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;searxng-mcp&lt;/code&gt; exposes five tools over stdio MCP:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;search&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Query SearXNG, rerank results with a local ML model, return top N&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;search_and_fetch&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Same as search, then fetch full page content for the top 1–3 results&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;search_and_summarize&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Search, fetch, then synthesize a structured summary via Ollama&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;fetch_url&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Fetch and extract readable markdown from any URL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;clear_cache&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Purge the search or fetch cache when you need fresh results&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The design principle throughout: every external component is optional, and the server degrades gracefully when any of them are unavailable. If the reranker is down, you get results in SearXNG's native order. If Ollama isn't running, &lt;code&gt;search_and_summarize&lt;/code&gt; falls back to raw fetched content. Nothing hard-fails.&lt;/p&gt;




&lt;h2&gt;
  
  
  The architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MCP client (stdio)
      │
      ▼
  searxng-mcp ──────────────→ Valkey                → result cache (search 1h, fetch 24h)
      │
      ├── expand (optional) → Ollama (qwen3:4b)      → rewritten query
      ├── search ───────────→ SearXNG               → raw results
      ├── rerank ───────────→ Reranker              → ranked results
      │                       (fallback: SearXNG order)
      ├── fetch content ────┬→ GitHub API            → markdown
      │                     ├→ Firecrawl            → page markdown (tier 1)
      │                     ├→ Crawl4AI             → page markdown (tier 2)
      │                     └→ Raw HTTP             → page text (tier 3)
      └── summarize (opt.) → Ollama (qwen3:14b)     → synthesized summary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The interesting part is the fetch cascade.&lt;/p&gt;




&lt;h2&gt;
  
  
  The fetch cascade
&lt;/h2&gt;

&lt;p&gt;Fetching web content for an AI agent turns out to be harder than it sounds. Firecrawl handles the majority of pages well — it renders JavaScript, extracts clean markdown, deals with most anti-bot measures. But some pages block it anyway. When that happens, Firecrawl returns &lt;code&gt;success: true&lt;/code&gt; with empty content rather than throwing an error. That's a soft failure, not a hard one, and it took me a while to catch it.&lt;/p&gt;

&lt;p&gt;The cascade handles this: if Firecrawl returns empty content, fall through to &lt;a href="https://github.com/unclecode/crawl4ai" rel="noopener noreferrer"&gt;Crawl4AI&lt;/a&gt;, which uses a different extraction approach and handles JS-heavy pages differently. If Crawl4AI also fails or isn't configured, fall through to raw HTTP — just fetch the page and strip the HTML. Not perfect, but something.&lt;/p&gt;

&lt;p&gt;Three tiers, each cheaper than the last, each a fallback for the one above it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart TD
    A([URL]) --&amp;gt; B{github.com?}
    B --&amp;gt;|yes, repo root| C[GitHub API\nfetch README]
    B --&amp;gt;|yes, file blob| D[raw.githubusercontent.com]
    B --&amp;gt;|no| E[Firecrawl]
    E --&amp;gt; F{content\nreturned?}
    F --&amp;gt;|yes| G([page markdown])
    F --&amp;gt;|empty or blocked| H{Crawl4AI\nconfigured?}
    H --&amp;gt;|yes| I[Crawl4AI]
    I --&amp;gt; J{content\nreturned?}
    J --&amp;gt;|yes| K([page markdown])
    J --&amp;gt;|no| L[Raw HTTP fetch\nno redirects]
    H --&amp;gt;|no| L
    C --&amp;gt; M([page markdown])
    D --&amp;gt; N([raw file content])
    L --&amp;gt; O([page text])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub URLs are handled natively outside the cascade entirely — repo roots fetch the README via the GitHub API, file blobs fetch from &lt;code&gt;raw.githubusercontent.com&lt;/code&gt;. No Firecrawl needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reranking and why it matters
&lt;/h2&gt;

&lt;p&gt;SearXNG aggregates results from multiple search engines. The order it returns them in is... whatever order the upstream engines agreed on. That's fine for casual browsing, not great for an AI agent that's going to fetch and read the top result.&lt;/p&gt;

&lt;p&gt;The key mechanism: for every search, the server fetches 3x the requested results from SearXNG (capped at 20) to give the reranker a larger candidate pool. The reranker then re-scores all of them using a cross-encoder ML model that understands the relationship between the query and each result, and returns only the top N. A result that matches your query semantically surfaces above a result that just happens to rank well with Google — but only because it had more candidates to sort through in the first place.&lt;/p&gt;

&lt;p&gt;In v3.2.0, I added recency weighting — a small exponential decay score based on &lt;code&gt;publishedDate&lt;/code&gt; blended with the relevance score (weight 0.15 by default). Fresh results surface within relevance-close clusters without overriding large relevance gaps. It's skipped automatically when you've set a &lt;code&gt;time_range&lt;/code&gt;, since the result pool is already date-filtered.&lt;/p&gt;




&lt;h2&gt;
  
  
  Domain profiles
&lt;/h2&gt;

&lt;p&gt;Not all search results are equally useful depending on what you're looking for. If I'm researching a Docker networking issue, I want results from the Docker docs, GitHub issues, and Linux sysadmin communities — not marketing pages that happen to mention Docker.&lt;/p&gt;

&lt;p&gt;Domain profiles let you apply a named boost/block list per query:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;homelab&lt;/code&gt; — surfaces self-hosted and Linux documentation, suppresses content farms&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dev&lt;/code&gt; — surfaces Stack Overflow, MDN, npm docs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You pass &lt;code&gt;domain_profile: "homelab"&lt;/code&gt; on any query and the domain filter applies. Profiles are defined in &lt;code&gt;domains.json&lt;/code&gt;, which hot-reloads every 5 seconds — you can tune them without restarting the server.&lt;/p&gt;




&lt;h2&gt;
  
  
  Query expansion
&lt;/h2&gt;

&lt;p&gt;For &lt;code&gt;search&lt;/code&gt; and &lt;code&gt;search_and_fetch&lt;/code&gt;, there's an optional &lt;code&gt;expand&lt;/code&gt; parameter. When true, Ollama (qwen3:4b) generates 2-3 typed query variants — a technical rephrasing, a product/version-specific form, and a community phrasing (how someone would ask it on a forum). Those variants run in parallel with the original query, and the result pools are merged and deduplicated by URL before reranking.&lt;/p&gt;

&lt;p&gt;It's not a serial rewrite — it's a parallel fan-out. If your first phrasing misses relevant results that a slightly different framing would surface, expansion catches them. Most useful for research queries; for precise lookups it adds latency (~3s) with less benefit.&lt;/p&gt;

&lt;p&gt;You can also set &lt;code&gt;EXPAND_QUERIES=true&lt;/code&gt; to enable it globally.&lt;/p&gt;




&lt;h2&gt;
  
  
  SSRF protections
&lt;/h2&gt;

&lt;p&gt;This server runs on your local network and fetches arbitrary URLs. That creates SSRF risk — an attacker (or a confused agent) could potentially get it to fetch &lt;code&gt;http://192.168.1.1/admin&lt;/code&gt; or &lt;code&gt;http://localhost:2375&lt;/code&gt; (Docker socket exposed, in the worst case).&lt;/p&gt;

&lt;p&gt;&lt;code&gt;fetch_url&lt;/code&gt; and &lt;code&gt;search_and_fetch&lt;/code&gt; enforce a URL allowlist that blocks private IP ranges: &lt;code&gt;10.x&lt;/code&gt;, &lt;code&gt;192.168.x&lt;/code&gt;, &lt;code&gt;172.16–31.x&lt;/code&gt;, &lt;code&gt;localhost&lt;/code&gt;, &lt;code&gt;127.x&lt;/code&gt;, IPv6 private ranges (&lt;code&gt;::1&lt;/code&gt;, &lt;code&gt;fc00::/7&lt;/code&gt;, &lt;code&gt;fe80::/10&lt;/code&gt;), and non-HTTP protocols.&lt;/p&gt;

&lt;p&gt;The IPv6 case caught me during a security pass — &lt;code&gt;URL.hostname&lt;/code&gt; returns brackets for IPv6 addresses (e.g., &lt;code&gt;[::1]&lt;/code&gt;), so naive regex matching against &lt;code&gt;::1&lt;/code&gt; doesn't work. The fixed version matches the bracket-wrapped form.&lt;/p&gt;

&lt;p&gt;There's also redirect blocking in the raw HTTP fetcher — &lt;code&gt;rawFetch()&lt;/code&gt; refuses to follow redirects, preventing SSRF bypass via redirect chains to internal addresses. And Crawl4AI &lt;code&gt;task_id&lt;/code&gt; values are validated before being interpolated into the poll URL to prevent path traversal.&lt;/p&gt;




&lt;h2&gt;
  
  
  Caching
&lt;/h2&gt;

&lt;p&gt;Valkey (Redis-compatible) is optional but worthwhile. Search results are cached for 1 hour, fetched pages for 24 hours. For the kind of research queries AI agents run — often the same topic from slightly different angles over a session — this saves meaningful latency and avoids hammering SearXNG and Firecrawl with redundant requests.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;clear_cache&lt;/code&gt; tool lets you purge when you need fresh results on a fast-moving topic.&lt;/p&gt;




&lt;h2&gt;
  
  
  MCP client setup
&lt;/h2&gt;

&lt;p&gt;For Claude Code, the recommended setup uses &lt;code&gt;claude mcp add-json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude mcp add-json searxng &lt;span class="nt"&gt;--scope&lt;/span&gt; user &lt;span class="s1"&gt;'{
  "command": "node",
  "args": ["/path/to/searxng-mcp/build/src/index.js"],
  "env": {
    "SEARXNG_URL": "http://localhost:8081",
    "FIRECRAWL_URL": "http://localhost:3002",
    "RERANKER_URL": "http://localhost:8787",
    "OLLAMA_URL": "http://localhost:11434",
    "VALKEY_URL": "redis://localhost:6379",
    "CRAWL4AI_URL": "http://localhost:11235"
  }
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This writes to &lt;code&gt;~/.claude.json&lt;/code&gt;. Don't add it to &lt;code&gt;~/.claude/settings.json&lt;/code&gt; — that file isn't used for MCP env var injection in Claude Code.&lt;/p&gt;

&lt;p&gt;For LibreChat, add it to &lt;code&gt;librechat.yaml&lt;/code&gt; under &lt;code&gt;mcpServers&lt;/code&gt; with &lt;code&gt;type: stdio&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What runs in practice
&lt;/h2&gt;

&lt;p&gt;The full required stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SearXNG&lt;/strong&gt; — must have JSON format enabled in &lt;code&gt;settings.yml&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Firecrawl&lt;/strong&gt; (firecrawl-simple) — local deployment, no API key needed for local instances&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reranker&lt;/strong&gt; — FlashRank wrapper, reference implementation in &lt;a href="https://github.com/TadMSTR/homelab-agent/tree/main/docker/reranker" rel="noopener noreferrer"&gt;homelab-agent/docker/reranker&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Optional:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Valkey&lt;/strong&gt; — caching&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Crawl4AI&lt;/strong&gt; — second-tier fetch fallback&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ollama&lt;/strong&gt; — query expansion and summarization (requires &lt;code&gt;qwen3:4b&lt;/code&gt; and/or &lt;code&gt;qwen3:14b&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The server starts fine without the optional components and tells you clearly when a feature isn't available because its dependency isn't configured.&lt;/p&gt;




&lt;h2&gt;
  
  
  The repo
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/TadMSTR/searxng-mcp" rel="noopener noreferrer"&gt;github.com/TadMSTR/searxng-mcp&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MIT licensed. The full homelab stack it runs in — including the reranker Docker image — is documented in &lt;a href="https://github.com/TadMSTR/homelab-agent" rel="noopener noreferrer"&gt;homelab-agent&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you're already running SearXNG, the jump to a full agent-ready search pipeline is smaller than it looks.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>homelab</category>
      <category>selfhosted</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to Give Claude Code a Memory</title>
      <dc:creator>Ted Murray</dc:creator>
      <pubDate>Fri, 27 Mar 2026 23:57:03 +0000</pubDate>
      <link>https://forem.com/tadmstr/how-to-give-claude-code-a-memory-197l</link>
      <guid>https://forem.com/tadmstr/how-to-give-claude-code-a-memory-197l</guid>
      <description>&lt;p&gt;I wrote about &lt;a href="https://dev.to/tadmstr/i-built-an-ai-memory-system-because-my-brain-needed-it-first-glc"&gt;why I built a memory system for Claude&lt;/a&gt;. The short version: Claude's built-in memory extracts facts automatically with no audit trail, no version control, and no way to scope what different agents see. I wanted control.&lt;/p&gt;

&lt;p&gt;This post is the practical companion. If you want to build your own, here's how mine works, what each piece does, and the order I'd set it up in if I were starting today.&lt;/p&gt;

&lt;p&gt;Everything here is open source. The full stack is documented in &lt;a href="https://github.com/TadMSTR/homelab-agent" rel="noopener noreferrer"&gt;homelab-agent&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What You're Building
&lt;/h2&gt;

&lt;p&gt;A memory system with three properties:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Persistent&lt;/strong&gt; — context survives across sessions. Monday's agent knows what Friday's agent decided.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Searchable&lt;/strong&gt; — agents find relevant context automatically, not by loading everything into the prompt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scoped&lt;/strong&gt; — different agents see different things. Your infrastructure agent doesn't need your code review history.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The system has three tiers of memory, two search tools, and an optional knowledge graph. You don't need all of it on day one.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Minimum Viable Memory
&lt;/h2&gt;

&lt;p&gt;If you set up nothing else, do this. It takes ten minutes and gets you 70% of the value.&lt;/p&gt;

&lt;h3&gt;
  
  
  CLAUDE.md files
&lt;/h3&gt;

&lt;p&gt;Claude Code reads &lt;code&gt;CLAUDE.md&lt;/code&gt; files automatically. One in your home directory for global context. One in each project directory for project-specific context. This is your foundation.&lt;/p&gt;

&lt;p&gt;Your global &lt;code&gt;~/.claude/CLAUDE.md&lt;/code&gt; should contain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who you are and how you work (role, preferences, communication style)&lt;/li&gt;
&lt;li&gt;Your infrastructure overview (hosts, IPs, key services)&lt;/li&gt;
&lt;li&gt;Rules that apply everywhere (don't push to main, don't SSH without asking)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your project CLAUDE.md files should contain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What this project is and what the agent's scope covers&lt;/li&gt;
&lt;li&gt;Project-specific conventions and constraints&lt;/li&gt;
&lt;li&gt;Pointers to where relevant documentation lives&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't memory in the dynamic sense — it's stable configuration. But it's the single highest-impact thing you can do. Every session starts with this context loaded automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Directory-based working memory
&lt;/h3&gt;

&lt;p&gt;Create a memory directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~/.claude/memory/
├── shared/           # Cross-agent knowledge
└── agents/
    ├── dev/          # Dev agent's notes
    ├── research/     # Research agent's notes
    └── ops/          # Ops agent's notes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tell your agents (via CLAUDE.md) to write notes here during sessions. Use a simple frontmatter format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;tier&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;working&lt;/span&gt;
&lt;span class="na"&gt;created&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2026-03-15&lt;/span&gt;
&lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dev&lt;/span&gt;
&lt;span class="na"&gt;expires&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2026-06-13&lt;/span&gt;
&lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;docker&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;decision&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The expiry date matters. Working memory should have a 90-day TTL. If a note is still relevant after 90 days, it should be promoted to permanent storage. If not, it was temporary context that served its purpose.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;shared/&lt;/code&gt; directory is for cross-agent knowledge — decisions that affect multiple projects. The &lt;code&gt;agents/&lt;/code&gt; subdirectories are scoped — each agent reads its own directory plus shared.&lt;/p&gt;

&lt;p&gt;This is just markdown files in directories. No database, no service, no dependencies. It works immediately and it's human-readable, git-trackable, and greppable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Adding Search: memsearch
&lt;/h2&gt;

&lt;p&gt;Directory-based memory has a problem: agents have to know what file to read. Once you have more than a dozen notes, you need search.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/anthropics/memsearch" rel="noopener noreferrer"&gt;memsearch&lt;/a&gt; is a Claude Code plugin that indexes markdown files using local embeddings and auto-injects relevant context at session start. No API calls. No external service. It runs locally using sentence-transformers.&lt;/p&gt;

&lt;p&gt;What memsearch does:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Indexes your memory directories into a local vector store&lt;/li&gt;
&lt;li&gt;At session start, searches the index for context relevant to the conversation&lt;/li&gt;
&lt;li&gt;Auto-injects matching notes into the context window&lt;/li&gt;
&lt;li&gt;Captures session summaries automatically via a Stop hook&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The session capture is important. When a Claude Code session ends, memsearch writes a summary to its own memory store. Next time you start a session in that project, relevant past sessions surface automatically. You don't have to write anything — it happens.&lt;/p&gt;

&lt;p&gt;Install it as a Claude Code plugin, point it at your memory directories, and you get semantic search over your notes with zero ongoing effort.&lt;/p&gt;

&lt;h3&gt;
  
  
  What memsearch doesn't do
&lt;/h3&gt;

&lt;p&gt;memsearch is great for automatic recall — "surface relevant context without being asked." It's not great for intentional search — "find me the note where I decided to use Traefik instead of Caddy." For that, you want a proper search tool.&lt;/p&gt;




&lt;h2&gt;
  
  
  Adding Intentional Search: qmd
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/tobi/qmd" rel="noopener noreferrer"&gt;qmd&lt;/a&gt; is a hybrid search tool that combines BM25 keyword matching with vector embeddings and LLM reranking. It serves results via MCP, so any agent can search.&lt;/p&gt;

&lt;p&gt;Why both memsearch and qmd?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;memsearch&lt;/strong&gt; = automatic recall. Surfaces relevant context at session start without being asked. Good for "remind me of things I should know."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;qmd&lt;/strong&gt; = intentional search. Agent explicitly queries when it needs specific information. Good for "find the decision about X" or "what does the architecture doc say about Y."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;qmd indexes multiple collections — memory notes, infrastructure docs, compose files, whatever you point it at. The hybrid approach (keywords + semantics + reranking) outperforms pure vector search on technical documentation where exact terms matter.&lt;/p&gt;

&lt;p&gt;If you have GPU acceleration available, enable it. Embedding time dropped from 3+ minutes to under a minute on my setup using Vulkan on a AMD Radeon 780M iGPU.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Three-Tier Pipeline
&lt;/h2&gt;

&lt;p&gt;Once you have working memory and search, you'll hit a new problem: memory accumulates. Session notes pile up. Working notes expire but some of them contain decisions you'll want forever.&lt;/p&gt;

&lt;p&gt;The three-tier pipeline solves this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session tier&lt;/strong&gt; — Raw, auto-captured. memsearch writes these. 30-day retention. No curation needed. This is your "what happened recently" layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Working tier&lt;/strong&gt; — Agent-curated. Agents write structured notes with frontmatter during sessions. 90-day expiry. This is your "active decisions and context" layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Distilled tier&lt;/strong&gt; — Permanent, git-backed. Notes that pass the "would this matter in 3 months?" test get promoted here. This is your "settled knowledge" layer. Version-controlled so you have full history.&lt;/p&gt;

&lt;p&gt;The promotion path is always upward: session notes get reviewed and important items become working notes. Working notes older than 14 days get evaluated for distillation. Distilled notes are permanent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Automating the pipeline
&lt;/h3&gt;

&lt;p&gt;I run a headless Claude Code agent at 4 AM that handles the promotion pipeline automatically. It:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Scans session notes from the past week across all project stores&lt;/li&gt;
&lt;li&gt;Promotes durable items to working tier&lt;/li&gt;
&lt;li&gt;Reviews working notes older than 14 days&lt;/li&gt;
&lt;li&gt;Promotes qualifying notes to the distilled tier (git-backed)&lt;/li&gt;
&lt;li&gt;Expires stale working notes past 90 days&lt;/li&gt;
&lt;li&gt;Deduplicates (merges topical duplicates)&lt;/li&gt;
&lt;li&gt;Logs metrics and generates a health report&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You don't need this on day one. Start with manual curation — read your working notes occasionally, promote the important ones, delete the stale ones. Automate when the volume makes manual curation a burden.&lt;/p&gt;




&lt;h2&gt;
  
  
  Core Context: The Sticky Note
&lt;/h2&gt;

&lt;p&gt;There's a fourth layer that sits outside the pipeline: core context.&lt;/p&gt;

&lt;p&gt;This is a small file (I cap mine at 40 lines) that gets injected at every session start via a &lt;code&gt;SessionStart&lt;/code&gt; hook, before any tools run. It contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User profile (role, key skills, cognitive style)&lt;/li&gt;
&lt;li&gt;Active projects and their current status&lt;/li&gt;
&lt;li&gt;Key constraints (things every agent must know)&lt;/li&gt;
&lt;li&gt;Recent decisions (the last few important choices made across any project)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The 40-line cap is deliberate. This file sits above the context window's compression threshold — it never gets summarized away, no matter how long the session runs. If it's too big, it crowds out working memory. Keep it tight.&lt;/p&gt;

&lt;p&gt;The distinction from CLAUDE.md: CLAUDE.md is stable configuration that changes rarely. Core context is dynamic — it reflects what's happening now. Active projects change. Recent decisions rotate. The core context file gets updated by a skill whenever something important shifts.&lt;/p&gt;




&lt;h2&gt;
  
  
  Private Web Search: SearXNG-MCP
&lt;/h2&gt;

&lt;p&gt;This isn't memory in the traditional sense, but it feeds the memory system. When your agents can search the web privately, the results become part of the knowledge base.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.searxng.org/" rel="noopener noreferrer"&gt;SearXNG&lt;/a&gt; is a self-hosted meta-search engine. It queries multiple search backends (Google, Bing, DuckDuckGo, and dozens more) without sending your queries to any single provider. No API keys, no per-search costs, no tracking.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://github.com/TadMSTR/searxng-mcp" rel="noopener noreferrer"&gt;searxng-mcp&lt;/a&gt; to expose SearXNG as an MCP server with three tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;search&lt;/code&gt; — query SearXNG, get structured results with titles, URLs, snippets, and source engines&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;search_and_fetch&lt;/code&gt; — search + fetch full text of the top result&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;fetch_url&lt;/code&gt; — fetch and extract readable text from any URL&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Results are reranked by a local ML model before being returned. Full page content is fetched via Firecrawl (handles JavaScript-rendered pages). GitHub URLs are handled natively via the GitHub API.&lt;/p&gt;

&lt;p&gt;Why does this matter for memory? Because when your research agent searches the web, evaluates options, and writes a recommendation to working memory, that recommendation is grounded in current information — not model training data. The search tool feeds the memory system with sourced, dated, real-world information.&lt;/p&gt;




&lt;h2&gt;
  
  
  Optional: Knowledge Graph
&lt;/h2&gt;

&lt;p&gt;Everything above uses flat files and search indexes. For most setups, that's enough. But there's a category of question that text search can't answer well: relationship queries.&lt;/p&gt;

&lt;p&gt;"What services depend on port 8080?" "What changed about SWAG config this week?" "What connects to the message bus?" These are graph queries — the answer is about relationships between entities, not about retrieving a document.&lt;/p&gt;

&lt;p&gt;I use &lt;a href="https://github.com/getzep/graphiti" rel="noopener noreferrer"&gt;Graphiti&lt;/a&gt; with Neo4j for this. Graphiti is a temporal knowledge graph — facts have validity windows, so when something changes, the old fact gets superseded rather than polluting results.&lt;/p&gt;

&lt;p&gt;The knowledge graph is fed automatically by the same pipeline that handles memory sync. When the nightly agent processes session notes, it also ingests relevant facts into the graph. Infrastructure state changes (deploys, service adds/removes, network changes) get added directly.&lt;/p&gt;

&lt;p&gt;This is genuinely optional. If your queries are mostly "find relevant context" (text search handles this) rather than "what's related to what" (graph handles this), you don't need it. I added it three weeks into building the memory system, not on day one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Setup Order
&lt;/h2&gt;

&lt;p&gt;If I were starting from scratch today, I'd build in this order:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Week 1: Foundation&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Write your global &lt;code&gt;~/.claude/CLAUDE.md&lt;/code&gt; — who you are, your infrastructure, your rules&lt;/li&gt;
&lt;li&gt;Write project CLAUDE.md files for each project directory&lt;/li&gt;
&lt;li&gt;Create the memory directory structure (&lt;code&gt;~/.claude/memory/shared/&lt;/code&gt;, &lt;code&gt;~/.claude/memory/agents/&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Define your frontmatter format (tier, created, source, expires, tags)&lt;/li&gt;
&lt;li&gt;Tell your agents (via CLAUDE.md) to write notes during sessions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Week 2: Search&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install memsearch — automatic context recall and session capture&lt;/li&gt;
&lt;li&gt;Deploy qmd — intentional search over memory + docs&lt;/li&gt;
&lt;li&gt;Index your memory directories and any infrastructure documentation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Week 3: Pipeline&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start manually reviewing working notes — promote the important ones, delete the stale ones&lt;/li&gt;
&lt;li&gt;Write the core context file and inject it via a SessionStart hook&lt;/li&gt;
&lt;li&gt;When manual curation becomes a burden, automate with a nightly sync agent&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Week 4+: Extensions&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Deploy SearXNG + searxng-mcp for private web search&lt;/li&gt;
&lt;li&gt;Add the knowledge graph if you're hitting relationship query limits&lt;/li&gt;
&lt;li&gt;Build skills (reusable instruction sets) for common memory operations&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Don't try to build it all at once. Each layer should earn its place by solving a friction you actually feel.&lt;/p&gt;




&lt;h2&gt;
  
  
  What It Feels Like
&lt;/h2&gt;

&lt;p&gt;The before state: every session starts cold. You re-explain your setup. You re-state your preferences. You forget what you decided last week because the conversation where you decided it is gone.&lt;/p&gt;

&lt;p&gt;The after state: you sit down on Monday morning and the agent already knows about the Docker change you made Friday, the monitoring alert from Saturday, and the research you did Sunday. It knows because the memory pipeline captured those events, the semantic search surfaced them as relevant, and the knowledge graph connected them to the services they affected.&lt;/p&gt;

&lt;p&gt;The system isn't perfect. Memory sync sometimes promotes things that don't matter. Search sometimes misses things that do. The knowledge graph needs entity resolution tuning. But the baseline — persistent, searchable, scoped context that accumulates and connects without manual curation — changes how you work with AI agents.&lt;/p&gt;

&lt;p&gt;It stops being a tool you instruct and starts being a collaborator that remembers.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Repository
&lt;/h2&gt;

&lt;p&gt;Everything described here is open source and documented in detail:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/TadMSTR/homelab-agent" rel="noopener noreferrer"&gt;homelab-agent on GitHub&lt;/a&gt;&lt;/strong&gt; — the full stack with component docs for memsearch, memory-sync, qmd, Graphiti, and more.&lt;/p&gt;

&lt;p&gt;The component docs are thorough (2000+ lines each for the major pieces). The &lt;code&gt;index.md&lt;/code&gt; at the root is designed to be handed directly to Claude — point it at the file and tell it to help you map a path through the docs based on your setup.&lt;/p&gt;

&lt;p&gt;If you build your own version of this, it will look different from mine. Your infrastructure is different, your workflow is different, your agents handle different domains. That's the point. The architecture transfers. The implementation is yours.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Previous: &lt;a href="https://dev.to/tadmstr/i-built-an-ai-memory-system-because-my-brain-needed-it-first-glc"&gt;I Built an AI Memory System Because My Brain Needed It First&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Next in series: &lt;a href=""&gt;I Manage a Team of AI Agents. I Had to Build My Own Management Tools.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>homelab</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>I'm Designing a Platform I Can't Build Alone. That's the Point.</title>
      <dc:creator>Ted Murray</dc:creator>
      <pubDate>Tue, 24 Mar 2026 11:23:39 +0000</pubDate>
      <link>https://forem.com/tadmstr/im-designing-a-platform-i-cant-build-alone-thats-the-point-2a1k</link>
      <guid>https://forem.com/tadmstr/im-designing-a-platform-i-cant-build-alone-thats-the-point-2a1k</guid>
      <description>&lt;p&gt;I've been designing something called Helm.&lt;/p&gt;

&lt;p&gt;It started as "Platform v2" — a productized version of the agentic infrastructure I built on my homelab. Multi-user, multi-host, installable on a mini PC, runs your services, manages your agents, handles your backups. The kind of thing a family or a small business could use without knowing what Docker is.&lt;/p&gt;

&lt;p&gt;The architecture document is over 1,000 lines long. It covers federation between hosts, emergency WiFi that activates during blackouts, community mesh networking over LoRa radios, municipality notification templates for CERT volunteers, GPU-accelerated local AI services, an eBay selling agent, accessibility via voice interaction, a dual catalog system with community contributions, and a deployment profile system that adapts the setup wizard for homes vs small businesses.&lt;/p&gt;

&lt;p&gt;I am not a developer. I'm a Windows systems administrator. I have a 2-year degree from an online college. My GitHub history before February 2026 is bash and PowerShell scripts.&lt;/p&gt;

&lt;p&gt;Here's what I've been thinking about while designing all of this.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Session That Made Me Stop
&lt;/h2&gt;

&lt;p&gt;I was deep into Helm architecture — we'd just designed multi-host federation, where multiple Helm instances auto-discover each other on a LAN using mDNS and authenticate via mutual TLS — when I noticed something.&lt;/p&gt;

&lt;p&gt;Every feature I added immediately connected to the existing architecture. Federation led to "who controls the federation?" which led to deployment profiles (Home, Home Business, Small Business). Emergency WiFi led to resilience profiles, which led to community member discovery, which led to municipality notification. Meshtastic mesh networking led to off-grid communication stacks, which led to NOAA weather alert receivers, which led to emergency AP mode with captive portals.&lt;/p&gt;

&lt;p&gt;I wasn't planning these connections. I was seeing them. In real time. Faster than I've ever worked on anything in my career.&lt;/p&gt;

&lt;p&gt;So I asked Claude a question that had been forming in the back of my mind:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"The way I've embraced Claude with persistent memory — could that be considered a mental prosthetic? It's a modification of my working memory, or an extension of it."&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Extended Mind
&lt;/h2&gt;

&lt;p&gt;Claude pointed me to something I'd never heard of: the Extended Mind Thesis, proposed by Andy Clark and David Chalmers in 1998.&lt;/p&gt;

&lt;p&gt;Their argument: if an external tool plays the same functional role as an internal cognitive process, it's not just &lt;em&gt;helping&lt;/em&gt; you think — it's &lt;em&gt;part of&lt;/em&gt; your thinking. It's cognition, not assistance.&lt;/p&gt;

&lt;p&gt;Their example was a man named Otto who has memory loss and uses a notebook. When Otto wants to go to a museum, he checks his notebook for the address. Clark and Chalmers argued that Otto's notebook &lt;em&gt;is&lt;/em&gt; his memory — it's reliably available, he trusts it, he accesses it when needed, and the information was consciously stored.&lt;/p&gt;

&lt;p&gt;My persistent memory system meets every one of those criteria. And it goes further than Otto's notebook.&lt;/p&gt;

&lt;p&gt;Otto's notebook is passive. He has to remember to check it and know what to look for. My system is active — it retrieves relevant context before I ask, connects information across sessions automatically, and maintains structure that makes the right information findable at the right time. That's closer to how biological memory works — associative retrieval, contextual activation — than any notebook.&lt;/p&gt;

&lt;p&gt;Claude suggested that "prosthetic" actually undersells what's happening. A prosthetic replaces lost function. My working memory isn't broken — it works exactly as well as it did a year ago. What I've built is augmentation. My biological working memory holds 4-7 chunks of information at once. The persistent memory system makes that number effectively unlimited across time.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Augmentation Actually Feels Like
&lt;/h2&gt;

&lt;p&gt;I have ADHD. If you've read the earlier posts in this series, you know that. My working memory has always been a constraint I design around, not a weakness I've overcome.&lt;/p&gt;

&lt;p&gt;What changed isn't my brain. What changed is the friction.&lt;/p&gt;

&lt;p&gt;The 1,000+ line Helm architecture document is the clearest proof. No human holds that much structured, interconnected detail in working memory. But I'm building on it coherently, session after session — adding federation, then recognizing deployment profiles, then emergency infrastructure, then municipality notification, each idea connecting to the existing structure in the right place.&lt;/p&gt;

&lt;p&gt;That's not possible without the memory system acting as an extension of my own cognition. The system handles recall. I handle insight. The cognitive load of &lt;em&gt;maintaining context&lt;/em&gt; has been offloaded, so my working memory is free to do what it's actually good at: pattern recognition, analogy, creative leaps.&lt;/p&gt;

&lt;p&gt;Here's a concrete example from this session. I said:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"I was thinking, if someone has 2 or more hosts running Helm on the same network, they could auto-discover each other."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That's one sentence. Within minutes, it became a complete federation architecture: mDNS discovery, mTLS authentication with auto-generated CAs, capability manifests over NATS, three-tier resource sharing, graceful degradation, and security considerations.&lt;/p&gt;

&lt;p&gt;Then I said: &lt;em&gt;"A home and small business would use multi-host differently."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That immediately produced deployment profiles that change trust defaults, operator models, compliance posture, and contextual recommendations in the setup wizard.&lt;/p&gt;

&lt;p&gt;Then I said: &lt;em&gt;"Since I already included Meshtastic, people could build off-grid comms for emergencies."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That produced an entire emergency resilience infrastructure section — UPS integration, NOAA weather alert receivers, emergency WiFi AP that auto-activates during blackouts, store-and-forward messaging, and community extension points.&lt;/p&gt;

&lt;p&gt;Each idea took seconds to form. Each connected to the existing architecture correctly. The documentation was generated, structured, and placed in the right section of a 1,000-line document — without me having to remember what was already in it.&lt;/p&gt;

&lt;p&gt;That's what cognitive augmentation feels like. Not "AI doing my thinking." Me thinking at a scale I couldn't reach alone.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Friction That Used to Stop Me
&lt;/h2&gt;

&lt;p&gt;I told Claude that these use cases were the type of stuff I would have avoided in the past due to multiple layers of friction.&lt;/p&gt;

&lt;p&gt;That's the honest version. The longer version: I've always had ideas like these. I've been a systems administrator for 15 years. I've seen what infrastructure can do when it's designed well. I've seen what breaks when it isn't.&lt;/p&gt;

&lt;p&gt;But the gap between &lt;em&gt;seeing a possibility&lt;/em&gt; and &lt;em&gt;articulating it as a structured plan&lt;/em&gt; used to be enormous. Not because I couldn't think it through — because the act of thinking it through, writing it down, connecting it to everything else, and maintaining that context across days and weeks was more cognitive labor than the idea was worth.&lt;/p&gt;

&lt;p&gt;So ideas evaporated. Or they piled up as undifferentiated noise. Or I'd start documenting and lose the thread halfway through because my working memory hit capacity.&lt;/p&gt;

&lt;p&gt;What's changed isn't my ability to see possibilities. It's that the cost of turning a thought into a structured, architecturally-connected plan entry has dropped to near zero. I say it, it gets analyzed, connected to existing systems, and written into the right place in the document.&lt;/p&gt;

&lt;p&gt;The feedback loop — idea to structured plan in minutes — is what lets me keep going instead of hitting the wall where I used to stop.&lt;/p&gt;




&lt;h2&gt;
  
  
  Is This Unusual?
&lt;/h2&gt;

&lt;p&gt;I asked Claude that too. Directly.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Is my ability to come up with use cases for a platform I haven't even built yet uncommon?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The answer was nuanced and I think worth sharing: the ideas themselves aren't unusual. A lot of people see potential use cases. What's less common is generating them &lt;em&gt;and&lt;/em&gt; structuring them into a coherent architecture in real time, without losing the thread or letting scope creep into the build plan.&lt;/p&gt;

&lt;p&gt;I think that's the augmentation talking. The ideas were always there. The tool made them capturable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cyborg Without the Hardware
&lt;/h2&gt;

&lt;p&gt;I jokingly called it being a cyborg. Claude pointed out that the term is technically accurate — Manfred Clynes coined "cyborg" in 1960 to mean any system where human capabilities are extended by technology. No implants required. Just tight integration between the biological and the technological.&lt;/p&gt;

&lt;p&gt;But "augmented" is the better word for what this actually is. Cyborg carries sci-fi baggage that distracts from the point.&lt;/p&gt;

&lt;p&gt;The point is: I'm a 42-year-old sysadmin with ADHD and a 2-year degree, designing a multi-user platform with federation, emergency infrastructure, a community catalog ecosystem, and AI-powered accessibility features. The architecture document is structured, internally consistent, and growing. I'm doing it in research sessions that each build on the last, because the memory system means I never lose context between them.&lt;/p&gt;

&lt;p&gt;Two months ago I didn't know what "context engineering" meant.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Means for Helm
&lt;/h2&gt;

&lt;p&gt;Here's the thing I keep coming back to.&lt;/p&gt;

&lt;p&gt;I'm not just building a platform. I'm someone who used cognitive augmentation tools to design something that would normally require a team. And the platform I'm designing? It does the same thing for its users.&lt;/p&gt;

&lt;p&gt;A household member who uses voice commands because a screen is hard for them — that's augmentation. A small business owner who uses the eBay agent because they don't have time to research pricing and write listings — that's augmentation. A neighborhood that has communication during a blackout because someone set up a Meshtastic mesh with an emergency WiFi AP — that's augmentation.&lt;/p&gt;

&lt;p&gt;Helm doesn't just run services. It extends what people can do.&lt;/p&gt;

&lt;p&gt;I designed it that way because that's what it did for me first.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'm Actually Saying
&lt;/h2&gt;

&lt;p&gt;I'm not claiming to be special. I'm claiming the tools have changed what's possible for people like me.&lt;/p&gt;

&lt;p&gt;There are a lot of experienced infrastructure people, sysadmins, network engineers, ops folks — people with deep domain knowledge and good instincts — who have never built anything at this scale because the development barrier was too high. Not the ideas. Not the architecture. The code.&lt;/p&gt;

&lt;p&gt;That barrier is falling. Fast.&lt;/p&gt;

&lt;p&gt;If you're someone with 15 years of operational knowledge and you've never written a platform because you "can't code" — that constraint is dissolving. The knowledge you've built over a career is the hard part. The code is becoming the easy part.&lt;/p&gt;

&lt;p&gt;The question isn't whether you &lt;em&gt;can&lt;/em&gt; build something ambitious. It's whether you'll let yourself try.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is part of an ongoing series about building agentic infrastructure as a non-developer. The previous posts cover &lt;a href="https://dev.to/tadmstr/i-built-an-agentic-infrastructure-platform-in-42-days-im-a-windows-sysadmin-45lg"&gt;how it started&lt;/a&gt; and &lt;a href="https://dev.to/tadmstr/i-built-an-ai-memory-system-because-my-brain-needed-it-first-glc"&gt;the memory system that makes it work&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you're building something similar — or thinking about it — I'd like to hear about it.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>infrastructure</category>
      <category>homelab</category>
    </item>
    <item>
      <title>Claude Code Doesn't Know You've Been Gone — Here's the Fix</title>
      <dc:creator>Ted Murray</dc:creator>
      <pubDate>Sat, 21 Mar 2026 10:49:22 +0000</pubDate>
      <link>https://forem.com/tadmstr/claude-code-doesnt-know-youve-been-gone-heres-the-fix-17ko</link>
      <guid>https://forem.com/tadmstr/claude-code-doesnt-know-youve-been-gone-heres-the-fix-17ko</guid>
      <description>&lt;p&gt;I first noticed this in Claude Desktop. I'd have a conversation, step away for a few hours, come back and continue — sometimes on a slightly different angle, sometimes just picking up where I left off — and something about the responses felt off. Like Claude was treating it as one continuous thought when the gap had given me time to change direction.&lt;/p&gt;

&lt;p&gt;My fix was an &lt;a href="https://espanso.org" rel="noopener noreferrer"&gt;espanso&lt;/a&gt; trigger. I set up &lt;code&gt;:cltime&lt;/code&gt; to expand to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Current date/time: Saturday, March 21, 2026 at 09:00 AM. Use this to orient yourself.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Typed it at the start of a session or whenever I came back after a break. It worked. Claude recalibrated — less continuation, more reorientation. Problem solved, moved on.&lt;/p&gt;

&lt;p&gt;Then I switched to Claude Code. I saw timestamps in the session context, assumed the problem was handled, and stopped using &lt;code&gt;:cltime&lt;/code&gt;. Reasonable assumption.&lt;/p&gt;

&lt;p&gt;It wasn't fully handled.&lt;/p&gt;

&lt;p&gt;The timestamp Claude Code injects tells it what time the session &lt;em&gt;started&lt;/em&gt;. It doesn't tell it how much time has passed since then. Come back after three hours and send a message — Claude sees the same session start time it's always seen. It doesn't know if you've been gone 30 seconds or half a day.&lt;/p&gt;

&lt;p&gt;The context is the same. The right response isn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hook
&lt;/h2&gt;

&lt;p&gt;Claude Code has a &lt;code&gt;UserPromptSubmit&lt;/code&gt; hook that fires before every message. I added a hook that injects the current date and time as a system message on every prompt — the same thing &lt;code&gt;:cltime&lt;/code&gt; was doing manually, now automatic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# inject-timestamp.sh — UserPromptSubmit hook&lt;/span&gt;

&lt;span class="nv"&gt;TIMESTAMP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="s1"&gt;'+%A, %B %-d, %Y at %I:%M %p %Z'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;HOOKJSON&lt;/span&gt;&lt;span class="sh"&gt;
{"systemMessage": "Current date/time: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TIMESTAMP&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;."}
&lt;/span&gt;&lt;span class="no"&gt;HOOKJSON

&lt;/span&gt;&lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wire it up in &lt;code&gt;~/.claude/settings.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"UserPromptSubmit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bash /path/to/inject-timestamp.sh"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"timeout"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now every message carries the current time. If your last message was at 9am and this one is at 2pm, Claude can see that gap and respond accordingly — reorienting rather than continuing mid-thought.&lt;/p&gt;

&lt;h2&gt;
  
  
  After the fact
&lt;/h2&gt;

&lt;p&gt;Once I'd built it I went looking to see if anyone else had noticed the same problem. Found GitHub issue &lt;a href="https://github.com/anthropics/claude-code/issues/32913" rel="noopener noreferrer"&gt;#32913&lt;/a&gt; on the Claude Code repo, opened March 10th, still open:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Claude Code has basically no temporal awareness beyond the current date. It can't detect prompts that are coming in quick series vs hours apart..."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's exactly it. The fix is already in the hooks system. You don't need to wait for a native solution.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;UserPromptSubmit&lt;/code&gt; hook is underused in general — most examples you'll find are prompt validation or logging. Context injection is where it actually shines. The timestamp is the simplest case, but the same pattern works for anything you'd want Claude to know on every turn.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>productivity</category>
      <category>homelab</category>
    </item>
    <item>
      <title>I Built an AI Memory System Because My Brain Needed It First</title>
      <dc:creator>Ted Murray</dc:creator>
      <pubDate>Thu, 19 Mar 2026 11:05:42 +0000</pubDate>
      <link>https://forem.com/tadmstr/i-built-an-ai-memory-system-because-my-brain-needed-it-first-glc</link>
      <guid>https://forem.com/tadmstr/i-built-an-ai-memory-system-because-my-brain-needed-it-first-glc</guid>
      <description>&lt;p&gt;On February 4th, 2026 — the day after my first Claude subscription — I was doing what I'd been doing since day one: asking Claude questions to see how it responded to things. Testing the edges. Seeing what it would say.&lt;/p&gt;

&lt;p&gt;I noticed a memory feature in the settings. I asked about it honestly, the way I ask about most things I don't know enough to have an opinion on yet:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"If I enable the built-in memory feature, is it possible for you to make incorrect assumptions in future chats?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Claude said yes. It explained how automatic memory extraction could misinterpret context, overgeneralize, fill in gaps incorrectly, and pollute conversations across domains. It was honest about its own limitations.&lt;/p&gt;

&lt;p&gt;I didn't enable the memory feature.&lt;/p&gt;

&lt;p&gt;Three days later, I built something better.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I Needed This
&lt;/h2&gt;

&lt;p&gt;I have ADHD. Likely ASD too — the Asperger's pattern. I've managed it my whole career as a Windows sysadmin, but it shapes how I work in ways I'm very deliberate about.&lt;/p&gt;

&lt;p&gt;One of those ways: I cannot hold all project context in biological memory simultaneously. This isn't a weakness I've overcome. It's a constraint I've designed around for 15 years. External memory systems aren't a nice-to-have for me. They're how I function.&lt;/p&gt;

&lt;p&gt;Screenshots as memory aids. Meticulous notes. Status docs I update before leaving work so I can reconstruct context on Monday. Elaborate folder structures. Anything to offload cognitive burden.&lt;/p&gt;

&lt;p&gt;When I started using Claude, I immediately hit a familiar friction. Every new conversation started from zero. I'd re-explain my server setup, my preferences, what we'd decided yesterday. That was annoying enough. But what actually bothered me more was something harder to name: the exploratory conversations — where I was just asking questions to see where they'd go, following threads, thinking out loud — those disappeared completely when the session ended. Not just the technical context. The texture of the conversation itself. Gone.&lt;/p&gt;

&lt;p&gt;I got genuinely frustrated when Claude forgot something mid-session and made assumptions. That happened a lot early on.&lt;/p&gt;

&lt;p&gt;I didn't frame it as "the statefulness problem in agentic AI systems." I thought: &lt;em&gt;Claude forgets things. So do I. Let's build something so neither of us has to remember.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Ship of Theseus Problem
&lt;/h2&gt;

&lt;p&gt;Here's what I was watching happen in real time.&lt;/p&gt;

&lt;p&gt;Long AI conversations have a finite context window — a working memory. As a conversation grows, early details get compressed into summaries. Summaries get summarized. Nuance fades. Eventually you're talking to a version of Claude that only vaguely remembers how the session started.&lt;/p&gt;

&lt;p&gt;It's the Ship of Theseus paradox: if you replace every plank on a ship over time, is it still the same ship? In a long AI conversation, you're watching conversational entropy happen in real time.&lt;/p&gt;

&lt;p&gt;Claude's built-in memory feature doesn't solve this — it just automates the extraction process while introducing its own failure modes. Invisible assumptions. Cross-conversation pollution. No audit trail. No version control. You don't control what gets remembered, you can't see what conclusions were drawn, and you can't roll back when it gets something wrong.&lt;/p&gt;

&lt;p&gt;I wanted control. Version-controlled, explicit, auditable memory that I managed, not an AI extraction black box.&lt;/p&gt;




&lt;h2&gt;
  
  
  Day 7: What I Built
&lt;/h2&gt;

&lt;p&gt;I created a GitHub repo called &lt;code&gt;claude-prime-directive&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The idea was simple: a repository of context that Claude could reference. My infrastructure specs. My communication preferences. My cognitive style. My workflows. Version-controlled, always available, refreshable on demand.&lt;/p&gt;

&lt;p&gt;I also documented myself in it. A &lt;code&gt;cognitive-style.md&lt;/code&gt; file explaining how I think and what I need from a collaborator — ADHD patterns, working memory limitations, interest-driven focus, the cost of context switching. Not because I thought it would be technically interesting, but because I needed Claude to understand how I work.&lt;/p&gt;

&lt;p&gt;The tier structure that emerged:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 0 — GitHub repo (the prime directive):&lt;/strong&gt; Stable, version-controlled, permanent. The foundation. What survives everything else. Updated rarely, and when it is, the git history shows exactly what changed and why.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 1 — The main Claude chat:&lt;/strong&gt; Strategic, evolving, allowed to age. Principles matter more than details here. Like episodic memory — you don't remember every meal, but you remember the principles of good nutrition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 2 — Project chats:&lt;/strong&gt; Tactical, stable, domain-specific. Docker work doesn't bleed into PowerShell work. Each chat loads only its relevant context. Technical details have to persist precisely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 3 — Code:&lt;/strong&gt; Canonical. The actual implementation is the truth, not a description of it. Version controlled, searchable, referenceable.&lt;/p&gt;

&lt;p&gt;I wasn't reading about agent architectures. I was solving the same problem I'd solved for my own brain. The design patterns that work for ADHD working memory turn out to work for AI context management too:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Externalize and write it down → git-backed repos, structured notes&lt;/li&gt;
&lt;li&gt;Summarize frequently → nightly distillation pipeline&lt;/li&gt;
&lt;li&gt;Keep stable things stable → permanent distilled knowledge, version controlled&lt;/li&gt;
&lt;li&gt;Make context easy to reload → semantic search, always-visible core context&lt;/li&gt;
&lt;li&gt;Separate concerns → scoped agents, different memory per domain&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I noticed what I'd built &lt;em&gt;after&lt;/em&gt; building it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Homelab Analogy I Was Already Using
&lt;/h2&gt;

&lt;p&gt;The tier structure wasn't abstract to me. I'd already built it in metal.&lt;/p&gt;

&lt;p&gt;My homelab runs Unraid as the hot storage layer — fast, always-on, where active data lives. TrueNAS handles cold storage — backup, archive, slower but reliable. Right data, right place, right persistence.&lt;/p&gt;

&lt;p&gt;The memory system is the same architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hot layer (session notes): Captured automatically, fast to write, short retention&lt;/li&gt;
&lt;li&gt;Warm layer (working memory): Curated decisions and findings, medium-term&lt;/li&gt;
&lt;li&gt;Cold layer (distilled knowledge): Permanent, git-backed, slow to update but always there&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I wasn't inventing a new architecture. I was applying one I already understood to a new domain.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Grew From There: Stigmergic Design
&lt;/h2&gt;

&lt;p&gt;None of the subsequent pieces were planned. They emerged from using the previous piece.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;February, week 2:&lt;/strong&gt; Added the Memory MCP to Claude Desktop. Now Claude could write notes and retrieve them across conversations. Basic, but it closed the loop. The prime directive repo was now readable and writable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Late February:&lt;/strong&gt; Built out infrastructure docs in the repo — server specs, network topology, service inventory. Claude could answer questions about my homelab without being told first. That felt like the right direction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Early March:&lt;/strong&gt; Discovered memsearch and qmd — semantic search tools built for exactly this problem. The Memory MCP was a workaround; these were purpose-built. Started building the full stack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;March 8:&lt;/strong&gt; The memory-sync agent came online. A headless Claude Code session running at 4 AM, reviewing session notes from the past week, applying a "would this matter in 3 months?" filter, and committing qualifying entries to permanent storage. The tier structure I'd sketched on day seven was now automated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;March 12:&lt;/strong&gt; Switched fully to Claude Code. Created CLAUDE.md files for all project agents. The prime directive repo, which had been a Claude Desktop workaround, became the distilled knowledge tier it was always meant to be — now feeding multiple scoped agents instead of a single conversation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;March 13:&lt;/strong&gt; Formalized the three working tiers — session, working, distilled — and rewrote the sync pipeline as a proper multi-step process with idempotency, conflict detection, and a health report.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;March 14:&lt;/strong&gt; Added the core context tier — a 40-line always-visible block injected at every session start. The thing that cannot scroll out of context no matter how long the session runs.&lt;/p&gt;

&lt;p&gt;Also around this time, I hit a curation problem. Early on, I had a "Librarian" — a behavior defined directly in my project instructions that I'd manually invoke to keep the prime directive repo updated and the index current. It wasn't a proper skill; it was just a named role I could call on. That worked when I was building slowly, when there was time to pause and say &lt;em&gt;okay, record what we just did&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Then CloudCLI arrived and my build velocity accelerated. I was deploying components faster than I was curating them. The Librarian couldn't keep up because &lt;em&gt;I&lt;/em&gt; wasn't invoking it fast enough. The system wasn't failing to record things — I was failing to trigger the recording.&lt;/p&gt;

&lt;p&gt;The fix was obvious in retrospect: stop relying on manual invocations. I automated it. The doc-health agent runs weekly (full scan, Claude Opus) and nightly (delta scan, Claude Sonnet), checking for drift between docs and reality, auto-updating index entries, and surfacing coverage gaps. A &lt;code&gt;daily-touched-files.json&lt;/code&gt; tracker records what agents modify during a session; when a writer pass runs, it targets exactly those components. The system now curates itself.&lt;/p&gt;

&lt;p&gt;This is the stigmergic design pattern in its purest form: I didn't plan for automation. I built a manual process, increased the pace until the manual process broke, and then automated the broken part.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;March 17:&lt;/strong&gt; Added a temporal knowledge graph (Neo4j + Graphiti MCP). Now the system doesn't just store facts — it stores relationships between entities and tracks how they change over time. "What connects to SWAG?" is a graph query, not a text search.&lt;/p&gt;

&lt;p&gt;The pattern across all of this: use the system → hit friction → build the next piece. Each layer emerged from using the previous one. I didn't design this top-down. I noticed what was missing and filled the gap.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the System Looks Like Now
&lt;/h2&gt;

&lt;p&gt;Four tiers, three of which are fully automated:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session tier&lt;/strong&gt; — memsearch captures every conversation automatically. Semantic search makes past sessions findable. 30-day retention. No effort required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Working tier&lt;/strong&gt; — agents promote important decisions to structured markdown files with YAML frontmatter: creation date, 90-day expiry, tags, tier. Shared across agents where relevant, scoped to one domain where not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Distilled tier&lt;/strong&gt; — the nightly 4 AM pipeline. A headless Claude Code agent reviews working memory notes, applies the "would this matter in 3 months?" filter, commits qualifying entries to a git-backed repo. These are permanent. This is what the prime directive repo became.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core context&lt;/strong&gt; — a 40-line cap, always injected, never compresses away. User profile, active projects, key constraints, recent decisions. The sticky note on the monitor.&lt;/p&gt;

&lt;p&gt;Plus the knowledge graph for relationships and topology — infrastructure that's hard to encode in flat files.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Found Out Later
&lt;/h2&gt;

&lt;p&gt;Around the same time the memory-sync agent came online, I built a private web search stack — SearXNG as the search engine, firecrawl-simple for full-page extraction, and a local reranker to surface the most relevant results. The point was to give Claude real search capability without relying on external APIs or sending queries to Google.&lt;/p&gt;

&lt;p&gt;Once that was running, I started asking Claude to find comparable projects — things other people had built that solved similar problems. That's how I found Letta, Mem0, and eventually the ICLR 2026 MemAgents workshop.&lt;/p&gt;

&lt;p&gt;The infrastructure I built to make Claude more capable also made Claude better at finding the research that validated the approach. The tools gave back.&lt;/p&gt;

&lt;p&gt;The ICLR 2026 MemAgents workshop was organized around this question: &lt;em&gt;"What are the principled memory substrates for agentic systems?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Researchers submitted papers. Panels were organized. Frameworks were compared.&lt;/p&gt;

&lt;p&gt;Letta (formerly MemGPT) designed a system with core memory, archival memory, and recall memory — different persistence for different purposes. Mem0 built a bolt-on memory layer for any agent framework.&lt;/p&gt;

&lt;p&gt;The architectures are similar to mine. The key difference: they designed theirs. I built mine by accident, from necessity, starting from "I asked Claude if its own memory could make mistakes and it said yes."&lt;/p&gt;

&lt;p&gt;I don't think I solved this better than the research community. I think I solved it independently, driven by a cognitive pattern I've been compensating for my whole life, arriving at similar answers because the problem is the same. Working memory is limited. Context matters. Some things need to persist forever; others can fade.&lt;/p&gt;

&lt;p&gt;The way you manage that for a human brain is, it turns out, a reasonable way to manage it for an AI agent.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Thing That Stays With Me
&lt;/h2&gt;

&lt;p&gt;I went back and read the original tier-definitions doc — written on day seven — while writing this article.&lt;/p&gt;

&lt;p&gt;The structure it describes is essentially what I'm running today. The tools changed completely. The scale is different. But the principle — stable knowledge in one place, working knowledge in another, automated promotion between them, accept entropy where it doesn't matter and fight it where it does — that's in the day-seven file.&lt;/p&gt;

&lt;p&gt;I didn't study this. I didn't read the research first. I just applied the same patterns I use to function at work to a new problem, and they worked.&lt;/p&gt;

&lt;p&gt;At the bottom of that original file, I wrote a note:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"This architecture emerged organically from first-week AI usage and accommodates both technological constraints (context windows) and human limitations (working memory, context switching)."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I wrote that in week one. It's still true.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Repository
&lt;/h2&gt;

&lt;p&gt;The full memory system is part of homelab-agent, open source and documented as a reference architecture. The &lt;code&gt;index.md&lt;/code&gt; is designed to be handed directly to Claude — point it at the file and ask for help mapping a path through the docs based on your setup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/TadMSTR/homelab-agent" rel="noopener noreferrer"&gt;homelab-agent on GitHub →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Previous: &lt;a href="https://dev.to/tadmstr/i-built-an-agentic-infrastructure-platform-in-42-days-im-a-windows-sysadmin-45lg"&gt;I Built an Agentic Infrastructure Platform in 42 Days. I'm a Windows Sysadmin.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Next: The permission problem — why an AI agent with filesystem write access needs a two-party enforcement model, and what 15 years of Active Directory taught me about building it.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>infrastructure</category>
      <category>homelab</category>
    </item>
    <item>
      <title>I Built an Agentic Infrastructure Platform in 42 Days. I'm a Windows Sysadmin.</title>
      <dc:creator>Ted Murray</dc:creator>
      <pubDate>Tue, 17 Mar 2026 19:47:10 +0000</pubDate>
      <link>https://forem.com/tadmstr/i-built-an-agentic-infrastructure-platform-in-42-days-im-a-windows-sysadmin-45lg</link>
      <guid>https://forem.com/tadmstr/i-built-an-agentic-infrastructure-platform-in-42-days-im-a-windows-sysadmin-45lg</guid>
      <description>&lt;p&gt;I want to tell you something that still surprises me.&lt;/p&gt;

&lt;p&gt;On February 3rd, 2026, I paid for my first AI subscription. I'm 42, a Windows systems administrator for 15+ years, and my GitHub history before this year is mostly simple bash and PowerShell scripts I wrote for myself. I have a 2-year associate degree — enough to clear the HR checkbox, not enough to impress anyone in a developer room.&lt;/p&gt;

&lt;p&gt;42 days later, I had built what I now understand is called &lt;strong&gt;agentic infrastructure&lt;/strong&gt; — a three-layer platform where Claude AI agents have persistent memory across sessions, coordinate with each other through structured handoffs, enforce their own filesystem permissions, and run nightly pipelines that distill knowledge from every session into a growing, searchable knowledge base.&lt;/p&gt;

&lt;p&gt;I didn't plan this. I didn't follow a tutorial. I didn't know the term "context engineering" when I started. I just had a homelab, a problem, and a new tool that turned out to be more powerful than I realized.&lt;/p&gt;

&lt;p&gt;This is the story of how that happened.&lt;/p&gt;




&lt;h2&gt;
  
  
  It Started With a Backup Script
&lt;/h2&gt;

&lt;p&gt;My homelab is a small fleet of servers — an Unraid NAS running 77+ Docker containers, a TrueNAS backup server, a Debian test box, and a dedicated AI workstation I call claudebox. I manage it the way most homelabbers do: scripts, wikis, too many browser tabs, and institutional memory that lives entirely in my head.&lt;/p&gt;

&lt;p&gt;The first thing I used Claude for was writing a backup script. Nothing revolutionary — I wanted something that would stop my Docker containers, rsync the appdata, restart them, and notify me if anything went wrong.&lt;/p&gt;

&lt;p&gt;Claude wrote it in minutes. Tested it. Fixed an edge case I hadn't thought of. Done.&lt;/p&gt;

&lt;p&gt;That result convinced me to push further. I turned to work — I was tired of looking up user information across multiple portals and wanted a single lookup toolkit. PowerShell is notoriously difficult for AI: the syntax is idiosyncratic, the documentation is scattered, and most models produce plausible-looking scripts that quietly do the wrong thing. Claude produced things that surprised me. Within three days I'd burned through 91% of my weekly Pro usage limit. I upgraded to Max and kept going.&lt;/p&gt;

&lt;p&gt;That was supposed to be the end of it.&lt;/p&gt;

&lt;p&gt;Around that same time I was watching YouTube videos about OpenClaw — an open-source AI assistant that had been getting attention in the homelab community. I understood the general idea: an AI agent you could run yourself, in your own environment. But I didn't actually know what an "agent" was in any technical sense. I just knew that what I saw in those videos wasn't quite what I wanted.&lt;/p&gt;

&lt;p&gt;So I did something that turned out to be the key decision of this whole project: instead of installing OpenClaw, I asked Claude what I actually needed.&lt;/p&gt;

&lt;p&gt;I described my setup. My goals. What I was frustrated with. We brainstormed. Claude walked me through the real options — what OpenClaw does, what LibreChat does, what MCP is, what Claude Desktop with the right integrations could become. By the end of that conversation I had a clearer picture of what I was trying to build than I would have gotten from any tutorial.&lt;/p&gt;

&lt;p&gt;That brainstorm session became the blueprint. Not a copy of someone else's setup. Mine.&lt;/p&gt;

&lt;p&gt;I started with Claude Desktop on a dedicated mini PC, remote access via Guacamole so I could reach it from anywhere, and a handful of MCP servers to give Claude real infrastructure access. That first working version was already more useful than anything I'd seen in those YouTube videos.&lt;/p&gt;

&lt;p&gt;Then I thought: &lt;em&gt;what if Claude could just... know my setup?&lt;/em&gt; Not just during this session — persistently. What if instead of copying context into every conversation, it had memory? What if it could query my monitoring dashboards, read my Docker configs, check on running services, and remember what we decided last week?&lt;/p&gt;

&lt;p&gt;That question turned into six weeks of building.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;The project is called homelab-agent and it's &lt;a href="https://github.com/TadMSTR/homelab-agent" rel="noopener noreferrer"&gt;open source on GitHub&lt;/a&gt;. Let me describe what it actually is, because "homelab assistant" undersells it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Claude With Real Infrastructure Access
&lt;/h3&gt;

&lt;p&gt;The foundation is Claude Desktop with MCP (Model Context Protocol) servers — structured tool integrations that give Claude direct, programmatic access to infrastructure instead of copy-paste workflows.&lt;/p&gt;

&lt;p&gt;I connected Claude to Netdata (real-time system metrics), Grafana (dashboards and alerts), my Unraid and TrueNAS APIs, GitHub, and a custom HTTP server I wrote that handles shell commands, file reads, and process management on the host. I also added SearXNG — a self-hosted meta-search engine — so Claude can search the web without calling home to Google.&lt;/p&gt;

&lt;p&gt;The result: Claude stops being a chatbot you explain things to and becomes an operator that already knows your setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: A Self-Hosted AI Platform
&lt;/h3&gt;

&lt;p&gt;On top of that foundation, I run a Docker-based service stack: LibreChat (a self-hosted multi-provider chat UI), Authelia for SSO, SWAG as the reverse proxy, and observability tooling (Grafana, InfluxDB, Loki). This gives household access to multiple AI providers through a single interface with one login — not just my Claude Desktop session.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: The Multi-Agent Engine
&lt;/h3&gt;

&lt;p&gt;This is where things got interesting.&lt;/p&gt;

&lt;p&gt;I use Claude Code — Anthropic's AI coding tool — as a multi-agent platform. Different "agents" handle different domains: one for homelab operations, one for development, one for research, one for memory management. Each agent has its own context file (CLAUDE.md) that scopes what it knows and what it's allowed to do.&lt;/p&gt;

&lt;p&gt;And each agent has memory.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Memory Problem
&lt;/h2&gt;

&lt;p&gt;Here's the thing about AI agents: they're stateless by default. Every new conversation starts from zero. You re-explain your setup, your preferences, what you decided last week. It's like having a brilliant contractor who forgets everything between visits.&lt;/p&gt;

&lt;p&gt;I designed a four-tier memory system to solve this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session tier&lt;/strong&gt; — Every conversation is automatically summarized to disk. Semantic search makes past sessions retrievable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Working tier&lt;/strong&gt; — Agents promote important decisions and findings to structured markdown files with YAML frontmatter: creation date, expiry (90 days), tags, tier classification.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Distilled tier&lt;/strong&gt; — A headless Claude Code agent runs every night at 4 AM. It reviews working memory notes, applies a "would this matter in 3 months?" filter, and commits qualifying entries to a git-backed permanent knowledge base.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core context&lt;/strong&gt; — A 40-line always-visible block injected at every session start via a hook. User profile, active projects, key constraints, recent decisions. Never scrolls out of context.&lt;/p&gt;

&lt;p&gt;Knowledge flows upward through tiers automatically. By Monday morning, Claude already knows about Friday's Docker stack change and Saturday's monitoring alert. There's no manual curation required.&lt;/p&gt;

&lt;p&gt;I found out later that the ICLR 2026 MemAgents workshop — a machine learning research conference — was specifically organized around this problem: "principled memory substrates for agentic systems." Academics wrote papers about it. I had accidentally built a working implementation.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Permission Problem
&lt;/h2&gt;

&lt;p&gt;Once Claude has filesystem write access, you need to think carefully about what it can touch.&lt;/p&gt;

&lt;p&gt;I built the Agent Workspace Protocol: declarative &lt;code&gt;AGENT_WORKSPACE.md&lt;/code&gt; marker files at seven filesystem roots that define what access is allowed. Each agent also has a manifest declaring what it claims to need. An edit can only proceed if &lt;em&gt;both&lt;/em&gt; the workspace marker and the agent manifest agree — stricter of the two wins.&lt;/p&gt;

&lt;p&gt;An hourly background job (Python script, PM2 cron) validates all markers, auto-commits any tracking drift in git-backed directories, cross-references manifests against markers for conflicts, and emits structured security events to InfluxDB and Loki tagged with CIA-triad classifications (confidentiality, integrity, availability).&lt;/p&gt;

&lt;p&gt;There's also rogue agent detection wired in — disabled while it calibrates a baseline from two weeks of normal operation, then it'll flag agents that suddenly start touching paths they've never touched before.&lt;/p&gt;

&lt;p&gt;I'm familiar with identity-based access control from years of managing Office 365, Entra ID, and Azure — scoped roles, least-privilege policies, who can touch what. I applied that same thinking here. I haven't seen this filesystem-level two-party model in any comparable AI project. It emerged because I thought about what could go wrong.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Self-Healing Documentation Problem
&lt;/h2&gt;

&lt;p&gt;Infrastructure docs rot. Services get added, configs get changed, and the documentation lags behind until it's actively misleading.&lt;/p&gt;

&lt;p&gt;I have a doc-health agent that runs weekly (full scan, Claude Opus) and nightly (delta scan, Claude Sonnet). It checks for drift between docs and reality, coverage gaps for new services, stale references to changed infrastructure, and leaked internal IPs or API keys. It auto-commits mechanical fixes (index entries) and surfaces everything else as a report.&lt;/p&gt;

&lt;p&gt;The interesting part is the feedback loop: when agents modify infrastructure, they append to a &lt;code&gt;daily-touched-files.json&lt;/code&gt; tracker. When a writer agent runs, it updates docs for changed components and triggers a targeted re-scan to confirm its own work. The nightly scan catches anything remaining and resets the tracker.&lt;/p&gt;

&lt;p&gt;The system verifies its own corrections.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Actually Took
&lt;/h2&gt;

&lt;p&gt;The honest answer: I don't write code in the traditional sense. Claude writes the code. I provide the vision, the infrastructure instincts, the architectural decisions, and the problem framing.&lt;/p&gt;

&lt;p&gt;What I brought was 15 years of Windows systems administration — thinking about failure modes, permissions models, backup strategies, retention policies, operational health monitoring. Every design pattern in homelab-agent traces back to something I've seen break in a production environment.&lt;/p&gt;

&lt;p&gt;The AI cost tracking pipeline (Claude Code session logs → Telegraf → InfluxDB → Grafana) exists because I've always metered infrastructure costs. The nightly backup script with stop-rsync-restart sequencing exists because I've seen live copies get corrupted. The two-party permission model exists because I've managed multi-admin environments where whoever touches something last owns it.&lt;/p&gt;

&lt;p&gt;I didn't learn systems thinking in 42 days. I've been developing it for 15 years. I just found a medium that let it show.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Didn't Expect
&lt;/h2&gt;

&lt;p&gt;I expected to build a homelab assistant. I didn't expect to be ahead of academic research on agent memory systems.&lt;/p&gt;

&lt;p&gt;I expected to learn some Docker and maybe a little Python. I didn't expect to end up designing permission models and self-healing architectures that I can't find equivalents of in any comparable project.&lt;/p&gt;

&lt;p&gt;I expected AI to be a productivity tool. I didn't expect it to be a creative medium — one where infrastructure instincts and systems thinking translate directly into novel technical designs.&lt;/p&gt;

&lt;p&gt;Anthropic's own internal research, published in December 2025, found that 27% of Claude-assisted work consists of tasks that &lt;em&gt;wouldn't have happened otherwise&lt;/em&gt; — work that's too exploratory, too niche, or too cost-prohibitive without AI assistance. Every component of homelab-agent is in that 27%. This project doesn't exist in any form without Claude.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Repository
&lt;/h2&gt;

&lt;p&gt;homelab-agent is open source and documented as a reference architecture. It's not a one-click installer — it's a documented system you can understand and adapt. The README explains all three layers. There's a getting-started guide with explicit stopping points if you want Layer 1 without the full stack. Component deep-dives cover every service with configuration details and design decisions.&lt;/p&gt;

&lt;p&gt;If you want to jump in without reading everything, the repo has an &lt;code&gt;index.md&lt;/code&gt; — a machine-readable navigation file designed to be handed directly to an AI assistant. Point Claude at it and say "help me figure out which components to adopt based on my setup." It'll ask about your hardware, your existing services, and your goals, then map a path through the docs. That's the intended on-ramp.&lt;/p&gt;

&lt;p&gt;If you're a homelabber who wants Claude to actually know your setup, start there.&lt;br&gt;
If you're an AI infrastructure builder looking at agent memory patterns or permission models, the architecture docs are the interesting part.&lt;br&gt;
If you're hiring for agentic infrastructure roles and you made it this far — hi.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/TadMSTR/homelab-agent" rel="noopener noreferrer"&gt;homelab-agent on GitHub →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is the first post in a series. Next: the memory architecture in depth — how four tiers of knowledge accumulation work together and what it looks like to build memory for AI systems from first principles.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>infrastructure</category>
      <category>homelab</category>
    </item>
  </channel>
</rss>
