<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Filippo Venturini</title>
    <description>The latest articles on Forem by Filippo Venturini (@filippo_venturini).</description>
    <link>https://forem.com/filippo_venturini</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3801289%2F2a18909e-639a-46d7-9767-3704f854ea80.jpg</url>
      <title>Forem: Filippo Venturini</title>
      <link>https://forem.com/filippo_venturini</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/filippo_venturini"/>
    <language>en</language>
    <item>
      <title>How I built a local memory layer for AI agents — and why vaults changed everything</title>
      <dc:creator>Filippo Venturini</dc:creator>
      <pubDate>Mon, 02 Mar 2026 10:03:16 +0000</pubDate>
      <link>https://forem.com/filippo_venturini/how-i-built-a-local-memory-layer-for-ai-agents-and-why-vaults-changed-everything-1apg</link>
      <guid>https://forem.com/filippo_venturini/how-i-built-a-local-memory-layer-for-ai-agents-and-why-vaults-changed-everything-1apg</guid>
      <description>&lt;p&gt;Every serious project I've worked on with LLM agents hits the same wall eventually.&lt;/p&gt;

&lt;p&gt;The agent is smart. It reasons well. It follows instructions. But every new session it starts from zero, with no memory of what happened before, no context from previous runs, no knowledge it built over time.&lt;/p&gt;

&lt;p&gt;The naive fix is to stuff everything into the system prompt. It works until it doesn't — context windows fill up, costs spike, and you're manually curating what to include every time. The slightly less naive fix is RAG: retrieve relevant chunks before each call. Better, but now you have a retrieval problem on top of your agent problem, and a single shared vector store that every agent reads from indiscriminately.&lt;/p&gt;

&lt;p&gt;I built CtxVault because I wanted something different. Not just retrieval — infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The vault abstraction&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The core idea is simple: a vault is an isolated, self-contained memory unit. Its own directory, its own vector index, its own history. You can have one per agent, one per project, or share one across multiple agents as a coordination layer. You decide the topology.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ctxvault init personal
ctxvault init work
ctxvault init shared-knowledge
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each vault is just a folder on your machine. Drop documents in, index them, and they become semantically queryable. The agent can write new content at runtime, and that content is immediately searchable — no reindexing, no manual step.&lt;/p&gt;

&lt;p&gt;This matters more than it seems. When you have a single shared store with metadata filtering to separate "agent A's memory" from "agent B's memory", you're one misconfigured filter away from cross-contamination. Vaults make isolation structural, not configurational.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Three ways to talk to it&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I wanted CtxVault to work at every level of the stack, so I built three integration modes.&lt;/p&gt;

&lt;p&gt;CLI for humans:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ctxvault query personal &lt;span class="s2"&gt;"what am I learning to cook?"&lt;/span&gt;
ctxvault list work
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;HTTP API for agent pipelines — first start the server and initialize a vault — once via CLI or API, then it persists.&lt;/p&gt;

&lt;p&gt;Now your LangChain or LangGraph agents can write and query via REST:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;API&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://127.0.0.1:8000/ctxvault&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;API&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/init&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vault_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent-memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;API&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/write&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vault_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent-memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filename&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User is optimizing a FastAPI service. Main bottleneck is DB connection pooling.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;API&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vault_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent-memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;what was the performance issue we discussed?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;MCP for no-code agent autonomy — add two lines to your &lt;code&gt;mcp.json&lt;/code&gt; and any MCP-compatible client like Claude Desktop or Cursor gets direct vault access with no integration code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"ctxvault"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ctxvault-mcp"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent decides autonomously when to write, when to query, when to recall. You stay in control because every vault is a directory you can inspect and edit at any time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why local-first&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every alternative I evaluated either requires a cloud account, sends data to an external service, or both. For personal projects, side projects, and anything involving sensitive documents, that's a non-starter.&lt;/p&gt;

&lt;p&gt;CtxVault runs entirely on your machine. ChromaDB for vector storage, sentence-transformers for embeddings, FastAPI for the HTTP layer. No API keys, no telemetry, no vendor dependency. Install it and it works offline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it actually feels like&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj5ymwy9jcx18fwwgu4g6.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj5ymwy9jcx18fwwgu4g6.gif" alt="Claude Desktop using ctxvault MCP server — agent saves and recalls context across sessions" width="800" height="423"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Persistent memory across sessions — shown with Claude Desktop, works with any MCP-compatible client.&lt;/p&gt;

&lt;p&gt;The moment that made me realize the abstraction was right: I told Claude Desktop that I was learning to make fresh pasta and struggling with the sfoglia tearing when rolled thin. Closed the chat. Opened a new one. Asked "how's my pasta going?" — it knew exactly where I left off, because it had written the context to a vault and queried it when I came back.&lt;/p&gt;

&lt;p&gt;That's the thing about memory as infrastructure: when it works, it disappears. The agent just knows. You stop re-explaining. You stop copy-pasting context between sessions. The conversation has history even when the chat window doesn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's next&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The current version handles retrieval well. What it doesn't handle yet is lifecycle — knowing what's worth keeping versus noise, merging stale chunks, archival. That's the next real problem and it's harder than retrieval. If you've thought about this I'd genuinely like to hear your approach.&lt;/p&gt;

&lt;p&gt;CtxVault is open source, MIT licensed, available on PyPI.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;ctxvault
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub: &lt;a href="//github.com/Filippo-Venturini/ctxvault"&gt;github.com/Filippo-Venturini/ctxvault&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
