<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Salt Creative</title>
    <description>The latest articles on Forem by Salt Creative (@salt_creative).</description>
    <link>https://forem.com/salt_creative</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3859799%2F3e64efcf-32f8-4e90-881a-6b30a0e54c26.png</url>
      <title>Forem: Salt Creative</title>
      <link>https://forem.com/salt_creative</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/salt_creative"/>
    <language>en</language>
    <item>
      <title>Adding a Free Overflow Model to Your MCP Server: Gemma via the Gemini API</title>
      <dc:creator>Salt Creative</dc:creator>
      <pubDate>Sun, 12 Apr 2026 02:46:56 +0000</pubDate>
      <link>https://forem.com/salt_creative/adding-a-free-overflow-model-to-your-mcp-server-gemma-via-the-gemini-api-463f</link>
      <guid>https://forem.com/salt_creative/adding-a-free-overflow-model-to-your-mcp-server-gemma-via-the-gemini-api-463f</guid>
      <description>&lt;p&gt;Most agentic workflows have a single failure mode nobody plans for: the primary LLM hits its rate limit mid-session and everything stops. You can't log a result. You can't draft the next section. The workflow is blocked until the window resets. After hitting this enough times, I started treating it as an architecture problem rather than a billing problem.&lt;/p&gt;

&lt;p&gt;The fix turned out to be simpler than I expected.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;The Insight Hidden in the Gemini Docs&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While auditing our Google AI Studio integration, I noticed that Gemma — Google's open-weight model family — is served through the exact same API endpoint as Gemini. Same Python SDK, same API key, different &lt;code&gt;model&lt;/code&gt; string. And Gemma 3 27B costs $0 per million tokens on the free tier. If you already have a Gemini API key, you already have free access to a capable open-weight model. No new credentials, no additional SDK, no separate account.&lt;/p&gt;

&lt;p&gt;That's the whole unlock.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Registering the Tool in FastMCP&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Adding &lt;code&gt;query_gemma&lt;/code&gt; to a FastMCP server is a thin wrapper — roughly fifteen lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;google.generativeai&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastmcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt;

&lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-server&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_gemma&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemma-3-27b-it&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Send a prompt to Gemma. Use for generation tasks to reduce primary LLM token usage.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerativeModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;model&lt;/code&gt; parameter defaults to &lt;code&gt;gemma-3-27b-it&lt;/code&gt; but accepts the full family:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gemma-3-1b-it&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Minimal tasks, fastest&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gemma-3-4b-it&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Classification, simple formatting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gemma-3-12b-it&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;General use&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gemma-3-27b-it&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Default — best Gemma 3 quality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gemma-4-26b-a4b-it&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Gemma 4, efficient&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gemma-4-31b-it&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Gemma 4, highest quality&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;After adding the tool, reconnect your MCP connector to reload the manifest. That's the entire deployment.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;The Workflow Split That Makes This Useful&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The important constraint: &lt;code&gt;query_gemma&lt;/code&gt; is text in, text out. Gemma has no access to your tool registry. It can't call other MCP tools, query your data layer, or read session state. It only knows what you explicitly pass in the prompt.&lt;/p&gt;

&lt;p&gt;This forces a clean separation that turns out to be the right design anyway. The primary LLM handles tool calls, data retrieval, QA, and logging. Gemma handles generation-heavy tasks — drafting, summarizing, classifying, formatting. The primary LLM does less of the expensive token work. When it hits rate limits, Gemma absorbs the generation queue while the primary LLM recovers.&lt;/p&gt;

&lt;p&gt;The split also makes each model's role legible. If something fails, you know immediately which layer to look at.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;The Gap That Remains&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The free tier rate limits are real. Gemma 3 models allow 5–15 requests per minute depending on model size. For interactive workflows, that's usually fine. For anything resembling batch processing, you'll hit the ceiling fast and need retry logic.&lt;/p&gt;

&lt;p&gt;The deeper limitation is context. Gemma doesn't know what your other tools returned unless you tell it. Every &lt;code&gt;query_gemma&lt;/code&gt; call needs to be self-contained — task description, relevant data, output format, all passed explicitly. That's more prompt engineering overhead than calling a context-aware primary LLM, and it matters for complex tasks.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;What This Is and Isn't&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This isn't a replacement for your primary LLM. For tasks requiring tool calls, structured reasoning over live data, or anything where the model needs to know what happened earlier in the session — you still need the primary stack.&lt;/p&gt;

&lt;p&gt;For pure generation tasks, it works well and it's free. The practical framing: treat it as a relief valve on your token budget, not a second brain.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Build your overflow capacity the same way you build your primary stack — thin interfaces, clear contracts, explicit failure modes. A model you can swap in when the primary one is saturated is worth more than a more powerful model you can't afford to run continuously.&lt;/em&gt;&lt;/p&gt;




</description>
      <category>llm</category>
      <category>mcp</category>
      <category>agents</category>
      <category>opensource</category>
    </item>
    <item>
      <title>From Monolithic Prompts to Modular Context: A Practical Architecture for Agent Memory</title>
      <dc:creator>Salt Creative</dc:creator>
      <pubDate>Fri, 10 Apr 2026 17:46:19 +0000</pubDate>
      <link>https://forem.com/salt_creative/from-monolithic-prompts-to-modular-context-a-practical-architecture-for-agent-memory-1lcp</link>
      <guid>https://forem.com/salt_creative/from-monolithic-prompts-to-modular-context-a-practical-architecture-for-agent-memory-1lcp</guid>
      <description>&lt;p&gt;Most teams building on top of LLMs treat the system prompt as a static artifact — write it once, tune it occasionally, move on. That works fine for simple assistants. It breaks down the moment your agent needs to operate across multiple domains, maintain state across sessions, and actually &lt;em&gt;learn&lt;/em&gt; from its mistakes rather than repeating them.&lt;/p&gt;

&lt;p&gt;After running a production agentic workflow for several months, I rebuilt the memory layer from scratch. Here's what I learned.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;The Problem with Monolithic Context&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The original system had a single large context file loaded at the start of every session. It contained everything: infrastructure details, client rules, workflow protocols, historical session logs, SEO doctrine, tool documentation — all of it, every time.&lt;/p&gt;

&lt;p&gt;This violates a principle that should be obvious but isn't: &lt;strong&gt;context is an attention budget, not a storage bin.&lt;/strong&gt; Research on context rot (Chroma, 2024) shows that LLM recall degrades nonlinearly as context length increases. You're not just adding tokens — you're diluting attention across an increasingly noisy signal space. Every irrelevant token you load competes with every relevant one.&lt;/p&gt;

&lt;p&gt;The other problem: a monolithic file has no mutation mechanism. It grows. It never gets smarter. Failures get logged as narrative and immediately buried under new entries. The system had no immune memory.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;The Architecture: Six Files, Three Load Tiers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The redesign splits context across six files organized by load trigger — not by topic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 1 — Always loaded (~1,000 tokens total):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Core identity file&lt;/em&gt;: project structure, infrastructure, tool index, session rules. Session rules appear first — not buried — because of the "lost in the middle" attention gradient documented by Liu et al. (2023).&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Failure pattern file&lt;/em&gt;: every entry is a real production failure encoded as a structured triple: &lt;code&gt;Failure | Trigger | Rule&lt;/code&gt;. Always loaded. Consulted before tool calls.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tier 2 — Client-scoped (loaded via explicit switch):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Client context file&lt;/em&gt;: domain-specific rules, approved sources, active work log, client-specific failure patterns. Never loaded during other client sessions. Zero cross-contamination.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tier 3 — Task-scoped (loaded by session type):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Workflow file&lt;/em&gt;: embedded executable sequences, not passive documentation. The NLP audit loop is written as a runnable checklist, not prose.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Architecture file&lt;/em&gt;: decision gates written as prompts. Before any structural recommendation, the agent runs a three-step investment check — hardcoded as a trigger, not a suggestion.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Session notes&lt;/em&gt;: ephemeral working memory. Cleared each session. Holds active threads, decisions made, blockers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Total always-on footprint: ~1,000 tokens. Task-scoped files add 500–700 tokens only when relevant. This is a ~60% reduction from the monolithic baseline, with higher signal density at every tier.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;The Failure Pattern File: Applied RCL&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The most important file in the system isn't the one with the most information — it's the one that encodes what went wrong.&lt;/p&gt;

&lt;p&gt;Recent work on Reflective Context Learning (RCL, arxiv 2604.03189) formalizes something practitioners have been doing informally: treating context optimization as a training loop. The forward pass executes the agent. The backward pass reflects on the trace and identifies which context entry was absent or wrong. The optimizer step mutates that entry.&lt;/p&gt;

&lt;p&gt;The failure pattern file is the mutation log. Each entry follows a strict schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;| Failure | Trigger | Rule |
|---------|---------|------|
| Silent tool timeout | Batch API call | Single requests only — 
  no error is thrown on batch failure |
| OAuth token expiry | ~90 day intervals | Re-authenticate before 
  session; token refresh is not automatic |
| Entity misclassification | Repeated superlative phrases | 
  Rewrite entire sentence — removing one word doesn't 
  clear the pattern |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Critically, the &lt;code&gt;update_context&lt;/code&gt; tool now accepts an optional &lt;code&gt;failure&lt;/code&gt; parameter. When something breaks mid-session, the agent writes to both the session log and the failure pattern file simultaneously. The mutation is captured at the moment of failure — not reconstructed from memory at session end.&lt;/p&gt;

&lt;p&gt;This is what Meta's engineering team described in their April 2026 post on tribal knowledge capture: the most valuable context isn't what the system does when it works — it's what causes it to fail silently.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;The "Compass, Not Encyclopedia" Constraint&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Meta's framework for context file design: 25–35 lines per file, four sections maximum — Quick Commands, Key Files, Non-Obvious Patterns, See Also. Every line earns its place or gets cut.&lt;/p&gt;

&lt;p&gt;The instinct when building these systems is to add more. More rules, more examples, more edge cases. That instinct is wrong. A 4,000-token context file with 80% signal is worse than a 1,000-token file with 95% signal, because attention is not uniformly distributed across tokens. The model doesn't read your context file the way a human reads a document. It attends to it — and attention degrades with distance and density.&lt;/p&gt;

&lt;p&gt;The design principle that follows: &lt;strong&gt;never put passive information where active instructions belong.&lt;/strong&gt; If a rule matters, write it as a trigger. If a workflow matters, write it as a sequence. Documentation is for humans. Context is for attention.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;What This Changes in Practice&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Three things that work better with modular context:&lt;/p&gt;

&lt;p&gt;First, &lt;strong&gt;session startup is declarative.&lt;/strong&gt; Instead of one large file that's always partially irrelevant, the agent loads exactly what it needs. A client-specific session loads the client file. An audit session loads the workflow file. The core file stays small and stable.&lt;/p&gt;

&lt;p&gt;Second, &lt;strong&gt;failures compound into capability.&lt;/strong&gt; Every production issue that gets structured into the failure pattern file makes the next session marginally more reliable. The system gets harder to break over time without any model fine-tuning — purely through context engineering.&lt;/p&gt;

&lt;p&gt;Third, &lt;strong&gt;the system is auditable.&lt;/strong&gt; Because context is modular and versioned in git, you can trace exactly what information was available to the agent during any session. When something goes wrong, you can identify whether the missing rule existed in the failure log, and if not, add it.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;The Gap That Remains&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The honest limitation: Claude.ai's MCP connector, as currently implemented, loads one context file automatically. The sub-files require explicit tool calls to retrieve. This means the agent must be instructed to load its own context — it doesn't happen natively.&lt;/p&gt;

&lt;p&gt;The workaround is a &lt;code&gt;get_subcontext&lt;/code&gt; tool that reads any file in the context directory by name. It works, but it's a patch on a deeper architectural gap: LLM interfaces don't yet treat context as a first-class, dynamically composable resource. They treat it as a static field.&lt;/p&gt;

&lt;p&gt;That's the next frontier. Not larger context windows — smarter context routing.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Building something similar? The patterns here — tiered load triggers, structured failure logs, embedded executable prompts — generalize beyond any specific stack. The core insight is simple: treat your context files the way a good engineer treats a codebase. Small, modular, version-controlled, and self-documenting.&lt;/em&gt;&lt;/p&gt;




</description>
      <category>agents</category>
      <category>architecture</category>
      <category>llm</category>
      <category>promptengineering</category>
    </item>
    <item>
      <title>I Built a Self-Updating Memory System for Claude Using a Custom MCP Server</title>
      <dc:creator>Salt Creative</dc:creator>
      <pubDate>Fri, 03 Apr 2026 16:59:51 +0000</pubDate>
      <link>https://forem.com/salt_creative/i-built-a-self-updating-memory-system-for-claude-using-a-custom-mcp-server-4kj2</link>
      <guid>https://forem.com/salt_creative/i-built-a-self-updating-memory-system-for-claude-using-a-custom-mcp-server-4kj2</guid>
      <description>&lt;p&gt;I've been running a &lt;a href="https://www.sltcreative.com/beyond-the-dashboard-using-model-context-protocol-mcp-to-give-claude-direct-access-to-your-gsc-data" rel="noopener noreferrer"&gt;custom MCP server&lt;/a&gt; connected to Claude.ai for several months as part of a proprietary SEO intelligence platform. The setup works well for structured data queries — rankings, crawl issues, keyword gaps — but session memory was the weak point.&lt;/p&gt;

&lt;p&gt;I had a flat markdown file that stored project context: open tasks, doctrine, session history. It loaded at the start of every session via a &lt;code&gt;get_context&lt;/code&gt; tool. The problem? Updating it required manually editing the file on the server after every session. It kept drifting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix was simpler than I expected.&lt;/strong&gt;&lt;br&gt;
I added a single append-only tool called &lt;code&gt;update_context&lt;/code&gt; to my FastMCP server. It takes one argument — a plain text summary of what happened in the session. The tool auto-injects the date and appends a dated entry to the Session History section of the context file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;python&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;update_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Appends a dated session history entry to the context file.
    Date is auto-injected — pass only the summary text.
    Format: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[what was done]. Next: [what to monitor or do next].&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;context_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;home&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;project-context.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;today&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;formatted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;**&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:** &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;entry&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;marker&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;## Session History&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;marker&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: Session History section not found.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;insert_point&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rfind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;**2025-&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;insert_point&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;insert_point&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;marker&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;marker&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;end_of_line&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;insert_point&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;insert_point&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;end_of_line&lt;/span&gt;

    &lt;span class="n"&gt;new_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;insert_point&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;formatted&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;insert_point&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
    &lt;span class="n"&gt;context_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Context updated: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;formatted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now at the end of every Claude session I just say "log this session" and Claude calls the tool directly. No copy-pasting, no opening files, no forgetting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The full memory stack looks like this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;get_context&lt;/code&gt; — loads the flat markdown file at session start (project history, open TODOs, doctrine)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;update_context&lt;/code&gt; — appends session summary at session end&lt;/li&gt;
&lt;li&gt;DuckDB — structured queryable data (rankings, crawl data, keyword gaps)&lt;/li&gt;
&lt;li&gt;Claude.ai native memory — personal preferences and recurring facts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Three complementary layers, no RAG, no vector database, no additional infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why not RAG?&lt;/strong&gt;&lt;br&gt;
I evaluated it. For a single-project setup with well-curated context, RAG is overkill. The flat file loads instantly, costs nothing, and you control exactly what the model knows. RAG earns its place when you have hundreds of unstructured documents you need to search semantically. I'm not there yet.&lt;br&gt;
The flat file + append tool beats RAG for the majority of single-project use cases.&lt;/p&gt;

&lt;p&gt;The whole append tool is about 30 lines of Python. If you're already running a FastMCP server and maintaining a context file manually, this is a straightforward upgrade worth doing.&lt;br&gt;
Happy to answer questions in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>productivity</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
