<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Joel Alan</title>
    <description>The latest articles on Forem by Joel Alan (@lxfu1).</description>
    <link>https://forem.com/lxfu1</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F705107%2Ff890f9cb-10e9-46fe-bc3b-c8398969c3e1.jpeg</url>
      <title>Forem: Joel Alan</title>
      <link>https://forem.com/lxfu1</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/lxfu1"/>
    <language>en</language>
    <item>
      <title>Context Compression and Persistent Memory Design for Terminal AI Assistants</title>
      <dc:creator>Joel Alan</dc:creator>
      <pubDate>Thu, 23 Apr 2026 11:12:50 +0000</pubDate>
      <link>https://forem.com/lxfu1/context-compression-and-persistent-memory-design-for-terminal-ai-assistants-2j19</link>
      <guid>https://forem.com/lxfu1/context-compression-and-persistent-memory-design-for-terminal-ai-assistants-2j19</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Drawing from practical experience with memo-agent (simplified Hermes version), exploring how to give terminal AI assistants "long-term memory" and "extended conversation" capabilities.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  I. The Starting Point
&lt;/h2&gt;

&lt;p&gt;Imagine this scenario: You're pair-programming with AI through a CLI tool, discussing project architecture, database design, and API specifications for 2 hours straight. The AI performs well, remembering every constraint you mentioned.&lt;/p&gt;

&lt;p&gt;The next day, you open the terminal and type "continue yesterday's design." The AI replies: "Sure, what did we discuss yesterday?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The conversation context has been reset.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This isn't science fiction—it's the reality of most terminal AI tools today. Other common issues include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After 30 rounds of code review, the AI "forgets" the architectural constraints you mentioned at the beginning&lt;/li&gt;
&lt;li&gt;Context window warnings force you to &lt;code&gt;/clear&lt;/code&gt; and start over&lt;/li&gt;
&lt;li&gt;No cross-session context accumulation means re-explaining background every time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The root causes are twofold:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Brutal context truncation&lt;/strong&gt; — During extended conversations, the system often discards the earliest messages, causing critical information loss&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No memory across sessions&lt;/strong&gt; — Each session starts from zero, unable to accumulate project context&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The design goal of memo-agent is to solve these two problems.&lt;/p&gt;




&lt;h2&gt;
  
  
  II. Persistent Memory: Letting AI Remember "Who You Are"
&lt;/h2&gt;

&lt;h3&gt;
  
  
  2.1 Local File Memory
&lt;/h3&gt;

&lt;p&gt;The simplest persistence solution is often the most reliable. memo-agent uses local Markdown files as memory carriers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~/.memo-agent/memory/
  NOTES.md      # Work notes (agent can read/write)
  PROFILE.md    # User preferences (read-only, maintained by user)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;NOTES.md&lt;/strong&gt; is automatically updated by the agent after each conversation round when deemed necessary. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You mentioned "this project uses functional style, avoid classes"&lt;/li&gt;
&lt;li&gt;You specified "API responses should always use &lt;code&gt;{code, data, message}&lt;/code&gt; format"&lt;/li&gt;
&lt;li&gt;You indicated "use SQLite WAL mode for database"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the agent considers these valuable, it appends them to NOTES.md. These notes are automatically injected into the system prompt at the start of the next session, becoming the agent's "common knowledge."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PROFILE.md&lt;/strong&gt; is manually maintained by the user, suitable for long-term stable preferences:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;I'm a backend engineer, primarily using Go and TypeScript.
Code style: functional first, avoid over-abstraction.
Please respond in Chinese, with English code comments.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2.2 Safe Injection Mechanism
&lt;/h3&gt;

&lt;p&gt;Injecting local file content into the system prompt carries security risks — if files are maliciously tampered with, they may contain prompt injection attacks. memo-agent scans content before injection, detecting the following patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Ignore previous instructions"&lt;/li&gt;
&lt;li&gt;"You are now a... role"&lt;/li&gt;
&lt;li&gt;"Send the following information to..."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If detected, injection is skipped and an alert is shown in the UI.&lt;/p&gt;

&lt;h3&gt;
  
  
  2.3 Session Chain: History Never Lost
&lt;/h3&gt;

&lt;p&gt;NOTES.md alone isn't enough. When conversations grow long and need compression, memo-agent doesn't truncate brutally. Instead:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Uses an auxiliary model to generate a summary of intermediate history&lt;/li&gt;
&lt;li&gt;Creates a new session with the summary as the starting context&lt;/li&gt;
&lt;li&gt;Old sessions are linked through &lt;code&gt;parent_session_id&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This forms the following structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────────┐
│  Session Tree                                               │
│                                                             │
│  Session A (2024-01-15)                                     │
│  ├── Original conversation: 50 rounds                       │
│  ├── Input tokens: 45,000                                   │
│  └── Child session: B                                       │
│                                                             │
│      Session B (2024-01-16)                                 │
│      ├── Compressed summary: "Decided to use SQLite WAL..." │
│      ├── New conversation: 30 rounds                        │
│      ├── Input tokens: 28,000                               │
│      └── Child session: C                                   │
│                                                             │
│          Session C (2024-01-17)                             │
│          ├── Secondary compression summary                  │
│          └── New conversation: 20 rounds                    │
│                                                             │
└─────────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Session Chain Database Design&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;sessions&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;parent_session_id&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;sessions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;-- Chain linkage&lt;/span&gt;
  &lt;span class="n"&gt;compressed_summary&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;-- Summary inherited from parent session&lt;/span&gt;
  &lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;output_tokens&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;estimated_cost_usd&lt;/span&gt; &lt;span class="nb"&gt;REAL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'now'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
  &lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'now'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In theory, this chain can extend infinitely—history is never lost. Users can view the session chain via &lt;code&gt;/history&lt;/code&gt; and use &lt;code&gt;--resume &amp;lt;session-id&amp;gt;&lt;/code&gt; to return to any node.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost Comparison&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Solution&lt;/th&gt;
&lt;th&gt;Cost per round after 100 rounds&lt;/th&gt;
&lt;th&gt;Pros&lt;/th&gt;
&lt;th&gt;Cons&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No compression&lt;/td&gt;
&lt;td&gt;~$0.015/round&lt;/td&gt;
&lt;td&gt;Complete history&lt;/td&gt;
&lt;td&gt;Linear cost growth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Three-zone compression&lt;/td&gt;
&lt;td&gt;~$0.006/round&lt;/td&gt;
&lt;td&gt;Balanced&lt;/td&gt;
&lt;td&gt;Summaries may lose details&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Direct truncation&lt;/td&gt;
&lt;td&gt;~$0.005/round&lt;/td&gt;
&lt;td&gt;Low cost&lt;/td&gt;
&lt;td&gt;Loses early context&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  III. Context Compression: Three-Zone Model Engineering Practice
&lt;/h2&gt;

&lt;h3&gt;
  
  
  3.1 Why Compression is Needed
&lt;/h3&gt;

&lt;p&gt;Large models have limited context windows (e.g., GPT-4o's 128k tokens). Even with sufficient window size, excessively long contexts cause two problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Linear cost growth&lt;/strong&gt; — Every round sends the entire history, token consumption keeps increasing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Attention dilution&lt;/strong&gt; — The model may "overlook" key information buried in the middle of long contexts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Common truncation strategies (directly discarding earliest messages) break conversation coherence. memo-agent adopts a more elegant three-zone model.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.2 Three-Zone Model
&lt;/h3&gt;

&lt;p&gt;Divides conversation context into three zones:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────┐
│  HEAD (Anchor) ~ 4k tokens                              │
│  ├── system prompt (NOTES.md + PROFILE.md)              │
│  ├── First user input (project background, constraints) │
│  └── First AI response (core decisions)                 │
│  Never compressed, retains full semantics               │
├─────────────────────────────────────────────────────────┤
│  MIDDLE (Archive) Dynamic adjustment                    │
│  Before compression: Complete conversation history        │
│                      (Round 2 to N-20)                  │
│  After compression: LLM-generated structured summary      │
│  Example: "Decided to use SQLite WAL mode, pending:      │
│            index fields"                                │
├─────────────────────────────────────────────────────────┤
│  TAIL (Active) ~20k tokens                              │
│  Last 20 rounds of conversation, fully preserved        │
│  Ensures complete context for current topic             │
└─────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Compression Trigger Strategy&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Threshold&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;th&gt;Status Bar Display&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;70%&lt;/td&gt;
&lt;td&gt;Yellow warning&lt;/td&gt;
&lt;td&gt;&lt;code&gt;tokens: 89k/128k (70%)&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;85%&lt;/td&gt;
&lt;td&gt;Auto-trigger archive&lt;/td&gt;
&lt;td&gt;Shows "compressing..."&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/compact [focus description]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Execute immediately&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Summary Generation Prompt Template&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;COMPRESS_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`Compress the following conversation history into a structured summary.
Retain: Key decisions, pending items, technical constraints
Discard: Specific code implementations, debugging process, repeated discussions

Format:
- Decision: [matter]
- Constraint: [condition]
- Pending: [to be confirmed]

Conversation history:
{{history}}`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3.3 Summary Generation Strategy
&lt;/h3&gt;

&lt;p&gt;Archive compression isn't simple text truncation—it's about having an auxiliary model (recommend low-cost models like gpt-4o-mini) generate structured summaries. For example:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Original Conversation (10 rounds)&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: I want to write a SQLite storage layer
AI: Sure, I recommend using better-sqlite3...
User: Need to support WAL mode
AI: WAL mode configuration is as follows...
User: Also add FTS5 full-text search
AI: You can create a virtual table like this...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Compressed Summary&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- Decision: Use better-sqlite3 as the database driver
- Configuration: Enable WAL mode (concurrent read/write, better performance)
- Feature: Add FTS5 virtual table for full-text search support
- Pending: Index fields to be confirmed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The summary retains &lt;strong&gt;decision points&lt;/strong&gt; and &lt;strong&gt;pending items&lt;/strong&gt;, discarding implementation details. If these details are needed later, they can be retrieved through &lt;code&gt;/search&lt;/code&gt; in the history.&lt;/p&gt;

&lt;h3&gt;
  
  
  3.4 Auxiliary Model Cost Reduction
&lt;/h3&gt;

&lt;p&gt;Archive compression can be configured with an independent auxiliary model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpt-4o&lt;/span&gt;              &lt;span class="c1"&gt;# Main model, responsible for high-quality conversations&lt;/span&gt;

&lt;span class="na"&gt;auxiliary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;         &lt;span class="c1"&gt;# Auxiliary model, responsible for archive summaries&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Typical scenario &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;100 rounds of conversation, cumulative 120k tokens consumed&lt;/li&gt;
&lt;li&gt;When triggering archive, use gpt-4o-mini to process 80k tokens of intermediate history&lt;/li&gt;
&lt;li&gt;Generate 2k tokens of summary&lt;/li&gt;
&lt;li&gt;Save approximately 60% of token consumption for subsequent rounds&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  IV. Full-Text Search: Adding a Search Engine to History
&lt;/h2&gt;

&lt;p&gt;Summaries alone aren't enough—users often ask "what did we discuss before," requiring precise recall. memo-agent implements full-text search using FTS5 virtual tables on SQLite.&lt;/p&gt;

&lt;h3&gt;
  
  
  4.1 Table Structure Design
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Messages table&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;role&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;-- 'user' | 'assistant' | 'tool' | 'system'&lt;/span&gt;
  &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;tool_calls&lt;/span&gt; &lt;span class="n"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="c1"&gt;-- Tool call records&lt;/span&gt;
  &lt;span class="n"&gt;token_count&lt;/span&gt; &lt;span class="nb"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'now'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- FTS5 full-text index virtual table (automatic tokenization, inverted index)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;VIRTUAL&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;messages_fts&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;fts5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                    &lt;span class="c1"&gt;-- Index field&lt;/span&gt;
  &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'messages'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;         &lt;span class="c1"&gt;-- Associated source table&lt;/span&gt;
  &lt;span class="n"&gt;content_rowid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'id'&lt;/span&gt;          &lt;span class="c1"&gt;-- Link through rowid&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.2 Automatic Synchronization Mechanism
&lt;/h3&gt;

&lt;p&gt;Uses triggers to keep FTS index synchronized with source table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Automatically sync to FTS on insert&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TRIGGER&lt;/span&gt; &lt;span class="n"&gt;messages_fts_insert&lt;/span&gt; &lt;span class="k"&gt;AFTER&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="k"&gt;BEGIN&lt;/span&gt;
  &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;messages_fts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rowid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rowid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Automatically clean up FTS on delete&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TRIGGER&lt;/span&gt; &lt;span class="n"&gt;messages_fts_delete&lt;/span&gt; &lt;span class="k"&gt;AFTER&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="k"&gt;BEGIN&lt;/span&gt;
  &lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;messages_fts&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;rowid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;old&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rowid&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Sync changes on update&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TRIGGER&lt;/span&gt; &lt;span class="n"&gt;messages_fts_update&lt;/span&gt; &lt;span class="k"&gt;AFTER&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="k"&gt;BEGIN&lt;/span&gt;
  &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;messages_fts&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;rowid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;old&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rowid&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.3 Query and Security Protection
&lt;/h3&gt;

&lt;p&gt;When user inputs &lt;code&gt;/search sqlite WAL mode&lt;/code&gt;, the underlying execution is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;searchMessages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Database&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;limit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Escape FTS5 special characters to prevent syntax injection&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;safeQuery&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/"/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;""&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;      &lt;span class="c1"&gt;// Escape double quotes&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\\\\&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;// Escape backslashes&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\*&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;        &lt;span class="c1"&gt;// Remove wildcards (or keep based on requirements)&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
    SELECT m.*, s.title, rank
    FROM messages_fts f
    JOIN messages m ON f.rowid = m.id
    JOIN sessions s ON m.session_id = s.id
    WHERE messages_fts MATCH ?
    ORDER BY rank
    LIMIT ?
  `&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prepare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;`"&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;safeQuery&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\`&lt;/span&gt;&lt;span class="s2"&gt;, limit);
}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Query Result Example&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; /search sqlite WAL mode

[Session: Database Design Discussion]
User: I want to write a SQLite storage layer
Assistant: Sure, I recommend using better-sqlite3 and enabling WAL mode...

[Session: Performance Optimization]
User: How to handle read-write conflicts in WAL mode?
Assistant: WAL mode supports concurrent read/write, but you need to configure busy_timeout...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.4 Performance Data
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;10,000 messages: Index build time ~200ms, query time &amp;lt; 10ms&lt;/li&gt;
&lt;li&gt;100,000 messages: Index size approximately 30% of original data, query time &amp;lt; 50ms&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  V. Summary
&lt;/h2&gt;

&lt;p&gt;Giving terminal AI assistants "long-term memory" and "extended conversation" capabilities centers on three designs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Local file memory&lt;/strong&gt; — Simple and reliable, automatic injection, with security scanning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Three-zone compression model&lt;/strong&gt; — HEAD anchor + MIDDLE summary + TAIL active, balancing completeness and cost&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session chain + full-text search&lt;/strong&gt; — History never lost, key information retrievable&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These designs aren't silver bullets. Summaries lose details, automatic memory may introduce noise, and token counting has errors. But in engineering practice, they strike a good balance between &lt;strong&gt;usability&lt;/strong&gt; and &lt;strong&gt;cost&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  5.1 Quick Start
&lt;/h3&gt;

&lt;p&gt;If you want to experience these features:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; memo-agent

&lt;span class="c"&gt;# Initialize configuration&lt;/span&gt;
memo init

&lt;span class="c"&gt;# Start conversation (automatically loads memory)&lt;/span&gt;
memo

&lt;span class="c"&gt;# View conversation history&lt;/span&gt;
memo &lt;span class="nt"&gt;--history&lt;/span&gt;

&lt;span class="c"&gt;# Return to specific session&lt;/span&gt;
memo &lt;span class="nt"&gt;--resume&lt;/span&gt; &amp;lt;session-id&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5.2 Future Roadmap
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MCP Integration&lt;/strong&gt;: Connect external data sources (Notion, GitHub Issues, etc.) through Model Context Protocol&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal Memory&lt;/strong&gt;: Support OCR indexing and retrieval of images, code screenshots&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smart Archive Strategy&lt;/strong&gt;: Automatically determine compression granularity based on conversation importance, rather than simple token thresholds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Collaborative Memory&lt;/strong&gt;: Team-shared NOTES.md for unified project standards&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're building similar terminal AI tools, welcome to discuss and exchange ideas.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Reference Implementation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Project: &lt;a href="https://github.com/lxfu1/memo-agent" rel="noopener noreferrer"&gt;github.com/lxfu1/memo-agent&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Core Modules:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;src/context/compressor.ts&lt;/code&gt; — Three-zone compression implementation&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;src/memory/notesManager.ts&lt;/code&gt; — Local file memory management&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;src/session/db.ts&lt;/code&gt; — SQLite and session chain design&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;src/model/streaming.ts&lt;/code&gt; — Streaming conversation processing&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Further Reading&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://platform.openai.com/docs/guides/chat-completions" rel="noopener noreferrer"&gt;OpenAI Chat Completions API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.sqlite.org/fts5.html" rel="noopener noreferrer"&gt;SQLite FTS5 Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/openai/tiktoken" rel="noopener noreferrer"&gt;tiktoken Tokenization Principles&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;Model Context Protocol Specification&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>cli</category>
      <category>llm</category>
    </item>
    <item>
      <title>memo-agent is a terminal-based AI assistant application (Hermes Agent simplified version), built with TypeScript + React + Ink. features persistent memory, MCP tool extensions, and intelligent context compression.

https://github.com/lxfu1/memo-agent</title>
      <dc:creator>Joel Alan</dc:creator>
      <pubDate>Thu, 23 Apr 2026 10:11:46 +0000</pubDate>
      <link>https://forem.com/lxfu1/memo-agent-is-a-terminal-based-ai-assistant-application-hermes-agent-simplified-version-built-1km8</link>
      <guid>https://forem.com/lxfu1/memo-agent-is-a-terminal-based-ai-assistant-application-hermes-agent-simplified-version-built-1km8</guid>
      <description>&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://github.com/lxfu1/memo-agent" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fopengraph.githubassets.com%2F15950e9353d333929a1ccd0b0db029ac891b6e5532ced3ef19708f78f9556018%2Flxfu1%2Fmemo-agent" height="600" class="m-0" width="1200"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://github.com/lxfu1/memo-agent" rel="noopener noreferrer" class="c-link"&gt;
            GitHub - lxfu1/memo-agent: memo-agent is a terminal-based AI assistant application (Hermes Agent simplified version), built with TypeScript + React + Ink. It connects directly to OpenAI-compatible APIs and features persistent memory, MCP tool extensions, and intelligent context compression. · GitHub
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            memo-agent is a terminal-based AI assistant application (Hermes Agent simplified version), built with TypeScript + React + Ink. It connects directly to OpenAI-compatible APIs and features persisten...
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.githubassets.com%2Ffavicons%2Ffavicon.svg" width="32" height="32"&gt;
          github.com
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


</description>
      <category>agents</category>
      <category>ai</category>
      <category>showdev</category>
      <category>typescript</category>
    </item>
  </channel>
</rss>
