<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Victor García</title>
    <description>The latest articles on Forem by Victor García (@micelclaw).</description>
    <link>https://forem.com/micelclaw</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3820897%2Fa121360b-8d01-406f-a889-9304625f2e47.png</url>
      <title>Forem: Victor García</title>
      <link>https://forem.com/micelclaw</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/micelclaw"/>
    <language>en</language>
    <item>
      <title>Designing an AI approval system: when should your agent ask for permission?</title>
      <dc:creator>Victor García</dc:creator>
      <pubDate>Tue, 07 Apr 2026 10:38:11 +0000</pubDate>
      <link>https://forem.com/micelclaw/designing-an-ai-approval-system-when-should-your-agent-ask-for-permission-k94</link>
      <guid>https://forem.com/micelclaw/designing-an-ai-approval-system-when-should-your-agent-ask-for-permission-k94</guid>
      <description>&lt;p&gt;An AI agent that can only read data is safe but useless. An AI agent that can send emails, delete files, format disks, and configure VPNs is useful but terrifying. The entire value of a personal AI operating system comes from the agent acting on your behalf — and the entire risk comes from the same thing.&lt;/p&gt;

&lt;p&gt;We needed a system that says "yes" fast to everyday operations and "are you sure?" to dangerous ones. Not a blanket confirmation on everything (that just trains the user to click "approve" without reading). Not unrestricted access either (one prompt injection away from &lt;code&gt;rm -rf /&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;This post is about the 4-level approval system we built, the dual-layer architecture (shell + API), and the surprisingly difficult design decision of where to draw the line between "just do it" and "ask me first."&lt;/p&gt;

&lt;h2&gt;
  
  
  The two attack surfaces
&lt;/h2&gt;

&lt;p&gt;An AI agent in our system can cause damage in two completely different ways:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Shell execution.&lt;/strong&gt; The agent uses the runtime's &lt;code&gt;exec&lt;/code&gt; tool to run commands on the host machine. This is raw power — &lt;code&gt;curl&lt;/code&gt;, &lt;code&gt;ls&lt;/code&gt;, &lt;code&gt;grep&lt;/code&gt;, but also &lt;code&gt;rm&lt;/code&gt;, &lt;code&gt;dd&lt;/code&gt;, &lt;code&gt;python3 -c 'import os; os.system("...")'&lt;/code&gt;. The attack surface is the entire operating system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;API operations.&lt;/strong&gt; The agent calls our REST API via &lt;code&gt;curl&lt;/code&gt;. The &lt;code&gt;curl&lt;/code&gt; command itself is harmless — it's the endpoint that's dangerous. &lt;code&gt;POST /storage/pools&lt;/code&gt; creates a RAID array. &lt;code&gt;DELETE /files/:id&lt;/code&gt; removes a file. &lt;code&gt;POST /emails/send&lt;/code&gt; sends an email you can't unsend. The business logic is the attack surface.&lt;/p&gt;

&lt;p&gt;These need separate control mechanisms because they have different risk profiles and different mitigation strategies.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 1: Shell control
&lt;/h2&gt;

&lt;p&gt;By default, the agent can only execute a small set of safe binaries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Safe bins: curl, jq, cat, echo, date, wc, head, tail, grep
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. The agent talks to the outside world through &lt;code&gt;curl&lt;/code&gt; to our REST API. Everything else — file manipulation, package installation, network commands, scripting — is blocked at the runtime level.&lt;/p&gt;

&lt;p&gt;There's an "Unrestricted Shell Mode" toggle in Settings → Security. It's deliberately scary: the toggle is marked in red, requires the user's password (not just a click), and shows a warning explaining that this allows the agent to execute any command on the system.&lt;/p&gt;

&lt;p&gt;Even in unrestricted mode, destructive commands (&lt;code&gt;rm&lt;/code&gt;, &lt;code&gt;dd&lt;/code&gt;, &lt;code&gt;mkfs&lt;/code&gt;, &lt;code&gt;fdisk&lt;/code&gt;) always require per-operation confirmation from the user. Full freedom doesn't mean no guardrails — it means the agent can attempt anything, but the user decides on dangerous operations.&lt;/p&gt;

&lt;p&gt;The key design principle: most users never enable unrestricted mode. The agent does everything it needs through the API. Shell access is a power-user feature for people who know what they're doing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Layer 2: Operation approvals
&lt;/h2&gt;

&lt;p&gt;This is where it gets interesting. Every API operation has an approval level:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;What happens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Auto&lt;/td&gt;
&lt;td&gt;Execute immediately, no record. Reads, searches, listings.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Logged&lt;/td&gt;
&lt;td&gt;Execute immediately, log to audit trail. Creates, updates, non-destructive writes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Confirm&lt;/td&gt;
&lt;td&gt;Pause and ask the user "Are you sure?" before executing. Sends, deletes, infrastructure changes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Secure&lt;/td&gt;
&lt;td&gt;Pause, ask for confirmation AND a numeric PIN. Format disk, delete volume, system reset.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The approval level is checked by a Fastify middleware (preHandler) that runs before the route handler. If the request comes from an agent token and the operation requires Level 2+, the middleware returns a &lt;code&gt;202 Accepted&lt;/code&gt; with an &lt;code&gt;approval_id&lt;/code&gt; instead of executing the operation. The agent then asks the user for confirmation through whatever channel they're chatting on — Telegram, WhatsApp, or the web dashboard.&lt;/p&gt;

&lt;h3&gt;
  
  
  How it looks in practice
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Level 0 — Auto (reading notes):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;User:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"What notes do I have about the project?"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Agent:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;GET&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;/notes?search=project&amp;amp;format=compact&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Executes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;immediately.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;User&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;sees&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;results.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No friction. Reads are always auto.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 1 — Logged (creating a note):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;User:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Save a note about today's meeting decisions"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Agent:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;POST&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;/notes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;title:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Meeting decisions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;content:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Executes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;immediately.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Logged&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;audit&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;trail.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Still no friction — the user asked for it. But the audit trail records that agent &lt;code&gt;francis&lt;/code&gt; created note &lt;code&gt;abc123&lt;/code&gt; at 14:32. If something goes wrong, there's accountability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 2 — Confirm (sending an email):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;User:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Send Ana the budget update"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Agent:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;POST&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;/emails/send&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;to:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ana@techcorp.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;subject:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Budget Q3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;202&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Accepted&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;approval_id:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"req_xyz"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Agent:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"I'm about to send this email to Ana García:
        Subject: Budget Q3
        [Preview of the body]
        Should I send it?"&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;User:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Yes"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Agent:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;POST&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;/approvals/req_xyz/approve&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Email&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;sent.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One confirmation step. The agent shows what it's about to do. The user says yes or no. This catches the common case where the agent misunderstood the intent — "I said Ana, not María" — without making every email a five-step process.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 3 — Secure (deleting a volume):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;User:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Delete the old backup volume"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Agent:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;POST&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;/storage/volumes/vol_old/delete&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;202&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Accepted&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;approval_id:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"req_abc"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;level:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Agent:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"⚠️ This will permanently delete volume vol_old (2.3TB).
        This cannot be undone.
        Please confirm with your security PIN."&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;User:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"4829"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Agent:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;POST&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;/approvals/req_abc/approve&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;pin:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"4829"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Volume&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;deleted.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two-factor: the user confirms AND enters their PIN. The PIN is a 4-6 digit numeric code set during initial setup, stored hashed with bcrypt. In messaging channels, the PIN message is deleted from chat history after verification (when the channel supports it).&lt;/p&gt;

&lt;h2&gt;
  
  
  The approval lifecycle
&lt;/h2&gt;

&lt;p&gt;Every Level 2+ operation creates an &lt;code&gt;approval_request&lt;/code&gt; record:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;approval_requests&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;              &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;         &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;requested_by&lt;/span&gt;    &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;-- 'agent:francis'&lt;/span&gt;
    &lt;span class="k"&gt;operation&lt;/span&gt;       &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;-- 'POST /emails/send'&lt;/span&gt;
    &lt;span class="k"&gt;level&lt;/span&gt;           &lt;span class="nb"&gt;SMALLINT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;-- 2 or 3&lt;/span&gt;
    &lt;span class="n"&gt;summary&lt;/span&gt;         &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;            &lt;span class="c1"&gt;-- "Send email to Ana: Budget Q3"&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt;          &lt;span class="n"&gt;JSONB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                    &lt;span class="c1"&gt;-- Request body snapshot&lt;/span&gt;
    &lt;span class="n"&gt;status&lt;/span&gt;          &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;resolved_at&lt;/span&gt;     &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;pin_verified&lt;/span&gt;    &lt;span class="nb"&gt;BOOLEAN&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;expires_at&lt;/span&gt;      &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt;      &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The lifecycle is:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6qbx1xet8glw8nzh3kwz.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6qbx1xet8glw8nzh3kwz.webp" alt="Timeouts, reminder, escalation, expiry"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent triggers operation
    ↓
Middleware: level &amp;gt;= 2? → Create approval_request (status: pending)
    ↓
Notify user via WebSocket (Dash) + Gateway RPC (Telegram/WhatsApp)
    ↓
User approves, rejects, or ignores
    ↓
├── Approved → Execute operation, status: approved
├── Rejected → Return error to agent, status: rejected
└── Timeout (30min) → Auto-expire, status: expired
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three timeout stages prevent approvals from hanging forever:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;Default&lt;/th&gt;
&lt;th&gt;What happens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Reminder&lt;/td&gt;
&lt;td&gt;5 minutes&lt;/td&gt;
&lt;td&gt;Send a reminder to the user&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Escalation&lt;/td&gt;
&lt;td&gt;15 minutes&lt;/td&gt;
&lt;td&gt;Notify the system owner (if different from the user)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Expiry&lt;/td&gt;
&lt;td&gt;30 minutes&lt;/td&gt;
&lt;td&gt;Auto-reject the request&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All three are configurable in Settings → Security → Approval Timeouts.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's configurable and what isn't
&lt;/h2&gt;

&lt;p&gt;The default levels are sensible but not everyone agrees on what's "dangerous." A power user who sends 50 emails a day wants email sending at Level 1 (logged, no confirmation). A cautious user wants it at Level 2 (confirm every one).&lt;/p&gt;

&lt;p&gt;Settings → Security shows a table of operations with dropdown selectors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Operation                    Default    Your Level
─────────────────────────────────────────────────
Create note                  Logged     [1 - Logged ▼]
Send email                   Confirm    [2 - Confirm ▼]
Delete files (permanent)     Confirm    [2 - Confirm ▼]
Enable VPN                   Confirm    [2 - Confirm ▼]
Add VPN peer                 Confirm    [2 - Confirm ▼]
Delete volume                Secure     [3 - Secure  ▼]
Format disk                  Secure     [3 - Secure  ▼]
System reset                 Secure     [3 - Secure  ▼]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two constraints prevent dangerous configurations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Level 3 operations can't go below Level 2.&lt;/strong&gt; You can downgrade "Delete volume" from Secure (3) to Confirm (2), but not to Logged (1) or Auto (0). Destructive, irreversible operations always require at least one confirmation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Level 0 operations can't be upgraded.&lt;/strong&gt; Read operations are always auto. Making &lt;code&gt;GET /notes&lt;/code&gt; require confirmation would break the system — the agent would need approval to answer "what notes do I have?"&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Changing approval levels is itself a Level 2 operation — the system asks for confirmation before letting you change the security settings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why agents can't approve their own requests
&lt;/h2&gt;

&lt;p&gt;This sounds obvious but it's the most important security decision in the system: &lt;strong&gt;an agent API key cannot approve an approval request.&lt;/strong&gt; Only JWT tokens (human login via Dash) or system tokens can call &lt;code&gt;POST /approvals/:id/approve&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If the agent could approve its own requests, a prompt injection attack could chain: trigger the operation → intercept the approval request → approve it. The human-in-the-loop is only meaningful if the human is the one doing the approving.&lt;/p&gt;

&lt;p&gt;In messaging channels (Telegram, WhatsApp), the approval flows through the agent — the user says "yes" in the chat, and the agent calls the approve endpoint. But the approve endpoint verifies that the approval was triggered by a user message, not by the agent itself. The &lt;code&gt;requested_by&lt;/code&gt; field records which agent requested the operation, and the same agent cannot resolve it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The middleware: 15 lines that matter
&lt;/h2&gt;

&lt;p&gt;The approval check is a Fastify preHandler that runs on every route with an assigned level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Simplified — the real version handles edge cases&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;approvalMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Skip for human (JWT) requests — the Dash IS the confirmation&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;authType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;jwt&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;level&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getOperationLevel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;level&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Auto or Logged — proceed&lt;/span&gt;

  &lt;span class="c1"&gt;// Level 2 or 3: create approval request and pause&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;approval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createApprovalRequest&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;requestedBy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`agent:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;apiKeyName&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;operation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;method&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;level&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;buildSummary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;params&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// Notify user&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;notifyApproval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;approval&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Return 202 — the agent knows to ask the user&lt;/span&gt;
  &lt;span class="nx"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;202&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;approval_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;approval&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;level&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;approval&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;hint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;level&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; 
      &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Ask user to confirm with PIN&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; 
      &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Ask user to confirm&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;hint&lt;/code&gt; field in the 202 response tells the agent's skill what kind of confirmation to request. Level 2: ask for a yes/no. Level 3: ask for the PIN.&lt;/p&gt;

&lt;p&gt;Human requests from the Dash skip the middleware entirely. When you click "Send" in the email composer, you ARE the confirmation. The approval system only gates agent-initiated operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  The skill: teaching the agent to ask
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;claw-approvals&lt;/code&gt; skill teaches the agent how to handle the approval flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Approval Protocol&lt;/span&gt;

When you receive a 202 response with an approval_id:
&lt;span class="p"&gt;1.&lt;/span&gt; Show the user what you're about to do (use the summary field)
&lt;span class="p"&gt;2.&lt;/span&gt; For Level 2: ask "Should I proceed?"
&lt;span class="p"&gt;3.&lt;/span&gt; For Level 3: ask "Please confirm with your security PIN"
&lt;span class="p"&gt;4.&lt;/span&gt; On "yes" or PIN: POST /approvals/{id}/approve (with pin if Level 3)
&lt;span class="p"&gt;5.&lt;/span&gt; On "no" or "cancel": POST /approvals/{id}/reject
&lt;span class="p"&gt;6.&lt;/span&gt; Never approve your own requests — always wait for user input
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The skill also handles &lt;code&gt;/pending&lt;/code&gt; (show pending approvals), &lt;code&gt;/history&lt;/code&gt; (show past approvals), and edge cases like expired requests.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned about where to draw the line
&lt;/h2&gt;

&lt;p&gt;The hardest part wasn't building the system — it was deciding which operations go at which level. Some were obvious:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Read anything → Level 0.&lt;/strong&gt; No debate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Format disk → Level 3.&lt;/strong&gt; No debate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The middle ground is where every conversation happened:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Email sending: Level 1 or Level 2?&lt;/strong&gt; We went with Level 2 (Confirm) as default. Email is irreversible — you can't unsend it. A misunderstood intent ("send Ana the budget" when you meant "draft Ana the budget") has real consequences. But we made it configurable because power users find the confirmation annoying.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Creating a note: Level 0 or Level 1?&lt;/strong&gt; We went with Level 1 (Logged). Creating a note is harmless — but logging it means the audit trail shows everything the agent did. If the agent starts creating garbage notes due to a bug, the log tells you when it started. Zero friction, full accountability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;VPN operations: Level 2.&lt;/strong&gt; Enabling or disabling VPN changes network topology. Adding a peer grants network access to a device. These aren't destructive (you can undo them) but they affect security posture. Confirmation is appropriate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deleting files: Level 2, not Level 3.&lt;/strong&gt; Files have soft delete with restore capability. "Delete" doesn't mean "gone forever" — it means "moved to trash." If deletion were permanent (bypassing soft delete), it would be Level 3.&lt;/p&gt;

&lt;p&gt;The general heuristic: &lt;strong&gt;reversible actions need less approval than irreversible ones.&lt;/strong&gt; Creating is Level 1 (you can delete it). Sending is Level 2 (you can't unsend it). Destroying is Level 3 (it's gone forever).&lt;/p&gt;

&lt;h2&gt;
  
  
  The dual-layer interaction
&lt;/h2&gt;

&lt;p&gt;The two layers (shell + API) work together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;User:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Check my disk space"&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Agent:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;exec&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"df -h"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Layer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"df"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;safe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;bins&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Execute&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;immediately&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Agent:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;GET&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;/storage/overview?format=compact&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Layer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Level&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;(read)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Execute&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;immediately&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Agent:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"You have 450GB free on /data. The backup volume is 78% full."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No approvals needed — both layers allowed the operations silently.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;User:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Set up a RAID1 mirror with the two new drives"&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Agent:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;POST&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;/storage/pools&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;name:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mirror1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;level:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"raid1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;devices:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"sda"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sdb"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Layer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Level&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;(Confirm)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;202&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Accepted&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Agent:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"I'm about to create a RAID1 pool using sda and sdb. 
        This will erase all data on both drives. Confirm?"&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;User:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Yes"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Pool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;created.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And in unrestricted shell mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;User:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Install htop on the server"&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Agent:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;exec&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"apt-get install htop"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Layer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Not&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;safe&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;bins&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;unrestricted&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;mode&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Ask&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;confirmation&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Agent:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"I want to run: apt-get install htop
        This will install a package on your system. OK?"&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;User:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Go ahead"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Layer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Approved&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Execute&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both layers enforce independently. An operation that passes Layer 1 (shell) can still be blocked by Layer 2 (API). An operation that bypasses Layer 2 (because it's a direct shell command) is still caught by Layer 1.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;I'd add batch approvals from the start.&lt;/strong&gt; When the agent needs to send 15 emails from a mail merge, asking for 15 individual confirmations is absurd. A "batch approve" mechanism ("Send these 15 emails? Here's the list.") should have been in v1. It's now on the backlog.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'd make the approval history more visible.&lt;/strong&gt; The audit trail exists (&lt;code&gt;GET /approvals/history&lt;/code&gt;), and there's an "Approvals History" section in the sidebar, but it should be more prominent. A weekly summary — "Your agent executed 340 operations this week: 280 auto, 55 logged, 5 confirmed" — would build trust and help users understand their agent's activity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'd reconsider the messaging channel PIN flow.&lt;/strong&gt; Sending a PIN via Telegram is not ideal — it's visible in chat history even if the agent tries to delete the message. For Level 3 operations, maybe the system should redirect to the Dash where a secure input modal exists, rather than accepting PINs in plain text through messaging.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next: the sandbox
&lt;/h2&gt;

&lt;p&gt;There's a missing piece between "restricted mode" (curl only) and "unrestricted mode" (everything, with confirmations). What if the agent could have a place to go completely wild — install packages, run services, break things — without any risk to your real system?&lt;/p&gt;

&lt;p&gt;We're planning &lt;strong&gt;sandbox environments&lt;/strong&gt;: Docker containers that the agent can create from Settings → Security. Not one — as many as you need. Each sandbox is an isolated machine with full root access. &lt;code&gt;apt install&lt;/code&gt;, &lt;code&gt;pip install&lt;/code&gt;, &lt;code&gt;systemctl&lt;/code&gt;, custom scripts, databases, web servers — anything goes. Zero approval, zero restrictions, zero risk to the host.&lt;/p&gt;

&lt;p&gt;The workflow we're designing around:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;Sandbox "deploy-test"     → experimenting with nginx + certbot config&lt;/span&gt;
&lt;span class="s"&gt;Sandbox "ml-pipeline"     → building a data processing pipeline with pandas&lt;/span&gt;
&lt;span class="s"&gt;Sandbox "new-skill"       → developing and testing a new agent skill&lt;/span&gt;

&lt;span class="na"&gt;Each one&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;independent, disposable, unrestricted.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent knows which sandbox it's working in. Commands routed to a sandbox go to that container. Commands on the real system go through normal approval layers. The two worlds don't touch — no shared volumes, no network bridge to internal services, no mount to &lt;code&gt;/data&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The key is the &lt;strong&gt;promote-to-production&lt;/strong&gt; flow. Once you've got something working in the sandbox — a configuration, a script, a service setup — you tell the agent "promote this to production." At that point, and only at that point, normal approval rules kick in. The agent needs Level 2 confirmation to copy files to the host, Level 2 to install a package on the real system, Level 3 to modify infrastructure. The sandbox is the drafting table; production is the real thing.&lt;/p&gt;

&lt;p&gt;If a sandbox gets trashed, nuke it and spawn a fresh one in seconds. The sandboxes are cheap — a base Debian image with internet access and a persistent volume for the workspace. Multiple sandboxes can run simultaneously for different experiments without interfering with each other or with the host.&lt;/p&gt;

&lt;p&gt;This isn't implemented yet, and there are open design questions: should sandboxes have read-only access to the real API (for testing skills against real data)? Should there be resource limits per sandbox (CPU, RAM, disk)? What's the UX for promoting — file-by-file or snapshot the whole container? We'd love input from anyone who's built agent sandboxing — this is genuinely uncharted territory for personal AI systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;An AI approval system needs two properties: it must be &lt;strong&gt;fast for safe operations&lt;/strong&gt; (no friction on reads, minimal friction on writes) and &lt;strong&gt;deliberate for dangerous ones&lt;/strong&gt; (explicit confirmation, PIN for irreversible actions, timeout for stale requests).&lt;/p&gt;

&lt;p&gt;Four levels handle this: auto (reads), logged (writes), confirm (irreversible), secure (destructive + PIN). Two layers: shell control for host-level commands, API control for business operations. One principle: agents cannot approve their own requests.&lt;/p&gt;

&lt;p&gt;The system processes ~95% of operations at Level 0 or 1 — invisible to the user. The 5% that require confirmation are the operations where a mistake actually matters: sending an email to the wrong person, deleting a volume, configuring network access. Those 5% are where trust is built or broken.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next up: building a process manager — how we manage Docker containers, systemd services, and Ollama models from a single dashboard with auto-start, health monitoring, and graceful shutdown.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>architecture</category>
      <category>selfhosted</category>
    </item>
    <item>
      <title>PII-aware routing: how to use cloud AI and keep your sensitive data local</title>
      <dc:creator>Victor García</dc:creator>
      <pubDate>Fri, 27 Mar 2026 14:23:21 +0000</pubDate>
      <link>https://forem.com/micelclaw/pii-aware-routing-how-to-use-cloud-ai-and-keep-your-sensitive-data-local-1m40</link>
      <guid>https://forem.com/micelclaw/pii-aware-routing-how-to-use-cloud-ai-and-keep-your-sensitive-data-local-1m40</guid>
      <description>&lt;p&gt;Here's the tension at the heart of every personal AI system: cloud models are better at reasoning, but your data is private. A self-hosted system can run everything locally — but a 2B parameter model on a mini-PC isn't going to draft a nuanced email response or analyze a complex financial situation the way a frontier model can.&lt;/p&gt;

&lt;p&gt;The naive solutions are both bad. "Send everything to the cloud" means your diary entries, medical notes, and financial records pass through someone else's servers. "Run everything locally" means accepting worse reasoning on tasks where model quality actually matters.&lt;/p&gt;

&lt;p&gt;We built a third option: a PII-aware routing layer that classifies every piece of data by sensitivity, routes it to the right model, and pseudonymizes anything sensitive that needs cloud reasoning power.&lt;/p&gt;

&lt;h2&gt;
  
  
  The classification: four levels, zero LLM calls
&lt;/h2&gt;

&lt;p&gt;Every record in the system gets a sensitivity level. The classification is entirely deterministic — regex patterns and domain rules. No LLM in the classification loop, because sending data to an LLM to decide if the data is too sensitive to send to an LLM is a circular problem.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;What it means&lt;/th&gt;
&lt;th&gt;Example domains&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;low&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Public or low-risk data&lt;/td&gt;
&lt;td&gt;Events, bookmarks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;normal&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Common personal data&lt;/td&gt;
&lt;td&gt;Notes, contacts, files, diary&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;high&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Sensitive personal data&lt;/td&gt;
&lt;td&gt;Emails, financial transactions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;critical&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Never leaves the device&lt;/td&gt;
&lt;td&gt;Medical/health data&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each domain has a default sensitivity level. Events are &lt;code&gt;low&lt;/code&gt; — knowing you have a meeting at 3pm isn't particularly sensitive. Emails are &lt;code&gt;high&lt;/code&gt; — they contain names, addresses, business context, and sometimes confidential information. Health entries are &lt;code&gt;critical&lt;/code&gt; — always local, no exceptions.&lt;/p&gt;

&lt;p&gt;But domains are just the baseline. The classifier also scans content for PII patterns that override the default:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Email addresses     → elevate to high minimum
Phone numbers       → elevate to high minimum
Credit card numbers → elevate to high minimum
IBAN codes          → elevate to high minimum
SSN / DNI / NIE     → elevate to high minimum
Medical terminology → elevate to critical
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A note titled "Grocery list" stays at &lt;code&gt;normal&lt;/code&gt;. A note containing "Dr. García prescribed 20mg omeprazole" gets elevated to &lt;code&gt;critical&lt;/code&gt; because the regex matched medical terminology. The content drives the classification, not just the domain.&lt;/p&gt;

&lt;p&gt;This is deliberately conservative. The regex patterns over-match — "Dr." triggers medical detection even if it's "Dr. Pepper." False positives mean data gets routed locally when it could have gone to the cloud. False negatives mean sensitive data leaks. Over-matching is the correct failure mode.&lt;/p&gt;

&lt;h2&gt;
  
  
  The routing decision
&lt;/h2&gt;

&lt;p&gt;Once classified, the router decides where each piece of data goes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;low / normal  → Cloud LLM — best reasoning
high          → Cloud LLM WITH pseudonymization — good reasoning, protected data
critical      → Local model only (Ollama) — or skip if Ollama unavailable
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The decision isn't binary "local vs cloud." There's a middle path: pseudonymize the sensitive parts, send to the cloud for reasoning, and de-pseudonymize the response before the user sees it.&lt;/p&gt;

&lt;p&gt;This matters because most tasks involving sensitive data don't need the sensitive parts for reasoning. "Summarize this email thread" needs the content structure and topic — not the actual names and email addresses. "What's the sentiment of this diary entry?" needs the emotional content — not the specific people mentioned.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pseudonymizer
&lt;/h2&gt;

&lt;p&gt;When a &lt;code&gt;high&lt;/code&gt; sensitivity record needs cloud processing, the pseudonymizer replaces PII with consistent tokens:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Entity type&lt;/th&gt;
&lt;th&gt;Pseudonym format&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Person&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Person_XXXX&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;"Ana García" → &lt;code&gt;Person_A3F2&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Email&lt;/td&gt;
&lt;td&gt;&lt;code&gt;email_XXXX@example.com&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;"&lt;a href="mailto:ana@techcorp.com"&gt;ana@techcorp.com&lt;/a&gt;" → &lt;code&gt;email_7B1C@example.com&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phone&lt;/td&gt;
&lt;td&gt;&lt;code&gt;+00-XXXX-0000&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;"+34 612 345 678" → &lt;code&gt;+00-E5D9-0000&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Organization&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Org_XXXX&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;"TechCorp" → &lt;code&gt;Org_4C8A&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Location&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Location_XXXX&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;"Calle Sagasta 15" → &lt;code&gt;Location_B2E1&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Three properties make this work:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Consistency.&lt;/strong&gt; The same value always produces the same pseudonym (SHA-256 of the original value, truncated). "Ana García" is always &lt;code&gt;Person_A3F2&lt;/code&gt;, in every record, in every session. This means the cloud model can reason about relationships: "Person_A3F2 sent 3 emails to Person_B7D1 about Org_4C8A" preserves the structure even though the names are hidden.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reversibility.&lt;/strong&gt; The &lt;code&gt;pseudonym_map&lt;/code&gt; table stores every mapping. When the cloud model's response comes back, the system replaces all pseudonyms with real values before storing or displaying the result. The user never sees &lt;code&gt;Person_A3F2&lt;/code&gt; — they see "Ana García."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Persistence.&lt;/strong&gt; Mappings survive across sessions. If "Ana García" was pseudonymized yesterday and appears again today, she gets the same pseudonym. This means the cloud model can build consistent context across multiple interactions without ever learning the real name.&lt;/p&gt;

&lt;p&gt;The detection itself uses regex — no LLM call. It's the same NER-lite approach as the sensitivity classifier: pattern matching for emails, phones, card numbers, and named entity patterns for persons and organizations. Not perfect, but fast and deterministic.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this looks like in practice
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Scenario 1: Calendar event (low sensitivity)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;User asks: "What's on my calendar tomorrow?"&lt;/p&gt;

&lt;p&gt;The system fetches tomorrow's events. Events are &lt;code&gt;low&lt;/code&gt; sensitivity. The full data — titles, locations, attendees — goes straight to the cloud model. No pseudonymization needed. The model reasons about the schedule and responds with a natural summary.&lt;/p&gt;

&lt;p&gt;Cost: one cloud API call. Privacy: no sensitive data exposed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 2: Email analysis (high sensitivity)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;User asks: "Summarize the email thread about the partnership."&lt;/p&gt;

&lt;p&gt;The email thread is &lt;code&gt;high&lt;/code&gt; sensitivity (default for emails). Before sending to the cloud model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Original: "Ana García &amp;lt;ana@techcorp.com&amp;gt; wrote: Hi Paco, 
regarding the TechCorp partnership with NexaTech..."

Pseudonymized: "Person_A3F2 &amp;lt;email_7B1C@example.com&amp;gt; wrote: 
Hi Person_0D4E, regarding the Org_4C8A partnership with Org_9F3B..."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cloud model receives the pseudonymized version. It can still analyze the thread structure, identify that Person_A3F2 is negotiating with Person_0D4E, and summarize the key points. The reasoning quality is nearly identical — the model doesn't need to know the real names to understand the negotiation dynamics.&lt;/p&gt;

&lt;p&gt;The response comes back with pseudonyms:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Person_A3F2 proposed a revenue-sharing model with Org_9F3B. 
Person_0D4E agreed in principle but requested..."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The system de-pseudonymizes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Ana García proposed a revenue-sharing model with NexaTech. 
Paco agreed in principle but requested..."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cost: one cloud API call + ~2ms pseudonymization. Privacy: no real names or emails left the device.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 3: Health data (critical sensitivity)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;User asks: "What medications am I taking?"&lt;/p&gt;

&lt;p&gt;Health entries are &lt;code&gt;critical&lt;/code&gt;. They never leave the device, period. The system routes to the local Ollama model. If Ollama is unavailable, the query fails gracefully — it does NOT fall back to the cloud.&lt;/p&gt;

&lt;p&gt;The local model's response might be less polished, but for medical data retrieval, the task is usually simple: find the records and list them. A 2B model handles that fine.&lt;/p&gt;

&lt;p&gt;Cost: one local model call. Privacy: absolute — zero data exposure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario 4: Note with accidental PII (elevated sensitivity)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;User creates a note: "Meeting with Dr. López about the lab results. Blood pressure 140/90."&lt;/p&gt;

&lt;p&gt;The note's domain is &lt;code&gt;normal&lt;/code&gt;, but the content contains medical terminology ("Dr.", "lab results", "blood pressure"). The classifier elevates it to &lt;code&gt;critical&lt;/code&gt;. From this point on, this note is treated like health data — local only.&lt;/p&gt;

&lt;p&gt;The user didn't tag it as medical. They didn't configure anything. The system caught it automatically. Conservative false positives are the design choice: if a note mentions "Dr. Pepper," it gets elevated too. That's a minor inconvenience (one note processed locally instead of on the cloud) with zero privacy risk.&lt;/p&gt;

&lt;h2&gt;
  
  
  The audit trail
&lt;/h2&gt;

&lt;p&gt;Every routing decision is logged:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;What it records&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;domain&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Which data domain (notes, emails, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;record_id&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Which specific record&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sensitivity&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Classified sensitivity level&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;action&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;What happened: &lt;code&gt;sent_pseudonymized&lt;/code&gt;, &lt;code&gt;sent_plain&lt;/code&gt;, &lt;code&gt;blocked&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;destination&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Where it went: &lt;code&gt;embeddings&lt;/code&gt;, &lt;code&gt;contextual_retrieval&lt;/code&gt;, &lt;code&gt;sleep_time&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;pii_routing_log&lt;/code&gt; table creates a complete audit of what data was exposed to which processing pipeline. If you ever need to answer "did my medical data ever touch a cloud service?", the answer is in the log.&lt;/p&gt;

&lt;p&gt;This is also how we verify the system works correctly. The log shows every routing decision. If a &lt;code&gt;critical&lt;/code&gt; record ever appears with action &lt;code&gt;sent_plain&lt;/code&gt; and a cloud destination, that's a bug — and the log caught it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where routing applies
&lt;/h2&gt;

&lt;p&gt;PII-aware routing isn't just for chat interactions. It applies everywhere the system sends data to an LLM:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Embeddings.&lt;/strong&gt; When generating semantic embeddings, the text is classified before being sent to the embedding model. If you're using a cloud embedding API (future option), &lt;code&gt;high&lt;/code&gt; and &lt;code&gt;critical&lt;/code&gt; records get embedded locally via Ollama instead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Contextual retrieval.&lt;/strong&gt; The HyDE pipeline (generating hypothetical answers for better search) uses LLM calls. If the search touches sensitive domains, those calls route through the pseudonymizer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sleep-time compute.&lt;/strong&gt; The background intelligence jobs process records during idle periods. The enrichment job (re-extracting entities from hot records) respects the same routing rules — a &lt;code&gt;critical&lt;/code&gt; record only gets re-extracted if Ollama is available.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Entity extraction.&lt;/strong&gt; When the CRUD hooks pipeline sends text to the LLM for entity extraction, the same classification applies. A health-related note gets extracted locally.&lt;/p&gt;

&lt;p&gt;The routing layer sits between every LLM consumer in the system and the actual model call. It's middleware — invisible to the features that use it, enforced consistently everywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  The multi-agent dimension
&lt;/h2&gt;

&lt;p&gt;With a multi-agent topology (7 agents in our system), PII routing gets another layer: agent scoping.&lt;/p&gt;

&lt;p&gt;Each agent has a scoped token that defines what domains it can access:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Francis (main)&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;     &lt;span class="s"&gt;notes:*, events:*, emails:*, contacts:*, diary:*&lt;/span&gt;
&lt;span class="na"&gt;Sentinel (infra)&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;   &lt;span class="s"&gt;storage:*, hal:*, network:*&lt;/span&gt;
&lt;span class="na"&gt;Dalí (creative)&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;    &lt;span class="s"&gt;photos:read, files:read&lt;/span&gt;
&lt;span class="na"&gt;Ledger (finance)&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;   &lt;span class="s"&gt;finance:*, crm:*&lt;/span&gt;
&lt;span class="na"&gt;Darwin (analytics)&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;graph:*, insights:*&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sentinel can't access emails. Dalí can't read the diary. This is enforced at the API level — even if a prompt injection tricks Dalí into requesting diary entries, the scoped token blocks it.&lt;/p&gt;

&lt;p&gt;Combined with PII routing, this creates defense in depth:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Agent scoping&lt;/strong&gt; prevents access to domains the agent shouldn't touch&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sensitivity classification&lt;/strong&gt; catches PII regardless of domain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pseudonymization&lt;/strong&gt; protects data that needs cloud processing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit logging&lt;/strong&gt; records everything for verification&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A prompt injection attack would need to bypass all four layers to exfiltrate sensitive data. The scoping blocks the API call. The classification catches the content. The pseudonymizer strips the PII. The audit log records the attempt.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we explicitly don't do
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;We don't use ML for classification.&lt;/strong&gt; A fine-tuned classifier could be more accurate than regex patterns. But it would need to see the data to classify it — which means sending potentially sensitive data to a model before deciding if it's safe to send to a model. Regex is dumber but has zero data exposure during classification.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We don't redact — we pseudonymize.&lt;/strong&gt; Redaction (&lt;code&gt;[REDACTED]&lt;/code&gt;) destroys information the cloud model needs for reasoning. Pseudonymization preserves structure ("Person_A sent an email to Person_B") while hiding identity. The cloud model can still reason about relationships, quantities, and patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We don't let the user override &lt;code&gt;critical&lt;/code&gt;.&lt;/strong&gt; You can change a record's sensitivity from &lt;code&gt;normal&lt;/code&gt; to &lt;code&gt;high&lt;/code&gt; manually. You cannot downgrade &lt;code&gt;critical&lt;/code&gt; to anything else. Health data stays local regardless of user preferences. This is a deliberate paternalistic choice — the privacy risk of accidentally exposing medical data outweighs the convenience of sending it to a better model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We don't route based on the LLM provider's privacy policy.&lt;/strong&gt; Whether provider A's privacy policy is better than provider B's is irrelevant. The system treats all cloud LLMs identically: external services that should never see &lt;code&gt;critical&lt;/code&gt; data and should only see &lt;code&gt;high&lt;/code&gt; data in pseudonymized form. Trust the math, not the terms of service.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;I'd add per-field sensitivity, not just per-record.&lt;/strong&gt; Currently, a contact record is &lt;code&gt;normal&lt;/code&gt; even though the &lt;code&gt;phones&lt;/code&gt; field is arguably more sensitive than the &lt;code&gt;company&lt;/code&gt; field. Per-field classification would let us pseudonymize just the phone number while sending the company name to the cloud. More precise, but also more complex — the pseudonymizer would need to understand JSON field structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'd build a sensitivity dashboard earlier.&lt;/strong&gt; The &lt;code&gt;pii_routing_log&lt;/code&gt; has all the data, but there's no UI for it yet. A dashboard showing "this week: 450 records processed, 380 sent plain, 65 pseudonymized, 5 blocked" would build user trust and make the privacy system tangible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'd make the regex patterns configurable.&lt;/strong&gt; Different users have different sensitivity needs. A doctor might want "aspirin" to be flagged as medical. A pharmacist might want it treated as normal. The current patterns are one-size-fits-all, which means they're too aggressive for some users and not aggressive enough for others.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;The privacy problem in personal AI isn't "local vs cloud." It's "which data goes where." Most of your data is fine to send to a cloud model — your calendar events and bookmark titles aren't secrets. Some data needs protection but can still benefit from cloud reasoning — pseudonymize it and send the structure without the identity. A small fraction of data should never leave your device — route it locally and accept the quality trade-off.&lt;/p&gt;

&lt;p&gt;Three components: a regex classifier (zero LLM calls, deterministic), a SHA-256 pseudonymizer (consistent, reversible, persistent), and a routing table (domain defaults + content elevation). No ML, no fine-tuning, no privacy policy trust assumptions.&lt;/p&gt;

&lt;p&gt;The system processes your medical notes with a 2B local model and your calendar queries with a cloud model. It knows the difference because a regex matched "blood pressure" — not because it asked an AI what's sensitive.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next up: designing an AI approval system — when should your agent ask for permission, and how do you build a confirmation workflow that doesn't slow everything down?&lt;/em&gt;&lt;/p&gt;

</description>
      <category>privacy</category>
      <category>ai</category>
      <category>architecture</category>
      <category>selfhosted</category>
    </item>
    <item>
      <title>Sleep-time compute for personal data: what your AI should do while you sleep</title>
      <dc:creator>Victor García</dc:creator>
      <pubDate>Thu, 26 Mar 2026 12:01:52 +0000</pubDate>
      <link>https://forem.com/micelclaw/sleep-time-compute-for-personal-data-what-your-ai-should-do-while-you-sleep-13fj</link>
      <guid>https://forem.com/micelclaw/sleep-time-compute-for-personal-data-what-your-ai-should-do-while-you-sleep-13fj</guid>
      <description>&lt;p&gt;Your personal AI assistant sits idle most of the day. You send it a message, it responds, then it waits. For hours. Maybe all night. The compute is there — the model is loaded, the database is running, the server is warm. But nothing happens until you type the next message.&lt;/p&gt;

&lt;p&gt;That's test-time compute: work done when the user asks for it. Letta's research (arXiv:2504.13171) showed that shifting processing to idle periods — sleep-time compute — achieves 5× fewer tokens at test time and 15% more correct answers. But their implementation only processes conversation memory. Nobody had applied it to structured personal data.&lt;/p&gt;

&lt;p&gt;We did.&lt;/p&gt;

&lt;h2&gt;
  
  
  The idea
&lt;/h2&gt;

&lt;p&gt;Instead of the agent doing all its thinking when you ask a question, it does most of the thinking in the background — during idle periods when you're not using the system. When you finally ask "what's going on with Project Tempest?", the answer is already half-assembled.&lt;/p&gt;

&lt;p&gt;The system maintains four background jobs that run every 30 minutes, but only when you're idle:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Priority&lt;/th&gt;
&lt;th&gt;Job&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Enrich connections&lt;/td&gt;
&lt;td&gt;Finds hot records with few graph links, re-runs entity extraction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Generate summary&lt;/td&gt;
&lt;td&gt;Compiles a weekly overview from the change log&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Detect patterns&lt;/td&gt;
&lt;td&gt;Discovers entities that co-occur but aren't linked&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Update preferences&lt;/td&gt;
&lt;td&gt;Learns behavioral patterns from your data&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each job consumes tokens from a configurable budget (default: 5,000 tokens per execution). When the budget runs out, lower-priority jobs get skipped. This means enriching connections (the most impactful job) always runs, while preference learning (the least time-sensitive) gets skipped first if resources are tight.&lt;/p&gt;

&lt;h2&gt;
  
  
  The trigger: idle detection
&lt;/h2&gt;

&lt;p&gt;The engine only runs when you're not using the system. If you're actively writing notes or reading emails, the background jobs wait. This matters for two reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Resource contention.&lt;/strong&gt; The LLM (whether local via Ollama or remote via API) is a shared resource. Background jobs competing with user requests for model access would add latency to the interactions you actually care about.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Relevance.&lt;/strong&gt; Sleep-time processing works on data that has settled. Running entity extraction on a note you're still editing wastes tokens — the note will change again in 30 seconds. Waiting until you're idle means processing stable data.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The idle detector is simple: if no user activity (API requests from the dashboard, agent messages, WebSocket heartbeats) has occurred in the last N minutes, the user is idle. The scheduler checks this condition before executing each run.&lt;/p&gt;

&lt;h2&gt;
  
  
  Job 1: Enrich connections
&lt;/h2&gt;

&lt;p&gt;The highest-priority job. It finds records that are "hot" (recently accessed, heat score &amp;gt; 0.3) but poorly connected in the knowledge graph (fewer than 3 entity links). These are records you care about but that the system doesn't fully understand yet.&lt;/p&gt;

&lt;p&gt;Here's what the query returned on a real run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Optimal Fuse Burn Rate Calculations v3    heat: 0.5  links: 2
Recipe: Rodney's Smoked Eyebrows Marinade heat: 0.5  links: 2
Banned Substances List (and Why Rodney…)  heat: 0.5  links: 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three notes flagged. Each has a warm heat score (recently accessed) but only 2 entity links where the average for this user is 5+. The initial extraction caught the obvious entities, but a second pass might find connections to people, projects, or locations that were mentioned implicitly.&lt;/p&gt;

&lt;p&gt;The engine re-enqueues them to the async extraction pipeline at &lt;code&gt;priority: low&lt;/code&gt; — they won't compete with real-time user actions. When the extraction worker picks them up, it sends the full note content to the LLM for a more thorough entity pass than the initial CRUD hook provides.&lt;/p&gt;

&lt;p&gt;Why prioritize this job? Because the knowledge graph is the foundation of search ranking, the digest engine, and the agent's contextual awareness. A poorly connected hot record means the system is blind to something you're actively working on. Enriching it improves everything downstream.&lt;/p&gt;

&lt;p&gt;Cost on this run: &lt;strong&gt;600 tokens&lt;/strong&gt; (3 records × ~200 tokens each). Execution time: &lt;strong&gt;16ms&lt;/strong&gt; (just the enqueue — the actual extraction happens later).&lt;/p&gt;

&lt;h2&gt;
  
  
  Job 2: Generate summary
&lt;/h2&gt;

&lt;p&gt;Aggregates a week of change log activity and the most active entities from the knowledge graph into a single pre-computed insight.&lt;/p&gt;

&lt;p&gt;On this run, the change log query returned:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;files&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;        &lt;span class="s"&gt;201 inserts, 5 updates, 152 deletes&lt;/span&gt;
&lt;span class="na"&gt;rss&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;          &lt;span class="s"&gt;141 inserts&lt;/span&gt;
&lt;span class="na"&gt;emails&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;       &lt;span class="s"&gt;61 inserts, 32 updates, 7 deletes&lt;/span&gt;
&lt;span class="na"&gt;notes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;        &lt;span class="s"&gt;2 updates&lt;/span&gt;
&lt;span class="na"&gt;kanban_cards&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1 update&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the top graph entities by recent activity:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Madrid              (location)      28 mentions
micelclaw           (organization)  12 mentions
Meta Platforms, Inc.(organization)   7 mentions
Instagram           (location)       6 mentions
Victoria            (person)         1 mention
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things jump out. First: 201 file inserts and 152 file deletes in one week — that's a bulk operation or a sync cycle, not manual activity. The summary captures this so the agent can mention it if asked "what happened this week?" without scanning 600+ change log rows at query time.&lt;/p&gt;

&lt;p&gt;Second: "Instagram" classified as a location is an entity extraction error — the kind of noise the enrichment job (Job 1) and the merge candidates system are designed to catch over time.&lt;/p&gt;

&lt;p&gt;The summary gets stored as a &lt;code&gt;weekly_summary&lt;/code&gt; insight with a 7-day TTL. No LLM call needed — it's pure SQL aggregation.&lt;/p&gt;

&lt;p&gt;Cost on this run: &lt;strong&gt;50 tokens&lt;/strong&gt; (fixed). Execution time: &lt;strong&gt;14ms&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Job 3: Detect patterns
&lt;/h2&gt;

&lt;p&gt;The most interesting job. It self-joins &lt;code&gt;entity_links&lt;/code&gt; to find pairs of entities that co-occur in 3 or more records but have no direct link between them — latent connections nobody made explicit.&lt;/p&gt;

&lt;p&gt;On this run, five patterns emerged:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Rodney ↔ Dolores     co-occur in 12 records, no direct link
Rodney ↔ Linda       co-occur in 11 records, no direct link
Rodney ↔ BoomClaw    co-occur in 10 records, no direct link
Warehouse B ↔ Rodney co-occur in 10 records, no direct link
Benny ↔ Rodney       co-occur in 10 records, no direct link
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every pattern radiates from "Rodney" — he's a hub entity that appears alongside four other entities across 10-12 records without any direct graph edge connecting them. The extraction pipeline created links from each note/email to "Rodney" and to "Dolores" independently, but never linked Rodney to Dolores directly. The co-occurrence pattern reveals the relationship that was hiding in plain sight.&lt;/p&gt;

&lt;p&gt;Each pair becomes a &lt;code&gt;connection_discovered&lt;/code&gt; insight with a 14-day TTL. The next time you search for "Rodney," the graph traversal finds Dolores, Linda, BoomClaw, Warehouse B, and Benny — even though no single record ever says "Rodney works with Dolores."&lt;/p&gt;

&lt;p&gt;This is the job that produces the "how did it know that?" moments. The answer is always the same: it counted co-occurrences while you weren't looking.&lt;/p&gt;

&lt;p&gt;The query itself — a self-join on &lt;code&gt;entity_links&lt;/code&gt; filtered by &lt;code&gt;NOT EXISTS&lt;/code&gt; — took &lt;strong&gt;138ms&lt;/strong&gt;. That's the heaviest operation in the pipeline, and it runs during idle time where nobody notices. At query time, the connections are already in the graph.&lt;/p&gt;

&lt;p&gt;Cost on this run: &lt;strong&gt;30 tokens&lt;/strong&gt; (fixed — pure SQL, no LLM).&lt;/p&gt;

&lt;h2&gt;
  
  
  Job 4: Update preferences
&lt;/h2&gt;

&lt;p&gt;The system learns behavioral patterns by analyzing your data over time. On this run, two patterns were detected:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Writing time distribution (last 30 days):&lt;/strong&gt; All 50 notes created at hour 12 UTC. That's not a preference — that's a signal so strong it maxed out confidence immediately. The system now knows that if you're going to write a note, it's probably at noon.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tag frequency:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;safety&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;11, personal&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;9, r-and-d&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;6, strategy&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5, humor&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both get persisted via UPSERT with incremental confidence — each observation nudges the score up by 0.05, capped at 0.95. After multiple runs, the preferences look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;scheduling / preferred_writing_hour = "12"&lt;/span&gt;
  &lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0.95 (max), evidence&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;20,650 observations&lt;/span&gt;

&lt;span class="s"&gt;organization / preferred_tags = ["safety","personal","r-and-d","strategy","humor"]&lt;/span&gt;
  &lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0.95 (max), evidence&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;417 observations&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent uses these when it needs to make decisions. Scheduling a reminder? It knows noon is when you're active. Suggesting tags for a new note? It offers your most-used tags first. Creating a diary entry template? It matches your writing style.&lt;/p&gt;

&lt;p&gt;If a preference is wrong, you delete it via the API. The system may re-learn it later if the pattern persists, but with reduced confidence — the deletion counts as negative feedback.&lt;/p&gt;

&lt;p&gt;Cost on this run: &lt;strong&gt;20 tokens&lt;/strong&gt; (fixed — pure SQL, no LLM).&lt;/p&gt;

&lt;h2&gt;
  
  
  The real numbers: 700 tokens, 184 milliseconds
&lt;/h2&gt;

&lt;p&gt;Here's the actual pipeline summary from the run above:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌────────────────────┬────────┬────────┬───────────────────────────────┐
│ Job                │ Tokens │ ms     │ Output                        │
├────────────────────┼────────┼────────┼───────────────────────────────┤
│ enrich_connections │ 600    │ 16     │ 3 notes re-enqueued           │
│ generate_summary   │ 50     │ 14     │ 1 weekly_summary insight      │
│ detect_patterns    │ 30     │ 138    │ 5 connection_discovered       │
│ update_preferences │ 20     │ 16     │ 2 preferences updated         │
├────────────────────┼────────┼────────┼───────────────────────────────┤
│ Total              │ 700    │ 184    │                               │
└────────────────────┴────────┴────────┴───────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;700 out of the 5,000 token budget — 14%. The full pipeline completed in under 200 milliseconds. Three of the four jobs are pure SQL with zero LLM calls. Only &lt;code&gt;enrich_connections&lt;/code&gt; queues work for the model, and even that just enqueues — the actual extraction runs later at low priority.&lt;/p&gt;

&lt;p&gt;Every execution gets logged to &lt;code&gt;sleep_time_jobs&lt;/code&gt; for auditability. If a job fails, the error is recorded and the next job still runs — the pipeline is fault-tolerant by design.&lt;/p&gt;

&lt;h2&gt;
  
  
  The token budget: sleep-time vs test-time
&lt;/h2&gt;

&lt;p&gt;Every sleep-time execution has a capped budget — 5,000 tokens by default, configurable per user. This prevents runaway costs from background processing. The jobs run in priority order and stop when the budget is exhausted.&lt;/p&gt;

&lt;p&gt;The insight from Letta's research holds: spending tokens during idle time dramatically reduces what you need to spend during active conversations. When the agent already knows that Rodney is connected to Dolores across 12 records (because Job 3 discovered it overnight), answering "who works with Rodney?" costs a graph traversal query (~5ms) instead of a full cross-domain LLM analysis (~3,000 tokens).&lt;/p&gt;

&lt;p&gt;We track sleep-time and test-time token usage separately in the token metrics dashboard, so you can see the trade-off directly: more sleep-time tokens → fewer test-time tokens → faster, cheaper responses.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three-stage digest
&lt;/h2&gt;

&lt;p&gt;The sleep-time engine powers an evolved version of the Digest Engine — the system that tells the agent "here's what changed since you last checked."&lt;/p&gt;

&lt;p&gt;The original digest was simple: scan the change log, format a markdown file, write it to the agent's workspace. The agent reads it on the next heartbeat.&lt;/p&gt;

&lt;p&gt;The v2 digest is a three-stage pipeline:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 1 — Selection.&lt;/strong&gt; Filter changes by relevance using configurable rules stored in a &lt;code&gt;digest_rules&lt;/code&gt; table. VIP emails (from your boss, from specific contacts) trigger immediate notification via PostgreSQL LISTEN/NOTIFY. Routine changes (a synced contact updated its phone number) get buffered for the periodic digest.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 2 — Correlation.&lt;/strong&gt; Use the LLM to discover cross-domain connections between changes. "You received an email from Ana García. Ana is attending tomorrow's meeting. You have 2 unfinished notes about the project she's working on." This stage is why sleep-time matters — the correlation discovery happens in the background, not when the agent is trying to respond to you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 3 — Scoring.&lt;/strong&gt; Rate each item by urgency, cross-domain relevance, and historical feedback (did the user act on similar insights before?). The output shifts from "what changed" to "what matters and why."&lt;/p&gt;

&lt;p&gt;The scored digest gets written to DIGEST.md in the agent's workspace. The agent reads it and decides what to surface. Urgent items might trigger an immediate notification. Low-score items accumulate for a daily summary.&lt;/p&gt;

&lt;h2&gt;
  
  
  The world model
&lt;/h2&gt;

&lt;p&gt;One output of the sleep-time engine is a materialized "world model" — a living document that summarizes your current state:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Active projects and their status&lt;/li&gt;
&lt;li&gt;Key people and recent interactions&lt;/li&gt;
&lt;li&gt;Upcoming deadlines and events&lt;/li&gt;
&lt;li&gt;Behavioral patterns and preferences&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The world model is updated incrementally. Each sleep-time run only modifies the sections affected by recent changes. The agent references it as persistent context — a pre-computed summary of "what's going on in your life right now" that doesn't need to be recomputed every conversation.&lt;/p&gt;

&lt;p&gt;This is inspired by Daniel Miessler's PAI framework (MISSION.md, GOALS.md, PROJECTS.md pattern), adapted to structured data. Instead of the user maintaining these documents manually, the system generates them from real data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Zero cost when idle
&lt;/h2&gt;

&lt;p&gt;The most important design decision: when nothing has changed, the engine does nothing. Zero tokens. Zero queries. The scheduler checks for pending changes in the change log before executing any job. No changes → skip the entire run.&lt;/p&gt;

&lt;p&gt;This means the system's cost is proportional to your activity, not to time. A weekend where you don't use the system costs nothing. A busy Monday with 50 emails and 10 notes triggers multiple enrichment passes. The cost follows the value.&lt;/p&gt;

&lt;p&gt;Similarly, the digest delivery to the agent is conditional. No changes → no DIGEST.md written → no system event → the agent doesn't wake up → zero tokens consumed. This was a deliberate choice over a heartbeat model where the agent would check for updates periodically regardless.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;I'd add entity type validation in the summary job.&lt;/strong&gt; The real run showed "Instagram" classified as a &lt;code&gt;location&lt;/code&gt; — a clear extraction error that propagated into the weekly summary. A simple validation step (is this entity type plausible for this name?) would catch obvious misclassifications before they pollute insights. The data exists to fix this; we just haven't built the filter yet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'd build the pattern detection job first, not the enrichment job.&lt;/strong&gt; Enrichment (Job 1) improves the knowledge graph incrementally. Pattern detection (Job 3) produces visible, surprising insights that users actually react to. "Ana García is connected to Project Tempest" is a moment of delight. "We added 2 more entity links to your note about Tempest" is invisible maintenance. Leading with delight would have made the sleep-time engine feel valuable sooner.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'd make the token budget adaptive.&lt;/strong&gt; Right now it's a flat 5,000 tokens per run. A smarter approach: scale the budget with the amount of pending work. 3 new records → 1,000 tokens. 50 new records after a sync → 10,000 tokens. The budget should match the opportunity, not be a fixed ceiling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'd add a "sleep-time log" visible in the dashboard.&lt;/strong&gt; Currently, the only way to see what the engine did is through the insights API and the sleep_time_jobs table. A visible log ("Last night I discovered 3 new connections, updated 2 preferences, and generated your weekly summary") would build trust and make the background processing feel tangible.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;A personal AI system that only works when you talk to it is wasting 95% of its available compute. The data is sitting in PostgreSQL. The model is loaded in Ollama. The knowledge graph has gaps that a 30-token LLM call could fill. Why wait for the user to ask?&lt;/p&gt;

&lt;p&gt;Sleep-time compute shifts the work from "the user asked a question and now we scramble" to "we already know the answer because we connected the dots overnight." Four jobs, a token budget, an idle detector, and a three-stage digest pipeline. The system gets smarter while you sleep.&lt;/p&gt;

&lt;p&gt;The insight that makes it all work: spending tokens when nobody is waiting for a response is categorically cheaper — in latency, in user experience, and in total cost — than spending them when someone is staring at a loading spinner.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next up: PII-aware routing — how we send sensitive data to local models and everything else to the cloud, without the user having to think about it.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>postgres</category>
      <category>selfhosted</category>
    </item>
    <item>
      <title>Hybrid search with RRF: combining pgvector, tsvector, and a knowledge graph in one query"</title>
      <dc:creator>Victor García</dc:creator>
      <pubDate>Tue, 24 Mar 2026 12:04:03 +0000</pubDate>
      <link>https://forem.com/micelclaw/hybrid-search-with-rrf-combining-pgvector-tsvector-and-a-knowledge-graph-in-one-query-1d80</link>
      <guid>https://forem.com/micelclaw/hybrid-search-with-rrf-combining-pgvector-tsvector-and-a-knowledge-graph-in-one-query-1d80</guid>
      <description>&lt;p&gt;Here's a search query: "beach trip."&lt;/p&gt;

&lt;p&gt;Full-text search finds nothing — no record contains the word "beach." But there's a note that says "Qué calor en Valencia, el agua estaba perfecta." Semantic search finds it because the embedding for "beach trip" is close to the embedding for a hot day at the beach in Valencia.&lt;/p&gt;

&lt;p&gt;Now a different query: "Ana García."&lt;/p&gt;

&lt;p&gt;Semantic search returns a dozen vaguely related records. Full-text search returns the 3 records that literally contain "Ana García." But neither shows you that Ana attended last week's meeting, is CC'd on 5 email threads, and appears in tomorrow's calendar — connections that only the knowledge graph knows about.&lt;/p&gt;

&lt;p&gt;No single search method is enough. We needed all three, plus a way to combine them that doesn't require manual tuning.&lt;/p&gt;

&lt;h2&gt;
  
  
  The four signals
&lt;/h2&gt;

&lt;p&gt;Our search pipeline produces four independent scores for every candidate result:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;What it catches&lt;/th&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Semantic&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;pgvector cosine similarity&lt;/td&gt;
&lt;td&gt;Meaning-based matches ("beach" → "calor en Valencia")&lt;/td&gt;
&lt;td&gt;Pro&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Full-text&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;tsvector + GIN + ts_rank&lt;/td&gt;
&lt;td&gt;Exact keyword matches, fast and precise&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Graph&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;entity_links overlap&lt;/td&gt;
&lt;td&gt;Relational connections ("Ana García" → meetings she attended)&lt;/td&gt;
&lt;td&gt;Pro&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Heat&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;record_heat table&lt;/td&gt;
&lt;td&gt;Temporal relevance (recently accessed records)&lt;/td&gt;
&lt;td&gt;Free (display), Pro (in ranking)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Free tier users get full-text search only — which is still fast and well-ranked thanks to tsvector with weighted columns (title gets weight A, content gets weight B, tags get weight C). Pro users get all four signals fused together.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pipeline
&lt;/h2&gt;

&lt;p&gt;The search happens in seven steps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Query: "Ana García project update"
    │
    ├── 1. Vector search ──→ top-50 by cosine similarity
    ├── 2. Full-text search ──→ top-N by ts_rank (UNION ALL across domains)
    └── 3. Graph discovery ──→ N candidates via entity_links
                │
                ▼
         4. Deduplicate by (domain, record_id)
                │
                ▼
         5. Rank-normalize each signal to [0, 1]
                │
                ▼
         6. Detect degenerate signals
                │
                ▼
         7. Weighted fusion + multi-signal bonus
                │
                ▼
            Final ranked results with provenance
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let me walk through each step.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Vector search
&lt;/h3&gt;

&lt;p&gt;The query text is embedded on-the-fly using the same model that embeds records (&lt;code&gt;qwen3-embedding:0.6b&lt;/code&gt;, 1024 dimensions). Then a cosine similarity query runs against the embeddings table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;record_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The 0.3 minimum threshold filters garbage. The top 50 candidates move to the next step. If Ollama is down and we can't embed the query, this signal is simply skipped — the other signals still work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Full-text search
&lt;/h3&gt;

&lt;p&gt;A UNION ALL query across all domain tables, using PostgreSQL's native full-text search:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'note'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;record_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ts_rank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;notes&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;search_vector&lt;/span&gt; &lt;span class="o"&gt;@@&lt;/span&gt; &lt;span class="n"&gt;plainto_tsquery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'simple'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;deleted_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'event'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ts_rank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;search_vector&lt;/span&gt; &lt;span class="o"&gt;@@&lt;/span&gt; &lt;span class="n"&gt;plainto_tsquery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'simple'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;deleted_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;
&lt;span class="c1"&gt;-- ... contacts, emails, files, diary, bookmarks, kanban_cards&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We use &lt;code&gt;plainto_tsquery('simple', ...)&lt;/code&gt; instead of language-specific configurations. The &lt;code&gt;simple&lt;/code&gt; configuration doesn't stem words, which matters for multilingual data — Spanish and English records coexist, and stemming rules for one language would butcher the other.&lt;/p&gt;

&lt;p&gt;Each domain table has a &lt;code&gt;search_vector tsvector&lt;/code&gt; column maintained by a trigger (or &lt;code&gt;GENERATED ALWAYS AS ... STORED&lt;/code&gt; for newer tables). The vectors are weighted: title gets &lt;code&gt;'A'&lt;/code&gt;, description/content gets &lt;code&gt;'B'&lt;/code&gt;, tags get &lt;code&gt;'C'&lt;/code&gt;. A match in the title ranks higher than a match in the body.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Graph discovery
&lt;/h3&gt;

&lt;p&gt;This signal is different — it doesn't match text, it matches relationships.&lt;/p&gt;

&lt;p&gt;The query is matched against &lt;code&gt;graph_entities.normalized_name&lt;/code&gt;. If "Ana García" matches a Person entity, we find all records linked to that entity via &lt;code&gt;entity_links&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Find entities mentioned in the query&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;graph_entities&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;normalized_name&lt;/span&gt; &lt;span class="k"&gt;ILIKE&lt;/span&gt; &lt;span class="s1"&gt;'%ana garcia%'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;deleted_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Find all records linked to those entities&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;source_type&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source_id&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;record_id&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;entity_links&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;target_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'graph_entity'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;target_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;ANY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;entity_ids&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;graph_score&lt;/code&gt; for each result is the overlap ratio: how many of the query's entities appear in the result's connections, divided by the total entities found in the query.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Deduplication
&lt;/h3&gt;

&lt;p&gt;The three signals produce candidate sets that overlap. A note containing "Ana García" might appear in vector search (semantically similar), full-text search (exact keyword match), and graph search (linked to the Ana García entity). We deduplicate by &lt;code&gt;(domain, record_id)&lt;/code&gt; and track which signals produced each candidate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Rank-based normalization
&lt;/h3&gt;

&lt;p&gt;Here's where it gets interesting. We do NOT normalize by raw scores. We normalize by rank position:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;normalized_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;position&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;N&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The top result in a signal gets 1.0. The bottom gets nearly 0. A candidate absent from a signal gets 0.&lt;/p&gt;

&lt;p&gt;Why rank-based instead of min-max normalization? Because cosine similarity scores cluster. In a typical query, the top-50 vector search results might have similarities between 0.54 and 0.64 — a 10-point range. Min-max normalization would stretch this to 0.0–1.0, making the difference between rank 1 and rank 50 look huge when it's actually tiny.&lt;/p&gt;

&lt;p&gt;Rank-based normalization treats position as the signal, not magnitude. First place is first place, whether it scored 0.99 or 0.55.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6: Degenerate signal detection
&lt;/h3&gt;

&lt;p&gt;Sometimes a signal doesn't discriminate. If all 50 vector search results have cosine similarities within 5% of each other, the signal is noise — everything "looks the same" to the embedding model.&lt;/p&gt;

&lt;p&gt;When this happens, the weight assigned to the degenerate signal gets redistributed proportionally to the other signals. The search doesn't fail — it just relies more on the signals that are actually informative.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 7: Weighted fusion + multi-signal bonus
&lt;/h3&gt;

&lt;p&gt;The final score combines all four signals:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;α&lt;/span&gt; &lt;span class="err"&gt;×&lt;/span&gt; &lt;span class="n"&gt;heat&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;β&lt;/span&gt; &lt;span class="err"&gt;×&lt;/span&gt; &lt;span class="n"&gt;semantic&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;γ&lt;/span&gt; &lt;span class="err"&gt;×&lt;/span&gt; &lt;span class="n"&gt;fulltext&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;δ&lt;/span&gt; &lt;span class="err"&gt;×&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;α&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;β&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;γ&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;δ&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Default weights: all equal at 0.25 each. But the user can adjust them via the dashboard's ranking sliders — crank heat to prioritize recent activity, drop graph to ignore relationship signals, etc.&lt;/p&gt;

&lt;p&gt;Then comes the multi-signal bonus:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signals that found this result&lt;/th&gt;
&lt;th&gt;Multiplier&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1 signal only&lt;/td&gt;
&lt;td&gt;×1.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2 signals&lt;/td&gt;
&lt;td&gt;×1.25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3 signals&lt;/td&gt;
&lt;td&gt;×1.50&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A result that appears in vector search, full-text search, AND graph search gets a 50% bonus. This rewards results that are independently confirmed by multiple methods — they're almost certainly relevant.&lt;/p&gt;

&lt;p&gt;Heat doesn't count for the bonus calculation. It's a temporal signal, not a relevance signal — a record being hot doesn't mean it matches the query.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two search modes
&lt;/h2&gt;

&lt;p&gt;The system exposes two search endpoints that use this pipeline differently:&lt;/p&gt;

&lt;h3&gt;
  
  
  Standard search: &lt;code&gt;GET /search&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;The default. Uses Reciprocal Rank Fusion with fixed weights:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;rrf_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Σ&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;K&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;rank_i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="n"&gt;where&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;
&lt;span class="n"&gt;final&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rrf_score&lt;/span&gt; &lt;span class="err"&gt;×&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt; &lt;span class="err"&gt;×&lt;/span&gt; &lt;span class="n"&gt;heat_score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;RRF is elegant: it ignores raw scores entirely, only caring about rank position. A result that's #1 in vector search and #3 in full-text search gets &lt;code&gt;1/(60+1) + 1/(60+3) = 0.0164 + 0.0159 = 0.0323&lt;/code&gt;. The K=60 constant smooths the curve — the difference between rank 1 and rank 10 is meaningful but not extreme.&lt;/p&gt;

&lt;p&gt;Heat acts as a post-fusion tiebreaker with a maximum 10% boost. It never dominates relevance. A cold but highly relevant result always beats a hot but marginally relevant one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Advanced search: &lt;code&gt;GET /search/advanced&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0a59tag4s2lr5xnghxhb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0a59tag4s2lr5xnghxhb.png" alt=" " width="800" height="429"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The full power mode. Four independent sliders, degenerate signal detection, multi-signal bonus. Used by the Search module in the dashboard where users can see and control exactly how results are ranked.&lt;/p&gt;

&lt;p&gt;Every result includes full provenance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"provenance"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"heat_score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.52&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"vector_score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.68&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"fulltext_score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.52&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"graph_score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.11&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Complete transparency. The user can see that a result ranked high because of semantic similarity (0.68) despite low graph connectivity (0.11), and adjust weights accordingly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Free tier: surprisingly good without vectors
&lt;/h2&gt;

&lt;p&gt;Free users don't get semantic search, graph search, or heat-weighted ranking. They get tsvector + GIN full-text search across all domains, with &lt;code&gt;ts_rank()&lt;/code&gt; scoring and weighted columns.&lt;/p&gt;

&lt;p&gt;This sounds like a big downgrade. In practice, it's surprisingly solid:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Searches for names, titles, and specific terms work perfectly — full-text search is exact.&lt;/li&gt;
&lt;li&gt;Weighted tsvector means a match in the title ranks above a match in the body.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;plainto_tsquery('simple', ...)&lt;/code&gt; handles both Spanish and English without configuration.&lt;/li&gt;
&lt;li&gt;The UNION ALL across domains means one search bar finds notes, emails, events, contacts, and files.&lt;/li&gt;
&lt;li&gt;GIN indexes make it O(log n) — fast even with tens of thousands of records.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The meaningful Pro differentiator is semantic search: finding "calor en Valencia" when you search for "beach trip." That's genuinely impossible with keyword matching. But for the 80% of searches where people type exactly what they're looking for, free tier search works fine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Temporal expansion: the "wow" moment
&lt;/h2&gt;

&lt;p&gt;Here's a feature that surprised us with how useful it turned out to be.&lt;/p&gt;

&lt;p&gt;If the query mentions an entity that has upcoming events (within ±7 days), temporally close results get a boost:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;temporal_boost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.15&lt;/span&gt; &lt;span class="err"&gt;×&lt;/span&gt; &lt;span class="n"&gt;proximity&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In practice: you search for "Ana García." The system finds that Ana has a meeting with you tomorrow. Notes, emails, and contacts related to Ana that were created or accessed in the last week get boosted. The search results naturally organize around "here's everything relevant to Ana before your meeting tomorrow."&lt;/p&gt;

&lt;p&gt;We didn't plan this as a feature — it fell out of having the knowledge graph and the calendar in the same database. But it consistently produces the kind of results that make people say "how did it know I needed that?"&lt;/p&gt;

&lt;h2&gt;
  
  
  The tuning knobs
&lt;/h2&gt;

&lt;p&gt;Five parameters control the pipeline behavior:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Default&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;MIN_SIMILARITY&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0.3&lt;/td&gt;
&lt;td&gt;Cosine sim threshold. Higher = less vector noise&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;RRF_FETCH_SIZE&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;Candidates per signal. Lower = faster, fewer candidates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;RRF_K&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;td&gt;RRF smoothing constant. Higher = smoother rank differences&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-signal factor&lt;/td&gt;
&lt;td&gt;0.25&lt;/td&gt;
&lt;td&gt;Bonus per additional signal (1 + 0.25 × (count-1))&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Degenerate threshold&lt;/td&gt;
&lt;td&gt;5%&lt;/td&gt;
&lt;td&gt;Signal suppression when score range is too narrow&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We've left these at their defaults since implementation. The multi-signal bonus and degenerate detection handle most edge cases automatically. If we ever need to tune, the provenance metadata on every result tells us exactly which signal is helping or hurting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance
&lt;/h2&gt;

&lt;p&gt;All of this runs inside a single PostgreSQL instance, on the same machine that serves the API:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Vector search (50 candidates)&lt;/td&gt;
&lt;td&gt;~15-30ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full-text search (UNION ALL, 7 tables)&lt;/td&gt;
&lt;td&gt;~5-15ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Graph discovery&lt;/td&gt;
&lt;td&gt;~5-10ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fusion + scoring&lt;/td&gt;
&lt;td&gt;~2-5ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total (standard search)&lt;/td&gt;
&lt;td&gt;~30-60ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total (advanced search, 4 signals)&lt;/td&gt;
&lt;td&gt;~50-80ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;No Elasticsearch. No Solr. No separate search service. pgvector and tsvector are PostgreSQL extensions that run in the same process as the rest of the database. One backup strategy, one connection pool, one operational concern.&lt;/p&gt;

&lt;p&gt;For a personal system with a few thousand records per domain, this is more than fast enough. If we ever hit scale problems (unlikely for single-user), the first optimization would be reducing &lt;code&gt;RRF_FETCH_SIZE&lt;/code&gt; from 50 to 20 — cutting candidate generation in half with minimal quality loss.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;I'd implement full-text search from the very beginning, not as a free-tier afterthought.&lt;/strong&gt; We built vector search first (Phase 3) and added tsvector later as a "fallback for free users." Turns out full-text search is essential even for Pro users — it catches exact matches that embeddings miss. "Show me the email from &lt;a href="mailto:patricia@work.com"&gt;patricia@work.com&lt;/a&gt;" is a full-text query, not a semantic one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'd add provenance to the standard search too, not just advanced.&lt;/strong&gt; We initially only exposed the score breakdown in &lt;code&gt;/search/advanced&lt;/code&gt;. When we added it to the standard &lt;code&gt;/search&lt;/code&gt; endpoint (in the provenance field), debugging search quality became ten times easier. Every bug report went from "the search is bad" to "this result has vector_score 0.8 but graph_score 0, why?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'd explore reranking with a cross-encoder.&lt;/strong&gt; Our pipeline does retrieval + fusion but no reranking. A small cross-encoder model (like ms-marco-MiniLM) could re-score the top 20 results for higher precision. We deferred this because the current quality is good enough and adding another model to the Ollama queue would increase latency. But for a future post-MVP iteration, it's the obvious next step.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;The trick to hybrid search isn't the individual signals — pgvector and tsvector are well-documented, and knowledge graph traversal is just recursive CTEs. The trick is the fusion: how you combine signals with different scales, different failure modes, and different strengths.&lt;/p&gt;

&lt;p&gt;Reciprocal Rank Fusion solves the scale problem — ranks instead of raw scores. Degenerate signal detection solves the failure mode problem — a noisy signal gets suppressed instead of poisoning results. Multi-signal bonus solves the confidence problem — results confirmed by multiple methods are almost certainly good.&lt;/p&gt;

&lt;p&gt;Four signals, one UNION ALL, one PostgreSQL instance, under 100ms. The search that finds "calor en Valencia" when you type "beach trip" — and also shows you that Ana García has a meeting with you tomorrow.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next up: sleep-time compute for personal data — what your AI should be doing while you sleep, and why idle cycles are the most valuable compute you have.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>search</category>
      <category>pgvector</category>
      <category>ai</category>
    </item>
    <item>
      <title>Your AI agent is wasting 90% of its tokens on field names"</title>
      <dc:creator>Victor García</dc:creator>
      <pubDate>Fri, 20 Mar 2026 13:56:54 +0000</pubDate>
      <link>https://forem.com/micelclaw/7o-your-ai-agent-is-wasting-90-of-its-tokens-on-field-names-2o2a</link>
      <guid>https://forem.com/micelclaw/7o-your-ai-agent-is-wasting-90-of-its-tokens-on-field-names-2o2a</guid>
      <description>&lt;p&gt;We built the compact API format (previous post) and felt good about ourselves. API responses were 78% smaller. Tokens saved. Problem solved.&lt;/p&gt;

&lt;p&gt;Then we actually measured where our agent's tokens were going.&lt;/p&gt;

&lt;p&gt;The API responses weren't the problem. The skills were.&lt;/p&gt;

&lt;h2&gt;
  
  
  The audit
&lt;/h2&gt;

&lt;p&gt;We ran a token audit across all 31 skills in the system. Here's what we found:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Main agent (Francis) — 11 skills loaded:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;always:true&lt;/code&gt; skills (injected on every single message): &lt;strong&gt;~17,500 tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Total including on-demand skills: &lt;strong&gt;~20,600 tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Global (all 31 skills across all agents):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;always:true&lt;/code&gt; skills (12 total): &lt;strong&gt;~27,500 tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;All skills combined: &lt;strong&gt;~50,000 tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's 25% of Sonnet's context window consumed by skills alone. Before the user says a word. Before the agent reads a single note or email. A quarter of the available context is just instructions on how to call APIs.&lt;/p&gt;

&lt;p&gt;Add the workspace identity files — SOUL.md, IDENTITY.md, USER.md, TOOLS.md, AGENTS.md, BOOTSTRAP.md — and you're looking at another 3-5K tokens. So the agent starts every conversation with roughly &lt;strong&gt;20-22K tokens of system prompt&lt;/strong&gt;. That's over 10% of the context window, gone.&lt;/p&gt;

&lt;p&gt;The user's actual message? Usually 20-50 tokens. A rounding error.&lt;/p&gt;

&lt;h2&gt;
  
  
  The top 5 offenders
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Skill&lt;/th&gt;
&lt;th&gt;always&lt;/th&gt;
&lt;th&gt;Tokens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;claw-search&lt;/td&gt;
&lt;td&gt;true&lt;/td&gt;
&lt;td&gt;~5,100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;claw-hal&lt;/td&gt;
&lt;td&gt;true&lt;/td&gt;
&lt;td&gt;~4,200&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;claw-approvals&lt;/td&gt;
&lt;td&gt;true&lt;/td&gt;
&lt;td&gt;~3,050&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;claw-files&lt;/td&gt;
&lt;td&gt;true&lt;/td&gt;
&lt;td&gt;~2,850&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;claw-mail&lt;/td&gt;
&lt;td&gt;true&lt;/td&gt;
&lt;td&gt;~2,780&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;claw-search&lt;/code&gt; alone burns 5,100 tokens every message. It's the biggest skill because it handles cross-domain search routing — deciding whether to query the user's data (notes, emails, contacts) or the agent's own workspace memory. That routing logic is complex and takes words to explain.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;claw-hal&lt;/code&gt; (hardware abstraction — storage, docker, network) is second because it covers multiple subsystems. When someone asks "how's my disk?", HAL needs to know about volumes, SMART data, mount points, and Docker containers. That's a lot of endpoints.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters more than API response size
&lt;/h2&gt;

&lt;p&gt;Think about the token flow of a typical interaction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: "What meetings do I have today?"

System prompt:  ~20,000 tokens (skills + identity)
User message:        ~8 tokens
API call:          ~300 tokens (compact format response)
Agent response:     ~50 tokens
─────────────────────────────────
Total:          ~20,358 tokens
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The API response is 1.5% of the total. Even if we made it zero tokens, we'd save almost nothing. The system prompt is 98% of the cost.&lt;/p&gt;

&lt;p&gt;This is why we say the title of the previous post was slightly misleading. Yes, compact format saves 78% on API responses. But API responses are the small slice. The real token budget is dominated by the system prompt — and within that, by the skills.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we did about it
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. The &lt;code&gt;always:true&lt;/code&gt; / &lt;code&gt;always:false&lt;/code&gt; split
&lt;/h3&gt;

&lt;p&gt;The most impactful decision: most skills don't need to be loaded on every message.&lt;/p&gt;

&lt;p&gt;If you say "save a note about the meeting," the agent needs &lt;code&gt;claw-notes&lt;/code&gt;. It does not need &lt;code&gt;claw-photos&lt;/code&gt;, &lt;code&gt;claw-diary&lt;/code&gt;, &lt;code&gt;claw-bookmarks&lt;/code&gt;, &lt;code&gt;claw-storage&lt;/code&gt;, or &lt;code&gt;home-assistant&lt;/code&gt;. Loading all of them wastes context on instructions the agent will never use for this interaction.&lt;/p&gt;

&lt;p&gt;OpenClaw's skill system supports an &lt;code&gt;always&lt;/code&gt; flag in the skill metadata:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openclaw"&lt;/span&gt;&lt;span class="pi"&gt;:{&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;always"&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;true&lt;/span&gt;&lt;span class="pi"&gt;}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Skills marked &lt;code&gt;always:true&lt;/code&gt; are injected into every prompt. Skills marked &lt;code&gt;always:false&lt;/code&gt; are only activated when the conversation context matches their description. The routing model (a fast, cheap classifier) reads the user's message and decides which skills to load.&lt;/p&gt;

&lt;p&gt;Our split:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;always:true (every message)&lt;/th&gt;
&lt;th&gt;always:false (on demand)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;claw-notes, claw-calendar, claw-mail, claw-contacts, claw-drive, claw-search&lt;/td&gt;
&lt;td&gt;claw-diary, claw-photos, claw-bookmarks, claw-storage, claw-hal, claw-graph, home-assistant&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The first group are things people expect to always work: "save a note," "what's on my calendar," "check my email." If these weren't always loaded, the agent would sometimes miss obvious requests.&lt;/p&gt;

&lt;p&gt;The second group are contextual: "how's my disk?" activates storage. "Show me photos from last week" activates photos. The routing model triggers them based on keywords.&lt;/p&gt;

&lt;p&gt;Result: the main agent's always-on cost dropped from ~20,600 to ~17,500 tokens. Still a lot, but 3,100 tokens saved on every single message adds up fast across hundreds of daily interactions.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Writing skills for tokens, not for humans
&lt;/h3&gt;

&lt;p&gt;The SKILL.md file is not documentation. It's a prompt. Every word costs money.&lt;/p&gt;

&lt;p&gt;Our early skills looked like documentation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Creating a note&lt;/span&gt;

To create a new note, send a POST request to the notes endpoint.
The request body should contain the title and content fields.
The title is optional — if not provided, the system will use the
first line of the content as the title.

&lt;span class="gu"&gt;### Example&lt;/span&gt;

POST /api/v1/notes
Content-Type: application/json

{
  "title": "Meeting notes from Q1 review",
  "content": "Discussed budget allocation...",
  "tags": ["work", "q1"]
}

&lt;span class="gu"&gt;### Response&lt;/span&gt;

201 Created
{
  "data": {
    "id": "550e8400-...",
    "title": "Meeting notes from Q1 review",
    ...
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's ~150 tokens to say "POST /notes with title, content, and tags." After optimization:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;### Create note&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`POST /notes`&lt;/span&gt; body: &lt;span class="sb"&gt;`{title?, content, tags?[]}`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Response: &lt;span class="sb"&gt;`201`&lt;/span&gt; with created note
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;~30 tokens. Same information. The agent doesn't need prose explaining what a POST request is. It doesn't need example JSON responses — it knows what a 201 looks like. It needs the method, the path, the body fields, and which ones are optional.&lt;/p&gt;

&lt;p&gt;The guidelines we adopted:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No full JSON response examples.&lt;/strong&gt; The agent doesn't need them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Include body fields for POST/PATCH&lt;/strong&gt; — the agent does need those.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;code&gt;?&lt;/code&gt; suffix for optional fields&lt;/strong&gt; — &lt;code&gt;title?&lt;/code&gt; instead of "title (optional)."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One line per operation&lt;/strong&gt; when possible.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No prose connectors&lt;/strong&gt; — "To create a note, you should..." becomes "Create: &lt;code&gt;POST /notes&lt;/code&gt;"&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. The compact instruction in every skill
&lt;/h3&gt;

&lt;p&gt;Every skill now instructs the agent to use &lt;code&gt;?format=compact&lt;/code&gt; for listings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## API optimization&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; List: always use &lt;span class="sb"&gt;`?format=compact`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Detail: &lt;span class="sb"&gt;`GET /:id`&lt;/span&gt; (full JSON) only when needed
&lt;span class="p"&gt;-&lt;/span&gt; Do NOT use &lt;span class="sb"&gt;`format=compact`&lt;/span&gt; on POST/PATCH/DELETE
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures the savings from the compact API format (post 6) are actually realized. Without this instruction, the agent defaults to full JSON responses — it doesn't know compact exists unless the skill tells it.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Multi-agent delegation
&lt;/h3&gt;

&lt;p&gt;The nuclear option for token optimization: don't load skills you don't need because a different agent handles them.&lt;/p&gt;

&lt;p&gt;Our multi-agent topology has 7 agents. The main agent (Francis) is a router — it handles common requests and delegates specialized ones:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Atlas&lt;/strong&gt; handles search, knowledge graph, and research&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sentinel&lt;/strong&gt; handles infrastructure, HAL, Docker, network&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dalí&lt;/strong&gt; handles photos, media, creative tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ledger&lt;/strong&gt; handles finance, invoicing, crypto&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Darwin&lt;/strong&gt; handles analytics, insights, sleep-time intelligence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Francis keeps 11 skills. The heavy ones like &lt;code&gt;claw-hal&lt;/code&gt; (4,200 tokens) move to Sentinel, who only loads when infrastructure questions come up. &lt;code&gt;claw-photos&lt;/code&gt; and visual intelligence move to Dalí. The search skill stays with Francis because search is needed in almost every interaction.&lt;/p&gt;

&lt;p&gt;The main agent's prompt drops from ~20K to ~17.5K tokens. Still significant, but the per-message cost is meaningful — especially when using cloud models billed per token.&lt;/p&gt;

&lt;p&gt;Full disclosure: the multi-agent topology is still early. We've defined the roles and the skill distribution, but we haven't battle-tested delegation patterns, error propagation between agents, or the overhead of agent-to-agent communication. There are almost certainly optimizations we're missing — whether it's smarter skill chunking, dynamic skill loading based on conversation history, or something we haven't thought of at all. If you've built multi-agent systems and see room for improvement, we'd genuinely love to hear about it in the comments.&lt;/p&gt;

&lt;h2&gt;
  
  
  The counterintuitive insight: bigger models handle it better
&lt;/h2&gt;

&lt;p&gt;Here's something we didn't expect: Sonnet (the larger model) processes large skill contexts more efficiently than Haiku (the smaller, supposedly faster model).&lt;/p&gt;

&lt;p&gt;When the system prompt is ~20K tokens across 12 skills, the root cause of latency isn't API performance or network — it's the model processing the skill context. Haiku, despite being "faster" per token, takes longer to reason through a large, complex system prompt. Sonnet processes the same context and produces a better-routed response in less wall-clock time.&lt;/p&gt;

&lt;p&gt;This means the intuition of "use the small model for simple routing" breaks down when the routing itself requires understanding a large skill corpus. The small model saves on per-token cost but loses on latency and accuracy. For our use case — personal OS with 12+ skills — Sonnet as the primary agent model is strictly better than Haiku, despite the higher per-token price.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers after optimization
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Main agent always-on skills&lt;/td&gt;
&lt;td&gt;~20,600 tokens&lt;/td&gt;
&lt;td&gt;~17,500 tokens&lt;/td&gt;
&lt;td&gt;-15%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tokens per skill (avg)&lt;/td&gt;
&lt;td&gt;~1,700&lt;/td&gt;
&lt;td&gt;~1,400&lt;/td&gt;
&lt;td&gt;-18%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Skills always:true&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;6 (main) / 12 (global)&lt;/td&gt;
&lt;td&gt;-60%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API response (10 events)&lt;/td&gt;
&lt;td&gt;~4,200 tokens&lt;/td&gt;
&lt;td&gt;~800 tokens&lt;/td&gt;
&lt;td&gt;-81%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;System prompt total&lt;/td&gt;
&lt;td&gt;~25,000 tokens&lt;/td&gt;
&lt;td&gt;~20,000 tokens&lt;/td&gt;
&lt;td&gt;-20%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 20% reduction in system prompt is nice, but the real win is architectural: understanding that skills are the dominant cost and designing the multi-agent topology, the always/on-demand split, and the skill writing guidelines around that reality.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;I'd measure token consumption from day one.&lt;/strong&gt; We built 12 skills before ever counting how many tokens they consumed together. If we'd measured after the third skill, we'd have adopted the concise writing style immediately instead of rewriting everything later.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'd design the multi-agent topology earlier.&lt;/strong&gt; The decision to split agents was driven by token costs, but it should have been driven by separation of concerns. Sentinel handling all infrastructure makes sense regardless of tokens — it's a different expertise domain. We arrived at the right architecture for the wrong reason.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'd add a token budget per skill in the manifest.&lt;/strong&gt; Right now there's no mechanism to warn when a skill exceeds a reasonable size. A &lt;code&gt;max_tokens: 3000&lt;/code&gt; field in the manifest would force skill authors (including us) to stay concise. If your skill is over budget, you need to split it or trim it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;When you're building an AI agent system, the optimization hierarchy is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;System prompt size&lt;/strong&gt; (~20K tokens, 98% of most interactions) — reduce always-on skills, write concisely, use multi-agent delegation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skill activation routing&lt;/strong&gt; — load only what's needed for this specific message&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API response format&lt;/strong&gt; — compact, diff-aware, progressive disclosure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model selection&lt;/strong&gt; — sometimes the bigger model is faster because it handles context better&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most optimization guides start at #3. The actual money is at #1.&lt;/p&gt;

&lt;p&gt;Your agent isn't wasting tokens on API responses. It's wasting them on instructions it doesn't need for this particular message. Fix the prompt, then fix the API.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next up: hybrid search with Reciprocal Rank Fusion — how we combined pgvector, tsvector, the knowledge graph, and heat scoring into a single search pipeline.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>llm</category>
      <category>optimization</category>
    </item>
    <item>
      <title>From JSON to compact: reducing API payloads 60% for LLM consumption"</title>
      <dc:creator>Victor García</dc:creator>
      <pubDate>Thu, 19 Mar 2026 12:17:24 +0000</pubDate>
      <link>https://forem.com/micelclaw/6o-from-json-to-compact-reducing-api-payloads-60-for-llm-consumption-cgk</link>
      <guid>https://forem.com/micelclaw/6o-from-json-to-compact-reducing-api-payloads-60-for-llm-consumption-cgk</guid>
      <description>&lt;p&gt;Every time your AI agent calls your API, it pays for the response in tokens. Not just the useful data — every &lt;code&gt;"id":&lt;/code&gt;, every &lt;code&gt;"created_at":&lt;/code&gt;, every &lt;code&gt;null&lt;/code&gt; field it didn't ask for. JSON is designed for humans reading documentation, not for LLMs processing structured data.&lt;/p&gt;

&lt;p&gt;We measured our API responses before optimization. A typical "show me my events today" call returned 3 events in ~1,200 tokens. After compact format: ~280 tokens. Same information, 77% fewer tokens.&lt;/p&gt;

&lt;p&gt;This post is about how we built that format, what we tried and killed, and the surprising insight that changed how we think about API design for agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem: JSON is expensive
&lt;/h2&gt;

&lt;p&gt;Here's what a standard events response looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"550e8400-e29b-41d4-a716-446655440000"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Standup"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"location"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Discord"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"start_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-02-18T09:00:00Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"end_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-02-18T09:30:00Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"all_day"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"recurrence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"freq"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"weekly"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"days"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"mon"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"wed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"fri"&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"confirmed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"google"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"source_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"abc123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"calendar_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Work"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"reminders"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"attendees"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"custom_fields"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"created_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-01-15T10:00:00Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"updated_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-02-17T10:00:00Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"synced_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-02-18T06:00:00Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"deleted_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"meta"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"total"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"limit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"offset"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tier"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"free"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One event. 18 fields. Of those, the agent needs maybe 5 to answer "what's on my calendar?": title, start time, end time, location, and whether it recurs. The other 13 fields — nulls, internal IDs, sync timestamps — are noise.&lt;/p&gt;

&lt;p&gt;Multiply this by 20 events and every API call burns tokens on data the agent will never use.&lt;/p&gt;

&lt;h2&gt;
  
  
  The solution: &lt;code&gt;?format=compact&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Add one query parameter and the response transforms entirely:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;GET /api/v1/events?date=today&amp;amp;format=compact
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"format"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"compact"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"3 events (today)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"lines"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"09:00-09:30 Standup [Discord] 🔁 id:550e8400"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"14:00-15:00 Comida con Ana [La Mar] id:660e8400"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"📅 all-day: Cumple Mamá id:770e8400"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ids"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"550e8400-e29b-41d4-a716-446655440000"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"660e8400-e29b-41d4-a716-446655440001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"770e8400-e29b-41d4-a716-446655440002"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"meta"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"total"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"limit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"offset"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"has_more"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three events in under 300 tokens. The agent can read the &lt;code&gt;summary&lt;/code&gt; line and respond "You have 3 events today" without parsing anything. If it needs to modify an event, the &lt;code&gt;ids&lt;/code&gt; array has the UUIDs in the same order as the lines.&lt;/p&gt;

&lt;p&gt;Each domain has its own line format:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Domain&lt;/th&gt;
&lt;th&gt;Line format&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Events&lt;/td&gt;
&lt;td&gt;&lt;code&gt;HH:MM-HH:MM title [location] 🔁&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;09:00-09:30 Standup [Discord] 🔁&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Notes&lt;/td&gt;
&lt;td&gt;&lt;code&gt;📌 "title" (tags) [relative_date]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;📌 "Meeting notes Q1" (trabajo) [2w ago]&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Emails&lt;/td&gt;
&lt;td&gt;&lt;code&gt;from: "subject" [FOLDER] status 📎&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Juan: "Budget Q3" [INBOX] unread 📎&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Contacts&lt;/td&gt;
&lt;td&gt;&lt;code&gt;name — company — email&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Ana García — TechCorp — ana@tech.com&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Files&lt;/td&gt;
&lt;td&gt;&lt;code&gt;filename (size) [type]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;report.pdf (2.3MB) [pdf]&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Diary&lt;/td&gt;
&lt;td&gt;&lt;code&gt;date mood "preview"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;2026-02-18 😊 "Great day at the office..."&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The line templates are fixed per domain. They include exactly the fields an agent needs for a conversational summary, and nothing else.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the agent does with it
&lt;/h2&gt;

&lt;p&gt;The key insight is progressive disclosure. Compact format is the first step; full JSON is the second.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: "What's on my calendar today?"
Agent: GET /events?date=today&amp;amp;format=compact
       → reads summary: "3 events (today)"
       → responds: "You have 3 events today: Standup at 9, Comida con Ana at 2, and it's Mamá's birthday."

User: "Move the lunch to 3pm"
Agent: PATCH /events/660e8400-e29b-41d4-a716-446655440001
       → body: {"start_at": "2026-02-18T15:00:00Z", "end_at": "2026-02-18T16:00:00Z"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent used compact for the listing (cheap) and a direct PATCH for the mutation (needs the UUID, which compact provides). It never needed to fetch the full JSON for any event.&lt;/p&gt;

&lt;p&gt;Every skill instructs the agent to use &lt;code&gt;?format=compact&lt;/code&gt; for listings and &lt;code&gt;GET /:id&lt;/code&gt; (full JSON) only when it needs details about a specific record. This pattern — compact for browse, full for drill-down — is progressive disclosure implemented through the API, not the UI.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we tried and killed
&lt;/h2&gt;

&lt;p&gt;Before arriving at compact format, we explored three ideas. All of them died.&lt;/p&gt;

&lt;h3&gt;
  
  
  TOON (Token-Oriented Object Notation)
&lt;/h3&gt;

&lt;p&gt;TOON is a serialization format designed for LLMs. It defines headers once and streams rows — like a TSV with a schema preamble. Research claims 30-60% fewer tokens than JSON.&lt;/p&gt;

&lt;p&gt;We evaluated it and dropped it. The savings over our compact format were marginal (~3-5 tokens per line), but TOON introduced real problems: special characters need escaping, the parser has to handle edge cases with newlines in content, and the agent needs to understand a custom format instead of reading natural-language lines.&lt;/p&gt;

&lt;p&gt;Compact lines are readable in English (or Spanish). TOON rows are not. For an LLM, readability matters more than raw compression.&lt;/p&gt;

&lt;h3&gt;
  
  
  Token Budget (&lt;code&gt;?token_budget=N&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;The idea: tell the server "I have 500 tokens left, fit the response in that." The server would progressively reduce fields, then limit results, then summarize content to fit within the budget.&lt;/p&gt;

&lt;p&gt;We killed it because it violates separation of responsibilities. The server doesn't know which fields matter to the agent in a given context. Sometimes the agent needs attendees and not descriptions. Sometimes the reverse. Making the server decide which fields to cut is a design smell.&lt;/p&gt;

&lt;p&gt;The agent already has &lt;code&gt;?fields=&lt;/code&gt;, &lt;code&gt;?limit=&lt;/code&gt;, and &lt;code&gt;?format=compact&lt;/code&gt; to control granularity. It doesn't need the server to guess on its behalf.&lt;/p&gt;

&lt;h3&gt;
  
  
  Semantic cache (pgvector similarity)
&lt;/h3&gt;

&lt;p&gt;The idea: cache API responses by query embedding. If a new query is semantically similar to a cached one, return the cached response. Research shows 31% of queries are semantically similar.&lt;/p&gt;

&lt;p&gt;For a single-user personal system, the hit rate would be 5-15% at best. And the overhead of embedding each query (50-200ms via Ollama) is slower than just running the database query directly (5-20ms). We'd be adding latency to save latency.&lt;/p&gt;

&lt;p&gt;We kept the SHA-256 exact-match cache as a possibility for the future (when the sleep-time engine starts making repetitive queries), but the semantic layer was pure overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Diff-aware responses: &lt;code&gt;?since=&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Compact format reduces the size of each response. But there's another dimension: reducing how often the agent needs to ask at all.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;?since=&lt;/code&gt; parameter lets the agent say "I last checked at 14:00 — what changed?"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;GET /api/v1/events?since=2026-02-18T14:00:00Z
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"created"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"updated"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"660e8400-..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Comida con Ana"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"start_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-02-18T15:00:00Z"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"deleted"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"meta"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"response_timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-02-18T14:35:00Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"total_changes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent stores the &lt;code&gt;response_timestamp&lt;/code&gt; from each call and passes it as &lt;code&gt;?since=&lt;/code&gt; next time. If nothing changed, the response is essentially empty. The response_timestamp is captured before the query executes — conservative by design, so changes are never lost (at worst, the agent sees a duplicate).&lt;/p&gt;

&lt;p&gt;Combined with compact format, the pattern becomes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;First call: &lt;code&gt;GET /events?format=compact&lt;/code&gt; → full compact listing&lt;/li&gt;
&lt;li&gt;Subsequent calls: &lt;code&gt;GET /events?since=&amp;lt;last_timestamp&amp;gt;&lt;/code&gt; → only changes&lt;/li&gt;
&lt;li&gt;If changes exist and agent needs context: &lt;code&gt;GET /events/:id&lt;/code&gt; → full detail on specific records&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For a system where the agent polls every 30 minutes and maybe 2 records changed, this turns a 1,200-token response into a 150-token one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Token counting: the &lt;code&gt;X-Token-Count&lt;/code&gt; header
&lt;/h2&gt;

&lt;p&gt;Every response includes an &lt;code&gt;X-Token-Count&lt;/code&gt; header with the approximate token count of the response body. It's a heuristic (±15% accuracy), not an exact count — we use character-based estimation rather than running a tokenizer on every response.&lt;/p&gt;

&lt;p&gt;The agent doesn't make binary decisions based on this number. It's informational — "this response cost me approximately 450 tokens" — so the agent can track its consumption over time and adjust strategy. If it notices that contacts responses are consistently heavy, it might switch to compact or reduce limits.&lt;/p&gt;

&lt;p&gt;We briefly considered using tiktoken for exact counting and decided against it. The precision wasn't worth the overhead on every response, and the agent doesn't need exact numbers — it needs a signal.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;p&gt;Measured across 7 domain endpoints with realistic data volumes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Domain&lt;/th&gt;
&lt;th&gt;JSON (tokens)&lt;/th&gt;
&lt;th&gt;Compact (tokens)&lt;/th&gt;
&lt;th&gt;Reduction&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Events (10)&lt;/td&gt;
&lt;td&gt;~4,200&lt;/td&gt;
&lt;td&gt;~800&lt;/td&gt;
&lt;td&gt;81%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Notes (10)&lt;/td&gt;
&lt;td&gt;~5,800&lt;/td&gt;
&lt;td&gt;~1,200&lt;/td&gt;
&lt;td&gt;79%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Emails (10)&lt;/td&gt;
&lt;td&gt;~8,500&lt;/td&gt;
&lt;td&gt;~1,500&lt;/td&gt;
&lt;td&gt;82%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Contacts (10)&lt;/td&gt;
&lt;td&gt;~3,200&lt;/td&gt;
&lt;td&gt;~900&lt;/td&gt;
&lt;td&gt;72%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Files (10)&lt;/td&gt;
&lt;td&gt;~2,800&lt;/td&gt;
&lt;td&gt;~700&lt;/td&gt;
&lt;td&gt;75%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Diary (10)&lt;/td&gt;
&lt;td&gt;~4,000&lt;/td&gt;
&lt;td&gt;~1,100&lt;/td&gt;
&lt;td&gt;73%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Search (10)&lt;/td&gt;
&lt;td&gt;~6,000&lt;/td&gt;
&lt;td&gt;~1,000&lt;/td&gt;
&lt;td&gt;83%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Average reduction: ~78%. The title says 60% because that's the conservative number for small result sets (3-5 records). As result sets grow, the savings compound.&lt;/p&gt;

&lt;p&gt;Emails save the most because they have the most fields (30+ columns, most of which are null for any given record). Contacts save the least because they're already relatively compact in JSON.&lt;/p&gt;

&lt;h2&gt;
  
  
  Design decisions worth noting
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Compact and &lt;code&gt;?fields=&lt;/code&gt; are mutually exclusive.&lt;/strong&gt; Compact templates are fixed per domain. Allowing custom field selection within compact would mean building a dynamic template engine — complexity for a feature nobody asked for. If you need specific fields, use &lt;code&gt;?fields=&lt;/code&gt;. If you need minimal tokens, use &lt;code&gt;?format=compact&lt;/code&gt;. Not both.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Compact is ignored silently on non-list endpoints.&lt;/strong&gt; &lt;code&gt;GET /notes/:id?format=compact&lt;/code&gt; returns normal JSON. No error, no warning. The agent shouldn't need to remember which endpoints support which formats — it just adds &lt;code&gt;format=compact&lt;/code&gt; to everything and the server does the right thing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Errors are always full JSON.&lt;/strong&gt; Even with &lt;code&gt;?format=compact&lt;/code&gt;, errors return the standard error envelope with code, message, details, and hint. The agent needs structured error information more than it needs minimal tokens in failure cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The &lt;code&gt;ids&lt;/code&gt; array is the bridge.&lt;/strong&gt; The compact lines are for the LLM to read and summarize. The &lt;code&gt;ids&lt;/code&gt; array is for the LLM to act — it maps 1:1 to the lines, so "the second event" maps to &lt;code&gt;ids[1]&lt;/code&gt;. This dual-track design (human-readable lines + machine-actionable IDs) is what makes compact format actually useful rather than just cheap.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;I'd build compact format before anything else.&lt;/strong&gt; We built it during the API Intelligence phase (Cluster A), but it should have been in Phase 1. Every skill we wrote before compact format included verbose JSON examples that the agent had to parse. The day we shipped compact, we updated every skill to use it and immediately saw lower latency and better agent responses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'd kill more ideas faster.&lt;/strong&gt; TOON, token budget, and semantic cache each took a design session to spec out and a decision to kill. The design sessions weren't wasted — they clarified what we actually needed — but we could have killed them in 30 minutes of back-of-envelope calculation instead of writing full specs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;If you're building an API that LLM agents will consume, you're probably shipping too much data. JSON is great for browsers and SDKs. It's terrible for language models that pay per token.&lt;/p&gt;

&lt;p&gt;The fix is embarrassingly simple: a query parameter that switches the response format from structured objects to human-readable lines with an ID array. Fixed templates per domain. No custom serializer, no protocol change, no content negotiation. Just a different &lt;code&gt;if&lt;/code&gt; branch in your response handler.&lt;/p&gt;

&lt;p&gt;The deeper insight: the agent doesn't need your data model. It needs a summary it can repeat to the user and IDs it can use to take action. Everything else is overhead.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next up: your AI agent is wasting 90% of its tokens — and it's not the API's fault. It's the skills.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>api</category>
      <category>ai</category>
      <category>typescript</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Heat scoring: teaching your data to forget (gracefully)"</title>
      <dc:creator>Victor García</dc:creator>
      <pubDate>Wed, 18 Mar 2026 17:26:00 +0000</pubDate>
      <link>https://forem.com/micelclaw/heat-scoring-teaching-your-data-to-forget-gracefully-1093</link>
      <guid>https://forem.com/micelclaw/heat-scoring-teaching-your-data-to-forget-gracefully-1093</guid>
      <description>&lt;p&gt;Here's something that bothered us early on: a search for "project meeting" returned every meeting note we'd ever written, sorted by relevance. The note from this morning and the one from eight months ago scored identically — because semantically, they're equally "about project meetings."&lt;/p&gt;

&lt;p&gt;But they're not equally useful. The one from this morning matters right now. The one from eight months ago is a fossil. We needed a way to express that difference without breaking search when someone genuinely wants to find old records.&lt;/p&gt;

&lt;p&gt;The solution was to give every record a temperature.&lt;/p&gt;

&lt;h2&gt;
  
  
  The concept
&lt;/h2&gt;

&lt;p&gt;Every record in every domain table — notes, events, contacts, emails, files, diary entries — has a heat score. It's a number between 0 and 1 that answers one question: &lt;strong&gt;how relevant is this record right now?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Heat rises when you interact with a record (open it, edit it, create it). Heat decays exponentially over time when you don't. A note you wrote this morning is hot. A note you haven't touched in six months is cold. A contact you email every week stays permanently warm.&lt;/p&gt;

&lt;p&gt;Three tiers classify records:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Heat score&lt;/th&gt;
&lt;th&gt;What it means&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hot&lt;/td&gt;
&lt;td&gt;&amp;gt; 0.7&lt;/td&gt;
&lt;td&gt;Actively used right now&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Warm&lt;/td&gt;
&lt;td&gt;&amp;gt; 0.2&lt;/td&gt;
&lt;td&gt;Used recently-ish&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cold&lt;/td&gt;
&lt;td&gt;≤ 0.2&lt;/td&gt;
&lt;td&gt;Hasn't been touched in a while&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The idea is inspired by MemoryOS (EMNLP 2025), which applies memory tiers to conversational AI. We adapted it to structured personal data — records instead of chat messages, database rows instead of context windows.&lt;/p&gt;

&lt;h2&gt;
  
  
  The formula
&lt;/h2&gt;

&lt;p&gt;The heat score combines two signals: how often you've accessed a record, and how recently.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;effective_access = access_count × max(0, 1 - (hours_since_last_increment / 4383))
raw_heat = effective_access × e^(-λ × hours_since_last_access)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where &lt;code&gt;λ = ln(2) / 168&lt;/code&gt; — a half-life of one week (168 hours).&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;effective_access&lt;/code&gt; part handles frequency. A contact you've accessed 100 times counts for a lot — but that count itself decays linearly over a year. Six months of neglect and those 100 accesses are worth 50. A year of neglect and they're worth zero. This prevents records from staying hot forever just because they were used intensively once.&lt;/p&gt;

&lt;p&gt;The exponential decay part handles recency. Something accessed right now has full heat. Something accessed a week ago has half. Two weeks ago, a quarter. The exponential curve means heat drops fast in the first few days, then slowly flattens out.&lt;/p&gt;

&lt;p&gt;The combined effect is interesting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Contact accessed 100 times, last access 6 months ago:&lt;/strong&gt; effective_access ≈ 50, but the exponential decay from 6 months of recency drives heat_score close to 0. It's cold.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Note accessed 3 times, last access today:&lt;/strong&gt; effective_access = 3, multiplied by nearly full recency. Heat_score is moderate. It's warm, trending hot.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Email accessed once, created this morning:&lt;/strong&gt; effective_access = 1, recency = fresh. Low warm. One more access and it tips to solid warm.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Normalization: the 0-1 range
&lt;/h3&gt;

&lt;p&gt;The raw formula produces unbounded values — a record accessed 50 times today gives a raw heat of ~50. That's useless for ranking. So we normalize:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;normalized = raw_heat / SCALE_FACTOR    // SCALE_FACTOR = 10.0 (calibrated)
heat_score = normalized ≤ 0.8 ? normalized : 0.8 + (normalized - 0.8) × 0.5
heat_score = min(heat_score, 1.0)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The soft cap at 0.8 is deliberate. Getting from 0.0 to 0.8 takes normal usage. Getting from 0.8 to 1.0 takes double the activity. This prevents "thermal saturation" — a record you obsessively check doesn't permanently peg the meter at 1.0, leaving room for other records to be relatively hotter.&lt;/p&gt;

&lt;p&gt;The hard cap at 1.0 is enforced in the database: &lt;code&gt;CHECK (heat_score &amp;gt;= 0 AND heat_score &amp;lt;= 1)&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The table
&lt;/h2&gt;

&lt;p&gt;Heat data lives in its own table, not in the domain tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;record_heat&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;              &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;gen_random_uuid&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="k"&gt;domain&lt;/span&gt;          &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;record_id&lt;/span&gt;       &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;access_count&lt;/span&gt;    &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;last_accessed&lt;/span&gt;   &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;last_increment&lt;/span&gt;  &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;heat_score&lt;/span&gt;      &lt;span class="nb"&gt;REAL&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory_tier&lt;/span&gt;     &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'cold'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt;      &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;updated_at&lt;/span&gt;      &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="k"&gt;UNIQUE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;record_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same pattern as the embeddings table: &lt;code&gt;(domain, record_id)&lt;/code&gt; instead of foreign keys. No cascading deletes to manage — when a note is deleted, the heat row becomes an orphan that the cleanup cron handles.&lt;/p&gt;

&lt;p&gt;Why a separate table instead of adding columns to each domain table? Because heat is cross-cutting. Every domain behaves the same way. A single cron job, a single update function, a single query pattern. If heat lived in 7+ domain tables, every change to the formula would be 7+ ALTER statements.&lt;/p&gt;

&lt;h2&gt;
  
  
  When heat changes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  On access (instant)
&lt;/h3&gt;

&lt;p&gt;Every time you open a record — &lt;code&gt;GET /notes/:id&lt;/code&gt;, &lt;code&gt;GET /contacts/:id&lt;/code&gt;, whatever — a post-handler fires:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Increment &lt;code&gt;access_count&lt;/code&gt; by 1&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;last_accessed&lt;/code&gt; to now&lt;/li&gt;
&lt;li&gt;Recalculate &lt;code&gt;heat_score&lt;/code&gt; and &lt;code&gt;memory_tier&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Upsert into &lt;code&gt;record_heat&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's one SQL query. Under 1 millisecond. The user never notices.&lt;/p&gt;

&lt;p&gt;Creating or editing a record also bumps heat. A record you just wrote or just modified reflects active intent — it should be hot.&lt;/p&gt;

&lt;p&gt;Deleting doesn't bump heat. The decay cron will cool it naturally.&lt;/p&gt;

&lt;h3&gt;
  
  
  On cron (every 6 hours)
&lt;/h3&gt;

&lt;p&gt;A scheduled job recalculates heat for every row in &lt;code&gt;record_heat&lt;/code&gt;. This is necessary because the exponential decay is time-based — without periodic recalculation, a record that nobody accesses would stay at whatever heat it had when last touched. The cron ensures everything drifts toward cold over time.&lt;/p&gt;

&lt;p&gt;The cron also applies the linear decay to &lt;code&gt;access_count&lt;/code&gt; — proportionally reducing it based on time elapsed since the last recalculation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Source tracking
&lt;/h3&gt;

&lt;p&gt;Not all accesses are equal. We track the source of each heat event:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;HeatSource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user_dash&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;agent_primary&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;agent_creative&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; 
                &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sleep_time&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;sync&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;unknown&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A user opening a contact in the dashboard is a strong signal of interest. The sleep-time engine scanning the same contact for background analysis is not. The heat system accepts the source parameter so that downstream consumers (like the proactive preference learner) can distinguish genuine human interest from automated access.&lt;/p&gt;

&lt;p&gt;Sync imports get lower initial heat than user-created records, because the user didn't actively choose to create them.&lt;/p&gt;

&lt;h2&gt;
  
  
  How heat changes search
&lt;/h2&gt;

&lt;p&gt;Heat never filters search results. If you search for "apartment lease" and the only match is a cold note from a year ago, it absolutely must appear. Hiding results because they're cold would be insane.&lt;/p&gt;

&lt;p&gt;Instead, heat acts as a &lt;strong&gt;post-fusion tiebreaker&lt;/strong&gt; in the search ranking algorithm:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;final_score = rrf_score × (1 + 0.1 × heat_score)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is step 4 in the Reciprocal Rank Fusion pipeline. The &lt;code&gt;0.1&lt;/code&gt; multiplier means heat can boost a result by at most 10%. It breaks ties between equally relevant results — the hotter one wins — but it can never override genuine relevance. A cold but highly relevant result still beats a hot but marginally relevant one.&lt;/p&gt;

&lt;p&gt;In the advanced search mode, heat becomes one of four independently weighted signals:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;score = (α × heat + β × semantic + γ × fulltext + δ × graph) / (α + β + γ + δ)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The user can drag sliders to adjust the weight of each signal. Want to find recent-activity-first? Crank heat to 1.0. Want to find the most semantically relevant regardless of age? Drop heat to 0. The weights are transparent — every search result shows its raw scores for all four signals.&lt;/p&gt;

&lt;h2&gt;
  
  
  How heat changes the digest
&lt;/h2&gt;

&lt;p&gt;The Digest Engine — the background job that compiles "here's what changed since you last checked" — uses heat in two ways:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prioritization.&lt;/strong&gt; When 50 things changed overnight, the digest needs to decide what to mention first. Heat-weighted changes surface near the top. "Ana García, who you've been emailing all week, sent a new message" ranks above "A contact you haven't opened in 4 months was synced."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cold filtering.&lt;/strong&gt; By default, the search endpoint excludes records in the "accessed but cold" zone (heat &amp;gt; 0 but &amp;lt; 0.2). These are records you interacted with once and then forgot. They still exist and are findable if you search explicitly — but they don't clutter routine results. A &lt;code&gt;?include_cold=true&lt;/code&gt; parameter brings them back for exhaustive searches.&lt;/p&gt;

&lt;h2&gt;
  
  
  How heat changes the API
&lt;/h2&gt;

&lt;p&gt;Every list endpoint gains a &lt;code&gt;?tier=&lt;/code&gt; filter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;GET /notes?tier=hot          → only hot notes
GET /contacts?tier=warm      → warm + hot contacts
GET /events?tier=cold        → only cold events (rarely useful, but available)
GET /notes                   → all notes (no change from before)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The AI agent uses this heavily. When the user says "what's on my plate?", the agent queries &lt;code&gt;?tier=hot&lt;/code&gt; across all domains to get a quick snapshot of active items without wading through the entire database. When the user says "find that contract from last year", the agent searches without tier filtering because cold records are exactly what's needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The edge case: heat on edges
&lt;/h2&gt;

&lt;p&gt;Records aren't the only thing with heat. Graph edges — the connections between entities in the knowledge graph — also have a heat score.&lt;/p&gt;

&lt;p&gt;When you open a note and then immediately open a related email (within 60 seconds), the edge between those two records gets warmer. The dashboard tracks these co-navigation patterns automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lastViewedRecord&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;lastViewedRecord&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;timestamp&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/links/heat-edge&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;from_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;lastViewedRecord&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;from_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;lastViewedRecord&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;to_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;currentRecord&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;to_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;currentRecord&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hot edges become visible in the graph visualization — glowing connections between frequently co-accessed records. Cold edges fade almost invisible. It turns the knowledge graph into a heat map of your actual workflow, not just a static web of extracted entities.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;I'd start with simpler thresholds and tune later.&lt;/strong&gt; We went through three iterations of tier boundaries (the original design had unbounded scores with thresholds at 5.0 and 0.5, then we normalized to 0-1 with a soft cap, then adjusted thresholds to 0.7/0.2). Starting with "anything accessed in the last week is hot, anything in the last month is warm, everything else is cold" would have been fine for v1.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'd add heat to the search provenance from day one.&lt;/strong&gt; Heat was initially invisible in search results — you could feel its effect on ranking but couldn't see the raw score. Adding provenance (the breakdown of all four search signals) happened later, and it was immediately obvious it should have been there from the start. Transparency in ranking builds trust.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;Heat scoring is maybe 150 lines of code: one table, one upsert function, one cron job, one post-fusion multiplier. But it fundamentally changes the system's relationship with time.&lt;/p&gt;

&lt;p&gt;Without heat, every record is equally present. Your database is a flat archive — 2,000 notes, all with the same weight. With heat, the database has a sense of "now." Recent activity floats to the surface. Old activity sinks naturally. The AI agent, the search engine, and the digest all benefit from the same signal: what matters to you right now.&lt;/p&gt;

&lt;p&gt;The formula doesn't need to be clever. Exponential decay with a weekly half-life and a soft cap. That's it. The cleverness is in where you apply it — as a tiebreaker in search, as a filter in the agent's queries, as a priority signal in the digest, as a visual indicator in the dashboard.&lt;/p&gt;

&lt;p&gt;Data that knows how to forget is data that stays useful.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next up: from JSON to compact — how we reduced API payloads by 60% for LLM consumption, and why your AI agent is wasting most of its tokens on field names.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>architecture</category>
      <category>ai</category>
      <category>search</category>
    </item>
    <item>
      <title>Building a personal knowledge graph with just PostgreSQL (no Neo4j needed)"</title>
      <dc:creator>Victor García</dc:creator>
      <pubDate>Wed, 18 Mar 2026 13:04:24 +0000</pubDate>
      <link>https://forem.com/micelclaw/4o-building-a-personal-knowledge-graph-with-just-postgresql-no-neo4j-needed-22b2</link>
      <guid>https://forem.com/micelclaw/4o-building-a-personal-knowledge-graph-with-just-postgresql-no-neo4j-needed-22b2</guid>
      <description>&lt;p&gt;At some point during development, we needed to answer questions that search couldn't: "Who's connected to this project?" "What links this email to tomorrow's meeting?" "Which people keep appearing together across my notes?"&lt;/p&gt;

&lt;p&gt;Semantic search finds records by meaning. Full-text search finds them by keywords. But neither can traverse relationships. For that, you need a graph.&lt;/p&gt;

&lt;p&gt;The obvious choice was Neo4j, or at least Apache AGE (which adds Cypher queries to PostgreSQL). We evaluated both and went with... two tables and &lt;code&gt;WITH RECURSIVE&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This post explains why, and shows the actual schema, queries, and API that power the knowledge graph.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why not a graph database?
&lt;/h2&gt;

&lt;p&gt;We seriously considered Apache AGE. It adds OpenCypher support directly inside PostgreSQL — no separate service, same database. The competitive analysis even had it as "Innovation #2: Bi-temporal personal knowledge graph via Apache AGE."&lt;/p&gt;

&lt;p&gt;We rejected it for three reasons:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scale.&lt;/strong&gt; This is a personal system. One user, maybe a family. We're talking about fewer than 10,000 entity nodes and 50,000 edges. At that scale, a recursive CTE with depth 2-3 takes microseconds. The performance argument for a graph engine simply doesn't exist.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dependencies.&lt;/strong&gt; Apache AGE is a PostgreSQL extension that needs to be compiled and installed. On a bare-metal mini-PC with different architectures (x86, ARM), that's a maintenance headache. Our system already depends on two extensions (pgcrypto and pgvector). Adding a third for a feature that SQL handles natively felt wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Type safety.&lt;/strong&gt; We use Drizzle ORM for everything. Apache AGE queries return untyped results from Cypher strings embedded in SQL. Our recursive CTEs return typed rows that Drizzle understands. No impedance mismatch, no parsing layer, no surprise nulls.&lt;/p&gt;

&lt;p&gt;The decision was documented as "ADR B-4: Apache AGE rejected in favor of SQL-native approach with recursive CTEs." The roadmap originally referenced AGE — we had to go back and update it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The schema: two tables
&lt;/h2&gt;

&lt;p&gt;The entire knowledge graph lives in two tables. One for nodes, one for edges.&lt;/p&gt;

&lt;h3&gt;
  
  
  Nodes: &lt;code&gt;graph_entities&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;graph_entities&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;                &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;gen_random_uuid&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;entity_type&lt;/span&gt;       &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;-- person, project, location, topic&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;              &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;            &lt;span class="c1"&gt;-- "Ana García", "Micelclaw OS", "Zaragoza"&lt;/span&gt;
    &lt;span class="n"&gt;normalized_name&lt;/span&gt;   &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;            &lt;span class="c1"&gt;-- "ana garcia", "micelclaw os", "zaragoza"&lt;/span&gt;
    &lt;span class="n"&gt;properties&lt;/span&gt;        &lt;span class="n"&gt;JSONB&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'{}'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;-- {email, company, role, coordinates...}&lt;/span&gt;
    &lt;span class="n"&gt;merge_history&lt;/span&gt;     &lt;span class="n"&gt;JSONB&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'[]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;-- [{merged_from, date, reason}]&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt;        &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;updated_at&lt;/span&gt;        &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;deleted_at&lt;/span&gt;        &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;UNIQUE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entity_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;normalized_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four entity types: &lt;code&gt;person&lt;/code&gt;, &lt;code&gt;project&lt;/code&gt;, &lt;code&gt;location&lt;/code&gt;, &lt;code&gt;topic&lt;/code&gt;. That's it. We considered adding more (organization, event, document) and decided against it. Four types cover 95% of real-world connections in personal data. The &lt;code&gt;properties&lt;/code&gt; JSONB handles the rest — a person can have an email and company, a location can have coordinates.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;normalized_name&lt;/code&gt; column is the key to entity resolution: lowercase, no accents, trimmed. When the extraction pipeline finds "Ana García" in a note and "ana garcia" in an email, the UNIQUE constraint ensures they map to the same node.&lt;/p&gt;

&lt;h3&gt;
  
  
  Edges: &lt;code&gt;entity_links&lt;/code&gt; (extended)
&lt;/h3&gt;

&lt;p&gt;We already had an &lt;code&gt;entity_links&lt;/code&gt; table from the initial schema — it was one of the original 13 tables. It stored simple connections like "this note mentions this contact." For the knowledge graph, we extended it with three columns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;entity_links&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;link_type&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'manual'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- 'manual' | 'extracted' | 'inferred' | 'structural'&lt;/span&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;entity_links&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="nb"&gt;REAL&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- 0.0 to 1.0&lt;/span&gt;

&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;entity_links&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;created_by&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'system'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- 'user' | 'llm' | 'system'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three ALTER statements. That's the entire migration for turning a flat links table into a knowledge graph edge store.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;source_type&lt;/code&gt; and &lt;code&gt;target_type&lt;/code&gt; columns (VARCHAR, no CHECK constraint) now accept &lt;code&gt;graph_entity&lt;/code&gt; as a valid type. This means an edge can connect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A note → a graph entity (Person mentioned in text)&lt;/li&gt;
&lt;li&gt;A graph entity → a graph entity (Person works at Project)&lt;/li&gt;
&lt;li&gt;An event → a graph entity (Event located in City)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The relationship taxonomy follows a subject → verb → object convention:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Relationship&lt;/th&gt;
&lt;th&gt;Direction&lt;/th&gt;
&lt;th&gt;Created by&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;mentions&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;record → entity&lt;/td&gt;
&lt;td&gt;LLM extraction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;attended_by&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;event → person&lt;/td&gt;
&lt;td&gt;Sync (calendar attendees)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;located_in&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;record → location&lt;/td&gt;
&lt;td&gt;LLM extraction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;works_at&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;person → project/org&lt;/td&gt;
&lt;td&gt;LLM extraction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;collaborates_with&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;person → person&lt;/td&gt;
&lt;td&gt;Inferred from co-occurrence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;relates_to&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;entity → entity&lt;/td&gt;
&lt;td&gt;Manual or sleep-time engine&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We also reserved three relationships for future use: &lt;code&gt;contradicts&lt;/code&gt;, &lt;code&gt;follows_up&lt;/code&gt;, &lt;code&gt;supersedes&lt;/code&gt; — for a Zettelkasten-style auto-linking feature that isn't built yet but whose namespace is already protected.&lt;/p&gt;

&lt;h2&gt;
  
  
  The queries: recursive CTEs
&lt;/h2&gt;

&lt;p&gt;Three query patterns cover every graph operation we need.&lt;/p&gt;

&lt;h3&gt;
  
  
  Expansion: "Who is connected to Ana García?"
&lt;/h3&gt;

&lt;p&gt;Given a node, find all neighbors up to depth N:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="k"&gt;RECURSIVE&lt;/span&gt; &lt;span class="n"&gt;graph_walk&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="c1"&gt;-- Base: direct connections from the starting entity&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt;
        &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relationship&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;entity_links&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'graph_entity'&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;entity_id&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;

    &lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;

    &lt;span class="c1"&gt;-- Recursive: follow edges from discovered nodes&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt;
        &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;relationship&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;gw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;depth&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;graph_walk&lt;/span&gt; &lt;span class="n"&gt;gw&lt;/span&gt;
    &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;entity_links&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt;
        &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target_type&lt;/span&gt;
       &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target_id&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;gw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;depth&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;max_depth&lt;/span&gt;    &lt;span class="c1"&gt;-- default: 2&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;DISTINCT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;graph_walk&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;target_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At depth 2 with 10K nodes, this query returns in under 10ms. We set the default to 2 and the maximum to 3. Depth 3 is rarely useful for personal data — it usually returns the entire graph.&lt;/p&gt;

&lt;h3&gt;
  
  
  Path finding: "What connects this email to that meeting?"
&lt;/h3&gt;

&lt;p&gt;BFS to find the shortest path between two nodes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="k"&gt;RECURSIVE&lt;/span&gt; &lt;span class="n"&gt;path_search&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt;
        &lt;span class="n"&gt;ARRAY&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;source_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;target_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;target_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;entity_links&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;source_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;from_id&lt;/span&gt;

    &lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;

    &lt;span class="k"&gt;SELECT&lt;/span&gt;
        &lt;span class="n"&gt;ps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;ps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;depth&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;path_search&lt;/span&gt; &lt;span class="n"&gt;ps&lt;/span&gt;
    &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;entity_links&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt;
        &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target_id&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;depth&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;max_depth&lt;/span&gt;    &lt;span class="c1"&gt;-- default: 4&lt;/span&gt;
      &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;ANY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;-- cycle prevention&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;path_search&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;target_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;to_id&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;NOT (el.source_id = ANY(ps.path))&lt;/code&gt; clause prevents infinite loops. The &lt;code&gt;LIMIT 1&lt;/code&gt; with &lt;code&gt;ORDER BY depth&lt;/code&gt; gives us the shortest path.&lt;/p&gt;

&lt;h3&gt;
  
  
  Subgraph: "Show me everything around this project"
&lt;/h3&gt;

&lt;p&gt;For the visualization view in the dashboard (a force-directed graph using &lt;code&gt;react-force-graph-2d&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Get the top N entities by mention count, centered on an optional entity&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;ge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;entity_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;mention_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;rh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;heat_score&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;graph_entities&lt;/span&gt; &lt;span class="n"&gt;ge&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;entity_links&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'graph_entity'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;target_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'graph_entity'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;record_heat&lt;/span&gt; &lt;span class="n"&gt;rh&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;rh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;domain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'graph_entity'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;rh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;record_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deleted_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;ge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;heat_score&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;mention_count&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then a second query fetches all edges between the returned nodes. The dashboard renders nodes sized by mention count and colored by heat score — hot nodes glow, cold nodes fade.&lt;/p&gt;

&lt;h2&gt;
  
  
  How nodes get created
&lt;/h2&gt;

&lt;p&gt;Nodes enter the graph through three pipelines, only one of which involves an LLM:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Contacts → Person nodes (no LLM)
&lt;/h3&gt;

&lt;p&gt;When a contact is created or synced from Google, the CRUD hook creates a Person node directly from structured data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Contact {display_name: "Ana García", company: "TechCorp", emails: [{address: "ana@techcorp.com"}]}
    ↓
graph_entities {entity_type: "person", name: "Ana García", normalized_name: "ana garcia",
                properties: {email: "ana@techcorp.com", company: "TechCorp"}}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No model needed. The data is already structured.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Event attendees → Person nodes (no LLM)
&lt;/h3&gt;

&lt;p&gt;Calendar events have attendees as structured JSONB. Each attendee is resolved against existing Person nodes by email, or created as a new node:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Event attendee {email: "ana@techcorp.com", name: "Ana"}
    ↓
Match against graph_entities WHERE properties-&amp;gt;&amp;gt;'email' = 'ana@techcorp.com'
    ↓
Found → reuse existing node (even if name differs slightly)
Not found → create new Person node
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Text extraction → All entity types (LLM)
&lt;/h3&gt;

&lt;p&gt;This is the async pipeline from the previous blog post. The 2B model extracts persons, projects, locations, and topics from notes, emails, diary entries, and file contents. Each extracted entity is upserted by &lt;code&gt;(entity_type, normalized_name)&lt;/code&gt; — if it already exists, the properties are merged.&lt;/p&gt;

&lt;p&gt;The key insight: pipelines 1 and 2 mean the graph has a solid foundation of real, structured Person nodes before the LLM ever runs. When the LLM extracts "Ana García" from a note, it matches against a node that already exists from the contacts sync. No orphan entities, no duplicates — just a new edge connecting the note to an existing person.&lt;/p&gt;

&lt;h2&gt;
  
  
  Entity resolution: the hard problem
&lt;/h2&gt;

&lt;p&gt;Extraction creates nodes. Entity resolution decides whether two nodes are the same thing. We handle this in three levels:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 1 — Deterministic (automatic).&lt;/strong&gt; The &lt;code&gt;UNIQUE(entity_type, normalized_name)&lt;/code&gt; constraint handles exact matches. "Ana García" and "ana garcía" always map to the same node. For persons, email matching overrides name matching — if two nodes have the same email, they're the same person regardless of name differences.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 2 — Suggested merges (semi-automatic).&lt;/strong&gt; An endpoint returns pairs of same-type entities with high name similarity (using &lt;code&gt;pg_trgm&lt;/code&gt;'s &lt;code&gt;similarity()&lt;/code&gt; function, threshold &amp;gt; 0.4):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;GET&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;/graph/merge-candidates&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;entity_a:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;name:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Ana García"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;mention_count:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;entity_b:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;name:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Ana G."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;mention_count:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;similarity:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.87&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A merge redirects all edges from the absorbed node to the surviving node, records the event in &lt;code&gt;merge_history&lt;/code&gt;, and hard-deletes the duplicate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Level 3 — Sleep-time resolution (automatic, low-priority).&lt;/strong&gt; A background job periodically reviews Person nodes, calculates cross-similarities, and proposes merge candidates. The AI agent can also trigger merges when context makes it obvious ("Juan" and "Juan Pérez" in the same conversation).&lt;/p&gt;

&lt;p&gt;What we explicitly don't try to resolve: nicknames ("Pepe" = "José"), role-based references ("the accountant"), and ambiguous entities ("Santiago" — person or city?). Those are either handled by the agent in conversation or corrected manually. The graph needs to be conservative — an incorrect merge destroys data, while a duplicate is just noise that can be cleaned up later.&lt;/p&gt;

&lt;h2&gt;
  
  
  The API
&lt;/h2&gt;

&lt;p&gt;Six endpoints expose the graph:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;GET  /graph/entities              — Search by name, filter by type
GET  /graph/entities/:id          — Entity detail with direct connections
GET  /graph/connections           — Expansion traversal (depth 1-3)
GET  /graph/path                  — Shortest path between two entities
GET  /graph/subgraph              — Nodes + edges for visualization
GET  /graph/stats                 — Counts, orphans, pending queue
GET  /graph/merge-candidates      — Similar entity pairs
POST /graph/merge                 — Fuse two entities
POST /graph/cleanup               — Delete orphan nodes (0 connections)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The graph is a Pro feature. Free tier users still get &lt;code&gt;entity_links&lt;/code&gt; (manual connections between records), but the automatic entity extraction, the graph visualization, and the traversal queries require Pro.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the graph enables
&lt;/h2&gt;

&lt;p&gt;Once you have a knowledge graph, things that were impossible become trivial:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Who's involved in this project?"&lt;/strong&gt; — expansion query from a Project entity, depth 1, filter by Person type. Returns everyone who's been mentioned in connection with the project across all domains.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"What connects this email to that meeting?"&lt;/strong&gt; — path query. Returns: email → mentions → Person:Ana García → attended_by → Event:Sprint Review. Two hops, one shared person.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Show me everything about Ana García"&lt;/strong&gt; — entity detail. Returns: 15 mentions across notes and emails, 3 events attended, works at TechCorp, collaborates with Javier Losada. All discovered automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Search ranking.&lt;/strong&gt; The hybrid search uses graph connectivity as one of four signals (along with semantic similarity, full-text relevance, and heat score) in a Reciprocal Rank Fusion algorithm. A note that mentions entities connected to your recent activity ranks higher.&lt;/p&gt;

&lt;p&gt;The graph also feeds the sleep-time engine (background jobs that discover cross-domain correlations), the proactive digest ("Ana García, who you're meeting tomorrow, sent you an email yesterday"), and the AI agent's contextual awareness.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;p&gt;At the current scale of development:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Graph entities&lt;/td&gt;
&lt;td&gt;~150-300 per active user&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Entity links&lt;/td&gt;
&lt;td&gt;~500-1000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Expansion query (depth 2)&lt;/td&gt;
&lt;td&gt;&amp;lt; 10ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Path query (depth 4)&lt;/td&gt;
&lt;td&gt;&amp;lt; 15ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Subgraph for visualization (100 nodes)&lt;/td&gt;
&lt;td&gt;&amp;lt; 20ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory overhead&lt;/td&gt;
&lt;td&gt;0 (it's just PostgreSQL)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;No new service to deploy. No new port to manage. No new backup strategy. It's tables in the same database that holds everything else, queried with the same ORM, backed up with the same &lt;code&gt;pg_dump&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;I'd add &lt;code&gt;pg_trgm&lt;/code&gt; from day one.&lt;/strong&gt; We needed it for merge candidates (fuzzy name matching) and ended up adding it as a later migration. It's a small extension with zero downsides — should have been in the initial schema alongside pgcrypto and pgvector.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'd index &lt;code&gt;entity_links&lt;/code&gt; more aggressively.&lt;/strong&gt; The default indexes cover the UNIQUE constraint, but expansion queries benefit from separate indexes on &lt;code&gt;(source_type, source_id)&lt;/code&gt; and &lt;code&gt;(target_type, target_id)&lt;/code&gt;. We added them later when profiling showed sequential scans on the edges table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'd build the visualization earlier.&lt;/strong&gt; The force-directed graph view in the dashboard was one of the last features built, but it should have been one of the first. Seeing the graph visually — nodes clustering around projects, people forming communities — was the moment the knowledge graph went from "interesting data structure" to "this actually understands my data." It would have motivated better extraction quality earlier.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;A knowledge graph doesn't need a graph database. At personal scale (&amp;lt; 10K nodes, &amp;lt; 50K edges), PostgreSQL with recursive CTEs is fast, simple, and — critically — already there. No new infrastructure, no new dependency, no new operational burden.&lt;/p&gt;

&lt;p&gt;Two tables (&lt;code&gt;graph_entities&lt;/code&gt; + extended &lt;code&gt;entity_links&lt;/code&gt;), three traversal patterns (expansion, path, subgraph), three entity creation pipelines (structured sync, calendar attendees, LLM extraction), and three levels of entity resolution (deterministic, suggested, sleep-time).&lt;/p&gt;

&lt;p&gt;The graph is the connective tissue between everything else in the system — search, heat scoring, the AI agent, the digest engine. And it's all just rows in PostgreSQL.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next up: heat scoring — how we taught records to fade like memories, and why a simple exponential decay formula changes how search works.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>database</category>
      <category>knowledgegraph</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Entity extraction with a 2B model: benchmarks from a personal knowledge graph"</title>
      <dc:creator>Victor García</dc:creator>
      <pubDate>Sat, 14 Mar 2026 21:48:43 +0000</pubDate>
      <link>https://forem.com/micelclaw/entity-extraction-with-a-2b-model-benchmarks-from-a-personal-knowledge-graph-2f27</link>
      <guid>https://forem.com/micelclaw/entity-extraction-with-a-2b-model-benchmarks-from-a-personal-knowledge-graph-2f27</guid>
      <description>&lt;p&gt;When you're building a personal knowledge graph — the kind that automatically discovers that "Ana García" appears in your emails, your calendar, and tomorrow's meeting notes — you need entity extraction. The industry answer is to throw GPT-4 at it and move on. But when your system runs on a mini-PC in someone's living room, you need something that fits in 2GB of RAM.&lt;/p&gt;

&lt;p&gt;We benchmarked &lt;code&gt;qwen3-vl:2b-instruct-q4_K_M&lt;/code&gt; — a 2-billion parameter multimodal model, quantized to 4-bit — running locally through Ollama. The same model that describes our photos also extracts entities from text. One model, two jobs, less RAM.&lt;/p&gt;

&lt;p&gt;Here's what we found.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup
&lt;/h2&gt;

&lt;p&gt;We built a benchmark suite with two tasks:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Text extraction&lt;/strong&gt; — 15 cases across notes, emails, and diary entries. Mix of Spanish and English. Each case has human-annotated ground truth: which persons, projects, locations, and topics should be extracted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vision extraction&lt;/strong&gt; — 10 photos ranging from restaurant dinners to construction sites to landscape shots. Each photo goes through two stages: the model describes the image, then a second pass extracts entities from that description.&lt;/p&gt;

&lt;p&gt;The extraction prompt is deliberately simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Extract named entities from the following text. Return ONLY a JSON object:
- persons: array of person names mentioned
- projects: array of project/product names mentioned
- locations: array of place names mentioned
- topics: array of key topics/themes (max 3)

Rules:
- Only extract what is EXPLICITLY mentioned
- Do not invent or infer entities not present
- Normalize names (capitalize properly)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Matching uses embedding similarity (qwen3-embedding, 1024d) with a 0.75 threshold instead of exact string matching. "Parte Vieja" matches "Parte Vieja" obviously, but "edge caching" also matches "edge caching approach" because the embeddings are close enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  Text extraction: the numbers
&lt;/h2&gt;

&lt;p&gt;Overall F1: &lt;strong&gt;0.645&lt;/strong&gt;. Zero parse errors across all 15 cases — the model always returned valid JSON. Average latency: 2-4 seconds per case on CPU.&lt;/p&gt;

&lt;p&gt;But the overall F1 hides a story. Let's break it down by entity type:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Entity type&lt;/th&gt;
&lt;th&gt;Avg F1&lt;/th&gt;
&lt;th&gt;What happened&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Persons&lt;/td&gt;
&lt;td&gt;~0.87&lt;/td&gt;
&lt;td&gt;Near-perfect. The model's strongest category by far&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Locations&lt;/td&gt;
&lt;td&gt;~0.72&lt;/td&gt;
&lt;td&gt;Handles Spanish geography beautifully&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Projects&lt;/td&gt;
&lt;td&gt;~0.65&lt;/td&gt;
&lt;td&gt;Good when names are explicit, invents sometimes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Topics&lt;/td&gt;
&lt;td&gt;~0.30&lt;/td&gt;
&lt;td&gt;Weakest — but also the most subjective category&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Persons: the killer feature
&lt;/h3&gt;

&lt;p&gt;The model nails names. Full names, first names, Spanish names with accents — it gets them right:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Marta Ibáñez", "Javier Losada", "Rubén" — all extracted from a construction note. ✓&lt;/li&gt;
&lt;li&gt;"Carmen Pueyo", "Víctor García", "Diego Martínez" — from an email thread. ✓&lt;/li&gt;
&lt;li&gt;"Tom Preston-Werner" — from an English conference note. ✓&lt;/li&gt;
&lt;li&gt;"José Miguel Aguirre" — from a text full of nicknames. ✓&lt;/li&gt;
&lt;li&gt;"Roberto Casas", "Víctor", "Lucía", "Sandra" — four people from a sprint review email. All four. ✓&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Where it stumbles: a diary entry mentioning "Papá" and "Mamá" — the model extracted them as persons. Technically correct (they are persons), but the human ground truth didn't include them because they're not named individuals. This is a recurring pattern: &lt;strong&gt;the model extracts more than the human annotated&lt;/strong&gt;, which hurts precision without being wrong.&lt;/p&gt;

&lt;p&gt;The other pattern: the model extracted "Javier" as a separate person from "Javier Losada". Both in the same note. That's an entity resolution problem, not an extraction problem — and the knowledge graph handles it downstream with merge candidates.&lt;/p&gt;

&lt;h3&gt;
  
  
  Locations: surprisingly good at Spanish geography
&lt;/h3&gt;

&lt;p&gt;"Valdespartera", "Villanueva de Gállego", "La Ternasca", "Parte Vieja", "Urgull", "Benasque", "Añisclo" — these aren't exactly world-famous cities. They're neighborhoods, hiking valleys, and small towns in Aragon and the Basque Country. The model got them all.&lt;/p&gt;

&lt;p&gt;It also correctly classified "San Sebastián" as a location (not a person), "Ordesa" as a location (not a project), and "calle San Miguel" as a variant of "San Miguel." The embedding similarity matching helped here — "calle San Miguel" and "San Miguel" have near-perfect similarity.&lt;/p&gt;

&lt;p&gt;One amusing misclassification: "eu-west-1" (an AWS region) was extracted as a location. I mean... it is a location. Just not the kind we meant.&lt;/p&gt;

&lt;h3&gt;
  
  
  Projects: good when explicit, creative when not
&lt;/h3&gt;

&lt;p&gt;When the text says "Micelclaw OS" or "MACP Protocol" or "OpenClaw Gateway", the model finds them with 100% accuracy. Named projects are easy.&lt;/p&gt;

&lt;p&gt;The problem is when the model decides something is a project that isn't. "Pilotaje" (a construction technique) got classified as a project. "Txuletón" (a steak cut) became a project. "Barna" (slang for Barcelona) appeared as a project. The model is trying to be helpful — if it can't figure out which category something fits, it hedges by putting it in projects.&lt;/p&gt;

&lt;h3&gt;
  
  
  Topics: where F1 lies
&lt;/h3&gt;

&lt;p&gt;Topics scored ~0.30 F1. That sounds terrible. But look at what actually happened:&lt;/p&gt;

&lt;p&gt;A diary entry about a trip to San Sebastián. Human ground truth: &lt;code&gt;["viaje", "desconexión"]&lt;/code&gt; (trip, disconnecting). Model output: &lt;code&gt;["pintxos", "txuletón", "playa"]&lt;/code&gt; (pintxos, steak, beach).&lt;/p&gt;

&lt;p&gt;Both are correct summaries of the same diary entry. The human abstracted ("it was a trip about disconnecting"), the model got specific ("there were pintxos and beach"). The embedding similarity between "viaje" and "txuletón" is 0.67 — below the 0.75 threshold — so it counts as a miss.&lt;/p&gt;

&lt;p&gt;This pattern repeats across almost every case. The human writes abstract topics; the model extracts concrete ones. For a knowledge graph, the model's approach is arguably better — "pintxos" is more searchable than "desconexión."&lt;/p&gt;

&lt;h3&gt;
  
  
  Bilingual without trying
&lt;/h3&gt;

&lt;p&gt;We mixed Spanish and English cases without telling the model which language to expect. It handled both without issues. "Tom Preston-Werner" from an English note and "José Miguel Aguirre" from a Spanish one were extracted with the same accuracy. The extraction prompt is in English; the input text is in whatever language the user writes in. The model doesn't care.&lt;/p&gt;

&lt;h3&gt;
  
  
  The nickname challenge
&lt;/h3&gt;

&lt;p&gt;The hardest test case was a Spanish note full of nicknames: "Pepe", "Tere", "Boli", "Txe", plus the full name "José Miguel Aguirre."&lt;/p&gt;

&lt;p&gt;The model extracted "Pepe" and "José Miguel Aguirre" as separate persons — it didn't connect the nickname to the full name. It found "Tere" and "Txe" but missed "Boli." Three out of four nicknames is honestly better than expected for a 2B model.&lt;/p&gt;

&lt;p&gt;Resolving "Pepe" = "José Miguel Aguirre" is entity resolution, not extraction. That's handled by the knowledge graph's merge candidate system — when two person nodes co-occur frequently, the system flags them for manual or automated merging.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vision extraction: description first, entities second
&lt;/h2&gt;

&lt;p&gt;The photo pipeline works in two stages: the model describes the image, then the same extraction prompt runs on that description. This means the quality of entity extraction depends entirely on the quality of the description.&lt;/p&gt;

&lt;p&gt;Overall vision F1: &lt;strong&gt;0.532&lt;/strong&gt;. But the descriptions themselves are far better than the F1 suggests.&lt;/p&gt;

&lt;h3&gt;
  
  
  The descriptions are impressive
&lt;/h3&gt;

&lt;p&gt;A photo of an olive grove landscape:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"This image captures a vast, sunlit landscape of rolling hills and valleys, likely in a rural or agricultural region. The scene is dominated by rows of olive trees planted in a dense, geometric pattern across the slopes."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A construction site photo:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"A red and silver laser level is set up on a tripod, indicating precise work is being done. The site is surrounded by dirt, sand, and a few trees."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The model identified olive trees, a laser level on a tripod, and even recognized a 3D structural engineering model from a screenshot. For 2B parameters quantized to 4-bit, running on CPU, this is remarkable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where vision extraction breaks down
&lt;/h3&gt;

&lt;p&gt;The main issue: when photos contain people, the model says "four people sitting at a table" or "three people walking on a boardwalk." It counts them, describes what they're doing, but can't identify them. This is expected — face recognition requires a separate pipeline (we use InsightFace for that).&lt;/p&gt;

&lt;p&gt;The problem for the benchmark is that "four people" gets extracted as a person entity, which counts as a false positive against a ground truth that says "no specific persons." This systematically tanks the persons F1 for vision.&lt;/p&gt;

&lt;h3&gt;
  
  
  The ground truth problem
&lt;/h3&gt;

&lt;p&gt;Here's what we learned about benchmarking entity extraction: &lt;strong&gt;the human is the bottleneck, not the model.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For photo 7 (construction site), the human annotated objects as: &lt;code&gt;["tripod", "briefcase", "net", "brick", "concrete"]&lt;/code&gt;. The model found: &lt;code&gt;["red brick wall", "large concrete block", "red and silver laser level", "tripod", "dirt", "sand", "trees", "house", "sunlight", "clear sky"]&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The model extracted 10 objects where the human listed 5. The model's list is more complete and more accurate — "red and silver laser level" is a better description than what the human wrote. But the F1 score penalizes the model for being thorough, because every "extra" extraction hurts precision.&lt;/p&gt;

&lt;p&gt;This is a fundamental issue with evaluating extraction against human annotations. The human annotates what they think is important. The model extracts what is present. For a knowledge graph that needs to be comprehensive, the model's approach is correct — you want to capture everything and let the search ranking decide what's relevant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Latency: the real constraint
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Average&lt;/th&gt;
&lt;th&gt;Range&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Text extraction&lt;/td&gt;
&lt;td&gt;~3s&lt;/td&gt;
&lt;td&gt;1.5–4.2s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vision description&lt;/td&gt;
&lt;td&gt;~2.4s&lt;/td&gt;
&lt;td&gt;1.4–6.2s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vision extraction&lt;/td&gt;
&lt;td&gt;~1.8s&lt;/td&gt;
&lt;td&gt;1.1–4.4s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All on CPU, all sequential through a single Ollama instance with a priority semaphore. These numbers are for the async pipeline — the user never waits for them. A note gets created in ~50ms; the entity extraction happens 2-4 seconds later in the background.&lt;/p&gt;

&lt;p&gt;The first request after a cold start took 66 seconds (model loading into RAM). After that, Ollama keeps the model loaded and subsequent requests are fast. This is why we keep a single model in memory — loading and unloading models per task would destroy latency.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we'd change
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Lower the similarity threshold for topics.&lt;/strong&gt; The 0.75 threshold is too strict for abstract concepts. "Viaje" and "pintxos" are obviously related in context, but their embeddings are only 0.67 similar. For persons and locations, 0.75 is fine. For topics, 0.60 might be more appropriate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Post-process the "N people" pattern.&lt;/strong&gt; When the vision model says "four women" or "three people," the extraction prompt shouldn't classify that as a person entity. A simple regex filter on the extraction output would fix the most common false positive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Embrace the verbosity.&lt;/strong&gt; The model extracts more than a human would annotate. Instead of fighting this, design the knowledge graph to handle it — use confidence scores and the heat system to surface what matters and let the rest decay naturally.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;A 2B parameter model, quantized to 4-bit, running on CPU:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Persons:&lt;/strong&gt; F1 0.87 — production-ready for a personal knowledge graph&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Locations:&lt;/strong&gt; F1 0.72 — solid, handles non-English geography&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Projects:&lt;/strong&gt; F1 0.65 — good enough with downstream deduplication&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Topics:&lt;/strong&gt; F1 0.30 — misleading number, the model is actually more thorough than the human&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parse reliability:&lt;/strong&gt; 0 errors in 25 cases — always returns valid JSON&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency:&lt;/strong&gt; 2-4 seconds async, invisible to the user&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Is it as good as GPT-4? No. Is it good enough to build a personal knowledge graph that automatically discovers connections between your notes, emails, and calendar? Yes. And it runs on your hardware, processes your data locally, and costs zero per extraction.&lt;/p&gt;

&lt;p&gt;For a personal system processing maybe 50-100 new records per day, a 2B model with 3-second extraction time and ~0.87 F1 on the entities that matter most (people and places) is more than enough. The knowledge graph doesn't need to be perfect — it needs to be useful. And "Ana García appears in 3 emails and tomorrow's meeting" is useful even if the system also extracted "txuletón" as a project.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next up: how we built a knowledge graph using just PostgreSQL — no Neo4j, no Apache AGE, just recursive CTEs and an entity_links table.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>nlp</category>
      <category>ollama</category>
      <category>selfhosted</category>
    </item>
    <item>
      <title>The 4-slot hook pipeline: how every CRUD operation feeds four systems at once"</title>
      <dc:creator>Victor García</dc:creator>
      <pubDate>Sat, 14 Mar 2026 13:06:25 +0000</pubDate>
      <link>https://forem.com/micelclaw/2o-the-4-slot-hook-pipeline-how-every-crud-operation-feeds-four-systems-at-once-31n8</link>
      <guid>https://forem.com/micelclaw/2o-the-4-slot-hook-pipeline-how-every-crud-operation-feeds-four-systems-at-once-31n8</guid>
      <description>&lt;p&gt;Here's a problem that sneaks up on you when you're building a data-heavy application: every time you create or update a record, a bunch of other things need to happen. The record needs to be embedded for semantic search. Its heat score needs to be updated. Entities (people, places, projects) need to be extracted from the text. And the change needs to be logged so the digest engine knows what happened.&lt;/p&gt;

&lt;p&gt;The naive approach is to do all of that inline — right there in the route handler, after the INSERT. We tried that. It was a mistake.&lt;/p&gt;

&lt;p&gt;This post is about the pipeline we built to replace it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with inline processing
&lt;/h2&gt;

&lt;p&gt;Our first version of the notes endpoint looked something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// The "just do it all here" approach&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/notes&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;note&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;notes&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;returning&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="c1"&gt;// Now embed it...&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;note&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;note&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ollama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// 50-200ms&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;notes&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;recordId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;note&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;embedding&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// Log the change...&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;changeLog&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;notes&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;recordId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;note&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;create&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nx"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;note&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two problems. First, if Ollama is down or slow, the user waits. Or worse, the request fails — even though the note was already saved. Second, this same embedding + changelog logic needs to exist in every single route handler. Notes, events, contacts, emails, files, diary entries. That's a lot of duplicated code that's going to drift.&lt;/p&gt;

&lt;p&gt;And we hadn't even added heat tracking or entity extraction yet. Those would be two more blocks of code copy-pasted across every route.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pipeline
&lt;/h2&gt;

&lt;p&gt;We replaced all of that with a single function call: &lt;code&gt;runPostHooks(ctx)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Every domain route — notes, events, contacts, emails, files, diary entries — calls it after the database operation succeeds. It looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;handler    → INSERT/UPDATE/DELETE in domain table
              ↓
postHandler → runPostHooks(ctx)
              ↓
           ┌─────────────────────────────────────────┐
           │  Slot 1: Embedding (enqueue async job)  │
           │  Slot 2: Heat tracking (upsert, &amp;lt;1ms)   │
           │  Slot 3: Entity extraction (enqueue)     │
           │  Slot 4: Changelog (fire-and-forget)     │
           └─────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each slot runs inside its own try/catch. If slot 1 fails (Ollama is down), slot 2 still runs. If slot 3 throws (extraction model crashed), slot 4 still logs the change. The user's response is never affected — the INSERT already happened, the reply already went out.&lt;/p&gt;

&lt;p&gt;Here's the context object that every slot receives:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;CrudHookContext&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;      &lt;span class="c1"&gt;// 'notes', 'events', 'contacts', ...&lt;/span&gt;
  &lt;span class="nl"&gt;action&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;insert&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;update&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;delete&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;recordId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;HeatSource&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// 'user_dash', 'sync', 'agent_primary', ...&lt;/span&gt;
  &lt;span class="nl"&gt;record&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Every slot knows what happened (action), to what (domain + recordId), by whom (userId), and through which channel (source). The &lt;code&gt;record&lt;/code&gt; field carries the full row for slots that need to extract text from it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What each slot does
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Slot 1: Embedding
&lt;/h3&gt;

&lt;p&gt;Extracts text from the record and enqueues an async job. The enqueue itself takes ~0.1ms — the actual embedding generation happens later in the background via the AsyncQueue.&lt;/p&gt;

&lt;p&gt;The text extraction is domain-specific:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Domain&lt;/th&gt;
&lt;th&gt;What gets embedded&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;notes&lt;/td&gt;
&lt;td&gt;title + content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;events&lt;/td&gt;
&lt;td&gt;title + description + location&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;contacts&lt;/td&gt;
&lt;td&gt;display_name + company + job_title + notes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;emails&lt;/td&gt;
&lt;td&gt;subject + body_plain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;files&lt;/td&gt;
&lt;td&gt;filename + extracted content (if available)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;diary&lt;/td&gt;
&lt;td&gt;date + content&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The AsyncQueue processes jobs sequentially through an Ollama client with a semaphore (concurrency=1). Embeddings get high priority. If Ollama is unreachable, the job is retried 3 times with exponential backoff, then dropped with a log entry. The record is still fully usable — it just won't appear in semantic search until the next successful embed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Slot 2: Heat tracking
&lt;/h3&gt;

&lt;p&gt;A single upsert to the &lt;code&gt;record_heat&lt;/code&gt; table: increment &lt;code&gt;access_count&lt;/code&gt;, update &lt;code&gt;last_accessed&lt;/code&gt;, recalculate the heat score. Under 1ms. This is the cheapest slot by far, but it powers the entire memory tier system — hot, warm, and cold records that influence search ranking and the digest engine.&lt;/p&gt;

&lt;p&gt;One detail: heat tracking only fires on create and update operations from the user. Sync imports use &lt;code&gt;source: 'sync'&lt;/code&gt;, which the heat system treats differently (lower initial heat, since the user didn't actively create the record).&lt;/p&gt;

&lt;h3&gt;
  
  
  Slot 3: Entity extraction
&lt;/h3&gt;

&lt;p&gt;Enqueues an async job (same AsyncQueue as embeddings, but lower priority). The extraction worker sends the record's text to an LLM and gets back a structured list of entities — people, projects, locations, topics — that become nodes in the knowledge graph.&lt;/p&gt;

&lt;p&gt;Each domain has different extraction behavior:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Notes, emails, diary entries&lt;/strong&gt;: Full LLM extraction from text content&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contacts&lt;/strong&gt;: No LLM needed — the contact's structured data (name, email, company) directly becomes a Person node in the graph&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Events&lt;/strong&gt;: Attendees are resolved directly against existing graph entities by email/name; the rest goes through LLM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Entity extraction runs with lower priority than embeddings because it's more expensive (1-10 seconds per record vs ~50ms for an embedding) and less time-sensitive. The graph can be a few seconds behind without anyone noticing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Slot 4: Changelog
&lt;/h3&gt;

&lt;p&gt;A simple INSERT into the &lt;code&gt;change_log&lt;/code&gt; table. Domain, record ID, action, user ID, timestamp, and a human-readable summary generated by &lt;code&gt;extractSummary()&lt;/code&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Domain&lt;/th&gt;
&lt;th&gt;Summary format&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;notes&lt;/td&gt;
&lt;td&gt;Note title&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;events&lt;/td&gt;
&lt;td&gt;Event title + formatted start date&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;contacts&lt;/td&gt;
&lt;td&gt;Display name&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;emails&lt;/td&gt;
&lt;td&gt;Subject line&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;files&lt;/td&gt;
&lt;td&gt;Filename&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;diary&lt;/td&gt;
&lt;td&gt;Date + content preview (first 100 chars)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The Digest Engine reads this table periodically and compiles a summary of what changed: "3 new emails, 1 event moved, 2 notes created." It's the backbone of the proactive notification system.&lt;/p&gt;

&lt;h2&gt;
  
  
  The design that anticipated growth
&lt;/h2&gt;

&lt;p&gt;Here's the thing we got right, almost by accident: the slots were designed to be filled in later.&lt;/p&gt;

&lt;p&gt;When we first built the pipeline in Phase 4, only two slots were active:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Slot&lt;/th&gt;
&lt;th&gt;Phase 4&lt;/th&gt;
&lt;th&gt;After Cluster B&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;✅ Embedding&lt;/td&gt;
&lt;td&gt;✅ Embedding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;🔒 Reserved (no-op)&lt;/td&gt;
&lt;td&gt;✅ Heat tracking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;🔒 Reserved (no-op)&lt;/td&gt;
&lt;td&gt;✅ Entity extraction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;✅ Changelog&lt;/td&gt;
&lt;td&gt;✅ Changelog&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Slots 2 and 3 were literally empty functions — registered in the pipeline to document the execution order and reserve their position. When Cluster B (the data foundation for the knowledge graph and heat scoring) was implemented weeks later, those slots got filled in without touching a single line of the existing pipeline code or any route handler. No refactoring. No merge conflicts.&lt;/p&gt;

&lt;p&gt;This worked because the pipeline was designed around the &lt;code&gt;CrudHookContext&lt;/code&gt; interface. Every slot receives the same context. Adding a new slot means writing a function that takes &lt;code&gt;CrudHookContext&lt;/code&gt; and does something with it. That's the entire contract.&lt;/p&gt;

&lt;h2&gt;
  
  
  What happens per HTTP verb
&lt;/h2&gt;

&lt;p&gt;Not every verb triggers every slot. Deleting a record doesn't need a new embedding. Listing records doesn't need heat tracking (only individual GETs do).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;POST (create):  → Slot 1 (embed) → Slot 2 (heat: count=1) → Slot 3 (extract) → Slot 4 (log)
GET /:id (read): → Slot 2 (heat: count++) only
PATCH (update): → Slot 1 (re-embed) → Slot 2 (heat: count++) → Slot 3 (re-extract) → Slot 4 (log)
DELETE (soft):  → Slot 4 (log) only
GET / (list):   → nothing
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deletes don't re-embed or re-extract — the record is conceptually gone (soft-deleted). The heat cron will naturally cool it down. Listing doesn't trigger hooks at all — only individual record access bumps heat.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Ollama bottleneck
&lt;/h2&gt;

&lt;p&gt;All AI-powered slots (embedding and extraction) funnel through a single Ollama instance with a priority queue and a semaphore (concurrency=1). This sounds like a bottleneck, and it is — by design.&lt;/p&gt;

&lt;p&gt;Ollama running a 0.6B embedding model (&lt;code&gt;qwen3-embedding:0.6b&lt;/code&gt;, 1024 dimensions) on a mini-PC can handle one job at a time reliably. Trying to parallelize would thrash the CPU and make everything slower. Sequential processing with priority ordering (embeddings first, extraction second) gives predictable latency:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Embedding: ~50ms per record&lt;/li&gt;
&lt;li&gt;Entity extraction: 1-10 seconds per record (done by the same vision model that describes photos — &lt;code&gt;qwen3-vl:2b&lt;/code&gt; — because running a single multimodal model saves RAM versus having separate text-only and vision models)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When the user creates a note, the embedding is ready in under a second. The entity extraction might take a few more seconds, but the knowledge graph being slightly behind is invisible to the user.&lt;/p&gt;

&lt;p&gt;If Ollama is completely down, everything still works. The CRUD succeeds. The changelog is written. The heat is tracked. Only semantic search and the knowledge graph are degraded, and they'll catch up when Ollama comes back online thanks to the reindex endpoint.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why not database triggers?
&lt;/h2&gt;

&lt;p&gt;PostgreSQL triggers could do some of this — especially the changelog. We considered it and decided against it for three reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Triggers can't call external services.&lt;/strong&gt; Embedding requires Ollama. Entity extraction requires an LLM. Triggers are stuck inside the database.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Error handling is all-or-nothing.&lt;/strong&gt; A failing trigger rolls back the entire transaction. Our pipeline explicitly allows individual slot failures without affecting the core operation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Visibility.&lt;/strong&gt; Application-level hooks are easy to debug, log, and monitor. Trigger debugging is... less fun.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The changelog could legitimately be a trigger. But keeping all four slots in the same application-level pipeline means they're all visible in one file, they share the same error handling pattern, and they can be toggled or reordered without touching the database.&lt;/p&gt;

&lt;h2&gt;
  
  
  The payoff
&lt;/h2&gt;

&lt;p&gt;The 4-slot pipeline is maybe 200 lines of code. It replaced thousands of lines of duplicated inline processing across every route handler. Every new domain we add — kanban cards, RSS feed articles, bookmarks — gets embeddings, heat tracking, entity extraction, and changelog for free by adding one &lt;code&gt;runPostHooks(ctx)&lt;/code&gt; call.&lt;/p&gt;

&lt;p&gt;More importantly, it created clean extension points. When we needed PII-aware routing (Cluster E), it was a middleware in the preHandler — not a new slot. When we needed sleep-time intelligence (Cluster D), it consumed the changelog and heat data that the pipeline was already producing. The pipeline doesn't just feed four systems — it feeds the systems that feed the systems.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Next up: how a 0.6B parameter model turned out to be better than we expected for entity extraction — and why bigger isn't always faster when skill context is the bottleneck.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>postgres</category>
      <category>typescript</category>
      <category>ai</category>
    </item>
    <item>
      <title>137 migrations and counting: building a personal OS schema from scratch</title>
      <dc:creator>Victor García</dc:creator>
      <pubDate>Fri, 13 Mar 2026 08:14:09 +0000</pubDate>
      <link>https://forem.com/micelclaw/137-migrations-and-counting-building-a-personal-os-schema-from-scratch-31lf</link>
      <guid>https://forem.com/micelclaw/137-migrations-and-counting-building-a-personal-os-schema-from-scratch-31lf</guid>
      <description>&lt;p&gt;I started this project on February 18th, 2026. The idea was embarrassingly simple: I wanted a personal assistant that could look up my contacts fast and without burning through tokens. Notes, emails, a diary — the basics. A small Fastify server, a PostgreSQL database, maybe ten tables.&lt;/p&gt;

&lt;p&gt;Somewhere along the way, I ended up with 137 SQL migrations.&lt;/p&gt;

&lt;p&gt;This is the story of how a "quick personal cloud" turned into what we now call a sovereign digital operating system — and what the schema evolution looks like when you're building the plane while flying it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Day 1: the schema that fit on a napkin
&lt;/h2&gt;

&lt;p&gt;Here's the first migration, &lt;code&gt;0000_initial_schema.sql&lt;/code&gt;. Thirteen tables. I remember thinking "this covers everything I'll ever need":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="nv"&gt;"pgcrypto"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="nv"&gt;"vector"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;notes&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;              &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;gen_random_uuid&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;title&lt;/span&gt;           &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;         &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content_format&lt;/span&gt;  &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'markdown'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;source&lt;/span&gt;          &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'local'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tags&lt;/span&gt;            &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
    &lt;span class="n"&gt;pinned&lt;/span&gt;          &lt;span class="nb"&gt;BOOLEAN&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;archived&lt;/span&gt;        &lt;span class="nb"&gt;BOOLEAN&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt;      &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;updated_at&lt;/span&gt;      &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;deleted_at&lt;/span&gt;      &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notes, events, contacts, emails, files, diary entries, a CRM table (ambition!), Home Assistant events, agent conversations, entity links, OAuth tokens, a license cache, and an embeddings table with pgvector.&lt;/p&gt;

&lt;p&gt;Look at that embeddings table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;              &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;gen_random_uuid&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="k"&gt;domain&lt;/span&gt;          &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;record_id&lt;/span&gt;       &lt;span class="n"&gt;UUID&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content_text&lt;/span&gt;    &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt;       &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model_used&lt;/span&gt;      &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt;      &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="k"&gt;UNIQUE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;record_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One table for all embeddings. Domain column to tell apart notes from emails. Simple. Clean.&lt;/p&gt;

&lt;p&gt;That simplicity lasted about four days.&lt;/p&gt;

&lt;h2&gt;
  
  
  The decisions that shaped everything
&lt;/h2&gt;

&lt;p&gt;Three early decisions ended up defining the entire architecture. I didn't fully appreciate their impact at the time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Everything in &lt;code&gt;public&lt;/code&gt; schema
&lt;/h3&gt;

&lt;p&gt;We put all tables in PostgreSQL's default &lt;code&gt;public&lt;/code&gt; schema instead of creating separate schemas per domain (a &lt;code&gt;notes&lt;/code&gt; schema, an &lt;code&gt;emails&lt;/code&gt; schema, etc.). The reasoning was practical: every record in the system — notes, events, contacts, files — needs to go through the same intelligence pipeline. Embeddings. Entity extraction. Heat tracking. Knowledge graph edges. Change log entries.&lt;/p&gt;

&lt;p&gt;If each domain lived in its own schema, every cross-domain feature would need cross-schema joins and duplicated hooks. With everything in &lt;code&gt;public&lt;/code&gt;, a single set of CRUD hooks can process any record type through the same pipeline. A note and an email get embedded, heat-tracked, entity-extracted, and changelog'd by the exact same code path.&lt;/p&gt;

&lt;p&gt;This decision paid off enormously when we built the knowledge graph. An entity link between a contact and a calendar event is just a row in &lt;code&gt;entity_links&lt;/code&gt; — no cross-schema ceremony.&lt;/p&gt;

&lt;h3&gt;
  
  
  Soft delete everywhere
&lt;/h3&gt;

&lt;p&gt;Every single domain table has &lt;code&gt;deleted_at TIMESTAMPTZ&lt;/code&gt;. No exceptions. From day one.&lt;/p&gt;

&lt;p&gt;This wasn't wisdom — it was paranoia. But it turned out to be critical for the sync engine. When you sync with Gmail or Google Calendar, you need to know what was deleted locally so you can propagate that deletion. A hard delete means lost information. &lt;code&gt;deleted_at&lt;/code&gt; means you can diff between "this record was deleted" and "this record never existed."&lt;/p&gt;

&lt;p&gt;It also enabled restore functionality for free. &lt;code&gt;PATCH /notes/:id/restore&lt;/code&gt; is just &lt;code&gt;SET deleted_at = NULL&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  UUID primary keys on everything
&lt;/h3&gt;

&lt;p&gt;Not &lt;code&gt;SERIAL&lt;/code&gt;, not &lt;code&gt;BIGINT&lt;/code&gt;. UUIDs. Every table.&lt;/p&gt;

&lt;p&gt;The immediate benefit: entity_links can reference any table with just &lt;code&gt;(source_type, source_id)&lt;/code&gt;. No composite foreign keys, no lookup tables. The knowledge graph doesn't care if the source is a note or an email — it's always a UUID.&lt;/p&gt;

&lt;p&gt;The less obvious benefit: when we added multi-user support later, there were zero ID collision issues. Users get created on different instances, sync happens, and UUIDs just work.&lt;/p&gt;

&lt;h2&gt;
  
  
  The migration timeline: from 0 to 137
&lt;/h2&gt;

&lt;p&gt;Here's roughly how the schema evolved, grouped by what was happening:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Migrations 0000–0010: The foundation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The initial thirteen tables. Then tsvector search columns for free-tier full-text search. Then the sync engine infrastructure — &lt;code&gt;change_log&lt;/code&gt;, &lt;code&gt;sync_connectors&lt;/code&gt;, &lt;code&gt;sync_history&lt;/code&gt;. Custom fields support. Email accounts. File snapshots and shared links. Photo albums and face clusters.&lt;/p&gt;

&lt;p&gt;At this point we had maybe 20 tables and the system could do CRUD, search, and sync. A functional personal cloud. I could have stopped here.&lt;/p&gt;

&lt;p&gt;I did not stop here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Migrations 0050–0065: The intelligence layer&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is where things got wild. We added:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;change_log_affected_columns&lt;/code&gt; — tracking which specific fields changed, not just "this record was updated"&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;record_heat&lt;/code&gt; — heat scores that decay over time, turning every record into a memory that fades unless you interact with it&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;graph_entities&lt;/code&gt; — knowledge graph nodes (Person, Project, Location, Topic) extracted automatically by an LLM&lt;/li&gt;
&lt;li&gt;Extended &lt;code&gt;entity_links&lt;/code&gt; with link_type, confidence, strength, and &lt;code&gt;created_by&lt;/code&gt; columns&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;semantic_cache&lt;/code&gt; — caching API responses by query embedding similarity&lt;/li&gt;
&lt;li&gt;Reserved schemas for future domains (bookmarks, financial accounts, health data)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;record_heat&lt;/code&gt; table deserves its own blog post (coming soon), but here's the gist: every record has a heat score between 0 and 1. It decays exponentially over time. Every time you access a record, the heat bumps up. The search algorithm uses heat as a post-fusion multiplier — hot records rank higher than cold ones, all else being equal.&lt;/p&gt;

&lt;p&gt;The knowledge graph was another inflection point. Suddenly &lt;code&gt;entity_links&lt;/code&gt; wasn't just "this note mentions this contact" — it was "the system automatically discovered that Person:Ana García appeared in 3 emails, 2 notes, and tomorrow's calendar event, with confidence 0.85."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Migrations 0080–0103: Proactive intelligence&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Sleep-time jobs. Derived insights. User preferences. Proactive feedback. Agent tokens. Audit logs. Data sensitivity labels. PII routing logs. Pseudonym maps.&lt;/p&gt;

&lt;p&gt;This batch transformed the system from "a database you query" to "a database that thinks while you sleep." The sleep-time engine runs background LLM jobs during idle periods — discovering cross-domain correlations, extracting behavioral preferences, generating insights. All stored in tables that started as sketches on a whiteboard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Migrations 0110–0137: The OS layer&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;App system tables. Email moderation. Clipboard items. Visual intelligence embeddings. Face detection infrastructure. Kanban boards with labels, comments, dependencies, and checklists. RSS feeds with categories and AI summarization.&lt;/p&gt;

&lt;p&gt;By migration 0137, the schema had grown from "a personal notes database" into something that manages your entire digital life — notes, emails, calendar, contacts, files, photos, diary, projects, feeds, and an AI that understands the connections between all of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;I'd plan the multi-user migration from day one.&lt;/strong&gt; Migration 0009 (&lt;code&gt;add_user_id_to_domain_tables&lt;/code&gt;) added &lt;code&gt;user_id&lt;/code&gt; to every single domain table. Every. Single. One. That's a lot of &lt;code&gt;ALTER TABLE&lt;/code&gt; statements, a lot of index recreations, and a lot of unique constraint changes (diary entries went from &lt;code&gt;UNIQUE(entry_date)&lt;/code&gt; to &lt;code&gt;UNIQUE(user_id, entry_date)&lt;/code&gt;). If I'd included &lt;code&gt;user_id&lt;/code&gt; in the initial schema, that migration wouldn't exist.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'd version-track the embedding model from the start.&lt;/strong&gt; We added &lt;code&gt;model_version&lt;/code&gt; to the embeddings table later. But by then we'd already generated thousands of embeddings with one model. When we switched from &lt;code&gt;nomic-embed-text&lt;/code&gt; to &lt;code&gt;qwen3-embedding:0.6b&lt;/code&gt;, we had to re-embed everything. If the model version had been there from day one, the re-embedding could have been incremental.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'd actually apply my migrations.&lt;/strong&gt; The single most common bug in development wasn't a code error — it was me forgetting to run a migration I'd already written. The schema in the SQL file said one thing; the database said another. We ended up with a simple &lt;code&gt;for f in core/drizzle/0*.sql; do psql $DATABASE_URL -f "$f"; done&lt;/code&gt; command, but I should have automated it from week one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;p&gt;As of today:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SQL migration files&lt;/td&gt;
&lt;td&gt;137&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tables in the schema&lt;/td&gt;
&lt;td&gt;~50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Drizzle ORM schemas&lt;/td&gt;
&lt;td&gt;35+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Domains (notes, events, etc.)&lt;/td&gt;
&lt;td&gt;7 core + 5 extended&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Index definitions&lt;/td&gt;
&lt;td&gt;80+&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Some migrations took minutes to write. Others took days of back-and-forth to get right.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;If I could go back and tell February-18th me one thing, it would be: &lt;strong&gt;the schema is the product.&lt;/strong&gt; Not the API. Not the frontend. Not the AI. The schema.&lt;/p&gt;

&lt;p&gt;Every feature we built — heat scoring, knowledge graph, sleep-time intelligence, PII routing, multi-user isolation — started as a migration file. The tables defined the boundaries of what was possible. Getting the schema right (or at least right enough to iterate on) was the single highest-leverage activity in the entire project.&lt;/p&gt;

&lt;p&gt;137 migrations sounds like a lot. And it is. But each one was a small, deliberate step from "personal notes database" to something much bigger. And we're not done — migration 0138 is already drafted.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpyuiztuuegf7l10vgg24.webp" alt="notes module" width="800" height="450"&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;This is the first in a series of technical posts about building a self-hosted AI productivity OS. Next up: how a 4-slot hook pipeline makes every CRUD operation feed embeddings, heat tracking, entity extraction, and the changelog — without any of them blocking each other.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>database</category>
      <category>architecture</category>
      <category>selfhosted</category>
    </item>
  </channel>
</rss>
