<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ryan Carter</title>
    <description>The latest articles on Forem by Ryan Carter (@sym).</description>
    <link>https://forem.com/sym</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F221965%2Faff2cd59-8dec-481b-a4d1-3431f61de6e5.jpg</url>
      <title>Forem: Ryan Carter</title>
      <link>https://forem.com/sym</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/sym"/>
    <language>en</language>
    <item>
      <title>Turning Manual Ops Into a 10-Minute Task</title>
      <dc:creator>Ryan Carter</dc:creator>
      <pubDate>Tue, 28 Apr 2026 22:07:22 +0000</pubDate>
      <link>https://forem.com/sym/turning-manual-ops-into-a-10-minute-task-4eha</link>
      <guid>https://forem.com/sym/turning-manual-ops-into-a-10-minute-task-4eha</guid>
      <description>&lt;p&gt;I once turned a 2-week manual data update process into a 10-minute automated pipeline by writing a PHP script that ingested a vendor spreadsheet, normalized everything into a temporary MySQL database, and surfaced the result in a review dashboard before pushing to production. This post is the short version of that project — the tools I used, the approach, and the outcome — for any developer staring at a tedious manual ops process and wondering whether it's worth automating. (Spoiler: it almost always is.)&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Before:&lt;/strong&gt; ~10 business days of careful manual data entry against a fragile legacy database, every six months.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After:&lt;/strong&gt; ~10-minute automated run, ~30-second push to prod, single dashboard for human review.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stack:&lt;/strong&gt; PHP for ingestion and transforms, a temporary MySQL database for staging and validation, a web dashboard for human review.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why it worked:&lt;/strong&gt; The repetitive parts were genuinely repetitive (same enums, same transforms, same edge cases) and a human still got the final sign-off before anything hit production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Outcome:&lt;/strong&gt; ~90%+ reduction in customer-facing data issues, plus dev hours and company time saved every cycle.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The BEFORE Process
&lt;/h2&gt;

&lt;p&gt;I worked for a marketing company who's job it was to update a major restaurant's nutrition information with ingredients, UOMs, and caloric content. We would get a new spreadsheet full of updates we needed to apply to the database to update the website's display for 8m customers.&lt;/p&gt;

&lt;p&gt;This process typically took around 10 business days (2 weeks) to complete all the changes. We updated this I believe every 6 months or so. A lot of manual work, checking, re-checking, typing very carefully to maintain the dwindling data integrity without introducing new issues. Very picky old system that had to be handled a certain way so it would correctly feed the iOS app. Legacy code and DB setup. Very tedious and exhausting to complete.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Approach To Improve
&lt;/h2&gt;

&lt;p&gt;After completing this a handful of times I saw that there were common assumptions I could make that would shortcut the time we needed to complete this process, and automate many of the repetitive tasks. This would of course increase the validity of data (no human error) and allow easier checking of final results as well (verified against the original source of truth (the spreadsheet). &lt;/p&gt;

&lt;h2&gt;
  
  
  Tools I Used
&lt;/h2&gt;

&lt;p&gt;The best tools I had at that time was PHP as the scripting language for specific tasks, and a temporary MYSQL DB to help check and manipulate data to speed things along. &lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution
&lt;/h2&gt;

&lt;p&gt;I wrote some logic in PHP to ingest the spreadsheet data, match all the fields against common enums per category, and applied transforms for specific labels and description content, and then piped that into the final database only after it was tested, reviewed on a dashboard for quality and ready for production.&lt;/p&gt;

&lt;p&gt;Essentially the fix was to let the computer do as much processing as it could, have a human verify its work when done, and then automatically apply it to the target system, without the tedium of checking things one by one. &lt;/p&gt;

&lt;h2&gt;
  
  
  The Result
&lt;/h2&gt;

&lt;p&gt;The process with my script and DB system would take only a 10 minute run to process the data and display the final values. We could check it all on a web page and make adjustments to anything that was off, and then it was around 30 seconds to push to the prod DB. This saved devs hassle, the company money and time, and the customer issues at around a 90%+ rate. Not a bad outcome in the end. It is one of the projects I am most proud of to date and it was my original thought to even work on it. This is the kind of thing I love to do.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why a temporary MySQL database instead of validating in code?
&lt;/h3&gt;

&lt;p&gt;Spreadsheets have repetition, contradictions, and edge cases that are far easier to spot with SQL than with imperative validation code. A staging table with constraints catches duplicates and bad references immediately, and the dashboard can run any ad-hoc query against it before production gets touched.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why keep a human in the loop if the script is reliable?
&lt;/h3&gt;

&lt;p&gt;The data feeds an iOS app used by 8 million customers. A bad row in production isn't a bug — it's a customer-facing nutrition error. The 30 seconds it takes a human to scan a dashboard is cheap insurance against the kind of mistake nobody wants to explain in a postmortem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Could you have used a more modern stack?
&lt;/h3&gt;

&lt;p&gt;Sure — Python with pandas would have been a natural fit, and Postgres would have given me more flexibility. But PHP and MySQL were what the company already ran, and the entire project shipped without asking anyone for new infrastructure. That's part of why it actually got built.&lt;/p&gt;

&lt;h3&gt;
  
  
  What would you change if you did it again today?
&lt;/h3&gt;

&lt;p&gt;I'd add automated diff-vs-previous-cycle reports so reviewers see only what changed, version-control the transform rules, and write per-row confidence scores so the dashboard can highlight low-confidence entries first. The core architecture — ingest → stage → review → push — would stay exactly the same.&lt;/p&gt;

</description>
      <category>php</category>
      <category>productivity</category>
      <category>automation</category>
      <category>database</category>
    </item>
    <item>
      <title>Sending email from alias via Gmail</title>
      <dc:creator>Ryan Carter</dc:creator>
      <pubDate>Tue, 28 Apr 2026 22:07:18 +0000</pubDate>
      <link>https://forem.com/sym/sending-email-from-alias-via-gmail-55pp</link>
      <guid>https://forem.com/sym/sending-email-from-alias-via-gmail-55pp</guid>
      <description>&lt;p&gt;To send email from an alias address through your Gmail account, generate a Google App Password, then add the alias under &lt;strong&gt;Settings → Accounts and Import → Send mail as&lt;/strong&gt; with &lt;code&gt;smtp.gmail.com:587&lt;/code&gt; as the SMTP server and the App Password (not your normal Gmail password) as the credential. The alias has to already forward to your Gmail, and your account needs 2-Step Verification enabled.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites&lt;/strong&gt;: The alias address is a forwarder pointing to your Gmail. You need 2-Step Verification enabled on your Google account.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://myaccount.google.com/apppasswords" rel="noopener noreferrer"&gt;myaccount.google.com/apppasswords&lt;/a&gt;, create an App Password, and copy the 16-character code &lt;strong&gt;without spaces&lt;/strong&gt;. You must have two-factor auth turned on.&lt;/li&gt;
&lt;li&gt;In Gmail: Settings → See all settings → Accounts and Import → "Send mail as" → Add another email address&lt;/li&gt;
&lt;li&gt;Enter the alias name (e.g. Ryan Carter, however you want it to show in the person's inbox) and alias email address (e.g. &lt;a href="mailto:ryan@someplace.com"&gt;ryan@someplace.com&lt;/a&gt;), click Next Step&lt;/li&gt;
&lt;li&gt;Fill in the SMTP dialog:

&lt;ul&gt;
&lt;li&gt;Server: &lt;strong&gt;smtp.gmail.com&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Port: &lt;strong&gt;587&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Username: your real Gmail address (e.g. &lt;a href="mailto:ryan1234@gmail.com"&gt;ryan1234@gmail.com&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Password: the App Password (no spaces, NOT your regular password)&lt;/li&gt;
&lt;li&gt;Security: TLS (don't need to select it usually, it is by default)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Click Add Account — Gmail sends a verification email to your alias, which forwards it to your Gmail inbox&lt;/li&gt;
&lt;li&gt;Click the confirmation link in that email&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Done — the alias will now appear in the From dropdown when composing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Turning on 2-Step Verification:
&lt;/h2&gt;

&lt;p&gt;Go to myaccount.google.com/security → scroll down to "How you sign in to Google" → click "2-Step Verification" → follow the prompts.&lt;/p&gt;

&lt;h2&gt;
  
  
  If you use Google Workspace:
&lt;/h2&gt;

&lt;p&gt;Yes, it works the same way with one caveat — your Workspace admin has to allow App Passwords. If the App Passwords option doesn't appear at myaccount.google.com/apppasswords, it means the admin has it disabled. They'd need to go to the Admin Console → Security → Authentication → and enable "Allow users to manage their own app passwords."&lt;br&gt;
Also in Workspace there's an extra step: the admin needs to enable "Allow users to send mail through an external SMTP server" under Apps → Google Workspace → Gmail → Advanced settings. Otherwise the SMTP relay gets blocked.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why do I have to use an App Password instead of my regular Gmail password?
&lt;/h3&gt;

&lt;p&gt;Google blocks third-party access (including SMTP) using your normal account password by default — that's what 2-Step Verification is for. App Passwords are 16-character credentials scoped to a single app and revocable from your Google account, so they're safer than handing your real password to an SMTP client.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I do this without enabling 2-Step Verification?
&lt;/h3&gt;

&lt;p&gt;No. App Passwords only exist on accounts with 2-Step Verification turned on. The "less secure app access" toggle that used to allow this was deprecated by Google in 2022.&lt;/p&gt;

&lt;h3&gt;
  
  
  Will recipients see my real Gmail address anywhere?
&lt;/h3&gt;

&lt;p&gt;No — once the alias is verified, mail sent from it shows only the alias address in the From field. Recipients can still inspect the raw message headers and see Gmail's servers in the trace, but your real Gmail address is not exposed in the From, Reply-To, or visible headers.&lt;/p&gt;

&lt;h3&gt;
  
  
  What if I want replies to come back to my real Gmail and not the alias?
&lt;/h3&gt;

&lt;p&gt;That's the default if your alias is a forwarder pointing to your Gmail — replies go to the alias and get forwarded back. If you want replies to bypass the alias and hit your Gmail directly, set "Reply-To" to your real Gmail address when composing, or configure it on the Send-mail-as entry.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I send from multiple aliases?
&lt;/h3&gt;

&lt;p&gt;Yes — repeat the same setup for each alias. Each one becomes a separate option in the From dropdown when composing. You can also pick a default From address per recipient under "When replying to a message."&lt;/p&gt;

</description>
      <category>productivity</category>
      <category>tutorial</category>
      <category>beginners</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Multi-Model LLM Orchestration with OpenRouter</title>
      <dc:creator>Ryan Carter</dc:creator>
      <pubDate>Tue, 28 Apr 2026 22:07:14 +0000</pubDate>
      <link>https://forem.com/sym/multi-model-llm-orchestration-with-openrouter-g4l</link>
      <guid>https://forem.com/sym/multi-model-llm-orchestration-with-openrouter-g4l</guid>
      <description>&lt;p&gt;Multi-model LLM orchestration is the practice of routing AI requests to different models based on what each task needs — speed, cost, reasoning depth, or code quality. &lt;a href="https://openrouter.ai" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt; makes it practical by exposing models from Anthropic, OpenAI, Google, Meta, Mistral, and others through a single OpenAI-compatible API: one key, one bill, one client, and you swap models by changing a string. The implementation is a few dozen lines of code on top of the OpenAI SDK.&lt;/p&gt;

&lt;p&gt;This post walks through the pattern: defining named model slots, routing by task or complexity, streaming, fallback handling, and tracking cost across providers.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What it is:&lt;/strong&gt; Routing each AI request to the model best suited for that task instead of using one model for everything.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why it matters:&lt;/strong&gt; Cheaper at scale (small models for simple tasks), faster perceived latency (fast models for chat), better quality (right model for the job), and resilient (fall back across providers when one is down).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;How OpenRouter helps:&lt;/strong&gt; One API key gives you access to 100+ models across providers using the OpenAI SDK. Model strings follow &lt;code&gt;provider/model-name&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two routing strategies:&lt;/strong&gt; By task type (&lt;code&gt;summarize&lt;/code&gt; → fast model, &lt;code&gt;reason&lt;/code&gt; → deep model) or by estimated complexity (token count thresholds).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production essentials:&lt;/strong&gt; Streaming for chat UIs, try/catch fallbacks for provider outages, and per-request cost logging via the &lt;code&gt;usage&lt;/code&gt; object OpenRouter returns.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why bother with multiple models?
&lt;/h2&gt;

&lt;p&gt;A few real reasons:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost.&lt;/strong&gt; Frontier models like GPT-4o or Claude Opus are expensive at scale. For tasks that don't need that level of reasoning — summarization, classification, simple Q&amp;amp;A — a cheaper, faster model does the job at a fraction of the cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Speed.&lt;/strong&gt; Small models respond faster. If a user is waiting for a response, latency matters. Route quick tasks to a fast model and save the slow, expensive one for when it's actually needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quality.&lt;/strong&gt; Some models are better at specific things. Code generation, structured output, long-context reasoning, multilingual text — the best model for each task isn't always the same model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resilience.&lt;/strong&gt; If one provider has an outage or rate limit, you can fall back to another without rewriting your integration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting up OpenRouter
&lt;/h2&gt;

&lt;p&gt;Install the OpenAI SDK — OpenRouter is compatible with it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;openai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Point it at OpenRouter's base URL with your API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://openrouter.ai/api/v1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OPENROUTER_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Everything else is standard OpenAI SDK calls, just with different model strings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Defining your model roster
&lt;/h2&gt;

&lt;p&gt;The key to orchestration is deciding upfront which models you'll use and what each one is for. A simple approach is to define a set of "personas" — named roles that map to specific models:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;models&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;fast&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;google/gemini-flash-1.5&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;// quick tasks, low latency&lt;/span&gt;
  &lt;span class="na"&gt;balanced&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai/gpt-4o-mini&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// everyday reasoning&lt;/span&gt;
  &lt;span class="na"&gt;deep&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;anthropic/claude-opus-4&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;// complex reasoning, long context&lt;/span&gt;
  &lt;span class="na"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;anthropic/claude-sonnet-4&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;     &lt;span class="c1"&gt;// code generation and review&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Model strings in OpenRouter follow the pattern &lt;code&gt;provider/model-name&lt;/code&gt;. You can find the full list and pricing at &lt;a href="https://openrouter.ai/models" rel="noopener noreferrer"&gt;openrouter.ai/models&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;By mapping names to models rather than hardcoding model strings throughout your codebase, you can swap the underlying model without touching anything else. If a better cheap model comes out next month, you change one line.&lt;/p&gt;

&lt;h2&gt;
  
  
  Routing by task type
&lt;/h2&gt;

&lt;p&gt;The simplest orchestration strategy is routing based on task type — you decide which model to use before making the call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;modelMap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;summarize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fast&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;classify&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fast&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;draft&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;balanced&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;deep&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;modelMap&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="nx"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;balanced&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Usage&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;summarize&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Summarize this document: ...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;plan&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;reason&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Help me think through this architecture decision...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is explicit and predictable. You know exactly which model runs for each task type, which makes debugging straightforward and costs easy to reason about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Routing by estimated complexity
&lt;/h2&gt;

&lt;p&gt;A more dynamic approach is routing based on the size or complexity of the request itself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;selectModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tokenEstimate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// rough chars-to-tokens estimate&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tokenEstimate&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fast&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tokenEstimate&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;balanced&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;deep&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;selectModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can combine both approaches — route by task type first, then apply complexity thresholds within each category.&lt;/p&gt;

&lt;h2&gt;
  
  
  Streaming responses
&lt;/h2&gt;

&lt;p&gt;For any user-facing interface, streaming makes responses feel faster even when they aren't:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;streamChat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;onChunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]?.&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nf"&gt;onChunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Usage&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;streamChat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;balanced&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// or push to your UI&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Fallback handling
&lt;/h2&gt;

&lt;p&gt;Models go down. Rate limits happen. Add a fallback layer so a failure from one provider doesn't take your whole app down:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;chatWithFallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;preferredModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fallbackModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;preferredModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Model &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;preferredModel&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; failed, falling back to &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;fallbackModel&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;fallbackModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Tracking cost across models
&lt;/h2&gt;

&lt;p&gt;One of the underrated benefits of OpenRouter is that it returns token usage and cost metadata with each response. Log it and you'll know exactly what you're spending per task type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;chatWithCostTracking&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;selectModelForTask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;inputTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;outputTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completion_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;// OpenRouter includes cost in the response&lt;/span&gt;
    &lt;span class="na"&gt;cost&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;cost&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you have that data you can see which task types are eating your budget and tune your routing accordingly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting it together
&lt;/h2&gt;

&lt;p&gt;The pattern here is straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Define named model slots tied to task roles, not specific model strings&lt;/li&gt;
&lt;li&gt;Route requests to the right slot based on task type, complexity, or both&lt;/li&gt;
&lt;li&gt;Stream responses for user-facing interfaces&lt;/li&gt;
&lt;li&gt;Add fallbacks so individual provider failures don't cascade&lt;/li&gt;
&lt;li&gt;Log usage so you can optimize over time&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;OpenRouter removes the vendor lock-in that makes this feel risky. You're not betting on one provider — you're building a routing layer that can point at any model, from any provider, updated as the landscape changes. Given how fast the model landscape moves, that flexibility is worth more than it might seem today.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is OpenRouter more expensive than calling providers directly?
&lt;/h3&gt;

&lt;p&gt;OpenRouter passes through provider pricing with a small markup baked in (typically a few percent), and in exchange you get a single account, single bill, automatic failover, and the ability to swap models without touching keys or SDKs. For most teams the convenience is worth it; for very high-volume workloads on a single model, going direct can be cheaper.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does OpenRouter support streaming and tool/function calling?
&lt;/h3&gt;

&lt;p&gt;Yes. Streaming works exactly like the OpenAI SDK — set &lt;code&gt;stream: true&lt;/code&gt;. Tool/function calling is supported per-model: most modern models from Anthropic, OpenAI, and Google handle it; smaller open models vary. Check the model card on openrouter.ai/models for capability flags.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does this compare to LangChain or LiteLLM?
&lt;/h3&gt;

&lt;p&gt;LangChain is a much heavier framework with chains, agents, retrievers, and abstractions on top of providers. LiteLLM is the closest comparison — it's a unified provider proxy you self-host. OpenRouter is a hosted version of that idea: less control but zero ops, plus access to models you don't have direct accounts for.&lt;/p&gt;

&lt;h3&gt;
  
  
  What happens if a model gets deprecated or removed?
&lt;/h3&gt;

&lt;p&gt;OpenRouter announces deprecations in advance and usually keeps a redirect to a sensible successor. Because your code references a model string in one place (the named-slot map), updating to a new model is a one-line change. This is the main argument for the named-slot pattern over hardcoding model names throughout the codebase.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I route by user, by feature, or by A/B test?
&lt;/h3&gt;

&lt;p&gt;Yes. The routing function is just code, so you can include any signal in the decision: user tier, feature flag, A/B bucket, time of day. A common pattern is routing premium users to the deeper model and free users to the fast one. Another is shadow-routing — sending a copy of each request to a candidate model and comparing outputs offline.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I track which model performed best for a task?
&lt;/h3&gt;

&lt;p&gt;Log the &lt;code&gt;model&lt;/code&gt;, &lt;code&gt;task&lt;/code&gt;, latency, token usage, and a quality signal (user thumbs-up, downstream success, eval score) for every request. Once you have a few weeks of data, group by task and model and compare. This is how you justify routing decisions empirically instead of guessing.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>webdev</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How I Built an AI Document Ingestion Pipeline</title>
      <dc:creator>Ryan Carter</dc:creator>
      <pubDate>Tue, 28 Apr 2026 22:07:10 +0000</pubDate>
      <link>https://forem.com/sym/how-i-built-an-ai-document-ingestion-pipeline-1abf</link>
      <guid>https://forem.com/sym/how-i-built-an-ai-document-ingestion-pipeline-1abf</guid>
      <description>&lt;p&gt;&lt;a href="https://github.com/meownoirsoft/symport" rel="noopener noreferrer"&gt;Symport&lt;/a&gt; is an AI document ingestion pipeline that turns a phone photo of any paper document — receipt, EOB, prescription, utility bill — into structured JSON, then stores it in Postgres with embeddings for semantic search. The full flow is: image upload → Sharp preprocessing → GPT-4o vision extraction → normalized JSON → Postgres + pgvector. I built it because I hate paper and I also lose paper.&lt;/p&gt;

&lt;p&gt;This post walks through how the pipeline actually works, including the prompt engineering decisions that make extraction reliable enough to trust and the fallback layers that keep the app useful when extraction fails.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stack:&lt;/strong&gt; Sharp for image preprocessing, GPT-4o for vision extraction, Prisma + Postgres + pgvector for storage and semantic search.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The extraction prompt does most of the work:&lt;/strong&gt; explicit date context to fight year hallucinations, constrained &lt;code&gt;type&lt;/code&gt;/&lt;code&gt;category&lt;/code&gt; enums for predictable downstream branching, and a strict "JSON only, no markdown" tail.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User correction loop:&lt;/strong&gt; Users can add freeform feedback ("the drug name is metformin, not metFORMIN") and re-run extraction; the feedback gets injected back into the system prompt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema choice:&lt;/strong&gt; A single &lt;code&gt;extractedData&lt;/code&gt; JSON column instead of per-type tables, with a denormalized &lt;code&gt;searchText&lt;/code&gt; field for fast keyword search and an &lt;code&gt;embedding&lt;/code&gt; column for semantic search.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two fallback layers:&lt;/strong&gt; Document still saves if there's no API key, and still saves with an error summary if extraction throws — nothing is ever lost because AI had a bad day.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What the pipeline does
&lt;/h2&gt;

&lt;p&gt;The flow is straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Image upload → sharpen + encode → GPT-4o vision → structured JSON → Postgres + embeddings
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A user photographs a receipt, an insurance EOB, a prescription, a utility bill — anything on paper. The app returns a structured JSON object with the relevant fields extracted, tagged, and ready to query. No manual data entry.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Image preprocessing
&lt;/h2&gt;

&lt;p&gt;Raw phone photos are large and often noisy. Before sending to the vision model, every image gets sharpened and re-encoded using Sharp:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rawBuffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arrayBuffer&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sharpenAndEncode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rawBuffer&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sharp handles resizing, sharpening, and JPEG re-encoding in one pass. This serves two purposes: it reduces the payload size for the API call, and sharpening improves OCR accuracy on text-heavy documents like receipts. A blurry photo of small print is genuinely harder for vision models — a little preprocessing pays off.&lt;/p&gt;

&lt;p&gt;The processed image gets saved to disk as the source of truth, then the buffer goes to the extraction pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;filename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nf"&gt;randomBytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hex&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;.jpg`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;writeFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fullPath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;extracted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;extractFromImageBuffer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Random hex filename prevents collisions and avoids leaking any metadata about the document in the path.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: The extraction prompt
&lt;/h2&gt;

&lt;p&gt;This is where most of the real engineering lives. The system prompt does a lot of work to make the model's output consistent and parseable.&lt;/p&gt;

&lt;p&gt;The prompt has three parts assembled at startup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;EXTRACTION_SYSTEM_HEAD&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`You are a document extraction assistant. Analyze the image and extract structured data.

Current date context: We are in 2025. Use 2025 (not 2023 or other past years) for any ambiguous or partial dates when no stronger clue is present.

Use context clues from the document text to infer the correct year:
- "2025 taxes due in 2026" → tax year 2025
- "Plan year 2025", "Coverage year 2025" → use 2025
- "Due in 2026" on a tax-related doc often refers to tax year 2025

Respond with a single JSON object. Include "type", "category", "title", and "tags" in every response.

- "type": one of rx_receipt, eob, utility_bill, general
- "category": one of receipt, financial, medical, government, legal, identity, general
- "title" (required, 2-5 words max): Short label only. No sentences.
- "tags": array of 3–8 short labels. No spaces; use underscores if needed.
`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few decisions worth calling out here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explicit date context.&lt;/strong&gt; Vision models can hallucinate dates, especially on documents where the year is ambiguous. Anchoring the prompt with the current year and showing examples of how to reason about year context dramatically reduces date errors. Without this, a 2025 tax document might come back with 2023 dates because the model defaulted to its training data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Constrained type and category values.&lt;/strong&gt; Giving the model an explicit enum for &lt;code&gt;type&lt;/code&gt; and &lt;code&gt;category&lt;/code&gt; means you get predictable values you can branch on in code. Open-ended classification produces inconsistent strings that are annoying to handle downstream.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Short title constraint.&lt;/strong&gt; "2-5 words max, no sentences" prevents the model from writing a summary disguised as a title. You want "Prescription receipt" not "This document appears to be a receipt from Walgreens for a prescription medication."&lt;/p&gt;

&lt;p&gt;The tail of the prompt closes with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;EXTRACTION_SYSTEM_TAIL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
Use null for missing values. Amounts as numbers. Dates as YYYY-MM-DD; use context clues for year. Output only valid JSON, no markdown or explanation.`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;"Output only valid JSON, no markdown or explanation" is load-bearing. Without it, GPT-4o will frequently wrap the response in a markdown code block. The extraction code handles that case anyway, but telling the model not to do it reduces the cleanup work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Strip optional markdown code block&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;jsonStr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/``&lt;/span&gt;&lt;span class="err"&gt;`
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nx"&gt;endraw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;(?:&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;)?&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="nx"&gt;S&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;?)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="s2"&gt;```/);
if (match) jsonStr = match[1].trim();
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: User feedback loop
&lt;/h2&gt;

&lt;p&gt;One of the more useful features is the ability to correct extractions. If the model gets something wrong — misreads a drug name, gets the date wrong, miscategorizes the document — the user can add a correction note and re-run extraction. That feedback gets injected directly into the system prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;userFeedback&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;systemContent&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="s2"&gt;`\n\nIMPORTANT - User feedback on this document (apply these corrections):
&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;userFeedback&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means the model gets a second pass with explicit correction instructions. In practice it works well — "the drug name is metformin not metFORMIN" or "this is a 2025 EOB not 2024" gets applied reliably.&lt;/p&gt;

&lt;p&gt;The feedback also gets stored in the database as &lt;code&gt;extractionNotes&lt;/code&gt; on the document, so you have a record of what was corrected.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: The data model
&lt;/h2&gt;

&lt;p&gt;The Prisma schema keeps things straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="nx"&gt;Document&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;id&lt;/span&gt;            &lt;span class="nb"&gt;String&lt;/span&gt;   &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;id&lt;/span&gt; &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;cuid&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
  &lt;span class="nx"&gt;imagePath&lt;/span&gt;     &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;
  &lt;span class="nx"&gt;noteText&lt;/span&gt;      &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;
  &lt;span class="nx"&gt;status&lt;/span&gt;        &lt;span class="nb"&gt;String&lt;/span&gt;   &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pending&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="nx"&gt;extractedData&lt;/span&gt; &lt;span class="nx"&gt;Json&lt;/span&gt;
  &lt;span class="nx"&gt;searchText&lt;/span&gt;    &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;
  &lt;span class="nx"&gt;embedding&lt;/span&gt;     &lt;span class="nc"&gt;Unsupported&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;vector(1536)&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)?&lt;/span&gt;
  &lt;span class="nx"&gt;tags&lt;/span&gt;          &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
  &lt;span class="nx"&gt;extractionNotes&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;
  &lt;span class="nx"&gt;createdAt&lt;/span&gt;     &lt;span class="nx"&gt;DateTime&lt;/span&gt; &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
  &lt;span class="nx"&gt;updatedAt&lt;/span&gt;     &lt;span class="nx"&gt;DateTime&lt;/span&gt; &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;updatedAt&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few design choices here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;extractedData&lt;/code&gt; is a JSON blob.&lt;/strong&gt; Rather than creating separate tables for each document type (receipts, EOBs, utility bills), all extracted data lives in a single JSON column. This makes the schema flexible — different document types have different fields, and a rigid relational schema would be a constant maintenance burden as new types are added.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;searchText&lt;/code&gt; is denormalized.&lt;/strong&gt; After extraction, key fields get pulled out and concatenated into a single &lt;code&gt;searchText&lt;/code&gt; string for full-text search. This is faster to query than parsing JSON at search time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;buildSearchText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ExtractedDoc&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="nx"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;effectiveTitle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;summary&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;drug_name&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;drug_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;drug_name&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;insurer&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;insurer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;insurer&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tags&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Boolean&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;embedding&lt;/code&gt; for semantic search.&lt;/strong&gt; After the document is saved, an embedding gets generated from &lt;code&gt;searchText&lt;/code&gt; and stored in a pgvector column. This enables semantic search — finding "cholesterol medication" when the document says "lipitor" — without a separate vector database. Just pgvector as a Postgres extension.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Graceful degradation
&lt;/h2&gt;

&lt;p&gt;The pipeline has two fallback layers. First, if there's no API key configured, the document still gets saved — just without extraction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;prisma&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;imagePath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pending&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;extractedData&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;general&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Document&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Extraction skipped (no OPENAI_API_KEY)&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;searchText&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Extraction skipped&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Second, if extraction throws, the document still gets saved with an error summary rather than failing the whole request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;extracted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;extractFromImageBuffer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;extractedData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;extracted&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;extractedData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;general&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Extraction failed: &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nb"&gt;Error&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Unknown error&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The image is always saved. The extraction is best-effort. Users can re-trigger extraction manually, or add correction notes and re-run. Nothing gets lost because AI had a bad day.&lt;/p&gt;

&lt;h2&gt;
  
  
  The model
&lt;/h2&gt;

&lt;p&gt;The extraction model is configurable via environment variable with &lt;code&gt;gpt-4o&lt;/code&gt; as the default:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OPENAI_EXTRACTION_MODEL&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4o&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GPT-4o is the right choice here — it's genuinely better than smaller models at reading degraded document images, handwriting, and small print. For this specific task the quality difference is noticeable enough to justify the cost. Document extraction is a write-time operation (not a search-time one), so the latency and cost are acceptable.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;A few things I'd change with hindsight:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Add a confidence score.&lt;/strong&gt; The model sometimes hedges on fields it's uncertain about — a low-confidence flag on individual fields would let the UI highlight things that need user review rather than silently storing potentially wrong data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chunk large documents.&lt;/strong&gt; A single-page receipt is fine. A multi-page insurance EOB or medical record is harder — the model gets less accurate as documents get longer or more complex. Chunking multi-page documents and merging the extracted JSON would improve accuracy on longer content.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Store the raw extraction response.&lt;/strong&gt; Right now only the normalized result gets stored. Keeping the raw model output alongside it would make debugging extraction issues much easier.&lt;/p&gt;




&lt;p&gt;The full source is on GitHub at &lt;a href="https://github.com/meownoirsoft/symport" rel="noopener noreferrer"&gt;github.com/meownoirsoft/symport&lt;/a&gt;. The extraction logic lives in &lt;code&gt;lib/extract.ts&lt;/code&gt; and the ingestion endpoint is &lt;code&gt;app/api/documents/route.ts&lt;/code&gt; if you want to dig in.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why GPT-4o for vision instead of a cheaper model or open-source alternative?
&lt;/h3&gt;

&lt;p&gt;GPT-4o reads degraded phone photos, handwriting, and small print noticeably better than smaller or open-source vision models. For document extraction, getting the dates and amounts wrong is a much bigger problem than the per-call cost, so paying for the better model is worth it. Extraction runs once at write time, not on every read.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do you stop the model from hallucinating dates or amounts?
&lt;/h3&gt;

&lt;p&gt;The biggest wins are anchoring "current year" context in the system prompt with explicit examples, asking the model to use context clues from the document itself ("plan year 2025", "due in 2026" → tax year 2025), and constraining types/categories to enums so the model can't drift. The user-feedback loop catches anything that still slips through.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why store extracted data as a single JSON column instead of typed tables?
&lt;/h3&gt;

&lt;p&gt;Different document types have different fields — a prescription receipt has &lt;code&gt;drug_name&lt;/code&gt; and &lt;code&gt;pharmacy&lt;/code&gt;, an EOB has &lt;code&gt;insurer&lt;/code&gt; and &lt;code&gt;claim_id&lt;/code&gt;, a utility bill has &lt;code&gt;account_number&lt;/code&gt; and &lt;code&gt;service_period&lt;/code&gt;. A relational schema for every variant would be a constant migration treadmill. JSON keeps the schema flexible, and the denormalized &lt;code&gt;searchText&lt;/code&gt; and &lt;code&gt;embedding&lt;/code&gt; columns make queries fast where it matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  What happens if the model returns invalid JSON?
&lt;/h3&gt;

&lt;p&gt;The extraction code strips optional markdown code fences (&lt;br&gt;
&lt;br&gt;
&lt;code&gt;json ...&lt;/code&gt;&lt;br&gt;
&lt;br&gt;
) and parses the rest. If parsing still fails, the document saves with an error summary in the &lt;code&gt;extractedData.summary&lt;/code&gt; field rather than throwing — the user can re-run extraction or add a correction note. The image and metadata are never lost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use this with Anthropic's Claude vision instead of GPT-4o?
&lt;/h3&gt;

&lt;p&gt;Yes. The extraction prompt is provider-agnostic and the model is configurable via &lt;code&gt;OPENAI_EXTRACTION_MODEL&lt;/code&gt;. Swap the SDK call for the Anthropic SDK (or route through OpenRouter to avoid a code change) and Claude's vision models work as a drop-in alternative. The "JSON only, no markdown" instruction is even more important on Claude — it likes to explain itself by default.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do you handle multi-page documents?
&lt;/h3&gt;

&lt;p&gt;Today the pipeline treats each photo as a single document, which is fine for single-page items (receipts, prescriptions). For multi-page EOBs or medical records, the right next step is to chunk the document into pages, run extraction per page, and merge the resulting JSON into a single record. Adding that is on the to-do list.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>showdev</category>
      <category>typescript</category>
    </item>
    <item>
      <title>Git Branch Exists on Remote But Won't Show Locally</title>
      <dc:creator>Ryan Carter</dc:creator>
      <pubDate>Tue, 28 Apr 2026 22:07:06 +0000</pubDate>
      <link>https://forem.com/sym/git-branch-exists-on-remote-but-wont-show-locally-4idd</link>
      <guid>https://forem.com/sym/git-branch-exists-on-remote-but-wont-show-locally-4idd</guid>
      <description>&lt;p&gt;If a git branch shows up on the remote but &lt;code&gt;git branch -r&lt;/code&gt; doesn't list it locally, your fetch refspec is almost always scoped to a single branch instead of all branches. Fix it with one config change: &lt;code&gt;git config remote.origin.fetch "+refs/heads/*:refs/remotes/origin/*"&lt;/code&gt; followed by &lt;code&gt;git fetch origin --prune&lt;/code&gt;. This commonly happens after shallow clones, certain CI checkouts, and clones run with &lt;code&gt;--single-branch&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The full diagnosis takes about two minutes — start by confirming the branch exists on the remote, then walk through the three fixes below in order.&lt;/p&gt;

&lt;h2&gt;
  
  
  Confirm the branch actually exists on the remote
&lt;/h2&gt;

&lt;p&gt;First, bypass your local cache entirely and ask the remote directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git ls-remote origin
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your branch shows up here but not in &lt;code&gt;git branch -r&lt;/code&gt;, your local remote-tracking refs are stale or incorrectly scoped. That's the problem — and it's fixable.&lt;/p&gt;

&lt;p&gt;If it doesn't show up here either, the issue is permissions or a wrong remote URL. Check with &lt;code&gt;git remote -v&lt;/code&gt; and make sure &lt;code&gt;origin&lt;/code&gt; points where you think it does.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fix 1: Fetch with prune
&lt;/h2&gt;

&lt;p&gt;The simplest thing to try first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git fetch origin &lt;span class="nt"&gt;--prune&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--prune&lt;/code&gt; flag removes stale remote-tracking refs and re-syncs. Sometimes that's all it takes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Fix 2: Fetch the specific branch by name
&lt;/h2&gt;

&lt;p&gt;If a general fetch isn't picking it up, fetching by name often forces it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git fetch origin your-branch-name
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Fix 3: Check your fetch refspec
&lt;/h2&gt;

&lt;p&gt;This is the most common root cause when the above don't work. Check your git config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; .git/config
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Look at the &lt;code&gt;[remote "origin"]&lt;/code&gt; section. It should look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[remote "origin"]
    url = git@github.com:you/your-repo.git
    fetch = +refs/heads/*:refs/remotes/origin/*
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;fetch&lt;/code&gt; line is the refspec — it tells git which branches to track. The &lt;code&gt;*&lt;/code&gt; wildcard means "all branches."&lt;/p&gt;

&lt;p&gt;If yours looks like this instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fetch = +refs/heads/main:refs/remotes/origin/main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's your problem. The refspec is scoped to a single branch, so git is only tracking &lt;code&gt;main&lt;/code&gt; and ignoring everything else. This happens with shallow clones, some CI checkout configurations, and certain &lt;code&gt;git clone&lt;/code&gt; flags.&lt;/p&gt;

&lt;p&gt;Fix it by updating the refspec:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git config remote.origin.fetch &lt;span class="s2"&gt;"+refs/heads/*:refs/remotes/origin/*"&lt;/span&gt;
git fetch origin
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, &lt;code&gt;git branch -r&lt;/code&gt; should show all remote branches.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick reference
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Symptom&lt;/th&gt;
&lt;th&gt;Likely cause&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Branch in &lt;code&gt;ls-remote&lt;/code&gt; but not &lt;code&gt;branch -r&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Stale or scoped refspec&lt;/td&gt;
&lt;td&gt;Update refspec, re-fetch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Branch missing after &lt;code&gt;git fetch&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Stale tracking refs&lt;/td&gt;
&lt;td&gt;&lt;code&gt;git fetch origin --prune&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Branch missing entirely from &lt;code&gt;ls-remote&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Wrong remote URL or permissions&lt;/td&gt;
&lt;td&gt;Check &lt;code&gt;git remote -v&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;ls-remote&lt;/code&gt; check is always the right first step — it tells you immediately whether the problem is on the remote side or local side, which cuts the diagnosis in half.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why does &lt;code&gt;git fetch&lt;/code&gt; not pick up the new branch?
&lt;/h3&gt;

&lt;p&gt;Either your remote-tracking refs are stale (fix with &lt;code&gt;--prune&lt;/code&gt;), or your fetch refspec is scoped to a single branch (the most common cause when &lt;code&gt;--prune&lt;/code&gt; doesn't help). The refspec lives in &lt;code&gt;.git/config&lt;/code&gt; under &lt;code&gt;[remote "origin"]&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is a fetch refspec and why does it matter?
&lt;/h3&gt;

&lt;p&gt;A refspec tells git which remote refs to download and where to store them locally. The default &lt;code&gt;+refs/heads/*:refs/remotes/origin/*&lt;/code&gt; means "fetch every branch on the remote into &lt;code&gt;origin/*&lt;/code&gt; locally." If yours is scoped to a specific branch (e.g. &lt;code&gt;refs/heads/main:refs/remotes/origin/main&lt;/code&gt;), git will only ever track that one branch.&lt;/p&gt;

&lt;h3&gt;
  
  
  How did my refspec get scoped to a single branch?
&lt;/h3&gt;

&lt;p&gt;Common causes: cloning with &lt;code&gt;--single-branch&lt;/code&gt;, cloning with &lt;code&gt;--branch &amp;lt;name&amp;gt;&lt;/code&gt; plus &lt;code&gt;--single-branch&lt;/code&gt;, GitHub Actions checkouts that use &lt;code&gt;fetch-depth: 1&lt;/code&gt; and a specific ref, and some Dependabot/CI tools that explicitly scope the refspec to save bandwidth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is &lt;code&gt;git fetch --all --prune&lt;/code&gt; the same as fixing the refspec?
&lt;/h3&gt;

&lt;p&gt;No. &lt;code&gt;--all&lt;/code&gt; fetches from every configured remote (relevant if you have multiple), and &lt;code&gt;--prune&lt;/code&gt; removes stale remote-tracking refs — but neither expands a refspec that's scoped to a single branch. You still have to fix the refspec itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  Will fixing the refspec break anything?
&lt;/h3&gt;

&lt;p&gt;No. It just tells git to track all branches instead of one. You won't lose history, refs, or local branches. The next &lt;code&gt;git fetch origin&lt;/code&gt; will pull down all the previously-ignored remote branches.&lt;/p&gt;

</description>
      <category>git</category>
      <category>tutorial</category>
      <category>productivity</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Fixing Godot MCP in Cursor on WSL</title>
      <dc:creator>Ryan Carter</dc:creator>
      <pubDate>Tue, 28 Apr 2026 22:07:03 +0000</pubDate>
      <link>https://forem.com/sym/fixing-godot-mcp-in-cursor-on-wsl-3llc</link>
      <guid>https://forem.com/sym/fixing-godot-mcp-in-cursor-on-wsl-3llc</guid>
      <description>&lt;p&gt;If &lt;a href="https://github.com/Coding-Solo/godot-mcp" rel="noopener noreferrer"&gt;godot-mcp&lt;/a&gt; won't connect in Cursor on WSL, the real culprit is almost always that Cursor is a Windows app trying to launch a Linux Node binary it can't see. The fix is to set &lt;code&gt;wsl.exe&lt;/code&gt; as the command in &lt;code&gt;mcp.json&lt;/code&gt; and pass &lt;code&gt;node&lt;/code&gt; plus the absolute Linux path as arguments. Two smaller gotchas usually compound the problem along the way: tildes (&lt;code&gt;~&lt;/code&gt;) don't expand inside JSON, and JSON config files don't allow &lt;code&gt;//&lt;/code&gt; comments.&lt;/p&gt;

&lt;p&gt;This post walks through all three issues in the order I hit them, with the working &lt;code&gt;mcp.json&lt;/code&gt; config at the end.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Symptom:&lt;/strong&gt; Cursor logs show &lt;code&gt;Server not yet created, returning empty offerings&lt;/code&gt; and the MCP server never connects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Root cause:&lt;/strong&gt; Cursor runs on Windows; your &lt;code&gt;node&lt;/code&gt; lives in WSL. Cursor can't see it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix:&lt;/strong&gt; Use &lt;code&gt;"command": "wsl.exe"&lt;/code&gt; and put &lt;code&gt;node&lt;/code&gt; plus the absolute path in &lt;code&gt;"args"&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two side bugs:&lt;/strong&gt; &lt;code&gt;~&lt;/code&gt; doesn't expand in JSON values, and &lt;code&gt;//&lt;/code&gt; comments break JSON parsing silently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Final step:&lt;/strong&gt; Fully restart Cursor (not just reload), then open Godot before invoking &lt;code&gt;godot-mcp&lt;/code&gt; tools.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;I wanted to use the &lt;code&gt;godot-mcp&lt;/code&gt; package to let Cursor's AI interact directly with Godot — launching the editor, querying project info, managing scenes, all that good stuff. I downloaded it, built it, added it to Cursor's &lt;code&gt;mcp.json&lt;/code&gt;, and got this in the logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2026-03-07 10:55:11.578 [info] Server not yet created, returning empty offerings
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not helpful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Things Were Wrong
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Tilde doesn't expand in JSON
&lt;/h3&gt;

&lt;p&gt;My first config looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"~/game_dev/godot-mcp/build/index.js"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cursor launches MCP servers directly without a shell, so &lt;code&gt;~&lt;/code&gt; never gets expanded. It's looking for a file literally named &lt;code&gt;~&lt;/code&gt;. Use the full absolute path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/home/yourname/game_dev/godot-mcp/build/index.js"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. JSON doesn't support comments
&lt;/h3&gt;

&lt;p&gt;I had copied the example config which included:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"DEBUG"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"true"&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Optional:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Enable&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;detailed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;logging&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;//&lt;/code&gt; comment is invalid JSON and will silently break parsing. Remove it.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Cursor is a Windows app — it can't see your WSL Node
&lt;/h3&gt;

&lt;p&gt;This was the real one. Even after fixing the path and the comment, the server still wouldn't start. The reason: &lt;strong&gt;Cursor runs on Windows&lt;/strong&gt;. When it tries to execute &lt;code&gt;node&lt;/code&gt;, it's looking for a Windows binary — not the one you installed inside WSL.&lt;/p&gt;

&lt;p&gt;My WSL Node worked fine in the terminal. Cursor had no idea it existed.&lt;/p&gt;

&lt;p&gt;Worth noting: if you're using nvm inside WSL, this compounds the problem. Cursor doesn't run your shell init files, so even if nvm is configured in your &lt;code&gt;.bashrc&lt;/code&gt; or &lt;code&gt;.zshrc&lt;/code&gt;, Cursor won't pick it up. You can't just point at &lt;code&gt;node&lt;/code&gt; and expect it to resolve.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fix
&lt;/h2&gt;

&lt;p&gt;Use &lt;code&gt;wsl.exe&lt;/code&gt; as the command, and pass your WSL path as an argument. Windows knows how to find &lt;code&gt;wsl.exe&lt;/code&gt;, and it bridges the call into your Linux environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"godot"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"wsl.exe"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"node"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/home/yourname/game_dev/godot-mcp/build/index.js"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"DEBUG"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"true"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"disabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"autoApprove"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"launch_editor"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"run_project"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"get_debug_output"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"stop_project"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"get_godot_version"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"list_projects"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"get_project_info"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"create_scene"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"add_node"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"load_sprite"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"export_mesh_library"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"save_scene"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"get_uid"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"update_project_uids"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After a full Cursor restart (not just reload), the MCP server showed as connected.&lt;/p&gt;

&lt;h2&gt;
  
  
  One More Thing
&lt;/h2&gt;

&lt;p&gt;Most of the useful &lt;code&gt;godot-mcp&lt;/code&gt; tools require Godot's editor to be open with your project loaded. The MCP connects to a running editor instance — it's not fully standalone. So once Cursor shows the server as connected, open Godot before you start using tools like &lt;code&gt;get_project_info&lt;/code&gt; or &lt;code&gt;launch_editor&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why does &lt;code&gt;wsl.exe&lt;/code&gt; work when &lt;code&gt;node&lt;/code&gt; doesn't?
&lt;/h3&gt;

&lt;p&gt;Windows knows where &lt;code&gt;wsl.exe&lt;/code&gt; is via &lt;code&gt;PATH&lt;/code&gt;, and &lt;code&gt;wsl.exe&lt;/code&gt; knows how to invoke programs inside your WSL distribution. So &lt;code&gt;wsl.exe node /home/.../index.js&lt;/code&gt; is really "Windows runs wsl.exe, which runs Linux node, which runs your script." The Linux Node binary stays inside WSL where it belongs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need to do anything special for nvm?
&lt;/h3&gt;

&lt;p&gt;If &lt;code&gt;node&lt;/code&gt; is managed by nvm inside WSL, the first command in &lt;code&gt;args&lt;/code&gt; should be &lt;code&gt;node&lt;/code&gt; — &lt;code&gt;wsl.exe&lt;/code&gt; will resolve it through your default WSL shell PATH for non-interactive invocations. If that fails, replace &lt;code&gt;"node"&lt;/code&gt; with the absolute path to the active nvm node binary (e.g. &lt;code&gt;/home/you/.nvm/versions/node/v20.11.0/bin/node&lt;/code&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  Why a full Cursor restart instead of a reload?
&lt;/h3&gt;

&lt;p&gt;MCP servers are launched as child processes of Cursor at startup. A reload reuses the parent process and may keep stale state. A full quit + relaunch forces Cursor to reread &lt;code&gt;mcp.json&lt;/code&gt; and respawn the servers cleanly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does this same approach work for other MCP servers on WSL?
&lt;/h3&gt;

&lt;p&gt;Yes. Any MCP server that's installed inside WSL and run via &lt;code&gt;node&lt;/code&gt; (or &lt;code&gt;python&lt;/code&gt;, etc.) hits the same problem and uses the same fix — &lt;code&gt;"command": "wsl.exe"&lt;/code&gt; with the interpreter and absolute Linux path as args. Servers installed as native Windows binaries don't need this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why does my JSON look fine but Cursor still ignores it?
&lt;/h3&gt;

&lt;p&gt;The two silent killers are &lt;code&gt;//&lt;/code&gt; line comments (invalid JSON, parsers reject the whole file) and trailing commas (also invalid JSON in strict parsers). If Cursor isn't picking up your config at all, paste the file into a JSON validator first.&lt;/p&gt;

</description>
      <category>godot</category>
      <category>gamedev</category>
      <category>mcp</category>
      <category>wsl</category>
    </item>
    <item>
      <title>Building a Context-Aware AI Chat Without a Vector Database</title>
      <dc:creator>Ryan Carter</dc:creator>
      <pubDate>Tue, 28 Apr 2026 22:06:59 +0000</pubDate>
      <link>https://forem.com/sym/building-a-context-aware-ai-chat-without-a-vector-database-55c7</link>
      <guid>https://forem.com/sym/building-a-context-aware-ai-chat-without-a-vector-database-55c7</guid>
      <description>&lt;p&gt;You can ground an AI chat in your own data without a vector database by assembling the relevant documents directly into the system prompt before each request. No embedding pipeline, no similarity search, no separate infrastructure — just your structured data, formatted cleanly, injected as system context. It works well when your dataset is modest (hundreds of documents, not millions) and naturally segmented into logical groups.&lt;/p&gt;

&lt;p&gt;This is the pattern I used building Wiskr, a multi-model chat app that grounds conversations in documents from a connected document store. The rest of this post walks through how to implement it, where it breaks down, and how to upgrade to full RAG when you outgrow it.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The pattern:&lt;/strong&gt; Group documents into named contexts, load active contexts on each request, format them into a system prompt, prepend it to every API call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No vector DB needed:&lt;/strong&gt; For modest datasets, the model reads structured JSON directly — embeddings and similarity search are unnecessary overhead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token-limit guardrails:&lt;/strong&gt; Cap documents per context, summarize long ones, let users pin important ones, then add vector search only when those run out of room.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Upgrade path:&lt;/strong&gt; When you need real RAG later, the context-assembly layer stays put — you just add smarter document selection in front of it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best fit:&lt;/strong&gt; Personal assistants, support tools, document Q&amp;amp;A, and any AI feature that needs to reason about a bounded, structured user-specific dataset.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The core idea
&lt;/h2&gt;

&lt;p&gt;A standard LLM chat call looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;anthropic/claude-sonnet-4&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;What's my copay for metformin?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model has no idea who you are or what documents you have. It can only work with what's in the messages array.&lt;/p&gt;

&lt;p&gt;The context assembly pattern adds a system message that packages your relevant data before the conversation begins:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;anthropic/claude-sonnet-4&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;assembledContext&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;What's my copay for metformin?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the model has your data and can reason against it. The question is how to build &lt;code&gt;assembledContext&lt;/code&gt; well.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Organize data into contexts
&lt;/h2&gt;

&lt;p&gt;The first thing you need is a way to group related documents. In Wiskr these are called contexts — named buckets like "Medical," "Vehicle," "Insurance," or "House." Each conversation has a set of active contexts the user selects before chatting.&lt;/p&gt;

&lt;p&gt;In the database this is a simple structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;contexts&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;timestamptz&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;context_id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;contexts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="n"&gt;jsonb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;timestamptz&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Documents belong to contexts. Contexts belong to users. When a chat starts, the user picks which contexts are active — and only those get assembled into the prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Load active context documents
&lt;/h2&gt;

&lt;p&gt;When a conversation starts, load the documents for each active context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;loadContextDocuments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;contextIds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;`SELECT c.name as context_name, d.title, d.content
     FROM documents d
     JOIN contexts c ON c.id = d.context_id
     WHERE d.context_id = ANY($1)
     ORDER BY c.name, d.created_at DESC`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;contextIds&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: Assemble the system prompt
&lt;/h2&gt;

&lt;p&gt;With the documents loaded, format them into a readable system prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;assembleSystemPrompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Group documents by context name&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;byContext&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;acc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;acc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;context_name&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="nx"&gt;acc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;context_name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
    &lt;span class="nx"&gt;acc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;context_name&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;acc&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;contextBlocks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;byContext&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(([&lt;/span&gt;&lt;span class="nx"&gt;contextName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;docBlocks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`
### &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;
&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;
    `&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s2"&gt;`## &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;contextName&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;\n&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;docBlocks&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s2"&gt;`You are a helpful assistant with access to the user's personal documents.
Use the information below to give accurate, personalized responses.
If the answer isn't in the documents, say so — don't guess.

&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;contextBlocks&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Raw JSON is fine for the document content. Current models read it well, and it preserves the structure of your data without you having to write custom serializers for every document type.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Inject into every request
&lt;/h2&gt;

&lt;p&gt;Pass the assembled context as the system message on every API call, alongside the full conversation history:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;conversationId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;userMessage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Load conversation state&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;conversation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getConversation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;conversationId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;loadContextDocuments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;activeContextIds&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getMessageHistory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;conversationId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Assemble context&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;systemPrompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;assembleSystemPrompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Build messages array&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;systemPrompt&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;userMessage&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;];&lt;/span&gt;

  &lt;span class="c1"&gt;// Call the model&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;assistantMessage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// Save to history&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;saveMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;conversationId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;userMessage&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;saveMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;conversationId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;assistant&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;assistantMessage&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;assistantMessage&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Handling token limits
&lt;/h2&gt;

&lt;p&gt;The obvious risk with this approach is bloated prompts. If a user has 50 documents in their active contexts you'll hit token limits fast.&lt;/p&gt;

&lt;p&gt;A few practical strategies:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cap documents per context.&lt;/strong&gt; The simplest option — include only the N most recent documents per context. For most use cases, the newest 10-15 documents per context are the most relevant anyway.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s2"&gt;`SELECT c.name as context_name, d.title, d.content
   FROM documents d
   JOIN contexts c ON c.id = d.context_id
   WHERE d.context_id = ANY($1)
   ORDER BY c.name, d.created_at DESC
   LIMIT 15`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// cap per context&lt;/span&gt;
  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;contextIds&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Summarize large documents.&lt;/strong&gt; If individual documents are long, run them through a cheap fast model first to produce a condensed version before assembling the prompt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Let users pin documents.&lt;/strong&gt; Give users control — a pinned document always gets included, everything else is capped or summarized. This is often more useful than trying to guess relevance automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Add vector search later.&lt;/strong&gt; When your data grows large enough that capping and pinning don't cut it, vector search is the right next step. You add an embedding column, generate embeddings on save, and query by cosine similarity to find the most relevant documents for each conversation. The context assembly step stays the same — you just get smarter document selection feeding into it.&lt;/p&gt;

&lt;h2&gt;
  
  
  When this is the right approach
&lt;/h2&gt;

&lt;p&gt;This pattern works well when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your data is structured (JSON, not unstructured text blobs)&lt;/li&gt;
&lt;li&gt;Your dataset is modest (hundreds of documents, not millions)&lt;/li&gt;
&lt;li&gt;Users naturally segment their data into logical groups&lt;/li&gt;
&lt;li&gt;You want something working fast without infrastructure overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's a good starting point for any AI feature that needs to reason about user-specific data — support tools, personal assistants, document Q&amp;amp;A, anything where the data set is bounded and the structure is known.&lt;/p&gt;

&lt;p&gt;When you outgrow it, the upgrade path to full RAG is incremental rather than a rewrite. The context assembly layer stays. You just add smarter selection in front of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  When should I use context assembly instead of full RAG with a vector database?
&lt;/h3&gt;

&lt;p&gt;Use context assembly when your dataset is bounded (a few hundred documents per user, max), the documents are already structured (JSON, key-value, or short prose), and users have a natural way to scope which subset is relevant for a conversation. Switch to vector-database RAG when you can't fit the relevant slice in a system prompt, when relevance ranking actually matters, or when content is long-form unstructured text.&lt;/p&gt;

&lt;h3&gt;
  
  
  How big can the system prompt get before this falls apart?
&lt;/h3&gt;

&lt;p&gt;Modern frontier models accept 200K+ token context windows, but cost and latency both scale with prompt size. As a practical rule, keep the assembled context under ~20K tokens for most consumer use cases — beyond that you'll feel the latency in chat, and the per-request cost adds up fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does this work with any LLM provider?
&lt;/h3&gt;

&lt;p&gt;Yes. The pattern is just a system message — every chat-completions API supports it. I've used the same code unchanged across OpenAI, Anthropic, and OpenRouter.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I migrate to full RAG later without rewriting everything?
&lt;/h3&gt;

&lt;p&gt;Keep the context-assembly function as-is. Add an embedding column to the documents table, generate embeddings on save, and replace the "load all documents in active contexts" query with "load top N documents in active contexts ranked by cosine similarity to the user's question." Everything downstream of that — the prompt formatting, the chat call — stays identical.&lt;/p&gt;

&lt;h3&gt;
  
  
  What about prompt caching?
&lt;/h3&gt;

&lt;p&gt;This pattern composes well with prompt caching. The system prompt changes only when documents are added/edited, so providers that support prompt caching (Anthropic, OpenAI) can cache the assembled context across turns and dramatically cut input-token cost on follow-up messages.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is it safe to dump raw user data into the system prompt?
&lt;/h3&gt;

&lt;p&gt;For a single-tenant app where the user owns the data, yes — that's the whole point. For multi-tenant apps, be strict about which user's contexts get loaded, and never assemble across users. A bug in context selection becomes a data-leak bug.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>webdev</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Adding Semantic Search to Your Postgres App with pgvector</title>
      <dc:creator>Ryan Carter</dc:creator>
      <pubDate>Tue, 28 Apr 2026 22:06:55 +0000</pubDate>
      <link>https://forem.com/sym/adding-semantic-search-to-your-postgres-app-with-pgvector-448e</link>
      <guid>https://forem.com/sym/adding-semantic-search-to-your-postgres-app-with-pgvector-448e</guid>
      <description>&lt;p&gt;pgvector is a Postgres extension that adds vector storage and similarity search to an existing database, so you can run semantic queries directly against your application data without standing up a separate vector store. If you're already on Postgres, you can enable it with one &lt;code&gt;CREATE EXTENSION&lt;/code&gt; statement, add a vector column to any table, and have semantic search returning results the same day.&lt;/p&gt;

&lt;p&gt;This post walks through adding it to an existing app — from installing the extension to running your first semantic query, with an HNSW index for performance at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What it is:&lt;/strong&gt; A Postgres extension that adds a &lt;code&gt;vector&lt;/code&gt; column type and similarity-search operators (&lt;code&gt;&amp;lt;=&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;-&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;#&amp;gt;&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why it matters:&lt;/strong&gt; Semantic search without a separate vector database, hybrid keyword-and-semantic queries in one SQL statement, and no new service to operate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Five steps to ship it:&lt;/strong&gt; Install the extension, add a &lt;code&gt;vector(N)&lt;/code&gt; column, embed at write time, query with cosine similarity, add an HNSW index for scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding cost:&lt;/strong&gt; ~$0.02 per million tokens with &lt;code&gt;text-embedding-3-small&lt;/code&gt;. Ollama runs embedding models locally for free if you'd rather not depend on a provider.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When to upgrade beyond pgvector:&lt;/strong&gt; Tens of millions of vectors with sub-50ms latency requirements. Below that, pgvector is plenty.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's the difference between keyword search and semantic search?
&lt;/h2&gt;

&lt;p&gt;Keyword search finds exact matches. If a user searches "cholesterol prescription" and your record says "lipid panel results," they get nothing.&lt;/p&gt;

&lt;p&gt;Semantic search finds meaning. It understands that "cholesterol prescription" and "lipid panel results" are related concepts, and surfaces the right record even without a word match.&lt;/p&gt;

&lt;p&gt;That's what vector embeddings buy you. Instead of storing text, you store a numerical representation of what that text &lt;em&gt;means&lt;/em&gt;. Search becomes a question of mathematical similarity rather than string matching.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Enable the extension
&lt;/h2&gt;

&lt;p&gt;If you're running Postgres locally or in Docker, install pgvector first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Ubuntu / Debian&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;postgresql-16-pgvector

&lt;span class="c"&gt;# or via Docker — use the pgvector image instead of plain postgres&lt;/span&gt;
&lt;span class="c"&gt;# docker pull pgvector/pgvector:pg16&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then enable it in your database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No separate service, no new connection string.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Add an embedding column to your table
&lt;/h2&gt;

&lt;p&gt;Pick whichever table holds the content you want to make searchable. Add a vector column — the dimension count needs to match the embedding model you'll use.&lt;/p&gt;

&lt;p&gt;OpenAI's &lt;code&gt;text-embedding-3-small&lt;/code&gt; outputs 1536 dimensions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you use a different model, check its output dimension and use that number instead. The dimension has to be consistent — you can't mix embeddings from different models in the same column.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Generate embeddings when content is saved
&lt;/h2&gt;

&lt;p&gt;Whenever a record is created or updated, generate an embedding from its text content and store it. Here's a Node.js example using the OpenAI SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;generateEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text-embedding-3-small&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;saveDocument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Build a text representation of what you want to be searchable&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;textToEmbed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generateEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;textToEmbed&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;`INSERT INTO documents (title, content, tags, embedding)
     VALUES ($1, $2, $3, $4)`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things worth noting here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What you embed matters.&lt;/strong&gt; Concatenating title, tags, and content into one string gives the model more signal than just the raw content. Experiment with what makes your search results feel right.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embed at write time, not search time.&lt;/strong&gt; Pre-computing embeddings keeps search fast. You don't want to embed on every query.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If you have existing records&lt;/strong&gt;, run a backfill script to generate embeddings for everything already in the database before you go live.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 4: Search with cosine similarity
&lt;/h2&gt;

&lt;p&gt;When a user submits a search query, embed it the same way you embedded your content, then find the closest matches:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;semanticSearch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;limit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;queryEmbedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generateEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;`SELECT id, title, content,
            1 - (embedding &amp;lt;=&amp;gt; $1) AS similarity
     FROM documents
     ORDER BY embedding &amp;lt;=&amp;gt; $1
     LIMIT $2`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;queryEmbedding&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nx"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;&amp;lt;=&amp;gt;&lt;/code&gt; operator is cosine distance — lower means more similar. The &lt;code&gt;1 - (embedding &amp;lt;=&amp;gt; $1)&lt;/code&gt; gives you a similarity score between 0 and 1 if you want to display or filter by confidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Add an index for performance
&lt;/h2&gt;

&lt;p&gt;Without an index, Postgres does an exact nearest-neighbor scan across every row — fine for small tables, slow for large ones. Add an HNSW index to keep queries fast at scale:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;
&lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;hnsw&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector_cosine_ops&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;HNSW (Hierarchical Navigable Small World) is an approximate nearest-neighbor algorithm. It trades a tiny amount of recall accuracy for a large speed gain. For most applications the tradeoff is well worth it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Putting it together
&lt;/h2&gt;

&lt;p&gt;Here's what the full flow looks like:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User saves a document → you generate an embedding → store it in the &lt;code&gt;embedding&lt;/code&gt; column&lt;/li&gt;
&lt;li&gt;User searches → you embed the query → run cosine similarity against stored embeddings → return top matches&lt;/li&gt;
&lt;li&gt;Results feel like the app actually understands what the user is looking for&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  A few things to keep in mind
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Embedding cost is low but not zero.&lt;/strong&gt; OpenAI's &lt;code&gt;text-embedding-3-small&lt;/code&gt; is cheap — around $0.02 per million tokens — but it adds up at scale. If you're embedding large documents frequently, keep an eye on usage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Local embeddings are an option.&lt;/strong&gt; If you want to keep everything in-house, &lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; can run embedding models locally. The quality varies by model, but for many use cases it's more than good enough and costs nothing per query.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hybrid search is often better.&lt;/strong&gt; Semantic search alone can miss exact matches that keyword search would catch. For production apps, consider combining both — run a keyword search with &lt;code&gt;tsvector&lt;/code&gt; and a vector search with pgvector, then merge and rank the results. This is sometimes called hybrid search or reciprocal rank fusion.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chunking matters for long documents.&lt;/strong&gt; Embedding a 10,000-word document as a single vector loses a lot of nuance. For long content, chunk it into paragraphs or sections, embed each chunk separately, and link chunks back to the parent document.&lt;/p&gt;




&lt;p&gt;pgvector is one of those things that looks complicated from the outside but is surprisingly approachable once you start. If you're already on Postgres, there's no reason not to have it.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Do I need a separate vector database if I'm already using Postgres?
&lt;/h3&gt;

&lt;p&gt;For most apps, no. pgvector handles tens of millions of vectors comfortably with an HNSW index, and you keep the operational simplicity of one database. You'd reach for a dedicated vector store (Pinecone, Weaviate, Qdrant, Milvus) only when you need extreme scale, very low latency, or specialized features like hybrid sparse/dense indexing that pgvector doesn't cover.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which embedding model should I use with pgvector?
&lt;/h3&gt;

&lt;p&gt;For most production use cases, OpenAI's &lt;code&gt;text-embedding-3-small&lt;/code&gt; (1536 dims) is the default — cheap, fast, and high quality. Use &lt;code&gt;text-embedding-3-large&lt;/code&gt; (3072 dims) if you need more accuracy and can pay for it. For local/private deployments, Ollama running &lt;code&gt;nomic-embed-text&lt;/code&gt; or &lt;code&gt;mxbai-embed-large&lt;/code&gt; is a solid choice. The dimension number in your column type has to match the model.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the difference between HNSW and IVFFlat indexes?
&lt;/h3&gt;

&lt;p&gt;HNSW is faster to query and gives better recall, but takes longer to build and uses more memory. IVFFlat is faster to build, lighter on memory, but slower to query and less accurate. For most production workloads, HNSW is the right default. IVFFlat is fine if you're indexing very large datasets infrequently and care about build time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I use cosine, L2, or inner product distance?
&lt;/h3&gt;

&lt;p&gt;Cosine distance (&lt;code&gt;&amp;lt;=&amp;gt;&lt;/code&gt;) is the right default for text embeddings — it ignores vector magnitude and only compares direction, which matches how text embedding models are trained. Use L2 (&lt;code&gt;&amp;lt;-&amp;gt;&lt;/code&gt;) for image embeddings or anything where magnitude carries meaning. Inner product (&lt;code&gt;&amp;lt;#&amp;gt;&lt;/code&gt;) is fastest when your vectors are normalized but identical to cosine in that case.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need to re-embed when I update a record?
&lt;/h3&gt;

&lt;p&gt;Only if the text you embedded changed. The cleanest pattern is to embed a derived "search text" string (title + tags + content), and re-embed whenever any of those source fields change. A trigger or &lt;code&gt;BEFORE UPDATE&lt;/code&gt; hook keeps it in sync.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I combine semantic search with regular SQL filters?
&lt;/h3&gt;

&lt;p&gt;Yes — that's one of pgvector's biggest advantages. You can &lt;code&gt;WHERE user_id = $1 AND status = 'active' ORDER BY embedding &amp;lt;=&amp;gt; $2 LIMIT 10&lt;/code&gt; and get filtered semantic search in one query. With a separate vector store, you'd have to filter in two places and reconcile.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>ai</category>
      <category>database</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>What Is MCP and Why Should Developers Care?</title>
      <dc:creator>Ryan Carter</dc:creator>
      <pubDate>Tue, 28 Apr 2026 22:06:36 +0000</pubDate>
      <link>https://forem.com/sym/what-is-mcp-and-why-should-developers-care-10b3</link>
      <guid>https://forem.com/sym/what-is-mcp-and-why-should-developers-care-10b3</guid>
      <description>&lt;p&gt;MCP (Model Context Protocol) is an open standard that lets AI models connect to external tools and data sources through a single, consistent interface. Anthropic introduced it in late 2024 to replace the bespoke per-tool integrations developers used to build by hand — one shared protocol that works across any MCP-compatible AI host like Claude Desktop or Cursor.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What it is:&lt;/strong&gt; An open standard for AI-to-tool integration, introduced by Anthropic in late 2024.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why it exists:&lt;/strong&gt; Before MCP, every AI tool needed custom integrations. MCP makes them portable across hosts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;How it works:&lt;/strong&gt; Hosts (Cursor, Claude Desktop) talk to Servers (filesystem, GitHub, database) via Clients using the MCP protocol.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What servers expose:&lt;/strong&gt; Tools (actions the AI can call), Resources (data the AI can read), Prompts (reusable templates).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why developers should care:&lt;/strong&gt; Build an integration once, plug it into any MCP-compatible host. Less duplicated work, more leverage.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The problem MCP solves
&lt;/h2&gt;

&lt;p&gt;AI assistants like Claude, GPT, and Gemini are powerful inside a conversation. But by default they're isolated. They can reason about text you give them, but they can't see your codebase, query your database, check your calendar, or interact with the tools you actually use. Every integration has to be custom-built — you wire up an API call here, a function there, and it's all bespoke plumbing that doesn't transfer between tools.&lt;/p&gt;

&lt;p&gt;This is the problem Model Context Protocol (MCP) is designed to fix.&lt;/p&gt;

&lt;p&gt;MCP is an open standard, introduced by Anthropic in late 2024, that defines a consistent way for AI models to connect to external tools and data sources. Instead of every AI tool reinventing its own integration layer, MCP gives developers a shared protocol — one way to build a connection that works across any MCP-compatible AI host.&lt;/p&gt;

&lt;p&gt;Think of it like USB. Before USB, every device had its own connector. After USB, you plug anything into anything. MCP is trying to be that for AI tool integrations.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;MCP has three main pieces:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hosts&lt;/strong&gt; are the AI applications the user interacts with — Cursor, Claude Desktop, or any app that's built MCP support in. The host manages connections to servers and mediates between the AI model and the outside world.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Servers&lt;/strong&gt; are the integrations. An MCP server exposes a set of tools — things like "read a file," "query a database," "run a terminal command," or "fetch a web page." Servers can be local processes or remote services. They're relatively small, focused, and purpose-built.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Clients&lt;/strong&gt; live inside the host and handle the communication between the host and each server using the MCP protocol.&lt;/p&gt;

&lt;p&gt;When a user asks their AI assistant to do something that requires an external tool, the host checks what MCP servers are connected, picks the right tool, calls it, gets the result, and feeds it back to the model as context. The model never directly touches the external system — it just sees the results as part of its context window.&lt;/p&gt;

&lt;h2&gt;
  
  
  What MCP servers can expose
&lt;/h2&gt;

&lt;p&gt;MCP servers can expose three types of things:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tools&lt;/strong&gt; are functions the AI can call — actions that do something. Run a shell command, create a file, send a message, query an API. These are the most common and most useful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resources&lt;/strong&gt; are data the AI can read — files, database records, documents, anything that can be fetched and fed into context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompts&lt;/strong&gt; are reusable prompt templates the server makes available to the host — useful for standardizing how certain tasks get framed.&lt;/p&gt;

&lt;p&gt;Most real-world MCP servers focus on tools. That's where the practical value is.&lt;/p&gt;

&lt;h2&gt;
  
  
  A concrete example
&lt;/h2&gt;

&lt;p&gt;Say you're using Cursor to write code for a Godot game. Normally, Cursor can read files you paste in and suggest code, but it has no idea what's actually in your Godot project — what scenes exist, what nodes are in them, what the project structure looks like.&lt;/p&gt;

&lt;p&gt;With a Godot MCP server running, Cursor can call tools like &lt;code&gt;get_project_info&lt;/code&gt;, &lt;code&gt;list_scenes&lt;/code&gt;, or &lt;code&gt;get_node_tree&lt;/code&gt; and get real data back from your actual open project. The AI goes from working with whatever you manually paste in to working with live context from your development environment. That's a qualitatively different kind of assistance.&lt;/p&gt;

&lt;p&gt;The same pattern applies everywhere: a filesystem MCP server lets the AI read and write files. A database MCP server lets it query your schema and run queries. A GitHub MCP server lets it read issues, PRs, and code. The AI stays the same — what changes is how much of your actual world it can see.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the standard matters
&lt;/h2&gt;

&lt;p&gt;Before MCP, if you wanted Claude to talk to your database and Cursor to talk to the same database, you'd build two separate integrations. If a new AI tool came out that you wanted to try, you'd build a third.&lt;/p&gt;

&lt;p&gt;With MCP, you build the server once. Any MCP-compatible host can connect to it. That's the compounding value of a shared standard — the integration work accumulates instead of being repeated.&lt;/p&gt;

&lt;p&gt;It also means the ecosystem is growing fast. There are already MCP servers for filesystems, databases, web browsers, GitHub, Slack, Google Drive, and dozens of other tools. Most are open source. If a server exists for what you need, you configure it and connect it — no integration work required.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means if you're building with AI
&lt;/h2&gt;

&lt;p&gt;A few practical implications:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you're building AI-powered apps&lt;/strong&gt;, MCP gives you a cleaner architecture for tool integrations. Instead of hardcoding API calls into your prompt pipeline, you can expose capabilities as MCP tools and let the model decide when and how to use them. It's more composable and easier to extend.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you're using AI coding assistants&lt;/strong&gt;, connecting MCP servers to your editor is one of the highest-leverage things you can do right now. Giving your AI assistant access to your actual project context — not just what you paste in — makes it meaningfully more useful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you're evaluating AI tools for your stack&lt;/strong&gt;, MCP support is increasingly a signal worth paying attention to. Tools that support MCP plug into a growing ecosystem. Tools that don't are islands.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting started
&lt;/h2&gt;

&lt;p&gt;The best place to start is &lt;a href="https://modelcontextprotocol.io" rel="noopener noreferrer"&gt;Anthropic's MCP documentation&lt;/a&gt;. It covers the spec, has quickstart guides for building servers in Python and TypeScript, and links to the existing server ecosystem.&lt;/p&gt;

&lt;p&gt;If you want to see it in action quickly, Claude Desktop supports MCP out of the box. Install it, configure a filesystem or fetch server in &lt;code&gt;claude_desktop_config.json&lt;/code&gt;, and you'll have a working MCP setup in about ten minutes.&lt;/p&gt;

&lt;p&gt;For Cursor users, MCP servers are configured in &lt;code&gt;mcp.json&lt;/code&gt; in your project or user config directory. The Cursor docs cover the setup, and there are community-maintained lists of available servers worth browsing.&lt;/p&gt;




&lt;p&gt;MCP is still early. The spec is evolving, the tooling is maturing, and not every AI host supports it yet. But the direction is clear — shared, composable tool integrations are better for everyone than bespoke one-off wiring. If you're building seriously with AI, it's worth understanding now.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is the difference between MCP hosts and servers?
&lt;/h3&gt;

&lt;p&gt;A host is the AI application a user interacts with directly — Cursor, Claude Desktop, or any IDE that's added MCP support. A server is an integration that exposes a specific capability, like filesystem access, database queries, or the GitHub API. Hosts connect to one or more servers (via clients) so the AI model can use those capabilities as part of a conversation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is MCP only for Anthropic's models?
&lt;/h3&gt;

&lt;p&gt;No. MCP is an open standard, and any AI host can implement support for it regardless of which underlying model it uses. Claude Desktop and Cursor were early adopters, but the protocol itself is model-agnostic and not tied to Claude.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need to build my own MCP server?
&lt;/h3&gt;

&lt;p&gt;Probably not. There are already open-source MCP servers for filesystems, GitHub, Slack, Google Drive, Postgres, web fetching, and dozens of other common tools. You only need to build a custom server when you have an internal system or workflow no existing server covers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Are MCP servers safe to install?
&lt;/h3&gt;

&lt;p&gt;Treat them like any other dependency. MCP servers run as local processes or remote services with whatever permissions you give them, so vet the code or the maintainer before connecting one to your editor — especially if the server can read files, hit your network, or execute shell commands.&lt;/p&gt;

&lt;h3&gt;
  
  
  How is MCP different from OpenAI function calling or tool use?
&lt;/h3&gt;

&lt;p&gt;Function calling is a model-level feature — a single model deciding to call a function inside one app. MCP sits a layer above that: it standardizes how &lt;em&gt;any&lt;/em&gt; host application discovers and connects to &lt;em&gt;any&lt;/em&gt; tool integration. The same MCP server works with multiple hosts and models without rewriting the integration each time.&lt;/p&gt;

&lt;h3&gt;
  
  
  What languages can I use to build an MCP server?
&lt;/h3&gt;

&lt;p&gt;The official SDKs cover Python and TypeScript today, and community SDKs exist for several other languages. Because MCP is just a protocol over standard transports, you can implement it in anything that can speak JSON-RPC over stdio or HTTP.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>anthropic</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Multi-Select in Visual Studio Code</title>
      <dc:creator>Ryan Carter</dc:creator>
      <pubDate>Fri, 13 Sep 2019 21:42:42 +0000</pubDate>
      <link>https://forem.com/sym/multi-select-in-visual-studio-code-19k2</link>
      <guid>https://forem.com/sym/multi-select-in-visual-studio-code-19k2</guid>
      <description>&lt;h2&gt;
  
  
  tl;dr
&lt;/h2&gt;

&lt;p&gt;I am suddenly using &lt;a href="https://code.visualstudio.com/" rel="noopener noreferrer"&gt;VS Code&lt;/a&gt; because of multi-select (they call it multi-cursor in VS Code). Never thought I would. How the mighty have fallen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mac: Multi-Cursor Shortcuts
&lt;/h2&gt;

&lt;p&gt;(these probably work on Windows with some experimentation):&lt;/p&gt;

&lt;p&gt;Some shortcuts first, if that is all you're here for. Otherwise my rambling is below too, you know, if you're into that sort of thing. :)&lt;/p&gt;

&lt;h4&gt;
  
  
  NOTE: I use the "Selection =&amp;gt; Switch to Cmd + Click for Multi-Cursor" option.
&lt;/h4&gt;

&lt;h3&gt;
  
  
  Mac: Shift + Cmd + L
&lt;/h3&gt;

&lt;p&gt;Select a word and press &lt;strong&gt;Shift + Cmd + L&lt;/strong&gt; to select all instances of your selection.&lt;/p&gt;

&lt;h3&gt;
  
  
  Shift + Alt/Option + I
&lt;/h3&gt;

&lt;p&gt;Select a bunch of lines, then Shift + Alt/Option + I will put a cursor at the end of every selected line.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cmd + Option + Shift + UP/DOWN (ARROW)
&lt;/h3&gt;

&lt;p&gt;Selects in a column directly up or down from the cursor's position.&lt;/p&gt;

&lt;h3&gt;
  
  
  Alt/Option + Click
&lt;/h3&gt;

&lt;p&gt;Selects each instance with a new cursor&lt;/p&gt;

&lt;h6&gt;
  
  
  See the &lt;a href="https://code.visualstudio.com/docs/getstarted/keybindings" rel="noopener noreferrer"&gt;VS Code Key Bindings page&lt;/a&gt; for more info on OS specific shortcuts
&lt;/h6&gt;




&lt;h1&gt;
  
  
  Senseless Rambling:
&lt;/h1&gt;

&lt;p&gt;The best feature in &lt;a href="https://www.sublimetext.com/" rel="noopener noreferrer"&gt;Sublime Text&lt;/a&gt; 2/3 is hands down the multi-select feature. I've used it in many languages/stacks for years. It allows you to highlight a word, then automatically edit all instances of that word in your file. You can also select all lines in a column to edit many rows of data at the same time. It is basically the editing power of vim but more simple and graphical for vim noob idiots like me. &lt;/p&gt;

&lt;p&gt;Multi-select is the one thing that has stopped me from moving to another editor for a very long time. Several others have tried to replicate the feature, but none of them seem to get it right, enough to feel as smooth and effortless like Sublime does.&lt;/p&gt;

&lt;p&gt;That was until recently when I looked at VS Code and gave it another shot. I initially stopped using it right away because I was trying to write Vue code and the plugins for Vue really did not work correctly and messed up the spacing. I tried it again, and found that it does have multi-select and delightfully is easier to use than most. It isn't quite as good as the original Sublime implementation, but is good enough to make me switch over to use VS Code for most things. &lt;/p&gt;

&lt;p&gt;To be fair, I am a bit surprised I like a Microsoft product for programming this much. Microsoft has been making strides for years in many areas and shed the old world view of proprietary nonsense to a large degree. They have truly embraced the open source world with decent offerings. Enough that I have switched. I have gone to the dark side. I don't know if they have cookies, but I'm diabetic so that's a no go anyway. I digress.&lt;/p&gt;

&lt;p&gt;There are many other things that make me like VS Code too, but I won't likely be writing Vue/React in it anytime soon, depending on whether it can handle the JSX and other space-formatting issues I had. The built in terminal is very nice, as well as the easy extension support and intelligent features of updates and generally knowing what I want before I need it. Very well done. I appreciate that actual developers make this IDE and made it good for the masses.&lt;/p&gt;

&lt;p&gt;Well Microsoft, you did it. I finally embrace our overlords. Coincidence that Steve Ballmer had to leave for me to get on board with your evil plan for world development domination? I think not.&lt;/p&gt;

&lt;h6&gt;
  
  
  NOTE: Cross-posted from my personal site.
&lt;/h6&gt;

</description>
    </item>
  </channel>
</rss>
