<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Chetan Gupta</title>
    <description>The latest articles on Forem by Chetan Gupta (@chaets).</description>
    <link>https://forem.com/chaets</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F327999%2F06de0556-8b6a-4d4e-ab33-2081b573c84c.png</url>
      <title>Forem: Chetan Gupta</title>
      <link>https://forem.com/chaets</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/chaets"/>
    <language>en</language>
    <item>
      <title>Enabling SSH &amp; RDP on Ubuntu 24.04 VM in Proxmox (Complete Guide)</title>
      <dc:creator>Chetan Gupta</dc:creator>
      <pubDate>Sun, 22 Mar 2026 15:24:52 +0000</pubDate>
      <link>https://forem.com/chaets/enabling-ssh-rdp-on-ubuntu-2404-vm-in-proxmox-complete-guide-oll</link>
      <guid>https://forem.com/chaets/enabling-ssh-rdp-on-ubuntu-2404-vm-in-proxmox-complete-guide-oll</guid>
      <description>&lt;p&gt;Running Ubuntu inside &lt;strong&gt;Proxmox VE&lt;/strong&gt; is powerful for homelabs, but accessing it efficiently (SSH + Remote Desktop) is essential.&lt;/p&gt;

&lt;p&gt;This guide walks you through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Enabling SSH access&lt;/li&gt;
&lt;li&gt;✅ Enabling Remote Desktop (RDP)&lt;/li&gt;
&lt;li&gt;✅ Fixing common issues (like 0x204 error)&lt;/li&gt;
&lt;li&gt;✅ Understanding architecture with diagrams&lt;/li&gt;
&lt;li&gt;✅ Final checklist&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  🧠 Architecture Overview
&lt;/h1&gt;

&lt;h2&gt;
  
  
  🔷 Block Diagram: Access Flow
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+----------------------+
|   Your Laptop/PC     |
| (SSH / RDP Client)   |
+----------+-----------+
           |
           |  Network (LAN / WiFi)
           |
+----------v-----------+
|     Proxmox Host     |
|  (Hypervisor Layer)  |
+----------+-----------+
           |
           | Virtual Network Bridge (vmbr0)
           |
+----------v-----------+
|   Ubuntu 24.04 VM    |
|----------------------|
| SSH Server (port 22) |
| RDP Server (3389)    |
| UFW Firewall         |
| QEMU Guest Agent     |
+----------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  ⚙️ Part 1: Enable SSH on Ubuntu VM
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Step 1: Access VM Console (Proxmox GUI)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Login to Proxmox&lt;/li&gt;
&lt;li&gt;Select VM → &lt;strong&gt;Console&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 2: Install OpenSSH Server
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt update
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;openssh-server &lt;span class="nt"&gt;-y&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 3: Verify SSH Service
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl status ssh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If not running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl &lt;span class="nb"&gt;enable&lt;/span&gt; &lt;span class="nt"&gt;--now&lt;/span&gt; ssh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 4: Configure Firewall (UFW)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;ufw allow ssh
&lt;span class="nb"&gt;sudo &lt;/span&gt;ufw &lt;span class="nb"&gt;enable
sudo &lt;/span&gt;ufw status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;📌 If you see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;firewall not enabled (skipping reload)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;→ That’s normal; just enable it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 5: Get VM IP Address
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ip addr show
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 6: Connect from Your PC
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh username@&amp;lt;vm_ip&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  ⚠️ Proxmox Firewall Check
&lt;/h2&gt;

&lt;p&gt;If SSH doesn’t work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go to &lt;strong&gt;VM → Firewall&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Add rule:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Direction: IN&lt;/li&gt;
&lt;li&gt;Port: 22&lt;/li&gt;
&lt;li&gt;Protocol: TCP&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h1&gt;
  
  
  🖥️ Part 2: Enable Remote Desktop (RDP)
&lt;/h1&gt;

&lt;p&gt;You have &lt;strong&gt;2 methods&lt;/strong&gt;:&lt;/p&gt;




&lt;h1&gt;
  
  
  🔹 Option 1: GNOME Remote Desktop (Recommended)
&lt;/h1&gt;

&lt;p&gt;Best for Ubuntu 24.04 Desktop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Steps:
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Go to:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   Settings → System → Remote Desktop
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Enable:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;✅ Remote Desktop&lt;/li&gt;
&lt;li&gt;❌ Disable "Remote Login" (important!)&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Set:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Username &amp;amp; Password&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Open Firewall
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;ufw allow 3389/tcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  🔹 Option 2: xRDP (Alternative)
&lt;/h1&gt;

&lt;p&gt;Use if GNOME RDP fails.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install:
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt update
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;xrdp &lt;span class="nt"&gt;-y&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl &lt;span class="nb"&gt;enable&lt;/span&gt; &lt;span class="nt"&gt;--now&lt;/span&gt; xrdp
&lt;span class="nb"&gt;sudo &lt;/span&gt;adduser xrdp ssl-cert
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  ⚠️ Important Rule
&lt;/h2&gt;

&lt;p&gt;👉 NEVER run both at the same time&lt;/p&gt;

&lt;p&gt;Check port usage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;ss &lt;span class="nt"&gt;-tulpn&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; :3389
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  🔧 Part 3: Enable QEMU Guest Agent (VERY IMPORTANT)
&lt;/h1&gt;

&lt;p&gt;This fixes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Missing IP&lt;/li&gt;
&lt;li&gt;RDP issues&lt;/li&gt;
&lt;li&gt;Proxmox communication&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Install inside VM:
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;qemu-guest-agent &lt;span class="nt"&gt;-y&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl &lt;span class="nb"&gt;enable&lt;/span&gt; &lt;span class="nt"&gt;--now&lt;/span&gt; qemu-guest-agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Enable in Proxmox:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;VM → &lt;strong&gt;Options&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Enable &lt;strong&gt;QEMU Guest Agent&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;FULL shutdown → Start again&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  🧪 Part 4: Verify Services
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Check SSH:
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl status ssh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Check RDP:
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ss &lt;span class="nt"&gt;-tulpn&lt;/span&gt; | &lt;span class="nb"&gt;grep &lt;/span&gt;3389
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gnome-remote-desktop OR xrdp listening
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  🛠️ Part 5: Fix Common Issues
&lt;/h1&gt;




&lt;h2&gt;
  
  
  ❌ Error: SSH not found
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ssh.service could not be found
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✔ Fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;openssh-server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  ❌ Error: RDP 0x204
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Causes:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Firewall blocked&lt;/li&gt;
&lt;li&gt;Wrong service&lt;/li&gt;
&lt;li&gt;Wayland issue&lt;/li&gt;
&lt;li&gt;NLA mismatch&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  ✅ Fix 1: Disable Wayland
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;nano /etc/gdm3/custom.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Uncomment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="py"&gt;WaylandEnable&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restart:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart gdm3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  ✅ Fix 2: Disable GNOME Auth (User Mode)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;grdctl rdp &lt;span class="nb"&gt;enable
&lt;/span&gt;grdctl rdp set-credentials USERNAME PASSWORD
grdctl rdp disable-view-only
systemctl &lt;span class="nt"&gt;--user&lt;/span&gt; restart gnome-remote-desktop
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  ✅ Fix 3: Client Settings
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Disable NLA&lt;/li&gt;
&lt;li&gt;Set Security Layer → RDP&lt;/li&gt;
&lt;li&gt;Allow insecure connection&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  ✅ Fix 4: Test Connectivity
&lt;/h3&gt;

&lt;p&gt;From your PC:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nc &lt;span class="nt"&gt;-zv&lt;/span&gt; &amp;lt;vm_ip&amp;gt; 3389
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or (Windows):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;Test-NetConnection&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;vm_ip&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Port&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;3389&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  ❌ Issue: Port Conflict
&lt;/h2&gt;

&lt;p&gt;If both installed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt remove xrdp &lt;span class="nt"&gt;-y&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  ❌ Issue: Proxmox Firewall Blocking
&lt;/h2&gt;

&lt;p&gt;Add rule:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Port: 3389&lt;/li&gt;
&lt;li&gt;Protocol: TCP&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ❌ Issue: No GUI Session
&lt;/h2&gt;

&lt;p&gt;👉 For GNOME RDP:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User must be logged in on console&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  🧾 Final Checklist
&lt;/h1&gt;

&lt;h2&gt;
  
  
  ✅ SSH Setup
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[ ] OpenSSH installed&lt;/li&gt;
&lt;li&gt;[ ] SSH service running&lt;/li&gt;
&lt;li&gt;[ ] UFW allows port 22&lt;/li&gt;
&lt;li&gt;[ ] Proxmox firewall allows port 22&lt;/li&gt;
&lt;li&gt;[ ] SSH connection works&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ✅ RDP Setup
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[ ] GNOME Remote Desktop OR xRDP installed&lt;/li&gt;
&lt;li&gt;[ ] Port 3389 open&lt;/li&gt;
&lt;li&gt;[ ] No service conflict&lt;/li&gt;
&lt;li&gt;[ ] Wayland disabled (if needed)&lt;/li&gt;
&lt;li&gt;[ ] Credentials configured&lt;/li&gt;
&lt;li&gt;[ ] RDP connection works&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ✅ Proxmox Integration
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[ ] QEMU Guest Agent installed&lt;/li&gt;
&lt;li&gt;[ ] Enabled in Proxmox&lt;/li&gt;
&lt;li&gt;[ ] IP visible in dashboard&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ✅ Network Validation
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[ ] VM reachable via ping&lt;/li&gt;
&lt;li&gt;[ ] Ports 22 &amp;amp; 3389 reachable&lt;/li&gt;
&lt;li&gt;[ ] Same subnet / no AP isolation&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Pro Tip: Fix RDP Error Code 0x204 (Ubuntu 24.04 on Proxmox)
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Error code 0x204&lt;/strong&gt; usually means your RDP client &lt;strong&gt;cannot reach the Ubuntu VM over the network&lt;/strong&gt;.&lt;br&gt;
In Proxmox setups, this is commonly caused by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔐 Certificate mismatch (GNOME RDP bug in 24.04)&lt;/li&gt;
&lt;li&gt;🔥 Proxmox-level firewall blocking port 3389&lt;/li&gt;
&lt;li&gt;💤 VM power/suspend issues&lt;/li&gt;
&lt;li&gt;🔒 Strict Network Level Authentication (NLA)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🔧 1. Fix GNOME Certificate Bug (Most Overlooked Fix)
&lt;/h2&gt;

&lt;p&gt;Ubuntu 24.04 has a known issue with RDP certificates.&lt;/p&gt;

&lt;h3&gt;
  
  
  Steps:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Open &lt;strong&gt;Microsoft Remote Desktop&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Right-click your Ubuntu connection → &lt;strong&gt;Export&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Open the &lt;code&gt;.rdp&lt;/code&gt; file in a text editor&lt;/li&gt;
&lt;li&gt;Find:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   use redirection server name:i:0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Change to:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   use redirection server name:i:1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Save and &lt;strong&gt;re-import&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;✅ This bypasses certificate validation issues&lt;/p&gt;




&lt;h2&gt;
  
  
  🔥 2. Check Proxmox Firewall (Very Common Issue)
&lt;/h2&gt;

&lt;p&gt;Even if Ubuntu allows RDP, Proxmox may still block it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Steps:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Go to: &lt;strong&gt;VM → Firewall → Options&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Check if firewall is &lt;strong&gt;Enabled&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Add rule:&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Direction&lt;/td&gt;
&lt;td&gt;IN&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Action&lt;/td&gt;
&lt;td&gt;ACCEPT&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Protocol&lt;/td&gt;
&lt;td&gt;TCP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Destination&lt;/td&gt;
&lt;td&gt;3389&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  💤 3. Disable Ubuntu Power Saving
&lt;/h2&gt;

&lt;p&gt;RDP can fail if the VM “sleeps”.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Go to: &lt;strong&gt;Settings → Power&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Set:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Screen Blank → &lt;strong&gt;Never&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Automatic Suspend → &lt;strong&gt;Off&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  🔒 4. Relax Network Level Authentication (NLA)
&lt;/h2&gt;

&lt;p&gt;Strict authentication can break RDP connection.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix on Client (Windows/Mac):
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Open RDP settings → &lt;strong&gt;Advanced&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Set:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“If server authentication fails” →
👉 &lt;strong&gt;Connect and don’t warn me&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  🌐 5. Verify Network Reachability
&lt;/h2&gt;

&lt;p&gt;Make sure your machine can actually reach the VM:&lt;/p&gt;

&lt;h3&gt;
  
  
  Windows:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;Test-NetConnection&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;vm_ip&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Port&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;3389&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Mac/Linux:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nc &lt;span class="nt"&gt;-zv&lt;/span&gt; &amp;lt;vm_ip&amp;gt; 3389
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  ⚠️ Bonus Insight
&lt;/h2&gt;

&lt;p&gt;👉 If you're connecting to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;192.168.x.x&lt;/strong&gt; → Local network (should work easily)&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;External IP → You need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Port forwarding&lt;/li&gt;
&lt;li&gt;Router config&lt;/li&gt;
&lt;li&gt;Firewall rules&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧠 Quick Diagnosis Flow
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;RDP Error 0x204
      |
      v
Can you ping VM?
      |
   No ---&amp;gt; Network issue / AP isolation
      |
     Yes
      |
Is port 3389 open?
      |
   No ---&amp;gt; Firewall (Proxmox/UFW)
      |
     Yes
      |
Certificate / NLA / Wayland issue
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;This Pro Tip section fits perfectly under your &lt;strong&gt;Troubleshooting&lt;/strong&gt; part and makes your blog much more practical.&lt;/p&gt;

&lt;p&gt;If you want, I can next:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Merge this into your full blog cleanly&lt;/li&gt;
&lt;li&gt;Or convert everything into a &lt;strong&gt;professional Medium/LinkedIn article format&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  🎯 Key Takeaways
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;SSH requires &lt;strong&gt;OpenSSH inside VM&lt;/strong&gt; (not Proxmox-level)&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;RDP issues in Ubuntu 24.04 are mostly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wayland&lt;/li&gt;
&lt;li&gt;Service conflicts&lt;/li&gt;
&lt;li&gt;Firewall rules&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;QEMU Guest Agent is &lt;strong&gt;critical for stability&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Always validate:&lt;br&gt;&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  Network → Firewall → Service → Client
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>ubuntu</category>
      <category>virtualmachine</category>
      <category>proxmox</category>
      <category>ssh</category>
    </item>
    <item>
      <title>Part 3: Testing, Deploying, and Lessons Learned</title>
      <dc:creator>Chetan Gupta</dc:creator>
      <pubDate>Wed, 18 Mar 2026 02:38:58 +0000</pubDate>
      <link>https://forem.com/chaets/part-3-testing-deploying-and-lessons-learned-aa5</link>
      <guid>https://forem.com/chaets/part-3-testing-deploying-and-lessons-learned-aa5</guid>
      <description>&lt;p&gt;&lt;em&gt;The final part of a three-part series on building our first MCP server for healthcare interoperability.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Where We Left Off
&lt;/h2&gt;

&lt;p&gt;&lt;a href="//part-1-why-we-built-an-mcp-server-and-what-we-learned-before-writing-a-single-line-of-code.md"&gt;Part 1&lt;/a&gt; covered the &lt;em&gt;why&lt;/em&gt; — the problem space, the choice of MCP, and the architectural decisions. &lt;a href="//part-2-building-the-engine-tools-uris-and-the-art-of-indexing-fhir.md"&gt;Part 2&lt;/a&gt; covered the &lt;em&gt;how&lt;/em&gt; — the indexer, URI scheme, tool handlers, and transport layer. This final post covers the operational reality: how we test an MCP server, the developer workflow, deploying to real AI clients, and the honest retrospective on what worked and what we'd change.&lt;/p&gt;




&lt;h2&gt;
  
  
  Testing an MCP Server: It's Weirder Than You Think
&lt;/h2&gt;

&lt;p&gt;Testing a regular API is well-understood: spin up a server, send requests, assert on responses. Testing an MCP server adds a twist: &lt;strong&gt;your primary consumer is an AI, and you can't write assertions about AI behavior.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We developed a three-layer testing strategy:&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Unit Tests for Handlers
&lt;/h3&gt;

&lt;p&gt;Each handler is a pure function: it takes a Pydantic model and returns a dict. This makes unit testing straightforward.&lt;/p&gt;

&lt;p&gt;The trick is the database. Our handlers query SQLite, so we needed a test database. We chose &lt;strong&gt;temporary databases per test module&lt;/strong&gt; — each test file creates a fresh SQLite database in a temp directory, inserts known test data, and tears it down after.&lt;/p&gt;

&lt;p&gt;The pattern looks like this conceptually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────┐
│  Test Setup                                     │
│  1. Create temp SQLite file                     │
│  2. Create schema (same as production)          │
│  3. Insert known test data (Patient, etc.)      │
│  4. Rebuild FTS index                           │
│  5. Point FHIR_MCP_INDEX_PATH to temp file      │
│  6. Reload storage modules (pick up new path)   │
├─────────────────────────────────────────────────┤
│  Test Execution                                 │
│  - Import handler, create input model, call it  │
│  - Assert on returned metadata and payload      │
├─────────────────────────────────────────────────┤
│  Teardown                                       │
│  - Delete temp file                             │
│  - Restore environment                          │
└─────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A subtle issue we hit: &lt;strong&gt;module-level state.&lt;/strong&gt; Our SQLite store reads &lt;code&gt;DB_PATH&lt;/code&gt; from an environment variable &lt;em&gt;at module load time&lt;/em&gt;. In tests, we need to set the environment variable &lt;em&gt;before&lt;/em&gt; the module is imported, or reload the module after setting it. We solved this with &lt;code&gt;importlib.reload()&lt;/code&gt; — ugly but effective.&lt;/p&gt;

&lt;p&gt;If we were starting over, we'd inject the database path through the Settings object rather than reading environment variables at module scope. Lesson learned.&lt;/p&gt;

&lt;p&gt;Here are the kinds of tests we found most valuable:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Happy path tests:&lt;/strong&gt; "Give me Patient from R4 → returns metadata with name='Patient'." These catch regressions in the handler logic or the SQL queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not-found tests:&lt;/strong&gt; "Give me NonExistentResource from R4 → returns empty dict, not an exception." These are critical because the AI will inevitably ask for things that don't exist, and the server must handle that gracefully.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;FTS tests:&lt;/strong&gt; "Search for 'Patient' → returns at least one result. Search for 'xyznonexistent' → returns empty list." These verify that the full-text search index is working and that our FTS queries are correct.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: URI Scheme Tests
&lt;/h3&gt;

&lt;p&gt;The URI parser and formatter are pure functions with no dependencies. Testing them is simple and satisfying:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Parse "fhir://R4/StructureDefinition/Patient"
  → { scheme: "fhir", version: "R4", name: "Patient" }  ✓

Parse "ig://hl7.fhir.us.core/StructureDefinition/us-core-patient"
  → { scheme: "ig", version: "hl7.fhir.us.core", name: "us-core-patient" }  ✓

Parse "not-a-valid-uri"
  → None  ✓

Format fhir_uri("R4", "Patient")
  → "fhir://R4/StructureDefinition/Patient"  ✓
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We tested the round-trip: format a URI, parse it, verify the components match. This caught a few edge cases with dots in IG names and hyphens in profile names.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: Smoke Tests
&lt;/h3&gt;

&lt;p&gt;The smoke test script is our "does the whole thing work?" check. It:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Verifies the SQLite index file exists.&lt;/li&gt;
&lt;li&gt;Queries for a known resource (Patient) by exact match.&lt;/li&gt;
&lt;li&gt;Runs an FTS search and verifies results come back.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This runs against the &lt;em&gt;real&lt;/em&gt; index (not a test database) and is designed to catch "the build broke the index" or "the schema changed in a way that breaks queries."&lt;/p&gt;

&lt;p&gt;We run smoke tests as part of our local dev workflow — Tilt triggers them after building the index, and they fail-fast if anything is wrong.&lt;/p&gt;

&lt;h3&gt;
  
  
  What We Didn't Test (And Should Have)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Integration tests against the transport layer.&lt;/strong&gt; We tested handlers and storage independently but never tested the full flow: "send a JSON-RPC message on stdin → get a response on stdout." This meant that when we had the stdout buffering issue (mentioned in Part 2), we didn't catch it until manual testing with Claude Desktop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema evolution tests.&lt;/strong&gt; When we added PostgreSQL support, we had to ensure both backends returned the same shape of data. We should have written cross-backend tests from the start.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Developer Experience: Tilt, Docker, and the Inner Loop
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why Tilt?
&lt;/h3&gt;

&lt;p&gt;If you haven't used &lt;a href="https://tilt.dev/" rel="noopener noreferrer"&gt;Tilt&lt;/a&gt;, it's a local development orchestrator. You define resources (build steps, services, health checks) in a &lt;code&gt;Tiltfile&lt;/code&gt;, and Tilt manages the lifecycle: watching for file changes, rebuilding what's needed, restarting services, and showing you a dashboard of what's running.&lt;/p&gt;

&lt;p&gt;For our project, Tilt orchestrates four steps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────┐    ┌─────────────┐    ┌─────────────┐    ┌──────────────┐
│ uv sync  │───▶│   fetch     │───▶│   build     │───▶│  MCP server  │
│          │    │  packages   │    │   index     │    │  (HTTP mode) │
└──────────┘    └─────────────┘    └─────────────┘    └──────────────┘
  deps:            deps:              deps:              deps:
  pyproject.toml   fetch_packages.py  fixtures/          build-index
                   uv-sync            packages/          
                                      fetch-packages     readiness:
                                                         GET /health
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each step declares its dependencies. If you change &lt;code&gt;pyproject.toml&lt;/code&gt;, everything rebuilds. If you only change a handler file, only the server restarts. Tilt tracks file changes and does the minimum work needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why not just a shell script?&lt;/strong&gt; We had one initially:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv &lt;span class="nb"&gt;sync
&lt;/span&gt;python scripts/fetch_packages.py
python scripts/build_index.py
python &lt;span class="nt"&gt;-m&lt;/span&gt; apps.mcp_server.main &lt;span class="nt"&gt;--http&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem: when you change a handler, you have to Ctrl+C and rerun the whole thing. Tilt watches files and restarts only the server, keeping the index intact. It also gives you a dashboard showing the status of each step, and readiness probes that tell you when the server is actually ready (not just started).&lt;/p&gt;

&lt;h3&gt;
  
  
  The Tilt Configuration
&lt;/h3&gt;

&lt;p&gt;Two key decisions in our Tilt setup:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dual-backend support.&lt;/strong&gt; The Tiltfile reads &lt;code&gt;FHIR_MCP_STORAGE_BACKEND&lt;/code&gt; from the environment and configures either SQLite or PostgreSQL accordingly. For PostgreSQL, it uses &lt;code&gt;docker-compose&lt;/code&gt; to spin up a Postgres container. For SQLite, everything is local files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Health checks on the HTTP server.&lt;/strong&gt; The MCP server in HTTP mode exposes &lt;code&gt;GET /health&lt;/code&gt; which returns &lt;code&gt;{"status": "ok"}&lt;/code&gt;. Tilt polls this endpoint to know when the server is ready. This prevents you from sending requests to a server that's still starting up.&lt;/p&gt;

&lt;h3&gt;
  
  
  Docker: The Deployment Story
&lt;/h3&gt;

&lt;p&gt;Our Dockerfile follows a simple pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3.13-slim&lt;/span&gt;
    → Install dependencies with uv
    → Copy source code
    → Run fetch + build index at build time
    → CMD: start the MCP server (stdio mode)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Building the index at image build time&lt;/strong&gt; is deliberate. The Docker image ships with a pre-built index, so the container starts instantly at runtime. The tradeoff is that the image is larger (includes the SQLite database), but startup is fast and there are no runtime initialization steps.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;docker-compose.yml&lt;/code&gt; mounts the data directory as a volume. This means you can rebuild the index on the host and have the container pick it up without rebuilding the image.&lt;/p&gt;

&lt;p&gt;A subtlety: the container runs in &lt;code&gt;stdin_open: true&lt;/code&gt; and &lt;code&gt;tty: true&lt;/code&gt; mode. This is necessary for stdio transport — Docker needs to keep stdin open for the MCP client to communicate with the server.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deploying to Real AI Clients
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Claude Desktop
&lt;/h3&gt;

&lt;p&gt;Claude Desktop supports MCP servers natively. Configuration is a JSON file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"fhir-mcp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"python"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-m"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"apps.mcp_server.main"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"cwd"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/path/to/fhir-mcp"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude Desktop spawns the process, communicates over stdio, and presents the tools in its UI. The user can then ask questions like "What fields are in a FHIR R4 Patient resource?" and Claude will call &lt;code&gt;fhir.get_definition&lt;/code&gt; behind the scenes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Things we learned with Claude Desktop:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;cwd&lt;/code&gt; must be the project root (where &lt;code&gt;pyproject.toml&lt;/code&gt; lives), not the &lt;code&gt;apps/&lt;/code&gt; directory. Relative paths in settings (like &lt;code&gt;data/index/fhir_index.sqlite&lt;/code&gt;) resolve from &lt;code&gt;cwd&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;If the server crashes, Claude Desktop may not show a clear error. Check stderr output to diagnose issues.&lt;/li&gt;
&lt;li&gt;Claude is remarkably good at choosing the right tool. With descriptive tool names and typed inputs, it correctly uses &lt;code&gt;fhir.search&lt;/code&gt; for exploration and &lt;code&gt;fhir.get_definition&lt;/code&gt; for exact lookups.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cursor
&lt;/h3&gt;

&lt;p&gt;Cursor's MCP configuration is nearly identical:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"fhir-mcp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"python"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-m"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"apps.mcp_server.main"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"cwd"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/path/to/fhir-mcp"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Differences we noticed:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cursor tends to call tools in a coding context (while you're editing files), so the prompts and results are optimized for developer workflows.&lt;/li&gt;
&lt;li&gt;Response formatting matters more in Cursor because results appear inline with code.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Takeaway on Client Support
&lt;/h3&gt;

&lt;p&gt;Because MCP standardizes the protocol, supporting multiple clients was trivial. We wrote zero client-specific code. The same server binary, the same tools, the same transport — just different JSON config files for each client.&lt;/p&gt;

&lt;p&gt;This was one of MCP's biggest wins for us. We didn't have to build a Claude plugin &lt;em&gt;and&lt;/em&gt; a Cursor extension &lt;em&gt;and&lt;/em&gt; a VS Code integration. We built one MCP server, and it works everywhere MCP is supported.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prompts: The Underappreciated Third Pillar
&lt;/h2&gt;

&lt;p&gt;MCP has three primitives: tools, resources, and prompts. We spent most of our effort on tools, some on resources (URI scheme), and almost none on prompts initially. That was a mistake.&lt;/p&gt;

&lt;p&gt;Our prompts are simple strings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="s2"&gt;"summarize_profile"&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Summarize a FHIR profile in plain language."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="s2"&gt;"explain_constraint"&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Explain a constraint in a StructureDefinition."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="s2"&gt;"migration_notes"&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Describe migration notes between FHIR versions."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These seem trivial, but they serve an important purpose: &lt;strong&gt;they tell the AI how to use the tools' output.&lt;/strong&gt; Without prompts, the AI might return raw JSON metadata to the user. With a prompt like "summarize this profile in plain language," the AI knows to translate the technical output into something human-readable.&lt;/p&gt;

&lt;p&gt;If we were starting over, we'd invest more in prompts. Specifically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Parameterized prompts&lt;/strong&gt; that include the tool name and expected output format.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chain prompts&lt;/strong&gt; that guide the AI through multi-step workflows: "First call &lt;code&gt;ig.list&lt;/code&gt; to see available IGs, then call &lt;code&gt;fhir.search&lt;/code&gt; to find the relevant profile, then call &lt;code&gt;fhir.get_definition&lt;/code&gt; to get the full definition, then summarize it."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain-specific prompts&lt;/strong&gt; for common healthcare developer questions: "Compare this resource between R4 and R5 and list breaking changes."&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Honest Retrospective: What Worked, What Didn't, What We'd Change
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What Worked
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. The layered architecture.&lt;/strong&gt; Transport → Registry → Handlers → Packages → Storage. Every layer has one job. Adding PostgreSQL support was a one-layer change. Adding HTTP transport was a one-layer change. Adding a new tool is a two-file change (handler + registry).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Pydantic everywhere.&lt;/strong&gt; Input validation, settings, data models — Pydantic caught bugs early and served as living documentation. The type system paid for itself in the first week.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. SQLite + FTS5 for local use.&lt;/strong&gt; Zero-config, fast, reliable. For a single-user local tool, SQLite is hard to beat.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Explicit registries.&lt;/strong&gt; Being able to open one file and see every tool, resource, and prompt in the system is invaluable for onboarding and debugging.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. The stub pattern.&lt;/strong&gt; Having &lt;code&gt;validate.instance&lt;/code&gt; as a stub from day one meant the interface contract was established early. When we eventually implement it, the tool name, input schema, and registry entry already exist.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Didn't Work
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Module-level state.&lt;/strong&gt; Reading environment variables at module load time (e.g., &lt;code&gt;DB_PATH = os.environ.get(...)&lt;/code&gt;) made testing painful. We had to reload modules to pick up test configuration. Dependency injection through the Settings object would have been cleaner.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The Tool class is boilerplate-heavy.&lt;/strong&gt; Every handler file defines the same Tool class with the same three attributes. We should have defined it once in a shared module. We resisted DRY initially because we valued independence between handlers, but the duplication became annoying.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. No end-to-end transport tests.&lt;/strong&gt; We tested handlers and storage in isolation but never tested "JSON on stdin → JSON on stdout." The stdout buffering bug could have been caught by an automated test.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Prompts were an afterthought.&lt;/strong&gt; We treated them as static strings rather than the powerful interaction guides they could be. They deserve the same rigor as tool definitions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. No client-facing schema export.&lt;/strong&gt; MCP clients can request the tool schemas (input models) to understand what each tool expects. We return tool &lt;em&gt;names&lt;/em&gt; in &lt;code&gt;list_tools&lt;/code&gt; but don't include the full JSON schema for each tool's input model. Adding this would make it easier for clients (and AIs) to understand the tool interface without documentation.&lt;/p&gt;

&lt;h3&gt;
  
  
  What We'd Change in v2
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Use a proper MCP SDK.&lt;/strong&gt; We built the transport layer by hand (reading JSON-RPC from stdin, writing responses). There are now Python MCP SDKs that handle the protocol details. We'd use one of those instead of rolling our own.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Async handlers.&lt;/strong&gt; Our handlers are synchronous. For a local SQLite-based server, this is fine. But with PostgreSQL or potential network-based data sources, async would allow concurrent tool calls. The MCP protocol supports this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Streaming responses.&lt;/strong&gt; For large payloads (like a full StructureDefinition), streaming would be better than loading the entire JSON into memory and truncating. MCP supports progressive responses, and we should use them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Richer diff tool.&lt;/strong&gt; The &lt;code&gt;fhir.diff_versions&lt;/code&gt; tool currently only compares top-level metadata. A proper diff that compares element paths, cardinality changes, and type modifications would be dramatically more useful for migration work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Package management in the server.&lt;/strong&gt; Currently, packages are fetched and indexed offline by running scripts. Ideally, the server (or a companion tool) could fetch FHIR packages from a registry, index them, and make them available — all through MCP tools that the AI could invoke.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bigger Picture: What Building an MCP Server Taught Us
&lt;/h2&gt;

&lt;h3&gt;
  
  
  MCP changes how you think about AI integration
&lt;/h3&gt;

&lt;p&gt;Before MCP, we thought about AI integration as "give the AI context and hope for the best." After building an MCP server, we think about it as "give the AI typed, validated tools and let it be an agent."&lt;/p&gt;

&lt;p&gt;The difference is profound. With context stuffing, you're limited by the context window and the AI's ability to find the needle in the haystack. With MCP tools, the AI can make targeted, efficient queries — just like a developer would.&lt;/p&gt;

&lt;h3&gt;
  
  
  Healthcare needs more MCP servers
&lt;/h3&gt;

&lt;p&gt;FHIR is just one specification. Healthcare interoperability involves CDA, HL7v2, SMART on FHIR, Bulk Data, DaVinci IGs, and dozens of other standards. Each of these could benefit from an MCP server that lets AI assistants look up specifications accurately instead of hallucinating.&lt;/p&gt;

&lt;h3&gt;
  
  
  The bar for building an MCP server is low
&lt;/h3&gt;

&lt;p&gt;Our first working version was built in a few days. The core is ~500 lines of Python across the transport, registry, and handlers. The indexer is ~100 lines. The rest is data and configuration.&lt;/p&gt;

&lt;p&gt;If you have a domain-specific data source that AI assistants get wrong, building an MCP server for it is probably easier than you think. The protocol is simple, the pattern is clear, and the payoff — AI that gives accurate, grounded answers about your domain — is immediate.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick-Start Mental Model
&lt;/h2&gt;

&lt;p&gt;If you're thinking about building your own MCP server, here's the mental model we'd recommend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────────────┐
│                    YOUR MCP SERVER                              │
│                                                                 │
│  1. DATA LAYER                                                  │
│     What data do you have?                                      │
│     How will you store/index it?                                │
│     → SQLite for local, Postgres for shared                    │
│                                                                 │
│  2. TOOLS                                                       │
│     What operations does the AI need?                           │
│     → One tool per distinct operation                          │
│     → Pydantic model for every input                           │
│     → Return structured data, not prose                        │
│                                                                 │
│  3. RESOURCES                                                   │
│     What data should be directly addressable by URI?            │
│     → Design URIs that are human-readable and parseable        │
│                                                                 │
│  4. PROMPTS                                                     │
│     How should the AI present results to users?                 │
│     → Guide the AI's interpretation of tool output             │
│                                                                 │
│  5. TRANSPORT                                                   │
│     stdio for AI clients, HTTP for dev/testing                  │
│     → Keep this layer as thin as possible                      │
│                                                                 │
│  6. TEST                                                        │
│     Unit test handlers with mock data                           │
│     Smoke test the full pipeline                                │
│     → Test the transport layer end-to-end                      │
│                                                                 │
│  7. DEPLOY                                                      │
│     JSON config for each AI client                              │
│     Docker for production                                       │
│     Tilt for local dev                                          │
└─────────────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Building an MCP server was one of the most rewarding developer experience projects we've worked on. The feedback loop is immediate — you build a tool, restart the server, ask the AI a question, and watch it use your tool to give a better answer. It's like giving the AI a new superpower, one tool at a time.&lt;/p&gt;

&lt;p&gt;If you work in a domain with complex, versioned, structured data — healthcare, legal, finance, infrastructure — and you're tired of AI assistants getting the details wrong, consider building an MCP server. Start small. One tool. One data source. See what happens when the AI can actually look things up instead of guessing.&lt;/p&gt;

&lt;p&gt;You might be surprised how much better "AI-assisted" can be when the AI has access to ground truth.&lt;/p&gt;




&lt;p&gt;*This is Part 3 of a 3-part series. &lt;br&gt;
&lt;a href="https://dev.to/chaets/mcp-the-missing-layer-between-ai-and-your-application-fdj"&gt;Part 0: MCP — The Missing Layer Between AI and Your Application →&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://dev.to/chaets/part-1-why-we-built-an-mcp-server-and-what-we-learned-before-writing-a-single-line-of-code-4mao"&gt;Part 1: Why We Built an MCP Server — And What We Learned Before Writing a Single Line of Code&lt;/a&gt;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://dev.to/chaets/part-2-building-the-engine-tools-uris-and-the-art-of-indexing-fhir-fi1"&gt;Part 2: Building the Engine — Tools, URIs, and the Art of Indexing FHIR&lt;/a&gt;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;&lt;a href="https://dev.to/chaets/part-3-testing-deploying-and-lessons-learned-aa5"&gt;Part 3: Testing, Deploying, and Lessons Learned -&amp;gt; coming soon&lt;/a&gt;&lt;/em&gt;
If you to connect with me, let’s connect on &lt;a href="https://www.linkedin.com/in/chaets/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or drop me a message—I’d love to explore how I can help drive your data success!&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>fhir</category>
      <category>interoperability</category>
    </item>
    <item>
      <title>Part 2: Building the Engine — Tools, URIs, and the Art of Indexing FHIR</title>
      <dc:creator>Chetan Gupta</dc:creator>
      <pubDate>Tue, 10 Mar 2026 02:08:43 +0000</pubDate>
      <link>https://forem.com/chaets/part-2-building-the-engine-tools-uris-and-the-art-of-indexing-fhir-fi1</link>
      <guid>https://forem.com/chaets/part-2-building-the-engine-tools-uris-and-the-art-of-indexing-fhir-fi1</guid>
      <description>&lt;p&gt;&lt;em&gt;Part 2 of a three-part series on building our first MCP server for healthcare interoperability.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Where We Left Off
&lt;/h2&gt;

&lt;p&gt;In &lt;a href="https://dev.to/chaets/part-1-why-we-built-an-mcp-server-and-what-we-learned-before-writing-a-single-line-of-code-4mao"&gt;Part 1&lt;/a&gt;, we talked about &lt;em&gt;why&lt;/em&gt; we built an MCP server for FHIR and the architectural decisions we made before writing code. Now we're going to get into the &lt;em&gt;how&lt;/em&gt; — the implementation details, the patterns that emerged, and the places where the reality of FHIR made us rethink our approach.&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 1: Turning Thousands of JSON Files Into a Searchable Index
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Data Problem
&lt;/h3&gt;

&lt;p&gt;FHIR packages are distributed as folders of JSON files. A single core FHIR package (say, &lt;code&gt;hl7.fhir.r4.core&lt;/code&gt;) contains thousands of files: one for each StructureDefinition, ValueSet, CodeSystem, SearchParameter, OperationDefinition, and so on.&lt;/p&gt;

&lt;p&gt;Each file looks something like this (simplified):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"resourceType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"StructureDefinition"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Patient"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://hl7.org/fhir/StructureDefinition/Patient"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Patient"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Patient Resource"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"active"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"fhirVersion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"4.0.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"kind"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"resource"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Patient"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Demographics and other administrative information about an individual receiving care."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"differential"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"element"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Patient"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Patient"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"min"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"max"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Patient.identifier"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Patient.identifier"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"min"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"max"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Patient.name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Patient.name"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"min"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"max"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The challenge: we need to be able to (a) look up a specific resource by name and version, and (b) do full-text search across &lt;em&gt;all&lt;/em&gt; resources. Doing that by scanning the filesystem on every query would be far too slow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why SQLite + FTS5
&lt;/h3&gt;

&lt;p&gt;We chose SQLite with FTS5 (Full-Text Search 5) for the index. Here's the reasoning:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero infrastructure.&lt;/strong&gt; SQLite is a single file. No server process, no ports, no configuration. For a local-first tool, this is ideal — the entire database is just a file in &lt;code&gt;data/index/fhir_index.sqlite&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ships with Python.&lt;/strong&gt; The &lt;code&gt;sqlite3&lt;/code&gt; module is in Python's standard library. No pip install, no binary dependencies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;FTS5 is surprisingly powerful.&lt;/strong&gt; SQLite's FTS5 extension supports ranked full-text search with a single SQL query. You create a virtual table that mirrors your main table, and then you can &lt;code&gt;MATCH&lt;/code&gt; against it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fhir_version&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;fhir_resources_fts&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;fhir_resources_fts&lt;/span&gt; &lt;span class="k"&gt;MATCH&lt;/span&gt; &lt;span class="s1"&gt;'Patient'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you ranked results, and it's fast — milliseconds over thousands of resources.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Predictable performance.&lt;/strong&gt; SQLite's performance characteristics are well understood. For read-heavy workloads (which is all we do at runtime), it's excellent. No connection pooling, no query planning surprises.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Indexing Pipeline
&lt;/h3&gt;

&lt;p&gt;The indexer runs as a standalone script, separate from the server. It does three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Discover packages.&lt;/strong&gt; Walk &lt;code&gt;data/packages/&lt;/code&gt; and &lt;code&gt;data/fixtures/&lt;/code&gt;, find every directory with a &lt;code&gt;package.json&lt;/code&gt; file.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extract metadata.&lt;/strong&gt; For each JSON file in a package, read the resource, extract the key fields (canonical URL, name, title, type, version, description), and normalize them into a consistent shape.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write to SQLite.&lt;/strong&gt; Insert every resource into the main table, then rebuild the FTS5 index.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's the key insight about the extraction step: &lt;strong&gt;we don't index everything.&lt;/strong&gt; A StructureDefinition can have hundreds of elements, extensions, constraints, slicing rules. We extract only the metadata needed for lookup and search:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;canonical_url  →  "http://hl7.org/fhir/StructureDefinition/Patient"
name           →  "Patient"
title          →  "Patient Resource"
type           →  "StructureDefinition"   (the resourceType)
fhir_version   →  "R4"
package_name   →  "hl7.fhir.r4.core"
package_version → "4.0.1"
resource_type  →  "Patient"               (the FHIR type, like Patient, Observation)
summary_text   →  "Demographics and other administrative information..."
json_payload   →  (the full JSON, stored for retrieval)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;json_payload&lt;/code&gt; is stored but &lt;em&gt;not&lt;/em&gt; included in search by default. It's there so we can return the full resource when requested, but we don't want FTS5 indexing the entire JSON blob — that would bloat the index and produce noisy search results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Normalization: Why It Matters
&lt;/h3&gt;

&lt;p&gt;FHIR resources aren't consistent in their metadata fields. A StructureDefinition has &lt;code&gt;type&lt;/code&gt; and &lt;code&gt;kind&lt;/code&gt;. A ValueSet has neither. A CodeSystem has a &lt;code&gt;content&lt;/code&gt; field that's irrelevant to us. Different FHIR versions may organize fields slightly differently.&lt;/p&gt;

&lt;p&gt;We wrote normalization functions for each resource type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;normalize_structure_definition&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;package_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;flat&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;
&lt;span class="nf"&gt;normalize_value_set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;package_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt;             &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;flat&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;
&lt;span class="nf"&gt;normalize_code_system&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;package_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt;           &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;flat&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each one extracts the same set of fields into the same shape, regardless of the resource type. This means the handlers don't need to know the difference between indexing a ValueSet and a StructureDefinition — they all look the same in the database.&lt;/p&gt;

&lt;p&gt;This was a lesson in &lt;strong&gt;write boring normalization code early, save debugging time later.&lt;/strong&gt; We initially skipped normalization and tried to query the raw JSON fields with SQLite JSON functions. It worked, but the queries were fragile, slow, and different for each resource type. Flat normalization was a much better investment.&lt;/p&gt;

&lt;h3&gt;
  
  
  The "Rebuild the World" Pattern
&lt;/h3&gt;

&lt;p&gt;Our indexer always starts by deleting all existing data and re-indexing from scratch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DELETE FROM fhir_resources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# ... re-index everything ...
&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INSERT INTO fhir_resources_fts(fhir_resources_fts) VALUES(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rebuild&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is intentional. We're indexing static, versioned packages — not a stream of live data. The total data volume is small enough (seconds to minutes to index) that incremental updates aren't worth the complexity. "Delete everything and rebuild" is simple, correct, and fast enough.&lt;/p&gt;

&lt;p&gt;The FTS5 &lt;code&gt;'rebuild'&lt;/code&gt; command is important — it tells SQLite to reconstruct the full-text index from the content table. Without it, the FTS index would be stale after a bulk delete/insert.&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 2: Designing the URI Scheme
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why Custom URIs?
&lt;/h3&gt;

&lt;p&gt;MCP has a concept of &lt;strong&gt;resources&lt;/strong&gt; — read-only data items identified by URIs. The AI can "read" a resource by requesting its URI, similar to how a browser requests a URL.&lt;/p&gt;

&lt;p&gt;We needed URIs that were:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Human-readable&lt;/strong&gt; — a developer should be able to look at a URI and know what it refers to.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parseable&lt;/strong&gt; — the server needs to extract version, resource type, and name from the URI to do a lookup.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unambiguous&lt;/strong&gt; — the same name can exist in different contexts (the Patient StructureDefinition in R4 vs R5, or in US Core vs base FHIR).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We designed three URI schemes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fhir://R4/StructureDefinition/Patient
 │     │         │               │
 │     │         │               └── Resource name
 │     │         └── Resource kind
 │     └── FHIR version
 └── Scheme (core FHIR)

ig://hl7.fhir.us.core/5.0.1/StructureDefinition/us-core-patient
 │        │              │            │                │
 │        │              │            │                └── Profile name
 │        │              │            └── Resource kind
 │        │              └── IG version
 │        └── IG package name
 └── Scheme (Implementation Guide)

uscore://5.0.1/StructureDefinition/us-core-patient
  │       │           │                  │
  │       │           │                  └── Profile name
  │       │           └── Resource kind
  │       └── US Core version
  └── Scheme (convenience shorthand for US Core)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;uscore://&lt;/code&gt; scheme is a convenience alias. US Core is by far the most commonly referenced IG in the US healthcare ecosystem, so it gets a shorthand.&lt;/p&gt;

&lt;h3&gt;
  
  
  Parsing and Formatting
&lt;/h3&gt;

&lt;p&gt;We built a small &lt;code&gt;uri_scheme&lt;/code&gt; package with two modules:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Parsing&lt;/strong&gt; uses a regex to decompose a URI into its components:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="s2"&gt;"fhir://R4/StructureDefinition/Patient"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;scheme:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"fhir"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;version:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"R4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;name:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Patient"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Formatting&lt;/strong&gt; does the reverse — construct a URI from components:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;format_fhir_uri&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;R4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Patient&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fhir://R4/StructureDefinition/Patient&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A design decision we made here: &lt;strong&gt;StructureDefinition is hardcoded in the URI path.&lt;/strong&gt; We debated making the resource type a variable, but in practice, 95%+ of the resources that AI assistants ask about are StructureDefinitions (or profiles, which are StructureDefinitions). ValueSets and CodeSystems are almost always accessed via search, not direct URI lookup. Hardcoding simplified the URI scheme and made the common case cleaner.&lt;/p&gt;

&lt;p&gt;If we ever need to support &lt;code&gt;fhir://R4/ValueSet/administrative-gender&lt;/code&gt;, we can extend the regex. But we haven't needed to yet, and premature generalization would have complicated the parser for no benefit.&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 3: Building the Tool Handlers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Handler Pattern
&lt;/h3&gt;

&lt;p&gt;Every tool in our server follows the exact same structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Define a Pydantic model for the input
2. Write a handler function that takes the model and returns a result
3. Wrap them in a Tool object
4. Register the Tool in the registry
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This isn't accidental — we arrived at it after trying a few alternatives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attempt 1: Functions with `&lt;/strong&gt;kwargs`.**  We tried defining handlers as functions that accept keyword arguments directly. The problem: no validation, no type checking, no way for MCP to communicate the expected schema to the AI. The AI would send inputs in unexpected shapes and we'd get runtime KeyErrors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attempt 2: Decorated functions.&lt;/strong&gt; We tried a decorator approach where you'd annotate a function and metadata would be extracted automatically. Clever, but opaque. When something went wrong, the stack trace pointed to decorator internals, not our code. And new team members couldn't understand how tools were registered without understanding the decorator machinery.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attempt 3 (what we kept): Explicit Tool class.&lt;/strong&gt; A simple class with three attributes: &lt;code&gt;name&lt;/code&gt;, &lt;code&gt;input_model&lt;/code&gt;, &lt;code&gt;handler&lt;/code&gt;. No magic. No metaclasses. The registration is a dictionary assignment. The cost is a few extra lines per tool. The benefit is total clarity.&lt;/p&gt;

&lt;p&gt;Here's the conceptual pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────────────────────────────────────────────────┐
│  Handler File: fhir_search.py                        │
│                                                      │
│  1. Input Model (Pydantic)                           │
│     query: str                                       │
│     version: Optional[str]                           │
│     kind: Optional[str]                              │
│     top_n: int = 10                                  │
│                                                      │
│  2. Handler Function                                 │
│     Takes validated input → queries SQLite FTS        │
│     Returns list of matching resources               │
│                                                      │
│  3. Tool Object                                      │
│     name = "fhir.search"                             │
│     input_model = FhirSearchInput                    │
│     handler = fhir_search_handler                    │
└──────────────────────────────────────────────────────┘
          │
          ▼
┌──────────────────────────────────────────────────────┐
│  Registry: tools.py                                  │
│                                                      │
│  TOOL_REGISTRY = {                                   │
│      "fhir.search": fhir_search_tool,                │
│      "fhir.get_definition": fhir_get_definition_tool,│
│      ...                                             │
│  }                                                   │
└──────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Tool-by-Tool: The Thinking Behind Each One
&lt;/h3&gt;

&lt;p&gt;Let's walk through each tool and the reasoning behind it.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;fhir.get_definition&lt;/code&gt; — The Surgical Lookup
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; Given a FHIR version, resource kind, and name, returns the metadata (and optionally the full JSON) for that specific resource.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it exists:&lt;/strong&gt; This is the most fundamental operation. When an AI is discussing the Patient resource, it needs to be able to say "let me look that up" and get the authoritative definition. Not a search result. Not a "maybe." The exact definition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Design choices:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;include_json&lt;/code&gt; defaults to &lt;code&gt;false&lt;/code&gt;. Metadata (name, title, canonical URL, version, description) is usually enough for the AI to answer a question. The full JSON is huge and should only be retrieved when specifically needed.&lt;/li&gt;
&lt;li&gt;When &lt;code&gt;include_json&lt;/code&gt; is &lt;code&gt;true&lt;/code&gt;, the payload is &lt;strong&gt;truncated to 10,000 characters&lt;/strong&gt;. A full StructureDefinition can be 50KB+. Truncation keeps the response within reasonable context window limits while still providing useful structural information.&lt;/li&gt;
&lt;li&gt;Returns &lt;code&gt;(meta_dict, json_string)&lt;/code&gt; — separating metadata from the payload lets the AI decide what to use without parsing raw JSON.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;fhir.search&lt;/code&gt; — The Exploration Tool
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; Full-text search across all indexed resources, with optional filters for version, kind, and IG.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it exists:&lt;/strong&gt; Sometimes the AI doesn't know the exact resource name. A user might ask "what FHIR resource handles allergies?" The AI needs to &lt;em&gt;search&lt;/em&gt;, not just look up. This tool lets it query the index the same way a human would search a specification.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Design choices:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;top_n&lt;/code&gt; defaults to 10. Returning too many results wastes context. 10 is enough for the AI to find what it needs.&lt;/li&gt;
&lt;li&gt;Filters are all optional. You can search across everything (&lt;code&gt;query: "allergy"&lt;/code&gt;), or narrow it down (&lt;code&gt;query: "allergy", version: "R4", kind: "StructureDefinition"&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Results include metadata only, not full JSON. If the AI finds what it's looking for, it can follow up with &lt;code&gt;fhir.get_definition&lt;/code&gt; for the full payload.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;ig.list&lt;/code&gt; — The Discovery Tool
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; Returns a list of all Implementation Guides that have been indexed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it exists:&lt;/strong&gt; Before the AI can query an IG, it needs to know what IGs are available. This tool answers the question "what IGs does this server know about?" It's the starting point for IG-related conversations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Design choices:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Takes no input. It's purely a discovery mechanism.&lt;/li&gt;
&lt;li&gt;Returns package name, version, and FHIR version for each IG.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;uscore.get_profile&lt;/code&gt; — The Shortcut
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; Fetches a US Core profile by version and name.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it exists:&lt;/strong&gt; US Core is &lt;em&gt;the&lt;/em&gt; most commonly referenced IG in US healthcare development. Having a dedicated tool for it (instead of making the AI use &lt;code&gt;fhir.get_definition&lt;/code&gt; with the right package name) reduces the number of parameters the AI needs to get right and makes the common case faster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Design choices:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Separate from &lt;code&gt;fhir.get_definition&lt;/code&gt; even though it queries the same database. The semantic distinction matters to the AI — "get a US Core profile" is a different intent than "get a FHIR definition."&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;fhir.diff_versions&lt;/code&gt; — The Migration Helper
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; Compares a StructureDefinition between two FHIR versions (e.g., R4 vs R5).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it exists:&lt;/strong&gt; One of the most common questions in FHIR development is "what changed between versions?" When migrating from R4 to R5, developers need to know which elements were added, removed, or renamed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Design choices:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Currently does a &lt;strong&gt;metadata-level diff only&lt;/strong&gt; — comparing the top-level fields. A full element-path diff (comparing every element in the differential/snapshot) is complex and was deferred.&lt;/li&gt;
&lt;li&gt;The tool exists with partial functionality rather than not existing at all. This is deliberate: the AI knows the capability exists and can provide partial answers ("the metadata changed in these ways, though a full element diff isn't available yet") rather than no answer.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;code&gt;validate.instance&lt;/code&gt; — The Placeholder
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; Nothing, currently. Returns a "not implemented" response.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it exists as a stub:&lt;/strong&gt; We wanted the tool in the registry from day one, even though validation is hard. Why?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It signals intent. Other developers (and the AI itself) can see that validation is a planned capability.&lt;/li&gt;
&lt;li&gt;It establishes the input contract early. The Pydantic model defines what validation will eventually accept.&lt;/li&gt;
&lt;li&gt;It fails gracefully. If the AI tries to use it, it gets a clear "not implemented" message rather than a confusing error.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Phase 4: The Transport Layer — Less Is More
&lt;/h2&gt;

&lt;h3&gt;
  
  
  stdio: The Primary Transport
&lt;/h3&gt;

&lt;p&gt;MCP's standard transport is JSON-RPC over stdio. The client (Claude Desktop, Cursor, etc.) spawns the server as a child process, sends JSON on stdin, and reads JSON from stdout. stderr is reserved for logging.&lt;/p&gt;

&lt;p&gt;Our stdio transport is surprisingly simple. The core loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Read a line from stdin
2. Parse it as JSON
3. Route to the right handler based on the "method" field
4. Serialize the response as JSON
5. Write it to stdout + newline
6. Flush
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things we learned:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Always flush stdout.&lt;/strong&gt; If you don't explicitly flush after writing, the response may sit in a buffer and the client will hang waiting for it. This bit us during early testing — everything worked in manual testing (where stdout is line-buffered to a terminal) but hung in Claude Desktop (where stdout is fully buffered to a pipe).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Log to stderr, never stdout.&lt;/strong&gt; Stdout is the protocol channel. Any print statement that goes to stdout will be interpreted as a JSON-RPC message and break the protocol. We learned to use &lt;code&gt;print(..., file=sys.stderr)&lt;/code&gt; for all diagnostic output and configured Python's logging to write to stderr.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Catch and serialize all exceptions.&lt;/strong&gt; If the handler throws, the transport catches it and returns a structured error response. If the transport itself throws (e.g., malformed JSON), it still writes a valid JSON error to stdout. The client should never see a raw traceback on the protocol channel.&lt;/p&gt;

&lt;h3&gt;
  
  
  HTTP: The Development Convenience
&lt;/h3&gt;

&lt;p&gt;We added a simple HTTP transport for development and testing. It runs the same handlers but accepts requests via HTTP POST instead of stdin.&lt;/p&gt;

&lt;p&gt;Why? Because testing via stdin is painful. You have to pipe JSON into the process, read from stdout, and deal with buffering. With HTTP, you can use &lt;code&gt;curl&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8080 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"method": "invoke_tool", "params": {"name": "fhir.search", "input": {"query": "Patient"}}}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The HTTP server also exposes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;GET /health&lt;/code&gt; — for readiness probes (important for Tilt, which we'll cover in Part 3)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GET /tools&lt;/code&gt; — quick way to see what tools are available&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We built this using Python's built-in &lt;code&gt;http.server&lt;/code&gt; module — no Flask, no FastAPI, no additional dependencies. For a dev-only transport, stdlib is enough.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Glue: How Settings Hold It All Together
&lt;/h2&gt;

&lt;p&gt;Configuration flows through a single &lt;code&gt;Settings&lt;/code&gt; class built with Pydantic Settings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Settings&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;data_dir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;         &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data"&lt;/span&gt;                              &lt;span class="s"&gt;(base data directory)&lt;/span&gt;
  &lt;span class="na"&gt;index_path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;       &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data/index/fhir_index.sqlite"&lt;/span&gt;      &lt;span class="s"&gt;(SQLite index)&lt;/span&gt;
  &lt;span class="na"&gt;packages_dir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;     &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data/packages"&lt;/span&gt;                     &lt;span class="s"&gt;(FHIR packages)&lt;/span&gt;
  &lt;span class="na"&gt;fixtures_dir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;     &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data/fixtures"&lt;/span&gt;                     &lt;span class="s"&gt;(demo data)&lt;/span&gt;
  &lt;span class="na"&gt;log_level&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;        &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INFO"&lt;/span&gt;
  &lt;span class="na"&gt;storage_backend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqlite"&lt;/span&gt;                            &lt;span class="s"&gt;(or "postgres")&lt;/span&gt;
  &lt;span class="na"&gt;pg_host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;          &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost"&lt;/span&gt;                          &lt;span class="s"&gt;(PostgreSQL config)&lt;/span&gt;
  &lt;span class="na"&gt;pg_port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;          &lt;span class="m"&gt;5432&lt;/span&gt;
  &lt;span class="na"&gt;pg_database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;      &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fhir_mcp"&lt;/span&gt;
  &lt;span class="s"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything is configurable via environment variables with the &lt;code&gt;FHIR_MCP_&lt;/code&gt; prefix. So &lt;code&gt;FHIR_MCP_INDEX_PATH=/custom/path.sqlite&lt;/code&gt; overrides the default index path.&lt;/p&gt;

&lt;p&gt;Why Pydantic Settings instead of just &lt;code&gt;os.environ.get()&lt;/code&gt;? Because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Type coercion.&lt;/strong&gt; &lt;code&gt;pg_port&lt;/code&gt; is declared as &lt;code&gt;int&lt;/code&gt;, so the string from the environment is automatically converted.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Defaults in one place.&lt;/strong&gt; You can read the Settings class and see every configuration option with its default.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validation at startup.&lt;/strong&gt; If you set &lt;code&gt;FHIR_MCP_PG_PORT=not_a_number&lt;/code&gt;, Pydantic catches it immediately rather than failing on first database connection.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What Surprised Us About Building Tools for AI
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Surprise 1: The AI prefers narrow tools over flexible ones
&lt;/h3&gt;

&lt;p&gt;We initially tried to build a single "query" tool that could do lookups, search, and filtering all in one. The AI struggled with it — too many optional parameters, too many modes. When we split it into focused tools (&lt;code&gt;get_definition&lt;/code&gt; for exact lookup, &lt;code&gt;search&lt;/code&gt; for exploration, &lt;code&gt;ig.list&lt;/code&gt; for discovery), the AI's tool selection accuracy improved dramatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson: Build many focused tools, not few flexible ones.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Surprise 2: Optional fields need good defaults
&lt;/h3&gt;

&lt;p&gt;When we had &lt;code&gt;top_n&lt;/code&gt; as a required field on the search tool, the AI would sometimes send &lt;code&gt;top_n: 100&lt;/code&gt; or &lt;code&gt;top_n: 1000&lt;/code&gt;. When we made it optional with a default of 10, the AI almost always omitted it (using the default) or sent a reasonable value.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson: Defaults guide AI behavior. Choose them carefully.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Surprise 3: Error messages are consumed by the AI, not humans
&lt;/h3&gt;

&lt;p&gt;When a tool returns an error, the AI reads it and decides what to do next. We initially returned generic errors like &lt;code&gt;{"error": "Not found"}&lt;/code&gt;. The AI would then tell the user "the resource wasn't found" without any helpful context. When we improved errors to include specifics — &lt;code&gt;{"error": "StructureDefinition 'Patientt' not found in R4. Did you mean 'Patient'?"}&lt;/code&gt; — the AI became much better at self-correcting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson: Write error messages for your AI caller, not for a log file.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Surprise 4: The storage backend swap validated the architecture
&lt;/h3&gt;

&lt;p&gt;Halfway through development, we decided to add PostgreSQL as an alternative storage backend (for teams that wanted shared indexes or larger datasets). Because we'd built the storage layer as an interface — &lt;code&gt;get_definition_by_name()&lt;/code&gt;, &lt;code&gt;search_definitions()&lt;/code&gt;, &lt;code&gt;list_igs()&lt;/code&gt; — we could add a Postgres implementation without touching a single handler or transport file.&lt;/p&gt;

&lt;p&gt;The storage module uses a simple factory based on an environment variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;FHIR_MCP_STORAGE_BACKEND&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sqlite  →  uses sqlite_store
&lt;span class="nv"&gt;FHIR_MCP_STORAGE_BACKEND&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgres →  uses postgres_store
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;PostgreSQL uses &lt;code&gt;tsvector&lt;/code&gt;/&lt;code&gt;tsquery&lt;/code&gt; for full-text search instead of FTS5. The query interface is the same. The handlers don't know or care which backend is active.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson: Layer your architecture. The decision to separate storage from handlers paid for itself within weeks.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Coming Up in Part 3
&lt;/h2&gt;

&lt;p&gt;In the final post, we'll cover the operational side: how we test an MCP server, the developer experience with Tilt and Docker, lessons learned about deploying to different clients (Claude Desktop vs Cursor), and what we'd do differently if we started over today.&lt;/p&gt;




&lt;p&gt;*This is Part 2 of a 3-part series. &lt;br&gt;
&lt;a href="https://dev.to/chaets/mcp-the-missing-layer-between-ai-and-your-application-fdj"&gt;Part 0: MCP — The Missing Layer Between AI and Your Application →&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://dev.to/chaets/part-1-why-we-built-an-mcp-server-and-what-we-learned-before-writing-a-single-line-of-code-4mao"&gt;Part 1: Why We Built an MCP Server — And What We Learned Before Writing a Single Line of Code&lt;/a&gt;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://dev.to/chaets/part-2-building-the-engine-tools-uris-and-the-art-of-indexing-fhir-fi1"&gt;Part 2: Building the Engine — Tools, URIs, and the Art of Indexing FHIR&lt;/a&gt;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://dev.to/chaets/part-3-testing-deploying-and-lessons-learned-aa5"&gt;Part 3: Testing, Deploying, and Lessons Learned -&amp;gt; coming soon&lt;/a&gt;&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you to connect with me, let’s connect on &lt;a href="https://www.linkedin.com/in/chaets/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or drop me a message—I’d love to explore how I can help drive your data success!&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>fhir</category>
      <category>healthtech</category>
    </item>
    <item>
      <title>Part 1: Why We Built an MCP Server — And What We Learned Before Writing a Single Line of Code</title>
      <dc:creator>Chetan Gupta</dc:creator>
      <pubDate>Mon, 02 Mar 2026 04:39:47 +0000</pubDate>
      <link>https://forem.com/chaets/part-1-why-we-built-an-mcp-server-and-what-we-learned-before-writing-a-single-line-of-code-4mao</link>
      <guid>https://forem.com/chaets/part-1-why-we-built-an-mcp-server-and-what-we-learned-before-writing-a-single-line-of-code-4mao</guid>
      <description>&lt;p&gt;&lt;em&gt;A three-part series on building our first Model Context Protocol server for healthcare interoperability.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem That Wouldn't Go Away
&lt;/h2&gt;

&lt;p&gt;If you've ever worked in healthcare tech, you know the feeling: someone asks an AI assistant — Claude, ChatGPT, Copilot, whatever — a question about FHIR (Fast Healthcare Interoperability Resources), and the answer is &lt;em&gt;close&lt;/em&gt; but dangerously wrong. Maybe it hallucinates a field that doesn't exist in R4. Maybe it confuses a US Core profile with a base resource. Maybe it confidently describes an element that was removed two versions ago.&lt;/p&gt;

&lt;p&gt;This isn't the AI's fault. FHIR is a vast, versioned specification. The core spec alone has hundreds of StructureDefinitions, ValueSets, and CodeSystems. Layer on Implementation Guides (IGs) like US Core, and you're dealing with thousands of artifacts across multiple versions (R4, R4B, R5). No language model has all of that committed to memory with version-level precision.&lt;/p&gt;

&lt;p&gt;We kept running into this problem on our team. We'd be deep in implementation work — mapping clinical data, validating resources, reviewing profiles — and every time we turned to an AI for help, we had to mentally fact-check every response against the actual specification. It was exhausting.&lt;/p&gt;

&lt;p&gt;So we asked ourselves: &lt;strong&gt;what if the AI could just &lt;em&gt;look it up&lt;/em&gt;?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not from a web search. Not from its training data. From the actual, versioned, canonical FHIR packages sitting right on our machine.&lt;/p&gt;

&lt;p&gt;That's how &lt;code&gt;fhir-mcp&lt;/code&gt; was born.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why MCP? (And Why Not Just an API?)
&lt;/h2&gt;

&lt;p&gt;Before we chose the Model Context Protocol, we considered the obvious alternatives:&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 1: Fine-tune a model on FHIR specs
&lt;/h3&gt;

&lt;p&gt;We dismissed this quickly. FHIR evolves. New IGs are published constantly. Fine-tuning is expensive, slow, and creates a snapshot in time. We needed something that could reflect the state of &lt;em&gt;your&lt;/em&gt; local packages — whatever you've got downloaded today.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 2: RAG (Retrieval-Augmented Generation) pipeline
&lt;/h3&gt;

&lt;p&gt;This was tempting. Embed all the JSON, throw it in a vector store, retrieve context at query time. But we realized two things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FHIR resources are highly structured JSON, not prose. Embedding-based search over deeply nested JSON objects loses the structural relationships that matter most.&lt;/li&gt;
&lt;li&gt;We didn't just want "related text chunks." We wanted the AI to be able to call specific, typed operations: "get me the Patient StructureDefinition from R4," "search across all indexed resources for 'blood pressure,'" "diff the Observation resource between R4 and R5."&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Option 3: Build a REST API and tell the user to paste results
&lt;/h3&gt;

&lt;p&gt;This works, but it breaks the flow. The whole point was to let the AI &lt;em&gt;autonomously&lt;/em&gt; look things up during a conversation — not to make the human be the middleware.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why MCP Won
&lt;/h3&gt;

&lt;p&gt;MCP is purpose-built for exactly this: giving AI models structured access to external data and tools. Instead of building a generic API and hoping the AI figures out how to use it, MCP lets you declare:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tools&lt;/strong&gt;: Functions the AI can call with typed inputs. "Here's a function called &lt;code&gt;fhir.search&lt;/code&gt; that takes a query string and optional filters and returns matching FHIR resources."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resources&lt;/strong&gt;: Read-only data the AI can access via URIs. "Here's &lt;code&gt;fhir://R4/StructureDefinition/Patient&lt;/code&gt; — read it to get the Patient definition."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompts&lt;/strong&gt;: Reusable prompt templates. "Here's a prompt called &lt;code&gt;summarize_profile&lt;/code&gt; that guides you to explain a FHIR profile in plain language."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The AI doesn't need to know &lt;em&gt;how&lt;/em&gt; we indexed the data, or where the SQLite database lives, or how the JSON was normalized. It just sees a clean interface of tools it can call.&lt;/p&gt;

&lt;p&gt;And critically: &lt;strong&gt;MCP is transport-agnostic&lt;/strong&gt;. The same server can talk to Claude Desktop over stdio, to Cursor over stdio, or to a web client over HTTP. We wouldn't have to rewrite anything when switching clients.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architectural Decisions We Made on Day One
&lt;/h2&gt;

&lt;p&gt;Before writing any code, we spent time on design decisions that would shape everything downstream. Here's what we chose and &lt;em&gt;why&lt;/em&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Decision 1: Local-First, Read-Only
&lt;/h3&gt;

&lt;p&gt;We made a hard rule: &lt;strong&gt;this server will never write data, and it will never call external APIs at runtime.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Why? Because this is a healthcare context. We're indexing StructureDefinitions, not patient data — but even so, the principle matters. If you're building developer tooling in health tech, you want to be able to say "this thing runs entirely on your machine with zero network calls" without an asterisk.&lt;/p&gt;

&lt;p&gt;This also made the architecture simpler. No auth, no API keys, no rate limits, no network error handling in the hot path. The server boots, reads from a local SQLite database, and responds. That's it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Decision 2: Index First, Serve Second
&lt;/h3&gt;

&lt;p&gt;We realized early that "just read the JSON files at query time" wouldn't scale. A full FHIR R4 package has thousands of JSON files. Searching them by scanning the filesystem on every query would be unacceptably slow.&lt;/p&gt;

&lt;p&gt;So we split the system into two phases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Index phase&lt;/strong&gt; (offline): Read every FHIR package, extract metadata from each resource, and store it in a SQLite database with FTS5 (full-text search). This runs once, before the server starts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serve phase&lt;/strong&gt; (runtime): The MCP server only talks to the SQLite database. Fast, predictable, no filesystem scanning.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This was one of our best decisions. It meant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The indexer could be ugly and slow — it only runs once.&lt;/li&gt;
&lt;li&gt;The server could be fast and simple — it only does SQL queries.&lt;/li&gt;
&lt;li&gt;We could later swap SQLite for PostgreSQL without touching the server code (and we did).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Decision 3: One Handler Per Tool, Pydantic for Everything
&lt;/h3&gt;

&lt;p&gt;We debated putting all tool logic in one big handler file. We're glad we didn't.&lt;/p&gt;

&lt;p&gt;Each MCP tool gets its own file. Each file defines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;Pydantic model&lt;/strong&gt; for the tool's input&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;handler function&lt;/strong&gt; that takes the validated input and returns a result&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;Tool object&lt;/strong&gt; that bundles the name, input model, and handler together&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's why this pattern matters:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Validation happens before logic.&lt;/strong&gt; If an AI sends garbage input, Pydantic catches it and returns a structured error. The handler never sees invalid data. This is crucial when your caller is an AI — they &lt;em&gt;will&lt;/em&gt; send unexpected inputs, and you need to fail cleanly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Each tool is independently testable.&lt;/strong&gt; You can unit test the search handler without spinning up the transport layer. You can test the diff handler without having any other tools registered.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Adding a new tool is mechanical.&lt;/strong&gt; Create a file, define a Pydantic model, write the handler, register it in the tool registry. No touching the transport layer, no modifying the main server loop.&lt;/p&gt;

&lt;p&gt;Here's a simplified example of what one handler looks like conceptually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌───────────────────────────────────────┐
│  FhirSearchInput (Pydantic Model)     │
│  ├── query: str                       │
│  ├── version: Optional[str]           │
│  ├── kind: Optional[str]              │
│  └── top_n: int = 10                  │
├───────────────────────────────────────┤
│  fhir_search_handler(input) -&amp;gt; list   │
│  └── Calls into SQLite FTS5 search    │
├───────────────────────────────────────┤
│  Tool("fhir.search", model, handler)  │
│  └── Registered in TOOL_REGISTRY      │
└───────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Decision 4: Registry Pattern for Discovery
&lt;/h3&gt;

&lt;p&gt;MCP requires the server to respond to &lt;code&gt;list_tools&lt;/code&gt;, &lt;code&gt;list_resources&lt;/code&gt;, and &lt;code&gt;list_prompts&lt;/code&gt; requests. The client needs to know what's available before it can call anything.&lt;/p&gt;

&lt;p&gt;We used a simple dictionary registry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;TOOL_REGISTRY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fhir.get_definition&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;fhir_get_definition_tool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fhir.search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;fhir_search_tool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ig.list&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ig_list_tool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is deliberately low-tech. No decorators, no metaclasses, no auto-discovery. Just a dictionary. When the transport layer receives &lt;code&gt;list_tools&lt;/code&gt;, it returns the keys. When it receives &lt;code&gt;invoke_tool&lt;/code&gt;, it looks up the tool by name and calls it.&lt;/p&gt;

&lt;p&gt;Why not something fancier? Because &lt;strong&gt;we wanted to see the full list of tools in one place&lt;/strong&gt;. When you're building an MCP server, the tool inventory is your API surface. Making it explicit and visible in a single file means any developer can open that one file and understand the entire capability set of the server.&lt;/p&gt;

&lt;h3&gt;
  
  
  Decision 5: Transport as a Thin Layer
&lt;/h3&gt;

&lt;p&gt;The transport layer (stdio, HTTP) should do as little as possible. Its job is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Read a JSON-RPC request from the wire (stdin or HTTP body).&lt;/li&gt;
&lt;li&gt;Route it to the right handler.&lt;/li&gt;
&lt;li&gt;Write the JSON-RPC response back.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All business logic lives in the handlers. All data access lives in the storage layer. The transport is just plumbing.&lt;/p&gt;

&lt;p&gt;This was validated when we added HTTP transport for development. The handler code didn't change at all. We just wrote a new way to receive requests and send responses. The HTTP server even reuses the same tool registry and the same routing logic.&lt;/p&gt;

&lt;p&gt;The architecture looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────┐
│                  TRANSPORT LAYER                │
│  ┌───────────────┐    ┌──────────────────────┐  │
│  │  stdio (prod) │    │  HTTP (dev/testing)  │  │
│  └──────┬────────┘    └──────────┬───────────┘  │
│         │                        │              │
│         └───────────┬────────────┘              │
│                     ▼                           │
│           ┌─────────────────┐                   │
│           │  Request Router │                   │
│           └────────┬────────┘                   │
│                    │                            │
├────────────────────┼────────────────────────────┤
│              REGISTRY LAYER                     │
│  ┌──────────┬──────┴──────┬───────────┐         │
│  │  Tools   │  Resources  │  Prompts  │         │
│  └────┬─────┘             └───────────┘         │
│       │                                         │
├───────┼─────────────────────────────────────────┤
│       │          HANDLER LAYER                  │
│  ┌────┴─────────────────────────────────┐       │
│  │  fhir.get_definition                 │       │
│  │  fhir.search                         │       │
│  │  ig.list                             │       │
│  │  uscore.get_profile                  │       │
│  │  fhir.diff_versions                  │       │
│  │  validate.instance                   │       │
│  └────┬─────────────────────────────────┘       │
│       │                                         │
├───────┼─────────────────────────────────────────┤
│       │         PACKAGES LAYER                  │
│  ┌────┴──────────────────────────────────────┐  │
│  │  fhir_index (loaders, normalize, search,  │  │
│  │             storage)                      │  │
│  │  fhir_diff, fhir_validate, uri_scheme     │  │
│  │  shared (models, cache, schemas)          │  │
│  └────┬──────────────────────────────────────┘  │
│       │                                         │
│       ▼                                         │
│  ┌──────────────┐                               │
│  │  SQLite/PG   │                               │
│  │  (FTS index) │                               │
│  └──────────────┘                               │
└─────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Hardest Lesson: Designing for an AI Caller is Different
&lt;/h2&gt;

&lt;p&gt;Here's something that surprised us. When you build a traditional API, your caller is a human developer who reads documentation, understands your mental model, and crafts requests thoughtfully.&lt;/p&gt;

&lt;p&gt;When your caller is an AI, everything changes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool naming matters enormously.&lt;/strong&gt; We learned that names like &lt;code&gt;fhir.get_definition&lt;/code&gt; and &lt;code&gt;fhir.search&lt;/code&gt; aren't just organizational — they're what the AI uses to decide &lt;em&gt;which tool to call&lt;/em&gt;. A vague name like &lt;code&gt;lookup&lt;/code&gt; or &lt;code&gt;query&lt;/code&gt; would lead to the AI guessing wrong. Namespaced, descriptive names (&lt;code&gt;fhir.get_definition&lt;/code&gt;, &lt;code&gt;uscore.get_profile&lt;/code&gt;, &lt;code&gt;fhir.diff_versions&lt;/code&gt;) gave the AI clear signals about when to use each tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Input schemas are the AI's documentation.&lt;/strong&gt; The Pydantic model for each tool isn't just for validation — it's what the AI reads to understand what inputs are expected. Field names, types, and defaults all serve as implicit documentation. We named fields like &lt;code&gt;version&lt;/code&gt;, &lt;code&gt;kind&lt;/code&gt;, &lt;code&gt;name&lt;/code&gt;, &lt;code&gt;top_n&lt;/code&gt; rather than abbreviations like &lt;code&gt;v&lt;/code&gt;, &lt;code&gt;k&lt;/code&gt;, &lt;code&gt;n&lt;/code&gt;, &lt;code&gt;limit&lt;/code&gt; because the AI interprets these names to understand their meaning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Return shape consistency matters.&lt;/strong&gt; Every tool returns a dict with predictable keys. The AI learns patterns quickly — if one tool returns &lt;code&gt;{"meta": {...}}&lt;/code&gt; and another returns &lt;code&gt;{"result": [...]}&lt;/code&gt;, it adapts. But inconsistency within a single tool across different call patterns (sometimes returning a list, sometimes a dict, sometimes a string) confuses it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Truncation is a feature, not a bug.&lt;/strong&gt; FHIR StructureDefinitions can be enormous — tens of thousands of characters of nested JSON. Sending the full thing back would blow the AI's context window. We learned to truncate payloads by default and only include the full JSON when explicitly requested (&lt;code&gt;include_json: true&lt;/code&gt;), and even then, cap it at a reasonable size.&lt;/p&gt;




&lt;h2&gt;
  
  
  What We Didn't Build (And Why)
&lt;/h2&gt;

&lt;p&gt;Equally important to what we built is what we deliberately left out of v0.1:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No authentication.&lt;/strong&gt; This is a local-first, single-user tool. Auth would add complexity for zero benefit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No write operations.&lt;/strong&gt; The AI can look things up, not modify them. This was a safety and simplicity choice.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No network calls at runtime.&lt;/strong&gt; Packages are fetched and indexed offline. The running server is fully air-gapped.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No custom FHIR SDK.&lt;/strong&gt; We considered using existing FHIR Python libraries but decided raw JSON + SQLite was simpler, faster, and gave us full control over what we indexed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No schema validation at the FHIR level.&lt;/strong&gt; We have a &lt;code&gt;validate.instance&lt;/code&gt; tool, but it's deliberately a stub. Proper FHIR validation is an enormous problem (profiles, extensions, invariants, terminology binding). We wanted the tool to exist in the interface — to signal future intent — without pretending we'd solved it.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Setting Up: The Toolchain Choices
&lt;/h2&gt;

&lt;p&gt;A few notes on tooling, because they shaped the developer experience:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Python 3.13+ with &lt;code&gt;uv&lt;/code&gt;&lt;/strong&gt;: We chose Python because FHIR is a data-heavy domain and Python's ecosystem for data manipulation is unmatched. We used &lt;code&gt;uv&lt;/code&gt; for dependency management — it's fast, it respects &lt;code&gt;pyproject.toml&lt;/code&gt;, and it doesn't fight you. No &lt;code&gt;requirements.txt&lt;/code&gt; files, no virtualenv scripts. Just &lt;code&gt;uv sync&lt;/code&gt; and go.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pydantic v2&lt;/strong&gt;: For input validation and data modeling. Pydantic v2 is significantly faster than v1 and integrates cleanly with &lt;code&gt;pydantic-settings&lt;/code&gt; for environment-based configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQLite with FTS5&lt;/strong&gt;: For the search index. SQLite is zero-config, ships with Python, and FTS5 gives us full-text search without standing up Elasticsearch. For a local-first tool, this is perfect.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;orjson&lt;/code&gt;&lt;/strong&gt;: For JSON serialization/deserialization. FHIR resources are large JSON objects, and &lt;code&gt;orjson&lt;/code&gt; is measurably faster than the stdlib &lt;code&gt;json&lt;/code&gt; module. In a server that's mostly reading and writing JSON, this matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  Coming Up in Part 2
&lt;/h2&gt;

&lt;p&gt;In the next post, we'll get into the actual implementation: how we built the indexer, designed the URI scheme, implemented the tool handlers, and wired everything together through the transport layer. We'll share the specific patterns that worked (and the ones we had to throw away).&lt;/p&gt;




&lt;p&gt;*This is Part 1 of a 3-part series. &lt;br&gt;
&lt;a href="https://dev.to/chaets/mcp-the-missing-layer-between-ai-and-your-application-fdj"&gt;Part 0: MCP — The Missing Layer Between AI and Your Application →&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://dev.to/chaets/part-1-why-we-built-an-mcp-server-and-what-we-learned-before-writing-a-single-line-of-code-4mao"&gt;Part 1: Why We Built an MCP Server — And What We Learned Before Writing a Single Line of Code&lt;/a&gt;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://dev.to/chaets/part-2-building-the-engine-tools-uris-and-the-art-of-indexing-fhir-fi1"&gt;Part 2: Building the Engine — Tools, URIs, and the Art of Indexing FHIR&lt;/a&gt;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;&lt;a href="https://dev.to/chaets/part-3-testing-deploying-and-lessons-learned-aa5"&gt;Part 3: Testing, Deploying, and Lessons Learned -&amp;gt; coming soon&lt;/a&gt;&lt;/em&gt;
If you to connect with me, let’s connect on &lt;a href="https://www.linkedin.com/in/chaets/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or drop me a message—I’d love to explore how I can help drive your data success!&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>mcp</category>
      <category>interoperability</category>
      <category>ai</category>
      <category>python</category>
    </item>
    <item>
      <title>Part 0: MCP — The Missing Layer Between AI and Your Application</title>
      <dc:creator>Chetan Gupta</dc:creator>
      <pubDate>Sat, 21 Feb 2026 15:07:31 +0000</pubDate>
      <link>https://forem.com/chaets/mcp-the-missing-layer-between-ai-and-your-application-fdj</link>
      <guid>https://forem.com/chaets/mcp-the-missing-layer-between-ai-and-your-application-fdj</guid>
      <description>&lt;p&gt;&lt;em&gt;A prequel to my three-part series on building an MCP server. This post stands on its own — no code, no codebase required. Just the idea that changed how we think about AI integration.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  AI Has a Context Problem
&lt;/h2&gt;

&lt;p&gt;Let's start with an uncomfortable truth: the AI you're chatting with right now doesn't know your application.&lt;/p&gt;

&lt;p&gt;It doesn't know your database schema. It doesn't know which API version you're running in production. It doesn't know that your team renamed &lt;code&gt;user_id&lt;/code&gt; to &lt;code&gt;account_id&lt;/code&gt; six months ago, or that your FHIR implementation uses US Core 5.0.1, not 6.1.0, or that the &lt;code&gt;Observation&lt;/code&gt; resource in your system carries a custom extension for lab accession numbers.&lt;/p&gt;

&lt;p&gt;The AI knows &lt;em&gt;a lot about the world in general&lt;/em&gt;. But it knows &lt;em&gt;almost nothing about your world in particular&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;And this isn't a failure of AI. It's a failure of plumbing.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Way We Integrate AI Today Is Backwards
&lt;/h2&gt;

&lt;p&gt;Think about how most teams add AI to their workflow today:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Copy some context from your app (a schema, a log snippet, an error message).&lt;/li&gt;
&lt;li&gt;Paste it into an AI chat window.&lt;/li&gt;
&lt;li&gt;Hope the AI interprets it correctly.&lt;/li&gt;
&lt;li&gt;Read the response and mentally cross-reference it against reality.&lt;/li&gt;
&lt;li&gt;Repeat.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is the human-as-middleware pattern. &lt;strong&gt;You&lt;/strong&gt; are the integration layer between the AI and your application. You ferry data back and forth, translate context, and validate every response because the AI has no independent way to check its own answers.&lt;/p&gt;

&lt;p&gt;It works. Kind of. But it doesn't scale. And in domains where precision matters — healthcare, finance, infrastructure, compliance — "kind of works" is a liability.&lt;/p&gt;

&lt;p&gt;Consider what happens when a developer asks an AI assistant:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"What are the required fields in a FHIR R4 Patient resource?"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The AI might answer from its training data. Maybe it's right. Maybe it's describing R3 fields. Maybe it's mixing in elements from a US Core profile without saying so. Maybe it hallucinated a field that never existed. The developer has no way to tell without opening the specification themselves — which defeats the purpose of asking the AI in the first place.&lt;/p&gt;

&lt;p&gt;Now imagine the AI could do this instead:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Let me look that up."&lt;/em&gt;&lt;br&gt;
&lt;em&gt;(calls &lt;code&gt;fhir.get_definition&lt;/code&gt; with version=R4, kind=StructureDefinition, name=Patient)&lt;/em&gt;&lt;br&gt;
&lt;em&gt;"Here's the Patient resource from the R4 specification. The required elements are..."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Same question. But now the answer is grounded in the actual specification, not a statistical approximation of it. The AI didn't guess — it looked it up. Just like you would.&lt;/p&gt;

&lt;p&gt;That's what MCP enables.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is MCP, Actually?
&lt;/h2&gt;

&lt;p&gt;MCP stands for &lt;strong&gt;Model Context Protocol&lt;/strong&gt;. It's an open protocol — originally developed by Anthropic and now an open standard — that defines how AI models communicate with external tools, data sources, and services.&lt;/p&gt;

&lt;p&gt;But that description buries the lead. Here's what MCP actually is in practice:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP is a contract between an AI and the systems it can interact with.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That contract has three parts:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Tools — "Here are functions you can call"
&lt;/h3&gt;

&lt;p&gt;A tool is a typed function that the AI can invoke. You define the name, the inputs (with types and descriptions), and what it returns. The AI sees this contract and decides when to call the tool during a conversation.&lt;/p&gt;

&lt;p&gt;Think of it like giving the AI an API client — but instead of REST endpoints with ambiguous documentation, each tool has a strict schema that the AI can reason about.&lt;/p&gt;

&lt;p&gt;Example of what a tool &lt;em&gt;means&lt;/em&gt; to the AI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"I have a tool called fhir.search.
 It takes a query string and optional filters.
 It returns a list of matching FHIR resources.
 I should use this when the user asks about FHIR resources
 and I'm not sure of the exact name or want to explore."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The AI isn't reading documentation to figure this out. The tool's name, its input field names, its types — all of that &lt;em&gt;is&lt;/em&gt; the documentation. The schema is the interface.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Resources — "Here is data you can read"
&lt;/h3&gt;

&lt;p&gt;Resources are read-only data items identified by URIs. Unlike tools (which are actions), resources are data you can look at. The AI can request a resource by URI and get back structured content.&lt;/p&gt;

&lt;p&gt;Think of resources as a filesystem the AI can browse:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fhir://R4/StructureDefinition/Patient     → the Patient definition
fhir://R5/StructureDefinition/Observation → the Observation definition
uscore://5.0.1/StructureDefinition/us-core-patient → the US Core Patient profile
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The AI doesn't need to know where these live on disk or how they're stored. It just requests a URI and gets data back.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Prompts — "Here's how to approach a task"
&lt;/h3&gt;

&lt;p&gt;Prompts are reusable templates that guide the AI on how to use tools and present results. They're the "playbook" that says: "When someone asks you to summarize a FHIR profile, here's the approach..."&lt;/p&gt;

&lt;p&gt;Prompts are the least understood part of MCP, but they're important. They bridge the gap between raw tool output (structured data) and what the human actually needs (an explanation, a comparison, a recommendation).&lt;/p&gt;




&lt;h2&gt;
  
  
  Why MCP Matters for Application Development
&lt;/h2&gt;

&lt;p&gt;Here's the argument I want to make: &lt;strong&gt;every non-trivial application should eventually expose an MCP interface.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not because it's trendy. Because the alternative — expecting AI to understand your application from general knowledge — will increasingly become a bottleneck.&lt;/p&gt;

&lt;p&gt;Let me make the case through five observations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observation 1: AI is already in your team's workflow
&lt;/h3&gt;

&lt;p&gt;Whether you've officially "adopted AI" or not, your developers are using Claude, ChatGPT, Copilot, or Cursor every day. They're asking it about your codebase, your APIs, your domain. And the AI is answering from general knowledge — which means it's getting your specifics wrong a non-trivial percentage of the time.&lt;/p&gt;

&lt;p&gt;MCP lets you meet the AI where it already is. Instead of fighting the fact that developers use AI, you make the AI more useful by giving it access to your actual systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observation 2: Context stuffing doesn't scale
&lt;/h3&gt;

&lt;p&gt;The common workaround for AI's lack of context is to paste relevant information into the prompt. "Here's my schema. Here's the error log. Here's the config file." This is context stuffing, and it has hard limits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context window limits.&lt;/strong&gt; Even with 200K token models, you can't paste your entire codebase.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Relevance filtering.&lt;/strong&gt; The human has to decide what's relevant &lt;em&gt;before&lt;/em&gt; asking the question, which assumes they already know the answer's shape.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Staleness.&lt;/strong&gt; The pasted context is a snapshot. If the schema changed yesterday and you pasted last week's version, the AI's answer is wrong.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MCP replaces context stuffing with &lt;strong&gt;context fetching&lt;/strong&gt;. The AI asks for what it needs, when it needs it, from the live source. No human in the loop. No stale snapshots.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observation 3: Structured tools beat unstructured context
&lt;/h3&gt;

&lt;p&gt;There's a fundamental difference between giving an AI a blob of text and giving it a typed tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unstructured context:&lt;/strong&gt; "Here's a JSON file with 3,000 lines of FHIR StructureDefinitions. Somewhere in there is the information about the Patient resource."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structured tool:&lt;/strong&gt; "Call &lt;code&gt;fhir.get_definition(version='R4', kind='StructureDefinition', name='Patient')&lt;/code&gt; and you'll get exactly the Patient definition with metadata."&lt;/p&gt;

&lt;p&gt;The unstructured approach makes the AI do the work of parsing, searching, and disambiguating. The structured approach makes the &lt;em&gt;server&lt;/em&gt; do that work — where it can use proper indexing, query optimization, and validation — and gives the AI a clean result.&lt;/p&gt;

&lt;p&gt;This is the same lesson the industry learned with databases decades ago. You don't give users a flat file and tell them to grep for what they need. You give them a query interface. MCP is the query interface for AI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observation 4: AI clients are converging on MCP
&lt;/h3&gt;

&lt;p&gt;Claude Desktop supports MCP natively. Cursor supports MCP. VS Code is adding MCP support. The ecosystem is converging on this protocol as the standard way for AI assistants to interact with external systems.&lt;/p&gt;

&lt;p&gt;This means building an MCP server isn't a bet on one AI provider. It's an investment that works across every MCP-compatible client. Write once, work everywhere — the same server handles Claude, Cursor, and whatever comes next.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observation 5: The best time to build an MCP server is before you need one
&lt;/h3&gt;

&lt;p&gt;Here's a pattern we see:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A team starts using AI for development.&lt;/li&gt;
&lt;li&gt;AI gives wrong answers about the team's specific domain.&lt;/li&gt;
&lt;li&gt;The team compensates with manual context stuffing and mental fact-checking.&lt;/li&gt;
&lt;li&gt;Months pass. The workarounds become exhausting.&lt;/li&gt;
&lt;li&gt;Someone says "we should build a tool for this."&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The teams that build the MCP server at step 2 save months of accumulated friction. The ones that wait until step 5 have to retrofit it while already being frustrated.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Motivation Behind my Project
&lt;/h2&gt;

&lt;p&gt;I work in healthcare interoperability. My domain is FHIR — the standard that governs how health data is structured and exchanged between systems. It's a specification that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Has &lt;strong&gt;hundreds of resource types&lt;/strong&gt; (Patient, Observation, Condition, MedicationRequest, ...).&lt;/li&gt;
&lt;li&gt;Spans &lt;strong&gt;multiple versions&lt;/strong&gt; (R4, R4B, R5) with subtle but important differences between them.&lt;/li&gt;
&lt;li&gt;Is extended by &lt;strong&gt;Implementation Guides&lt;/strong&gt; (US Core, Da Vinci, mCODE, ...) that add constraints, profiles, and extensions.&lt;/li&gt;
&lt;li&gt;Is deeply &lt;strong&gt;structural&lt;/strong&gt; — a StructureDefinition has elements, types, cardinality constraints, slicing rules, invariants, and bindings to terminology.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is exactly the kind of domain where AI confidently gives &lt;em&gt;almost-right&lt;/em&gt; answers. And in healthcare, almost-right is dangerous. A developer who implements a resource mapping based on a hallucinated field name creates a real interoperability bug — one that might not surface until clinical data flows through the wrong path.&lt;/p&gt;

&lt;p&gt;We needed the AI to stop guessing and start looking things up.&lt;/p&gt;

&lt;p&gt;But we also wanted something broader than a single-purpose tool. We wanted to validate an approach: &lt;strong&gt;can you take a complex, versioned, deeply structured specification and make it available to AI in a way that's fast, local, and useful?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The answer is yes. And the approach generalizes.&lt;/p&gt;




&lt;h2&gt;
  
  
  MCP Is Not Just for FHIR
&lt;/h2&gt;

&lt;p&gt;Everything we built for FHIR could be applied to any domain with these characteristics:&lt;/p&gt;

&lt;h3&gt;
  
  
  Complex, versioned specifications
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenAPI/Swagger specs&lt;/strong&gt;: An MCP server that lets AI look up your API endpoints, request/response schemas, and versioning — from the actual spec file, not from memory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database schemas&lt;/strong&gt;: An MCP server that queries your database metadata (tables, columns, types, relationships, indexes) so the AI can write correct SQL without you pasting the schema.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure-as-Code&lt;/strong&gt;: An MCP server that reads your Terraform state, CloudFormation templates, or Kubernetes manifests so the AI understands your actual infrastructure, not a generic tutorial version.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Regulatory or compliance frameworks
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;HIPAA, SOC2, GDPR&lt;/strong&gt;: An MCP server that lets AI look up specific regulatory requirements, controls, and your organization's compliance status.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clinical terminology&lt;/strong&gt;: SNOMED CT, LOINC, ICD-10 — enormous code systems that AI can't memorize but could search and retrieve through tools.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Internal knowledge
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Internal documentation&lt;/strong&gt;: An MCP server that indexes your team's runbooks, architecture decision records, and onboarding guides.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configuration management&lt;/strong&gt;: An MCP server that reads your application's feature flags, environment configs, and deployment status.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pattern is always the same:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌───────────────────────────────────────────────────┐
│              Your Domain Knowledge                │
│                                                   │
│  Specifications, schemas, configs, terminology,   │
│  documentation, compliance requirements, ...      │
└───────────────────┬───────────────────────────────┘
                    │
                    ▼
┌───────────────────────────────────────────────────┐
│              Indexer / Loader                      │
│                                                   │
│  Extract, normalize, store in a searchable index  │
└───────────────────┬───────────────────────────────┘
                    │
                    ▼
┌───────────────────────────────────────────────────┐
│              MCP Server                           │
│                                                   │
│  Tools: lookup, search, compare, validate         │
│  Resources: addressable items via URIs            │
│  Prompts: guidance on how to use the output       │
└───────────────────┬───────────────────────────────┘
                    │
                    ▼
┌───────────────────────────────────────────────────┐
│              AI Client                            │
│                                                   │
│  Claude Desktop, Cursor, VS Code, custom apps...  │
│  Calls tools, reads resources, follows prompts    │
└───────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What Changes When AI Can Look Things Up
&lt;/h2&gt;

&lt;p&gt;When we shipped the first working version of our FHIR MCP server and plugged it into Claude Desktop, something shifted in how we worked.&lt;/p&gt;

&lt;h3&gt;
  
  
  Before MCP:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;"Claude, what elements are in FHIR R4 Patient?" → &lt;em&gt;Read response, open spec to verify, correct two errors, paste corrections back&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;"What's different about Observation between R4 and R5?" → &lt;em&gt;Claude gives a plausible but unverifiable answer. Spend 20 minutes diffing specs manually.&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;"Does US Core require Patient.identifier?" → &lt;em&gt;Claude says yes confidently. Is it right? Open the IG, find the profile, check the cardinality. Claude was right this time, but you had to check.&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  After MCP:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;"Claude, what elements are in FHIR R4 Patient?" → &lt;em&gt;Claude calls &lt;code&gt;fhir.get_definition&lt;/code&gt;, returns the actual definition, summarizes it. No need to verify — it's from the spec.&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;"What's different about Observation between R4 and R5?" → &lt;em&gt;Claude calls &lt;code&gt;fhir.diff_versions&lt;/code&gt;, gets the actual differences, explains them.&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;"Does US Core require Patient.identifier?" → &lt;em&gt;Claude calls &lt;code&gt;uscore.get_profile&lt;/code&gt;, reads the constraint, answers with the actual cardinality and must-support flag.&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The mental overhead disappeared. Not partially — &lt;em&gt;entirely&lt;/em&gt;. We stopped being the middleware between the AI and the specification. The AI handled it.&lt;/p&gt;

&lt;p&gt;And here's the subtle thing: &lt;strong&gt;we started asking better questions.&lt;/strong&gt; When you trust that the AI's answers are grounded, you ask more ambitious questions. You ask follow-ups. You explore edge cases. The conversation becomes collaborative instead of adversarial.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Counterarguments (And Why We Disagree)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  "Just use a bigger context window"
&lt;/h3&gt;

&lt;p&gt;Context windows are getting larger, and some people argue that you should just dump everything into the prompt. But this misses several points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bigger context ≠ better retrieval.&lt;/strong&gt; Studies consistently show that models struggle to find specific information in very long contexts ("lost in the middle" problem). A targeted tool call beats a 200K-token haystack.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost scales with context.&lt;/strong&gt; Larger prompts cost more per request. A tool call that returns 500 tokens of targeted data is cheaper than pre-loading 50,000 tokens of "just in case" context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency scales with context.&lt;/strong&gt; Time-to-first-token increases with prompt length. Small, focused tool calls keep the conversation snappy.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  "Just use RAG"
&lt;/h3&gt;

&lt;p&gt;RAG is great for unstructured documents. But when your data is structured — schemas, specifications, typed resources — RAG's embedding-and-chunk approach loses structural relationships. You can't meaningfully embed a 40KB JSON StructureDefinition and expect cosine similarity to find "the cardinality of Patient.identifier.system."&lt;/p&gt;

&lt;p&gt;MCP tools can do targeted, structured queries. RAG can't. They're complementary, but for structured domains, MCP is the right tool.&lt;/p&gt;

&lt;h3&gt;
  
  
  "We'll wait for AI to get better"
&lt;/h3&gt;

&lt;p&gt;AI will get better. Models will memorize more. But the long tail of domain-specific, versioned, organization-specific knowledge will always exceed what's in training data. Your database schema isn't in GPT-5's training set. Your FHIR IG published last month isn't either. MCP bridges this gap regardless of how smart the model gets.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Building an MCP server is too much work"
&lt;/h3&gt;

&lt;p&gt;Our first working version was ~500 lines of Python across the server, handlers, and transport. The indexer was ~100 lines. We used SQLite (ships with Python), Pydantic (one pip install), and JSON-RPC (a trivial protocol). No infrastructure. No cloud services. No frameworks.&lt;/p&gt;

&lt;p&gt;If you can build a CLI tool, you can build an MCP server. The protocol is simpler than REST.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Think About Your First MCP Server
&lt;/h2&gt;

&lt;p&gt;If you're considering building an MCP server, here's the decision framework we'd recommend:&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Identify the "fact-checking tax"
&lt;/h3&gt;

&lt;p&gt;Where does your team spend time verifying AI outputs against ground truth? Every time someone copies a schema into a prompt, checks an API response against documentation, or says "let me verify that" after reading an AI answer — that's the tax. The bigger the tax, the stronger the case for MCP.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Identify the data source
&lt;/h3&gt;

&lt;p&gt;What's the ground truth? A specification? A database? An API? A set of configuration files? This is what your MCP server will index or query.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Identify the operations
&lt;/h3&gt;

&lt;p&gt;What does the AI need to do with that data? Usually it's some combination of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lookup&lt;/strong&gt;: Get a specific item by identifier.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Search&lt;/strong&gt;: Find items matching a query.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compare&lt;/strong&gt;: Diff two versions or configurations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validate&lt;/strong&gt;: Check if something conforms to a specification.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;List&lt;/strong&gt;: Enumerate available items.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each of these becomes an MCP tool.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Start with one tool
&lt;/h3&gt;

&lt;p&gt;Don't build all six tools on day one. Build the lookup tool. Get it working in Claude Desktop or Cursor. Use it for a week. You'll immediately discover what the second tool should be.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: Iterate based on what the AI gets wrong
&lt;/h3&gt;

&lt;p&gt;Watch how the AI uses your tools. When it calls the wrong tool, that's a signal that your tool names or schemas need clarification. When it sends bad inputs, that's a signal that your input model needs better field names or defaults. When it presents the output poorly, that's a signal that you need a prompt.&lt;/p&gt;

&lt;p&gt;MCP servers are living things. They improve through use.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where This Is Headed
&lt;/h2&gt;

&lt;p&gt;We believe MCP (or something like it) will become standard infrastructure for software teams. Not today, maybe not this year, but soon. The same way that APIs became standard for service-to-service communication, MCP will become standard for AI-to-application communication.&lt;/p&gt;

&lt;p&gt;The teams that build MCP servers early will have a head start. They'll have cleaner tool interfaces, better prompt patterns, and more experience with AI-as-caller design. They'll also have developers who trust their AI assistants because those assistants actually give correct, grounded answers.&lt;/p&gt;

&lt;p&gt;Our FHIR MCP server was a proof of concept. It works. It's useful. And it proved to us that the pattern generalizes. If your domain has complex, structured, versioned knowledge that AI gets wrong — and what domain doesn't? — building an MCP server is one of the highest-leverage investments you can make.&lt;/p&gt;

&lt;p&gt;If you understand MCP deeply, integrating any new data sources/application/AI context becomes significantly easier.&lt;br&gt;
If you would like to connect with me, let’s connect on &lt;a href="https://www.linkedin.com/in/chaets/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or drop me a message—I’d love to explore how I can help drive your data/AI success!&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post is a prequel to our three-part implementation series:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://dev.to/chaets/part-1-why-we-built-an-mcp-server-and-what-we-learned-before-writing-a-single-line-of-code-4mao"&gt;Part 1: Why We Built an MCP Server — And What We Learned Before Writing a Single Line of Code&lt;/a&gt;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://dev.to/chaets/part-2-building-the-engine-tools-uris-and-the-art-of-indexing-fhir-fi1"&gt;Part 2: Building the Engine — Tools, URIs, and the Art of Indexing FHIR&lt;/a&gt;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://dev.to/chaets/part-3-testing-deploying-and-lessons-learned-aa5"&gt;Part 3: Testing, Deploying, and Lessons Learned -&amp;gt; coming soon&lt;/a&gt;&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;If you to connect with me, let’s connect on &lt;a href="https://www.linkedin.com/in/chaets/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or drop me a message—I’d love to explore how I can help drive your data success!&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>fhir</category>
      <category>interoperability</category>
      <category>ai</category>
    </item>
    <item>
      <title>Anthropic Claude Opus 4.6</title>
      <dc:creator>Chetan Gupta</dc:creator>
      <pubDate>Fri, 06 Feb 2026 03:06:12 +0000</pubDate>
      <link>https://forem.com/chaets/anthropic-claude-opus-46-4808</link>
      <guid>https://forem.com/chaets/anthropic-claude-opus-46-4808</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwxm6h72qvretk2q3vq3u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwxm6h72qvretk2q3vq3u.png" alt=" " width="800" height="797"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Anthropic has officially released Claude Opus 4.6 — and the benchmark numbers speak volumes. &lt;/p&gt;

&lt;p&gt;Key Performance Highlights&lt;br&gt;
• GDPval-AA Elo: Opus 4.6 outperforms its predecessor (Opus 4.5) by ~190 Elo points and beats OpenAI’s GPT-5.2 by ~144 Elo points on economically valuable knowledge work tasks.&lt;br&gt;&lt;br&gt;
• Terminal-Bench 2.0 (agentic coding): Achieves a leading score of ~65.4%, placing it at the top of real-world coding and task automation benchmarks.&lt;br&gt;&lt;br&gt;
• Higher context retention: On an 8-needle 1M variant of MRCR v2 (needle-in-haystack benchmark), Opus 4.6 scores 76% vs. ~18.5% for Sonnet 4.5 — a massive uplift in long-context retrieval.&lt;br&gt;&lt;br&gt;
• BigLaw Bench (legal reasoning): Achieves 90.2%, including 40% perfect scores and 84% above 0.8.&lt;br&gt;&lt;br&gt;
• Across internal evaluations, Opus 4.6 leads on deep multi-step reasoning, search, and agentic workflows compared with other frontier models.  &lt;/p&gt;

&lt;p&gt;What this means:&lt;br&gt;
This isn’t just an incremental update — it’s a meaningful leap in real-world task performance for coding, reasoning, multi-agent planning, and large-context work. Whether you’re building AI agents, automating workflows, or tackling enterprise knowledge work, these numbers signal greater reliability and capability on complex tasks.&lt;/p&gt;

&lt;p&gt;Opus 4.6 now sets a new benchmark bar for frontier LLM performance — especially where depth, persistence, and real-world reasoning matter most.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>If you are in the influencer market, whether it’s #tech, #health, #realestate, etc., it doesn’t matter what industry it is; what matters is that you should have an opinion about everything!!! Everything means everything!!!</title>
      <dc:creator>Chetan Gupta</dc:creator>
      <pubDate>Sun, 01 Feb 2026 20:20:27 +0000</pubDate>
      <link>https://forem.com/chaets/if-you-are-in-the-influencer-market-whether-its-tech-health-realestate-etc-it-doesnt-5h8g</link>
      <guid>https://forem.com/chaets/if-you-are-in-the-influencer-market-whether-its-tech-health-realestate-etc-it-doesnt-5h8g</guid>
      <description></description>
    </item>
    <item>
      <title>How One Can Start Their Journey in Data Engineering</title>
      <dc:creator>Chetan Gupta</dc:creator>
      <pubDate>Sat, 10 Jan 2026 19:30:44 +0000</pubDate>
      <link>https://forem.com/chaets/how-one-start-their-journey-in-data-engineering-a77</link>
      <guid>https://forem.com/chaets/how-one-start-their-journey-in-data-engineering-a77</guid>
      <description>&lt;p&gt;Data Engineering is everywhere today. Behind every dashboard, AI model, recommendation system, or business report, there is a data engineer making sure data flows correctly.&lt;/p&gt;

&lt;p&gt;If you’re a &lt;strong&gt;complete newbie&lt;/strong&gt;, the biggest challenge isn’t learning—it’s &lt;strong&gt;knowing where to start&lt;/strong&gt;. The internet is full of roadmaps, tools, and opinions, and it’s easy to feel lost before you even begin.&lt;/p&gt;

&lt;p&gt;This blog gives you a &lt;strong&gt;clear, simple, step-by-step starting point&lt;/strong&gt; for your Data Engineering journey.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. First, Understand What Data Engineering Is (In Simple Words)
&lt;/h2&gt;

&lt;p&gt;Before learning anything technical, understand the role.&lt;/p&gt;

&lt;p&gt;A data engineer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Collects data from different sources&lt;/li&gt;
&lt;li&gt;Stores it in an organized way&lt;/li&gt;
&lt;li&gt;Cleans and transforms raw data&lt;/li&gt;
&lt;li&gt;Makes data available for analysis and applications&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Think of data engineers as plumbers of data&lt;/strong&gt;—they build pipelines so data flows smoothly and reliably.
&lt;/h3&gt;

&lt;p&gt;You don’t need to be great at math or AI to start. You need curiosity and consistency.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Don’t Start with Tools — Start with Basics
&lt;/h2&gt;

&lt;p&gt;Many beginners make the mistake of jumping directly into tools like Spark, Kafka, or Airflow. This leads to confusion.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Learn Basic Computer &amp;amp; Data Concepts
&lt;/h3&gt;

&lt;p&gt;You should understand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What files are (CSV, JSON)&lt;/li&gt;
&lt;li&gt;What databases are&lt;/li&gt;
&lt;li&gt;What rows and columns mean&lt;/li&gt;
&lt;li&gt;What “data” actually looks like&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This builds confidence before coding.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Learn SQL First (Your Best Friend)
&lt;/h2&gt;

&lt;p&gt;If you learn &lt;strong&gt;only one skill&lt;/strong&gt; to start data engineering, make it SQL.&lt;/p&gt;

&lt;p&gt;SQL helps you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read data&lt;/li&gt;
&lt;li&gt;Filter data&lt;/li&gt;
&lt;li&gt;Group and summarize data&lt;/li&gt;
&lt;li&gt;Join multiple tables&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Start with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SELECT&lt;/li&gt;
&lt;li&gt;WHERE&lt;/li&gt;
&lt;li&gt;ORDER BY&lt;/li&gt;
&lt;li&gt;GROUP BY&lt;/li&gt;
&lt;li&gt;JOIN&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  You don’t need advanced SQL on day one. Simple queries are powerful.
&lt;/h3&gt;




&lt;h2&gt;
  
  
  4. Learn One Programming Language (Python is Best)
&lt;/h2&gt;

&lt;p&gt;You don’t need to be a hardcore programmer.&lt;/p&gt;

&lt;p&gt;With Python, focus on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Variables and loops&lt;/li&gt;
&lt;li&gt;Functions&lt;/li&gt;
&lt;li&gt;Reading and writing files&lt;/li&gt;
&lt;li&gt;Lists and dictionaries&lt;/li&gt;
&lt;li&gt;Basic error handling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Python is used everywhere in data engineering, and it’s beginner-friendly.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Understand How Data Moves (Core Idea of Data Engineering)
&lt;/h2&gt;

&lt;p&gt;Once you know basic SQL and Python, learn &lt;strong&gt;how data flows&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Ask questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Where does data come from?&lt;/li&gt;
&lt;li&gt;Where is it stored?&lt;/li&gt;
&lt;li&gt;How is it cleaned?&lt;/li&gt;
&lt;li&gt;Who uses it?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Learn these concepts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Batch data (run once a day)&lt;/li&gt;
&lt;li&gt;Real-time data (streams)&lt;/li&gt;
&lt;li&gt;ETL (Extract, Transform, Load)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You don’t need advanced tools yet—just the idea.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Learn About Data Storage (At a High Level)
&lt;/h2&gt;

&lt;p&gt;Understand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What a database is&lt;/li&gt;
&lt;li&gt;What a data warehouse is&lt;/li&gt;
&lt;li&gt;What cloud storage means&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You don’t need to master cloud immediately—just know that modern data lives in the cloud.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Build Small, Simple Projects (Very Important)
&lt;/h2&gt;

&lt;p&gt;Learning without building causes fear and confusion.&lt;/p&gt;

&lt;h3&gt;
  
  
  Beginner Project Ideas:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Read a CSV file using Python&lt;/li&gt;
&lt;li&gt;Store data in a database&lt;/li&gt;
&lt;li&gt;Write SQL queries to analyze it&lt;/li&gt;
&lt;li&gt;Clean messy data&lt;/li&gt;
&lt;li&gt;Automate a simple script&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Even tiny projects count. Progress &amp;gt; perfection.
&lt;/h3&gt;




&lt;h2&gt;
  
  
  8. Learn Git &amp;amp; Basic Engineering Habits
&lt;/h2&gt;

&lt;p&gt;Start thinking like an engineer early:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use Git to save your code&lt;/li&gt;
&lt;li&gt;Write small, clean scripts&lt;/li&gt;
&lt;li&gt;Add comments&lt;/li&gt;
&lt;li&gt;Handle errors properly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These habits matter more than tools.&lt;/p&gt;




&lt;h2&gt;
  
  
  9. Ignore the Tool Hype (For Now)
&lt;/h2&gt;

&lt;p&gt;As a newbie, &lt;strong&gt;you do NOT need&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spark&lt;/li&gt;
&lt;li&gt;Kafka&lt;/li&gt;
&lt;li&gt;Kubernetes&lt;/li&gt;
&lt;li&gt;Complex cloud architectures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those come later.&lt;/p&gt;

&lt;p&gt;Focus on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SQL&lt;/li&gt;
&lt;li&gt;Python&lt;/li&gt;
&lt;li&gt;Data concepts&lt;/li&gt;
&lt;li&gt;Building confidence&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  10. Be Patient — Data Engineering Takes Time
&lt;/h2&gt;

&lt;p&gt;Data engineering is not learned in weeks. It’s built over months.&lt;/p&gt;

&lt;p&gt;You will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Feel confused&lt;/li&gt;
&lt;li&gt;Break things&lt;/li&gt;
&lt;li&gt;Forget syntax&lt;/li&gt;
&lt;li&gt;Rethink your path&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s normal.&lt;/p&gt;

&lt;p&gt;Consistency beats intelligence in this field.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pro Tip: Start Interviewing Early (Even If You Feel “Not Ready”)
&lt;/h2&gt;

&lt;p&gt;One of the &lt;strong&gt;most underrated learning strategies&lt;/strong&gt; for beginners in Data Engineering is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Start interviewing for data engineering roles early — even before you think you’re ready.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is not about getting the job immediately.&lt;br&gt;
This is about &lt;strong&gt;gaining real-world experience of what the market wants&lt;/strong&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  Why Interviewing Early Is Powerful
&lt;/h3&gt;

&lt;p&gt;When you interview, you learn things no course or roadmap can teach you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What companies &lt;em&gt;actually&lt;/em&gt; ask for&lt;/li&gt;
&lt;li&gt;Which skills matter most right now&lt;/li&gt;
&lt;li&gt;How deep your knowledge needs to be&lt;/li&gt;
&lt;li&gt;Where your gaps are&lt;/li&gt;
&lt;li&gt;How to explain your thinking clearly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each interview becomes &lt;strong&gt;market research&lt;/strong&gt; for your learning journey.&lt;/p&gt;




&lt;h3&gt;
  
  
  Interviews Show You the Real Trends in Data Engineering
&lt;/h3&gt;

&lt;p&gt;By giving interviews, you’ll quickly notice patterns like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SQL is asked &lt;strong&gt;almost everywhere&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Python basics are expected, not advanced algorithms&lt;/li&gt;
&lt;li&gt;Questions focus on &lt;strong&gt;data pipelines&lt;/strong&gt;, not theory&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Scenario-based questions are very common:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;“How would you design a pipeline for this?”&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;“How would you handle late-arriving data?”&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;“How do you ensure data quality?”&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;This tells you &lt;strong&gt;what to prioritize&lt;/strong&gt; in your learning.&lt;/p&gt;




&lt;h3&gt;
  
  
  Interviews Are a Feedback Loop
&lt;/h3&gt;

&lt;p&gt;Think of interviews like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You interview&lt;/li&gt;
&lt;li&gt;You get stuck or rejected&lt;/li&gt;
&lt;li&gt;You note what you didn’t know&lt;/li&gt;
&lt;li&gt;You learn exactly that&lt;/li&gt;
&lt;li&gt;You interview again — stronger&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This loop is incredibly effective.&lt;/p&gt;

&lt;p&gt;=&amp;gt; Many successful data engineers failed &lt;strong&gt;multiple interviews&lt;/strong&gt; before landing their first role.&lt;/p&gt;




&lt;h3&gt;
  
  
  What Interviewers Look for in Entry-Level Data Engineers
&lt;/h3&gt;

&lt;p&gt;For beginners, interviewers usually care about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clear understanding of data basics&lt;/li&gt;
&lt;li&gt;Strong SQL fundamentals&lt;/li&gt;
&lt;li&gt;Ability to explain your projects&lt;/li&gt;
&lt;li&gt;Logical thinking&lt;/li&gt;
&lt;li&gt;Willingness to learn&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They &lt;strong&gt;do not expect mastery&lt;/strong&gt; of every tool.&lt;/p&gt;




&lt;h3&gt;
  
  
  Don’t Wait for “Perfection”
&lt;/h3&gt;

&lt;p&gt;A common beginner mistake is thinking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“I’ll start applying once I know everything.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That day never comes.&lt;/p&gt;

&lt;p&gt;Instead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Apply early&lt;/li&gt;
&lt;li&gt;Interview often&lt;/li&gt;
&lt;li&gt;Learn from rejection&lt;/li&gt;
&lt;li&gt;Improve intentionally&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each interview adds &lt;strong&gt;experience&lt;/strong&gt;, confidence, and direction.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thought (Very Important)
&lt;/h2&gt;

&lt;p&gt;Learning data engineering in isolation is slow.&lt;br&gt;
Learning data engineering &lt;strong&gt;with market feedback&lt;/strong&gt; is fast.&lt;/p&gt;

&lt;p&gt;So while you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Learn SQL&lt;/li&gt;
&lt;li&gt;Practice Python&lt;/li&gt;
&lt;li&gt;Build small projects&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Also start interviewing.&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;It will shape your skills, sharpen your thinking, and prepare you for the real world of data engineering.&lt;/p&gt;

&lt;p&gt;If you remember only one thing, remember this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Start small. Learn slowly. Build continuously.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Data Engineering rewards people who:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Understand fundamentals&lt;/li&gt;
&lt;li&gt;Think logically&lt;/li&gt;
&lt;li&gt;Care about data quality&lt;/li&gt;
&lt;li&gt;Keep learning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you stay consistent, even as a newbie, you &lt;em&gt;can&lt;/em&gt; grow into a strong data engineer. If you to connect with me, let’s connect on &lt;a href="https://www.linkedin.com/in/chaets/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or drop me a message—I’d love to explore how I can help drive your data success!&lt;/p&gt;

</description>
      <category>programming</category>
      <category>dataengineering</category>
      <category>sql</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Why Idempotence Is So Important in Data Engineering</title>
      <dc:creator>Chetan Gupta</dc:creator>
      <pubDate>Sun, 14 Dec 2025 00:10:26 +0000</pubDate>
      <link>https://forem.com/chaets/why-idempotency-is-so-important-in-data-engineering-24mj</link>
      <guid>https://forem.com/chaets/why-idempotency-is-so-important-in-data-engineering-24mj</guid>
      <description>&lt;h2&gt;
  
  
  Introductions:
&lt;/h2&gt;

&lt;p&gt;In data engineering, &lt;strong&gt;things fail all the time&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Jobs crash halfway. Networks timeout. Airflow retries tasks. Kafka replays messages. Backfills rerun months of data. And sometimes… someone just clicks “Run” again.&lt;/p&gt;

&lt;p&gt;In this messy, failure-prone world, &lt;strong&gt;idempotency&lt;/strong&gt; is what keeps your data correct, trustworthy, and sane.&lt;/p&gt;

&lt;p&gt;Let’s explore &lt;strong&gt;what idempotency is&lt;/strong&gt;, &lt;strong&gt;why it’s critical&lt;/strong&gt;, and &lt;strong&gt;how to design for it&lt;/strong&gt;, with practical do’s and don’ts.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Idempotency?
&lt;/h2&gt;

&lt;p&gt;A process is &lt;strong&gt;idempotent&lt;/strong&gt; if:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Running it once or running it multiple times produces &lt;strong&gt;the same final result&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Simple Example
&lt;/h3&gt;

&lt;p&gt;If a job processes data for &lt;code&gt;2025-01-01&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run it once → correct result&lt;/li&gt;
&lt;li&gt;Run it twice → same correct result&lt;/li&gt;
&lt;li&gt;Run it ten times → still the same result&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No duplicates. No inflation. No corruption.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Idempotency Matters in Data Engineering
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Failures Are Normal, Not Exceptional
&lt;/h3&gt;

&lt;p&gt;Modern data systems are distributed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spark jobs fail due to executor loss&lt;/li&gt;
&lt;li&gt;Airflow tasks retry automatically&lt;/li&gt;
&lt;li&gt;Cloud storage has eventual consistency&lt;/li&gt;
&lt;li&gt;APIs timeout mid-request&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without idempotency:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A retry can &lt;strong&gt;double-count data&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Partial writes can corrupt tables&lt;/li&gt;
&lt;li&gt;“Fixing” failures creates new bugs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Idempotency turns retries from a &lt;strong&gt;risk&lt;/strong&gt; into a &lt;strong&gt;feature&lt;/strong&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Schedulers and Orchestrators Rely on It
&lt;/h3&gt;

&lt;p&gt;Tools like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Airflow&lt;/li&gt;
&lt;li&gt;Dagster&lt;/li&gt;
&lt;li&gt;Prefect&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;assume&lt;/strong&gt; tasks can be retried safely.&lt;/p&gt;

&lt;p&gt;If your task is not idempotent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retries silently introduce data errors&lt;/li&gt;
&lt;li&gt;“Green DAGs” produce bad data&lt;/li&gt;
&lt;li&gt;Debugging becomes nearly impossible&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Idempotency is the &lt;strong&gt;contract&lt;/strong&gt; between your code and your scheduler.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Backfills and Reprocessing Become Safe
&lt;/h3&gt;

&lt;p&gt;Backfills are unavoidable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Logic changes&lt;/li&gt;
&lt;li&gt;Bug fixes&lt;/li&gt;
&lt;li&gt;Late-arriving data&lt;/li&gt;
&lt;li&gt;Schema evolution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With idempotent pipelines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can rerun historical data confidently&lt;/li&gt;
&lt;li&gt;You don’t need manual cleanup&lt;/li&gt;
&lt;li&gt;You avoid “special backfill code paths”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without idempotency:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every backfill is a high-risk operation&lt;/li&gt;
&lt;li&gt;Engineers fear touching old data&lt;/li&gt;
&lt;li&gt;Technical debt piles up fast&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  4. Exactly-Once Semantics Are Rare (and Expensive)
&lt;/h3&gt;

&lt;p&gt;In theory, we want &lt;strong&gt;exactly-once processing&lt;/strong&gt;.&lt;br&gt;
In practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Distributed systems mostly provide &lt;strong&gt;at-least-once&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Exactly-once guarantees are complex and costly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Idempotency lets you &lt;strong&gt;embrace at-least-once delivery&lt;/strong&gt; safely.&lt;/p&gt;

&lt;p&gt;Instead of fighting the system, you design your logic to handle duplicates gracefully.&lt;/p&gt;


&lt;h3&gt;
  
  
  5. Data Trust Depends on It
&lt;/h3&gt;

&lt;p&gt;Nothing erodes trust faster than:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Metrics that change every rerun&lt;/li&gt;
&lt;li&gt;Counts that slowly drift upward&lt;/li&gt;
&lt;li&gt;Dashboards that don’t match yesterday&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Idempotent pipelines ensure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deterministic outputs&lt;/li&gt;
&lt;li&gt;Reproducible results&lt;/li&gt;
&lt;li&gt;Confidence in downstream analytics&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Common Places Where Idempotency Breaks
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;INSERT INTO table VALUES (...)&lt;/code&gt; without constraints&lt;/li&gt;
&lt;li&gt;Appending files blindly to object storage&lt;/li&gt;
&lt;li&gt;Incremental loads without deduplication&lt;/li&gt;
&lt;li&gt;Updates without stable primary keys&lt;/li&gt;
&lt;li&gt;Side effects (emails, API calls) inside data jobs&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Design Patterns for Idempotency
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. Partitioned Writes (Overwrite, Don’t Append)
&lt;/h3&gt;

&lt;p&gt;Instead of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;staging_sales&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Prefer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="n"&gt;OVERWRITE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt; &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'2025-01-01'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;staging_sales&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'2025-01-01'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The partition is replaced, not duplicated&lt;/li&gt;
&lt;li&gt;Reruns are safe&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  2. Use Deterministic Keys
&lt;/h3&gt;

&lt;p&gt;Always have a &lt;strong&gt;stable primary key&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;order_id&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;user_id + event_time&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Hash of business attributes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deduplicate on read&lt;/li&gt;
&lt;li&gt;Merge on write&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;MERGE&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;
&lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;staging_users&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
&lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="n"&gt;MATCHED&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="n"&gt;MATCHED&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  3. Make Transformations Pure
&lt;/h3&gt;

&lt;p&gt;A pure transformation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Depends only on inputs&lt;/li&gt;
&lt;li&gt;Produces the same output every time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Avoid:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;CURRENT_TIMESTAMP&lt;/code&gt; inside transforms&lt;/li&gt;
&lt;li&gt;Random UUID generation during processing&lt;/li&gt;
&lt;li&gt;External API calls during transformations&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  4. Track Processing State Explicitly
&lt;/h3&gt;

&lt;p&gt;For streaming and incremental jobs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Store offsets&lt;/li&gt;
&lt;li&gt;Store watermarks&lt;/li&gt;
&lt;li&gt;Store processed timestamps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But design them so:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reprocessing the same window does not change results&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  5. Separate Side Effects from Data Processing
&lt;/h3&gt;

&lt;p&gt;Data writes should be idempotent.&lt;br&gt;
Side effects should be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Downstream&lt;/li&gt;
&lt;li&gt;Explicit&lt;/li&gt;
&lt;li&gt;Carefully controlled&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First write data safely&lt;/li&gt;
&lt;li&gt;Then trigger notifications based on final state&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Do’s and Don’ts of Idempotent Data Pipelines
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ✅ Do’s
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;✅ Design every job assuming it &lt;strong&gt;will be retried&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;✅ Use overwrite or merge instead of blind appends&lt;/li&gt;
&lt;li&gt;✅ Make jobs deterministic and repeatable&lt;/li&gt;
&lt;li&gt;✅ Use primary keys and deduplication logic&lt;/li&gt;
&lt;li&gt;✅ Make backfills a first-class use case&lt;/li&gt;
&lt;li&gt;✅ Log inputs, outputs, and checkpoints&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  ❌ Don’ts
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;❌ Assume “this job only runs once”&lt;/li&gt;
&lt;li&gt;❌ Append data without safeguards&lt;/li&gt;
&lt;li&gt;❌ Mix side effects with transformations&lt;/li&gt;
&lt;li&gt;❌ Depend on execution order for correctness&lt;/li&gt;
&lt;li&gt;❌ Use non-deterministic functions in core logic&lt;/li&gt;
&lt;li&gt;❌ Rely on humans to clean up duplicates&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  A Mental Model to Remember
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;If rerunning your pipeline scares you, it’s not idempotent.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A truly idempotent pipeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can be rerun anytime&lt;/li&gt;
&lt;li&gt;Produces the same result&lt;/li&gt;
&lt;li&gt;Turns failure recovery into a non-event&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Idempotency is not just a technical detail.&lt;br&gt;
It’s a &lt;strong&gt;design philosophy&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It makes systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More resilient&lt;/li&gt;
&lt;li&gt;Easier to operate&lt;/li&gt;
&lt;li&gt;Cheaper to maintain&lt;/li&gt;
&lt;li&gt;More trustworthy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In data engineering, where reprocessing is inevitable and failures are normal, &lt;strong&gt;idempotency is the difference between a fragile pipeline and a production-grade system&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Below is a &lt;strong&gt;practical, copy-pasteable checklist&lt;/strong&gt; teams can use during &lt;strong&gt;data pipeline design reviews, PR reviews, and post-incident audits&lt;/strong&gt;.&lt;br&gt;
It’s opinionated, short enough to be usable, but deep enough to catch real production issues.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bonus checklists: Idempotency Review Checklist for Data Pipelines
&lt;/h2&gt;

&lt;p&gt;Use this checklist to answer one core question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;“If this pipeline runs twice, will the result still be correct?”&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  1. Retry &amp;amp; Failure Safety
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; The pipeline must be safe under retries, partial failures, and restarts.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⬜ Can every task be retried without manual cleanup?&lt;/li&gt;
&lt;li&gt;⬜ What happens if the job fails halfway and reruns?&lt;/li&gt;
&lt;li&gt;⬜ Does the orchestrator (Airflow / Dagster / Prefect) retry tasks automatically?&lt;/li&gt;
&lt;li&gt;⬜ Are partial writes cleaned up or overwritten on retry?&lt;/li&gt;
&lt;li&gt;⬜ Is there a clear failure boundary (per partition, batch, or window)?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🚩 &lt;strong&gt;Red flag:&lt;/strong&gt; “We never retry this job.”&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Input Determinism
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Same inputs → same outputs.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⬜ Are inputs explicitly scoped (date, partition, offset, watermark)?&lt;/li&gt;
&lt;li&gt;⬜ Is the input source stable under reprocessing?&lt;/li&gt;
&lt;li&gt;⬜ Are late-arriving records handled deterministically?&lt;/li&gt;
&lt;li&gt;⬜ Is there protection against reading overlapping windows twice?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🚩 &lt;strong&gt;Red flag:&lt;/strong&gt; Inputs depend on “now”, “latest”, or implicit state.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Output Write Strategy
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Writing data should not create duplicates or drift.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⬜ Is the write strategy &lt;strong&gt;overwrite&lt;/strong&gt;, &lt;strong&gt;merge&lt;/strong&gt;, or &lt;strong&gt;upsert&lt;/strong&gt;?&lt;/li&gt;
&lt;li&gt;⬜ Are appends protected by deduplication or constraints?&lt;/li&gt;
&lt;li&gt;⬜ Is the output partitioned by a deterministic key (date, hour, batch_id)?&lt;/li&gt;
&lt;li&gt;⬜ Can a single partition be safely rewritten?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🚩 &lt;strong&gt;Red flag:&lt;/strong&gt; Blind &lt;code&gt;INSERT INTO&lt;/code&gt; or file appends with no safeguards.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Primary Keys &amp;amp; Deduplication
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; The system knows how to identify “the same record”.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⬜ Does each dataset have a well-defined primary or natural key?&lt;/li&gt;
&lt;li&gt;⬜ Is deduplication logic explicit and documented?&lt;/li&gt;
&lt;li&gt;⬜ Are keys stable across retries and backfills?&lt;/li&gt;
&lt;li&gt;⬜ Is deduplication enforced at read time, write time, or both?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🚩 &lt;strong&gt;Red flag:&lt;/strong&gt; “Duplicates shouldn’t happen.”&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Transformation Purity
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Transformations must be repeatable and predictable.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⬜ Are transformations deterministic?&lt;/li&gt;
&lt;li&gt;⬜ Are &lt;code&gt;CURRENT_TIMESTAMP&lt;/code&gt;, random UUIDs, or non-deterministic functions avoided?&lt;/li&gt;
&lt;li&gt;⬜ Are external API calls excluded from core transformations?&lt;/li&gt;
&lt;li&gt;⬜ Is business logic independent of execution order?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🚩 &lt;strong&gt;Red flag:&lt;/strong&gt; Output changes every time the job runs.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Incremental &amp;amp; Streaming Logic
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Incremental logic must tolerate reprocessing.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⬜ Are offsets, checkpoints, or watermarks stored reliably?&lt;/li&gt;
&lt;li&gt;⬜ Is reprocessing the same range safe?&lt;/li&gt;
&lt;li&gt;⬜ Is “at-least-once” delivery handled correctly?&lt;/li&gt;
&lt;li&gt;⬜ Can the pipeline replay historical data without corruption?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🚩 &lt;strong&gt;Red flag:&lt;/strong&gt; “We can’t replay this topic/table.”&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Backfill Readiness
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Backfills should be boring, not terrifying.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⬜ Can the pipeline be run for arbitrary historical ranges?&lt;/li&gt;
&lt;li&gt;⬜ Is backfill logic identical to regular logic?&lt;/li&gt;
&lt;li&gt;⬜ Does rerunning old partitions overwrite or merge cleanly?&lt;/li&gt;
&lt;li&gt;⬜ Are downstream consumers protected during backfills?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🚩 &lt;strong&gt;Red flag:&lt;/strong&gt; Special scripts or manual SQL for backfills.&lt;/p&gt;




&lt;h2&gt;
  
  
  8. Side Effects &amp;amp; External Actions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Data processing should not cause unintended external effects.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⬜ Are emails, webhooks, or API calls isolated from core data logic?&lt;/li&gt;
&lt;li&gt;⬜ Are side effects triggered only after successful completion?&lt;/li&gt;
&lt;li&gt;⬜ Are side effects idempotent themselves (dedup keys, request IDs)?&lt;/li&gt;
&lt;li&gt;⬜ Is there protection against double notifications?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🚩 &lt;strong&gt;Red flag:&lt;/strong&gt; Side effects inside transformation steps.&lt;/p&gt;




&lt;h2&gt;
  
  
  9. Observability &amp;amp; Validation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Idempotency issues should be detectable early.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⬜ Are row counts consistent across reruns?&lt;/li&gt;
&lt;li&gt;⬜ Are data quality checks rerun-safe?&lt;/li&gt;
&lt;li&gt;⬜ Are duplicates, nulls, and drift monitored?&lt;/li&gt;
&lt;li&gt;⬜ Is lineage clear for reruns and backfills?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🚩 &lt;strong&gt;Red flag:&lt;/strong&gt; No way to tell if data changed unexpectedly.&lt;/p&gt;




&lt;h2&gt;
  
  
  10. Human Factors &amp;amp; Documentation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Humans should not be part of correctness.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⬜ Is idempotency behavior documented?&lt;/li&gt;
&lt;li&gt;⬜ Can a new engineer safely rerun the pipeline?&lt;/li&gt;
&lt;li&gt;⬜ Are recovery steps automated, not manual?&lt;/li&gt;
&lt;li&gt;⬜ Is there a clear owner for data correctness?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🚩 &lt;strong&gt;Red flag:&lt;/strong&gt; “Ask Alice before rerunning.”&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Gate Question (Must Answer Yes)
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;⬜ &lt;strong&gt;Can we safely rerun this pipeline right now in production?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If the answer is &lt;strong&gt;no&lt;/strong&gt;, the pipeline is &lt;strong&gt;not idempotent&lt;/strong&gt; and needs redesign.&lt;/p&gt;




&lt;h2&gt;
  
  
  How Teams Should Use This Checklist
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;📌 &lt;strong&gt;Design reviews:&lt;/strong&gt; Before building pipelines&lt;/li&gt;
&lt;li&gt;🔍 &lt;strong&gt;PR reviews:&lt;/strong&gt; As a merge gate&lt;/li&gt;
&lt;li&gt;🚨 &lt;strong&gt;Post-incident reviews:&lt;/strong&gt; To prevent repeat failures&lt;/li&gt;
&lt;li&gt;🔁 &lt;strong&gt;Backfill planning:&lt;/strong&gt; Before rerunning historical data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Just tell me how your team works. If you to connect with me, let’s connect on &lt;a href="https://www.linkedin.com/in/chaets/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or drop me a message—I’d love to explore how I can help drive your data success!&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>dataengineering</category>
      <category>systemdesign</category>
      <category>etl</category>
    </item>
    <item>
      <title>REST API Calls for Data Engineers: A Practical Guide with Examples</title>
      <dc:creator>Chetan Gupta</dc:creator>
      <pubDate>Sun, 14 Dec 2025 00:02:48 +0000</pubDate>
      <link>https://forem.com/chaets/rest-api-calls-for-data-engineers-a-practical-guide-with-examples-5gd9</link>
      <guid>https://forem.com/chaets/rest-api-calls-for-data-engineers-a-practical-guide-with-examples-5gd9</guid>
      <description>&lt;h1&gt;
  
  
  REST API Calls for Data Engineers
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;As a &lt;strong&gt;Data Engineer&lt;/strong&gt;, you rarely work only with databases. Modern data pipelines frequently ingest data from &lt;strong&gt;REST APIs&lt;/strong&gt;—whether it’s pulling data from SaaS tools (Salesforce, Jira, Google Analytics), internal microservices, or third-party providers.&lt;/p&gt;

&lt;p&gt;Understanding how REST APIs work and how to interact with them efficiently is a &lt;strong&gt;core data engineering skill&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This blog covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What REST APIs are (briefly, practically)&lt;/li&gt;
&lt;li&gt;Common REST methods from a data engineering perspective&lt;/li&gt;
&lt;li&gt;Authentication patterns&lt;/li&gt;
&lt;li&gt;Pagination, filtering, and rate limiting&lt;/li&gt;
&lt;li&gt;Real-world examples using Python&lt;/li&gt;
&lt;li&gt;Best practices for production data pipelines&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What is a REST API (Data Engineer Perspective)
&lt;/h2&gt;

&lt;p&gt;REST (Representational State Transfer) APIs allow systems to communicate over &lt;strong&gt;HTTP&lt;/strong&gt; using standard methods.&lt;/p&gt;

&lt;p&gt;From a data engineer’s standpoint:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;REST APIs are &lt;strong&gt;data sources&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;JSON is the most common &lt;strong&gt;data format&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;APIs are often &lt;strong&gt;incremental, paginated, and rate-limited&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;APIs feed &lt;strong&gt;data lakes, warehouses, or streaming systems&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Core REST HTTP Methods You’ll Use
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Usage for Data Engineers&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GET&lt;/td&gt;
&lt;td&gt;Fetch data (most common)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;POST&lt;/td&gt;
&lt;td&gt;Submit parameters, create resources, complex queries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PUT&lt;/td&gt;
&lt;td&gt;Update existing resources&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DELETE&lt;/td&gt;
&lt;td&gt;Rarely used in pipelines&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In data engineering, &lt;strong&gt;GET&lt;/strong&gt; and &lt;strong&gt;POST&lt;/strong&gt; are used 90% of the time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Anatomy of a REST API Request
&lt;/h2&gt;

&lt;p&gt;A typical REST API call consists of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://api.example.com/v1/orders?start_date=2025-01-01&amp;amp;limit=100
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Components:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Base URL&lt;/strong&gt;: &lt;code&gt;https://api.example.com&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Endpoint&lt;/strong&gt;: &lt;code&gt;/v1/orders&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Query Parameters&lt;/strong&gt;: &lt;code&gt;start_date&lt;/code&gt;, &lt;code&gt;limit&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Headers&lt;/strong&gt;: Authentication, content type&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HTTP Method&lt;/strong&gt;: GET / POST&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Example 1: Simple GET Request (Fetching Data)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Use Case
&lt;/h3&gt;

&lt;p&gt;Fetch daily sales data from an external system.&lt;/p&gt;

&lt;h3&gt;
  
  
  API Request
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;GET https://api.company.com/v1/sales
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Python Example (requests library)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.company.com/v1/sales&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer YOUR_API_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Accept&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Typical JSON Response
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"sales"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"order_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;101&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"amount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;250.50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"currency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USD"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"order_date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2025-01-10"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This JSON is later:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Flattened&lt;/li&gt;
&lt;li&gt;Transformed&lt;/li&gt;
&lt;li&gt;Stored in a data lake or warehouse&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Example 2: Query Parameters (Filtering Data)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Use Case
&lt;/h3&gt;

&lt;p&gt;Pull &lt;strong&gt;incremental data&lt;/strong&gt; to avoid reprocessing historical records.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;GET /v1/sales?start_date=2025-01-01&amp;amp;end_date=2025-01-31
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Python Code
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2025-01-01&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2025-01-31&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;sales_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✅ &lt;strong&gt;Best Practice:&lt;/strong&gt; Always design pipelines to be &lt;strong&gt;incremental&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Example 3: POST Request (Complex Queries)
&lt;/h2&gt;

&lt;p&gt;Some APIs require &lt;strong&gt;POST&lt;/strong&gt; when filters are complex.&lt;/p&gt;

&lt;h3&gt;
  
  
  API Call
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST /v1/sales/search
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Payload
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"region"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"US"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"EU"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"min_amount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"date_range"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"from"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2025-01-01"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"to"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2025-01-31"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Python Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;region&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;US&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EU&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;min_amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;date_range&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;from&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2025-01-01&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2025-01-31&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Authentication Methods (Very Important)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. API Key Authentication
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;Authorization: ApiKey abc123
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Bearer Token (OAuth 2.0)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;Authorization: Bearer eyJhbGciOi...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Basic Auth (Less Secure)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;username&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🔐 &lt;strong&gt;Data Engineering Tip&lt;/strong&gt;&lt;br&gt;
Always store credentials in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Environment variables&lt;/li&gt;
&lt;li&gt;Secret managers (AWS Secrets Manager, Azure Key Vault)&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Example 4: Pagination (Very Common in APIs)
&lt;/h2&gt;

&lt;p&gt;Most APIs limit results per request.&lt;/p&gt;
&lt;h3&gt;
  
  
  API Response with Pagination
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"page"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"total_pages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Python Pagination Logic
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;all_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;page&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;all_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_pages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;

    &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;✅ &lt;strong&gt;Always handle pagination&lt;/strong&gt;, or you’ll silently miss data.&lt;/p&gt;


&lt;h2&gt;
  
  
  Example 5: Handling Rate Limits
&lt;/h2&gt;

&lt;p&gt;APIs often limit requests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;429 Too Many Requests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Retry Logic Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;📌 &lt;strong&gt;Production pipelines&lt;/strong&gt; should use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exponential backoff&lt;/li&gt;
&lt;li&gt;Retry limits&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Example 6: Error Handling (Critical for Pipelines)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;API failed with status &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Common HTTP Status Codes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;200&lt;/code&gt; – Success&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;400&lt;/code&gt; – Bad Request&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;401&lt;/code&gt; – Unauthorized&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;404&lt;/code&gt; – Not Found&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;500&lt;/code&gt; – Server Error&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  REST API Data Flow in a Data Pipeline
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;REST API
   ↓
Python / Spark Job
   ↓
Raw Zone (JSON)
   ↓
Transformation (Flattening, Cleaning)
   ↓
Data Warehouse (Snowflake / BigQuery / Redshift)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Best Practices for Data Engineers
&lt;/h2&gt;

&lt;p&gt;✔ Always design &lt;strong&gt;idempotent&lt;/strong&gt; pipelines&lt;br&gt;
✔ Log request/response metadata&lt;br&gt;
✔ Store raw API responses for reprocessing&lt;br&gt;
✔ Use incremental loads (timestamps, IDs)&lt;br&gt;
✔ Monitor failures and latency&lt;br&gt;
✔ Respect API rate limits&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;REST APIs are a &lt;strong&gt;primary data ingestion mechanism&lt;/strong&gt; for data engineers. Mastering REST calls—authentication, pagination, retries, and error handling—will make your pipelines &lt;strong&gt;reliable, scalable, and production-ready&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you understand REST APIs deeply, integrating any new data source becomes significantly easier.&lt;br&gt;
If you to connect with me, let’s connect on &lt;a href="https://www.linkedin.com/in/chaets/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or drop me a message—I’d love to explore how I can help drive your data success!&lt;/p&gt;

</description>
      <category>rest</category>
      <category>dataengineering</category>
      <category>json</category>
      <category>discuss</category>
    </item>
    <item>
      <title>From Policy to Code: How Leading Companies Operationalize Privacy</title>
      <dc:creator>Chetan Gupta</dc:creator>
      <pubDate>Wed, 26 Nov 2025 08:39:11 +0000</pubDate>
      <link>https://forem.com/chaets/from-policy-to-code-how-leading-companies-operationalize-privacy-13cj</link>
      <guid>https://forem.com/chaets/from-policy-to-code-how-leading-companies-operationalize-privacy-13cj</guid>
      <description>&lt;p&gt;Most companies still treat privacy as a policy problem.&lt;br&gt;
The best treat it as a systems problem.&lt;/p&gt;

&lt;p&gt;That difference — between writing rules and enforcing them — is what separates organizations that talk about responsible data use from those that actually achieve it.&lt;/p&gt;

&lt;p&gt;The Weekly Translation Failure&lt;/p&gt;

&lt;p&gt;Every week, legal, product, and engineering teams sit down to align on privacy and responsible data use. And every week, they run into the same challenge:&lt;br&gt;
no shared language.&lt;/p&gt;

&lt;p&gt;It’s not a communication problem.&lt;br&gt;
It’s a translation problem.&lt;/p&gt;

&lt;p&gt;A privacy policy that reads cleanly in a spec document becomes a maze of implementation questions the moment it meets code:&lt;br&gt;
    • How are user preferences modeled across systems?&lt;br&gt;
    • What’s a valid state change when consent is updated?&lt;br&gt;
    • What’s the source of truth when systems conflict?&lt;br&gt;
    • How do we avoid race conditions in enforcement?&lt;/p&gt;

&lt;p&gt;Policy teams speak in rights, obligations, and business rules.&lt;br&gt;
Engineers work in schemas, state machines, and system design.&lt;br&gt;
Product teams sit in the middle, trying to reconcile both worlds — often without the infrastructure to make alignment possible.&lt;/p&gt;

&lt;p&gt;The result?&lt;br&gt;
Requirements that feel legally sound but defy implementation.&lt;br&gt;
Code that compiles but misses the spirit or scope of compliance.&lt;/p&gt;

&lt;p&gt;The Missing Layer: A Shared Operational Foundation&lt;/p&gt;

&lt;p&gt;What’s missing isn’t collaboration — it’s a common operational foundation.&lt;br&gt;
A shared semantic layer that bridges policy intent and system behavior.&lt;/p&gt;

&lt;p&gt;This is why privacy must be treated as a systems problem.&lt;br&gt;
It can’t be solved in documents.&lt;br&gt;
It has to be enforced in code.&lt;/p&gt;

&lt;p&gt;That’s the core principle behind emerging privacy infrastructure — where legal definitions, business policies, and data models converge into a single executable framework. &lt;/p&gt;

&lt;p&gt;When obligations are expressed as code, they become:&lt;br&gt;
    • Reliable – enforced automatically, not manually interpreted.&lt;br&gt;
    • Scalable – applied consistently across systems and teams.&lt;br&gt;
    • Trustworthy – transparent, testable, and provable.&lt;/p&gt;

&lt;p&gt;When Policy Lives in Infrastructure&lt;/p&gt;

&lt;p&gt;When privacy is embedded directly in infrastructure, the dynamic between teams changes entirely:&lt;br&gt;
    • Legal can write once and enforce everywhere.&lt;br&gt;
    • Engineering ships faster with clarity and confidence.&lt;br&gt;
    • Product no longer has to choose between trust and velocity.&lt;/p&gt;

&lt;p&gt;That’s not just better governance — it’s a better growth model.&lt;/p&gt;

&lt;p&gt;Instead of being boxed in by complexity, teams gain the freedom to innovate safely with sensitive data — whether it’s for AI, analytics, personalization, or compliance.&lt;/p&gt;

&lt;p&gt;Privacy as a Competitive Advantage&lt;/p&gt;

&lt;p&gt;Enterprises that get this right stop playing defense with privacy.&lt;br&gt;
They build forward — turning trust into an operational advantage.&lt;/p&gt;

&lt;p&gt;Because when privacy becomes part of your stack, not just your policy binder, you don’t just comply.&lt;br&gt;
You scale responsibly.&lt;br&gt;
You innovate with confidence.&lt;br&gt;
And you turn privacy from a blocker into a feature of your growth model.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>privacy</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Templating Columns in dbt: Helpful path to forward</title>
      <dc:creator>Chetan Gupta</dc:creator>
      <pubDate>Sat, 18 Oct 2025 14:35:26 +0000</pubDate>
      <link>https://forem.com/chaets/templating-columns-in-dbt-helpful-path-to-forward-2g3l</link>
      <guid>https://forem.com/chaets/templating-columns-in-dbt-helpful-path-to-forward-2g3l</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Ship consistent, DRY models by generating column lists with Jinja, macros, and metadata.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Who this is for
&lt;/h2&gt;

&lt;p&gt;Analytics engineers and data engineers who want to avoid copy‑pasting column lists across models and instead &lt;strong&gt;generate them from templates&lt;/strong&gt; with dbt + Jinja.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you’ll build
&lt;/h2&gt;

&lt;p&gt;You’ll create a small dbt project that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Centralizes column definitions in &lt;strong&gt;macros&lt;/strong&gt; (with optional aliases/prefixes)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generates SELECT lists&lt;/strong&gt; from YAML metadata and adapter‑discovered schemas&lt;/li&gt;
&lt;li&gt;Applies &lt;strong&gt;policy‑style transforms&lt;/strong&gt; (e.g., lowercase emails) consistently&lt;/li&gt;
&lt;li&gt;Supports &lt;strong&gt;environment‑specific&lt;/strong&gt; or &lt;strong&gt;source‑specific&lt;/strong&gt; column sets&lt;/li&gt;
&lt;li&gt;Automatically &lt;strong&gt;tests&lt;/strong&gt; and &lt;strong&gt;documents&lt;/strong&gt; the templated columns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can copy/paste each step. By the end, you’ll have a reusable pattern you can drop into any dbt project.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;dbt (Core or Cloud) installed&lt;/li&gt;
&lt;li&gt;A warehouse connection configured (&lt;code&gt;profiles.yml&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Basic awareness of Jinja syntax&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Project structure
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.
├─ dbt_project.yml
├─ models/
│  ├─ marts/
│  │  └─ customers.sql
│  ├─ staging/
│  │  └─ stg_customers.sql
│  └─ schema.yml
└─ macros/
   ├─ columns/
   │  ├─ select_common_columns.sql
   │  ├─ policy_columns.sql
   │  ├─ yaml_columns.sql
   │  └─ discover_columns.sql
   └─ utilities.sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create the &lt;code&gt;macros/columns&lt;/code&gt; folder; we’ll populate it as we go.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1 — The simplest win: variables → columns
&lt;/h2&gt;

&lt;p&gt;Put repeatable column names in &lt;code&gt;dbt_project.yml&lt;/code&gt; as vars.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;dbt_project.yml&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my_project&lt;/span&gt;
&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1.0.0&lt;/span&gt;
&lt;span class="na"&gt;config-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;

&lt;span class="na"&gt;vars&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;common_columns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;id&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;created_at&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;updated_at&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;models/staging/stg_customers.sql&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="s1"&gt;', '&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;var&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'common_columns'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;}},&lt;/span&gt;
  &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;last_name&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'app'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'customers'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt; lightweight reuse without custom logic.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2 — Macro‑driven SELECT lists (with prefixes/aliases)
&lt;/h2&gt;

&lt;p&gt;Centralize the column list in a macro so you can pass a table alias and keep your SQL tidy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;macros/columns/select_common_columns.sql&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;macro&lt;/span&gt; &lt;span class="n"&gt;select_common_columns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;alias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="n"&gt;updated_at&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endmacro&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;models/marts/customers.sql&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;base&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'stg_customers'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="n"&gt;select_common_columns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'b.'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}},&lt;/span&gt;
  &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last_name&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;base&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why it’s great:&lt;/strong&gt; consistent, alias‑safe, easy to extend.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3 — Policy columns: apply consistent transforms
&lt;/h2&gt;

&lt;p&gt;Wrap transforms (PII handling, normalisation) in a macro and reuse them everywhere.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;macros/columns/policy_columns.sql&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;macro&lt;/span&gt; &lt;span class="n"&gt;customer_policy_columns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;alias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;({{&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="n"&gt;last_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="n"&gt;updated_at&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endmacro&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;models/marts/customers.sql&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="n"&gt;customer_policy_columns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'b.'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'stg_customers'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Pattern:&lt;/strong&gt; treat a column template like a “policy” you can apply repeatedly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4 — YAML‑driven column generation
&lt;/h2&gt;

&lt;p&gt;Keep column specs as metadata in &lt;code&gt;schema.yml&lt;/code&gt; and generate the SELECT list from it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;models/schema.yml&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
&lt;span class="na"&gt;models&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;stg_customers&lt;/span&gt;
    &lt;span class="na"&gt;columns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;id&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;email&lt;/span&gt;
        &lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;not_null&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
        &lt;span class="na"&gt;meta&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;transform&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lower({col})"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;first_name&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;last_name&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;created_at&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;updated_at&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;macros/columns/yaml_columns.sql&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;#&lt;/span&gt;
  &lt;span class="n"&gt;Generate&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;list&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="n"&gt;YAML&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
  &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="k"&gt;Reads&lt;/span&gt; &lt;span class="n"&gt;columns&lt;/span&gt; &lt;span class="k"&gt;defined&lt;/span&gt; &lt;span class="k"&gt;under&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;upstream&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="n"&gt;provided&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
  &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;Supports&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;per&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;column&lt;/span&gt; &lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="nv"&gt;`{col}`&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="n"&gt;replaced&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="k"&gt;column&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="o"&gt;#&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;macro&lt;/span&gt; &lt;span class="n"&gt;yaml_columns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nodes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="n"&gt;Fail&lt;/span&gt; &lt;span class="n"&gt;early&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="n"&gt;wrong&lt;/span&gt; &lt;span class="o"&gt;#&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="n"&gt;exceptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raise_compiler_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'yaml_columns: model '&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt; &lt;span class="n"&gt;model_name&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt; &lt;span class="s1"&gt;' not found in graph'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endif&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;rendered&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;values&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="k"&gt;transform&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'transform'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;none&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="k"&gt;transform&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'{col}'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt; &lt;span class="s1"&gt;' as '&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endif&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="n"&gt;rendered&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endfor&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="n"&gt;rendered&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;',&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;  '&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endmacro&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Usage&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="n"&gt;yaml_columns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'model.stg_customers'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'s.'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'stg_customers'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Notes&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;graph&lt;/code&gt; object is available at compile time and includes YAML column metadata.&lt;/li&gt;
&lt;li&gt;Use the fully qualified name: &lt;code&gt;model.&amp;lt;node_name&amp;gt;&lt;/code&gt; for models and &lt;code&gt;source.&amp;lt;source_name&amp;gt;.&amp;lt;table_name&amp;gt;&lt;/code&gt; for sources.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 5 — Discover columns dynamically from the warehouse
&lt;/h2&gt;

&lt;p&gt;Sometimes you need to reflect the actual schema at compile time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;macros/columns/discover_columns.sql&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;#&lt;/span&gt;
  &lt;span class="n"&gt;Gets&lt;/span&gt; &lt;span class="k"&gt;column&lt;/span&gt; &lt;span class="k"&gt;names&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="n"&gt;relation&lt;/span&gt; &lt;span class="k"&gt;at&lt;/span&gt; &lt;span class="n"&gt;compile&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;optionally&lt;/span&gt; &lt;span class="n"&gt;maps&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;filters&lt;/span&gt; &lt;span class="n"&gt;them&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="o"&gt;#&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;macro&lt;/span&gt; &lt;span class="n"&gt;discovered_columns_from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;include&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="n"&gt;exclude&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;cols&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;adapter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_columns_in_relation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="k"&gt;names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cols&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attribute&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'name'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;list&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="n"&gt;Apply&lt;/span&gt; &lt;span class="n"&gt;include&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;exclude&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="n"&gt;provided&lt;/span&gt; &lt;span class="o"&gt;#&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="n"&gt;include&lt;/span&gt; &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;include&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="k"&gt;names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;names&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'in'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;include&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;list&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endif&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="n"&gt;exclude&lt;/span&gt; &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;exclude&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="k"&gt;names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;names&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;reject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'in'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exclude&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;list&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endif&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;names&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'lower'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'string'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'trim'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'regex_replace'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'^'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;',&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;  '&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endmacro&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Usage&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;rel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'stg_customers'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="n"&gt;discovered_columns_from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exclude&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'_load_dt'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'c.'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="n"&gt;rel&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt; quickly mirror upstream schema, or build pass‑through layers.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Caveat:&lt;/strong&gt; &lt;code&gt;adapter.get_columns_in_relation&lt;/code&gt; runs at compile time; if the relation doesn’t exist yet (e.g., first run), create it once or fall back to a known list.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Step 6 — Column mapping &amp;amp; renaming at scale
&lt;/h2&gt;

&lt;p&gt;Create a mapping to rename raw columns to canonical names, with optional transforms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;macros/utilities.sql&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;macro&lt;/span&gt; &lt;span class="n"&gt;render_mapping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mapping&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="n"&gt;mapping&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;list&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="n"&gt;dicts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"EMAIL_ADDRESS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"lower({col})"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;#&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;mapping&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'transform'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;or&lt;/span&gt; &lt;span class="s1"&gt;'{col}'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="k"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'{col}'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'from'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expr&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt; &lt;span class="s1"&gt;' as '&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'to'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endfor&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;',&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;  '&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endmacro&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;models/staging/stg_customers.sql&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;mapping&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;"from"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"CUSTOMER_ID"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"to"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;"from"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"EMAIL_ADDRESS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"to"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"transform"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"lower({col})"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;"from"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"FIRST_NAME"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"to"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"first_name"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;"from"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"LAST_NAME"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"to"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"last_name"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;"from"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"CREATED_TS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"to"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"created_at"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;"from"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"UPDATED_TS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;"to"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;"updated_at"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="n"&gt;render_mapping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mapping&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'r.'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'app'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s1"&gt;'raw_customers'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Benefit:&lt;/strong&gt; auditably documents how raw columns map to curated names.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 7 — Environment / source specific templates
&lt;/h2&gt;

&lt;p&gt;Use vars or &lt;code&gt;target.name&lt;/code&gt; to toggle column sets across environments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;macros/columns/select_common_columns.sql&lt;/strong&gt; (extended)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;macro&lt;/span&gt; &lt;span class="n"&gt;select_common_columns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;alias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'prod'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'staging'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="n"&gt;updated_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="n"&gt;ingested_at&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="n"&gt;updated_at&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endif&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endmacro&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tip:&lt;/strong&gt; For source‑specific logic, branch on &lt;code&gt;this.name&lt;/code&gt; / &lt;code&gt;this.schema&lt;/code&gt; or pass a flag into the macro.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 8 — Tests &amp;amp; docs that follow your templates
&lt;/h2&gt;

&lt;p&gt;When you template columns, also template the &lt;strong&gt;tests&lt;/strong&gt; and &lt;strong&gt;descriptions&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;models/schema.yml&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
&lt;span class="na"&gt;models&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;customers&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Curated customers with standardized email and timestamps.&lt;/span&gt;
    &lt;span class="na"&gt;columns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;id&lt;/span&gt;
        &lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;unique&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;not_null&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;email&lt;/span&gt;
        &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Lower‑cased email address.&lt;/span&gt;
        &lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;not_null&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;accepted_values&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;values&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;@'&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt; &lt;span class="c1"&gt;# example; replace with proper regex test via package&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;created_at&lt;/span&gt;
        &lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;not_null&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Docs blocks&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="n"&gt;email_col&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;Email&lt;/span&gt; &lt;span class="n"&gt;stored&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;lowercase&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;deduplication&lt;/span&gt; &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;matching&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;enddocs&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Reference the doc block in YAML:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;email&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;doc('email_col')&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; &lt;code&gt;dbt docs generate&lt;/code&gt; mirrors your templated columns with accurate docs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 9 — Guardrails, debugging, and CI
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Render‑only checks:&lt;/strong&gt; &lt;code&gt;dbt compile&lt;/code&gt; to verify generated SQL before running.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Preview macro output:&lt;/strong&gt; use &lt;code&gt;{% do log(your_macro(), info=True) %}&lt;/code&gt; temporarily.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unit‑like tests:&lt;/strong&gt; For complex macros, add models that snapshot the macro output and assert against expected fixtures.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI:&lt;/strong&gt; run &lt;code&gt;dbt deps &amp;amp;&amp;amp; dbt compile &amp;amp;&amp;amp; dbt run --select state:modified+ &amp;amp;&amp;amp; dbt test&lt;/code&gt; on PRs.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 10 — Production proofing &amp;amp; performance
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Prefer static lists for high‑critical marts; use discovery only in staging.&lt;/li&gt;
&lt;li&gt;Keep macros &lt;strong&gt;pure&lt;/strong&gt; (deterministic) where possible; avoid warehouse calls in hot paths.&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;CTEs&lt;/strong&gt; to keep the final SELECT flat and easy to debug.&lt;/li&gt;
&lt;li&gt;Centralize transforms in a few macros to minimize surface area for change.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Reusable snippets (copy/paste)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  A. Policy column template
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;macro&lt;/span&gt; &lt;span class="n"&gt;user_policy_columns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;alias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;({{&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="n"&gt;updated_at&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endmacro&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  B. YAML‑driven generator
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;macro&lt;/span&gt; &lt;span class="n"&gt;yaml_columns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nodes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="n"&gt;exceptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raise_compiler_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'unknown model: '&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt; &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endif&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;values&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'transform'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;meta&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;none&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;or&lt;/span&gt; &lt;span class="s1"&gt;'{col}'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="k"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'{col}'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expr&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt; &lt;span class="s1"&gt;' as '&lt;/span&gt; &lt;span class="o"&gt;~&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endfor&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;',&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;  '&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endmacro&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  C. Discovery‑based generator
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;macro&lt;/span&gt; &lt;span class="n"&gt;discovered_columns_from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;include&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="n"&gt;exclude&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;cols&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;adapter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_columns_in_relation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="k"&gt;names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cols&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attribute&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'name'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;list&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="n"&gt;include&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="k"&gt;names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;names&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'in'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;include&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;list&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endif&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="n"&gt;exclude&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="k"&gt;names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;names&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;reject&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'in'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exclude&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;list&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endif&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;names&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'lower'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'string'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'trim'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'regex_replace'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'^'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;alias&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;',&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;  '&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endmacro&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Putting it all together
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;models/marts/customers.sql&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;base&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'stg_customers'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="k"&gt;final&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;select&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="n"&gt;user_policy_columns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'b.'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
  &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;base&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="k"&gt;final&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This compiles to a clean, consistent SELECT with all policy rules applied.&lt;/p&gt;




&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Macro not found:&lt;/strong&gt; ensure the file is in &lt;code&gt;macros/&lt;/code&gt; and the macro name matches.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;KeyError on &lt;code&gt;graph.nodes[...]&lt;/code&gt;:&lt;/strong&gt; use the fully qualified node name (&lt;code&gt;model.&amp;lt;name&amp;gt;&lt;/code&gt; or &lt;code&gt;source.&amp;lt;src&amp;gt;.&amp;lt;table&amp;gt;&lt;/code&gt;), and confirm the YAML exists.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Relation not found (discovery):&lt;/strong&gt; run once to create the relation, or guard with a fallback list:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;rel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'possibly_missing'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="n"&gt;rel&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;none&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="n"&gt;discovered_columns_from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endif&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Checklist for your PR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Columns are defined via macro(s) or YAML metadata&lt;/li&gt;
&lt;li&gt;[ ] Transforms are centralized (policy macros)&lt;/li&gt;
&lt;li&gt;[ ] Tests &amp;amp; docs reference the canonical column names&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;dbt compile&lt;/code&gt; output looks correct and readable&lt;/li&gt;
&lt;li&gt;[ ] CI runs &lt;code&gt;deps&lt;/code&gt;, &lt;code&gt;compile&lt;/code&gt;, &lt;code&gt;run&lt;/code&gt;, and &lt;code&gt;test&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Where to go next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Extend the YAML driver to support &lt;strong&gt;type casting&lt;/strong&gt;, &lt;strong&gt;default values&lt;/strong&gt;, or &lt;strong&gt;PII tags&lt;/strong&gt; via &lt;code&gt;meta&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Build &lt;strong&gt;package‑style macros&lt;/strong&gt; so multiple projects can share your policies.&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;exposures&lt;/strong&gt; to tie templated columns to downstream assets.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Happy templating! 🧩&lt;/p&gt;

</description>
      <category>dbt</category>
      <category>datatransformation</category>
      <category>datamodelling</category>
      <category>sql</category>
    </item>
  </channel>
</rss>
