<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Alex Retana</title>
    <description>The latest articles on Forem by Alex Retana (@alexretana).</description>
    <link>https://forem.com/alexretana</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3339127%2F961c69f0-2bb0-492f-8f51-657ae5b264a8.png</url>
      <title>Forem: Alex Retana</title>
      <link>https://forem.com/alexretana</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/alexretana"/>
    <language>en</language>
    <item>
      <title>Cracking the Medical Coding Challenge: Fine-Tuning BioBERT for ICD-10 Classification (Part 1)</title>
      <dc:creator>Alex Retana</dc:creator>
      <pubDate>Tue, 18 Nov 2025 16:20:41 +0000</pubDate>
      <link>https://forem.com/alexretana/cracking-the-medical-coding-challenge-fine-tuning-biobert-for-icd-10-classification-part-1-7oc</link>
      <guid>https://forem.com/alexretana/cracking-the-medical-coding-challenge-fine-tuning-biobert-for-icd-10-classification-part-1-7oc</guid>
      <description>&lt;h2&gt;
  
  
  The Problem That Keeps Medical Coders Up at Night
&lt;/h2&gt;

&lt;p&gt;Imagine you're processing disability claims for veterans. Each claim contains dense medical documentation—thousands of characters describing symptoms, diagnoses, and treatment history. Your job? Extract the correct ICD-10 diagnostic codes from this narrative. Miss a code, and a veteran might not receive the benefits they've earned. Add an incorrect code, and you've created compliance issues.&lt;/p&gt;

&lt;p&gt;Now imagine doing this hundreds of times per day, under pressure, with 158+ possible diagnosis codes to remember.&lt;/p&gt;

&lt;p&gt;This is exactly the type of problem that makes medical coding both critically important and incredibly challenging. And it's the perfect use case for Natural Language Processing (NLP). But here's the catch: training an AI to do this isn't straightforward, especially when you're dealing with limited training data and severe class imbalance.&lt;/p&gt;

&lt;p&gt;In this two-part series, I'll walk you through building an automated medical coding system. &lt;strong&gt;Part 1&lt;/strong&gt; (this article) focuses on fine-tuning BioBERT with advanced techniques to handle real-world constraints. &lt;strong&gt;Part 2&lt;/strong&gt; will explore AWS Comprehend Medical as an alternative approach and compare the two solutions.&lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;&lt;a href="https://github.com/alexretana/clinical-nlp-claims-processing" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Project Matters: Real-World Use Cases
&lt;/h2&gt;

&lt;p&gt;Before diving into code, let's talk about why automated medical coding matters:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Disability Claims Processing&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Veterans Affairs (VA) processes millions of disability claims. Each claim requires accurate ICD-10 coding to determine eligibility and compensation levels. Manual coding creates bottlenecks and inconsistencies.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Healthcare Revenue Cycle Management&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Hospitals lose billions annually due to coding errors. Automated coding assistance can flag potential issues before claims are submitted to insurance companies.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Clinical Research&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Large-scale medical studies require consistent coding of patient records. Automated extraction enables researchers to identify patient cohorts more efficiently.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. &lt;strong&gt;Compliance and Auditing&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Healthcare organizations must ensure coding accuracy for regulatory compliance. AI systems can audit existing codes and identify discrepancies.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fje6bq1114ny6k4z9sjk0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fje6bq1114ny6k4z9sjk0.png" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Dataset: MedCodER and Its Challenges
&lt;/h2&gt;

&lt;p&gt;For this project, we're using the &lt;strong&gt;MedCodER&lt;/strong&gt; (Medical Coding with Explanations and Retrievals) dataset, which contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;500+ clinical documents&lt;/strong&gt; with full SOAP notes (Subjective, Objective, Assessment, Plan)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;158 unique ICD-10-CM codes&lt;/strong&gt; &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Supporting evidence annotations&lt;/strong&gt; showing which text spans support each diagnosis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Severe class imbalance&lt;/strong&gt;: Most codes appear fewer than 10 times&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's what makes this dataset challenging (and realistic):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Class distribution snapshot
&lt;/span&gt;&lt;span class="n"&gt;Total&lt;/span&gt; &lt;span class="n"&gt;unique&lt;/span&gt; &lt;span class="n"&gt;codes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;158&lt;/span&gt;
&lt;span class="n"&gt;Codes&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="err"&gt;≥&lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt; &lt;span class="n"&gt;samples&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;18&lt;/span&gt;  &lt;span class="c1"&gt;# Only 11% have sufficient training data!
&lt;/span&gt;&lt;span class="n"&gt;Codes&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="err"&gt;≥&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt; &lt;span class="n"&gt;samples&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt;
&lt;span class="n"&gt;Codes&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="n"&gt;samples&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;98&lt;/span&gt;  &lt;span class="c1"&gt;# 62% are extremely rare
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This mirrors real-world medical data perfectly—common conditions like diabetes and hypertension appear frequently, while rare diseases have minimal examples.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx5a6j7zmjwu6eetotiht.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx5a6j7zmjwu6eetotiht.png" alt=" " width="800" height="279"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Naive Approach (And Why It Fails Spectacularly)
&lt;/h2&gt;

&lt;p&gt;Let's talk about what &lt;em&gt;doesn't&lt;/em&gt; work. Your first instinct might be:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Take full 2000+ character clinical documents&lt;/li&gt;
&lt;li&gt;Feed them to BioBERT&lt;/li&gt;
&lt;li&gt;Train on all 158 classes&lt;/li&gt;
&lt;li&gt;Hope for the best&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Result&lt;/strong&gt;: Macro F1 score of 0.023 (2.3%). Essentially random guessing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why does this fail?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem 1: Signal Dilution&lt;/strong&gt;&lt;br&gt;
A 2000-character document might contain only 50-100 characters actually describing a specific diagnosis. The rest is noise—patient demographics, vital signs, medication lists, etc.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 2: Insufficient Training Data&lt;/strong&gt;&lt;br&gt;
With only 500 documents and 158 classes, you have an average of ~3 examples per class. Deep learning models need orders of magnitude more data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Problem 3: Catastrophic Overfitting&lt;/strong&gt;&lt;br&gt;
BioBERT has 110 million parameters. Training all of them on tiny datasets causes the model to memorize training examples rather than learn generalizable patterns.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Solution: A Five-Pronged Strategy
&lt;/h2&gt;

&lt;p&gt;To achieve a &lt;strong&gt;94.4% Macro F1 score&lt;/strong&gt; (a 4,000% improvement!), we implement five key techniques:&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Evidence-Focused Training
&lt;/h3&gt;
&lt;h3&gt;
  
  
  2. Label Space Optimization
&lt;/h3&gt;
&lt;h3&gt;
  
  
  3. Back-Translation Data Augmentation
&lt;/h3&gt;
&lt;h3&gt;
  
  
  4. LoRA Parameter-Efficient Fine-Tuning
&lt;/h3&gt;
&lt;h3&gt;
  
  
  5. Class-Weighted Loss Function
&lt;/h3&gt;

&lt;p&gt;Let's dive into each one.&lt;/p&gt;


&lt;h2&gt;
  
  
  Technique 1: Evidence-Focused Training
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;: Training on 2000-character documents dilutes the diagnostic signal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Solution&lt;/strong&gt;: Use the supporting evidence annotations to extract focused diagnostic spans (~150-200 characters) with context.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_evidence_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Extract evidence span from full document text&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Start&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;End&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# Extract with ±50 character context window
&lt;/span&gt;    &lt;span class="n"&gt;context_start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;context_end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;medical_record_text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;medical_record_text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;context_start&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;context_end&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this works&lt;/strong&gt;: We're giving the model concentrated diagnostic information. Instead of finding a needle in a haystack, we're handing it the needle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example transformation&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;❌ &lt;strong&gt;Full Document (2,347 chars)&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Long patient history, demographics, vitals, multiple conditions mixed together...]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✅ &lt;strong&gt;Evidence Span (189 chars)&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"...blood pressure remains elevated at 156/94 despite medication compliance. 
Diagnosis: Essential (primary) hypertension. Will increase lisinopril dose..."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Consequence of skipping this step&lt;/strong&gt;:
&lt;/h3&gt;

&lt;p&gt;Without evidence extraction, the model struggles to differentiate signal from noise. You'd see F1 scores plateau around 20-30% even with other optimizations.&lt;/p&gt;




&lt;h2&gt;
  
  
  Technique 2: Label Space Optimization
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;: 62% of codes have fewer than 10 training examples—impossible to learn from.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Solution&lt;/strong&gt;: Filter to codes with ≥80 examples, reducing from 158 codes to 18 viable classes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;MIN_SAMPLES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;
&lt;span class="n"&gt;code_freq&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;evidence_focused&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ICD10&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;frequent_codes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;code_freq&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;code_freq&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;MIN_SAMPLES&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;evidence_filtered&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;evidence_focused&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;evidence_focused&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ICD10&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;isin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frequent_codes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Reduced to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frequent_codes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; codes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 18 codes
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Retained &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;evidence_filtered&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; examples&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# ~1,200 examples
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this works&lt;/strong&gt;: Machine learning requires sufficient examples to learn patterns. By focusing on codes with adequate representation, we ensure the model can actually learn meaningful relationships.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The trade-off&lt;/strong&gt;: We sacrifice coverage (18 codes vs. 158) for accuracy. This is acceptable in a hybrid system where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Custom model handles frequent codes&lt;/strong&gt; (high accuracy)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Commercial API handles rare codes&lt;/strong&gt; (broader coverage, lower accuracy)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmxb5proqc5ca6zy3ezxt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmxb5proqc5ca6zy3ezxt.png" alt=" " width="800" height="393"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Consequence of skipping this step&lt;/strong&gt;:
&lt;/h3&gt;

&lt;p&gt;Including rare codes creates extreme class imbalance. The model would:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ignore rare classes entirely (predicting only common ones)&lt;/li&gt;
&lt;li&gt;Waste capacity trying to memorize insufficient examples&lt;/li&gt;
&lt;li&gt;Achieve poor performance across all classes&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Technique 3: Back-Translation Data Augmentation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;: Even after filtering, we only have ~1,200 training examples for 18 classes (~67 examples per class). Still limited.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Solution&lt;/strong&gt;: Use back-translation to generate synthetic training data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;back_translate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pivot_lang&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;de&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Translate EN→DE→EN to create paraphrased version&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# EN → German
&lt;/span&gt;    &lt;span class="n"&gt;fwd_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MarianMTModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Helsinki-NLP/opus-mt-en-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;pivot_lang&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;fwd_tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MarianTokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Helsinki-NLP/opus-mt-en-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;pivot_lang&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;fwd_inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fwd_tokenizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_tensors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;pt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;truncation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;fwd_outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fwd_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;fwd_inputs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;german_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fwd_tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fwd_outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;skip_special_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# German → EN
&lt;/span&gt;    &lt;span class="n"&gt;bwd_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MarianMTModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Helsinki-NLP/opus-mt-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;pivot_lang&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;-en&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;bwd_tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MarianTokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Helsinki-NLP/opus-mt-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;pivot_lang&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;-en&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;bwd_inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;bwd_tokenizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;german_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_tensors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;pt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;truncation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;bwd_outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bwd_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;bwd_inputs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;back_translated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bwd_tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bwd_outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;skip_special_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;back_translated&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example transformation&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Original&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Patient reports persistent chest pain radiating to left arm with 
shortness of breath during physical exertion."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;After EN→DE→EN&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Patient experiences continuous chest pain extending to the left arm 
with breathing difficulty during physical activity."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this works&lt;/strong&gt;: The semantic meaning remains identical, but the phrasing varies. This teaches the model to recognize diagnoses regardless of how they're worded—critical for handling real-world clinical variation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best practice&lt;/strong&gt;: Use multiple pivot languages (German, French, Spanish) for 4x data expansion. In our demo, we use German for 1.2x expansion to save time.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Critical requirement&lt;/strong&gt;: Keep 100% original data in validation set
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Split BEFORE augmentation
&lt;/span&gt;&lt;span class="n"&gt;train_orig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;val_orig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;original_df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Augment ONLY training data
&lt;/span&gt;&lt;span class="n"&gt;train_augmented&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;augment_with_back_translation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train_orig&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;train_final&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;train_orig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;train_augmented&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Validation stays 100% original
&lt;/span&gt;&lt;span class="n"&gt;val_final&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;val_orig&lt;/span&gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this matters&lt;/strong&gt;: If augmented data leaks into validation, you'll get overly optimistic metrics. The model might learn artifacts of the translation process rather than true diagnostic patterns.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl4u27h7jzle34lmki9lh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl4u27h7jzle34lmki9lh.png" alt=" " width="800" height="111"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Consequence of skipping this step&lt;/strong&gt;:
&lt;/h3&gt;

&lt;p&gt;Without augmentation, the model has limited exposure to linguistic variation. It might learn to recognize specific phrasings but fail on synonyms or alternative formulations—reducing real-world robustness by 10-15%.&lt;/p&gt;




&lt;h2&gt;
  
  
  Technique 4: LoRA (Low-Rank Adaptation) Fine-Tuning
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;: BioBERT has 110 million parameters. Training all of them on 1,200 examples causes severe overfitting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Solution&lt;/strong&gt;: Use LoRA to train only 0.1% of parameters while keeping the rest frozen.&lt;/p&gt;

&lt;h3&gt;
  
  
  How LoRA Works
&lt;/h3&gt;

&lt;p&gt;Instead of updating all weights in the attention layers, LoRA injects trainable low-rank matrices:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Traditional: W_new = W_old + ΔW  (update all 768×768 = 589,824 params)
LoRA: W_new = W_old + A×B  (update 768×8 + 8×768 = 12,288 params)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A&lt;/strong&gt; is a 768×8 matrix&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;B&lt;/strong&gt; is an 8×768 matrix
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;r=8&lt;/strong&gt; is the rank (a hyperparameter)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;peft&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LoraConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;get_peft_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TaskType&lt;/span&gt;

&lt;span class="c1"&gt;# Load base BioBERT model
&lt;/span&gt;&lt;span class="n"&gt;base_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoModelForSequenceClassification&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dmis-lab/biobert-v1.1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;num_labels&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;problem_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;single_label_classification&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Configure LoRA
&lt;/span&gt;&lt;span class="n"&gt;lora_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LoraConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TaskType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SEQ_CLS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Rank: controls capacity vs. overfitting trade-off
&lt;/span&gt;    &lt;span class="n"&gt;lora_alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Scaling factor (typically 2×r)
&lt;/span&gt;    &lt;span class="n"&gt;lora_dropout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;target_modules&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="c1"&gt;# Apply to Q/V attention projections
&lt;/span&gt;    &lt;span class="n"&gt;inference_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Apply LoRA adapter
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_peft_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lora_config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Trainable params: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;numel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requires_grad&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Total params: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;numel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Trainable params: 148,488 (0.13%)
Total params: 109,629,456 (100%)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this works&lt;/strong&gt;: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pre-trained knowledge is preserved&lt;/strong&gt;: BioBERT's medical understanding stays intact&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Task-specific adaptation&lt;/strong&gt;: The small LoRA adapters learn to map BioBERT's features to ICD-10 codes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regularization effect&lt;/strong&gt;: Limited capacity prevents memorization&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Choosing the rank (r)&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;r=4&lt;/strong&gt;: Very lightweight, may underfit complex tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;r=8&lt;/strong&gt;: Sweet spot for most tasks (used here)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;r=16&lt;/strong&gt;: More capacity, risk of overfitting on small datasets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;r=32+&lt;/strong&gt;: Approaching full fine-tuning behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fasf6muzm7tg0ja924fxs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fasf6muzm7tg0ja924fxs.png" alt=" " width="800" height="338"&gt;&lt;/a&gt;&lt;br&gt;
Image above is from hugging face: &lt;a href="https://huggingface.co/docs/peft/main/en/conceptual_guides/lora" rel="noopener noreferrer"&gt;https://huggingface.co/docs/peft/main/en/conceptual_guides/lora&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Consequence of skipping this step&lt;/strong&gt;:
&lt;/h3&gt;

&lt;p&gt;Full fine-tuning on this dataset produces F1 scores around 20-30%. The model memorizes training examples and fails to generalize. LoRA's regularization is the difference between failure and success.&lt;/p&gt;


&lt;h2&gt;
  
  
  Technique 5: Class-Weighted Loss Function
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;: Even after filtering, we have imbalance (some codes have 200 examples, others have 80).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Solution&lt;/strong&gt;: Use weighted cross-entropy loss that penalizes errors on rare classes more heavily.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.utils.class_weight&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;compute_class_weight&lt;/span&gt;

&lt;span class="c1"&gt;# Compute balanced class weights
&lt;/span&gt;&lt;span class="n"&gt;class_weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;compute_class_weight&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;class_weight&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;balanced&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;classes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_labels&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;train_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;label_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;class_weights_tensor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tensor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;class_weights&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Custom Trainer with weighted loss
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WeightedTrainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Trainer&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;class_weights&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;class_weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;class_weights&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compute_loss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_outputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_items_in_batch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;labels&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;logits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;logits&lt;/span&gt;

        &lt;span class="c1"&gt;# Weighted cross-entropy loss
&lt;/span&gt;        &lt;span class="n"&gt;loss_fct&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CrossEntropyLoss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;weight&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;class_weights&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;loss_fct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;return_outputs&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;loss&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;How balanced weights work&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;weight[c] = n_samples / (n_classes × n_samples_in_class[c])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Class A: 200 examples → weight = 1,200/(18×200) = 0.33&lt;/li&gt;
&lt;li&gt;Class B: 80 examples → weight = 1,200/(18×80) = 0.83&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;During training, misclassifying Class B incurs 2.5× the penalty of Class A.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Consequence of skipping this step&lt;/strong&gt;:
&lt;/h3&gt;

&lt;p&gt;Without weighting, the model optimizes for overall accuracy by focusing on frequent classes. Rare classes get ignored, reducing macro F1 by 5-10%.&lt;/p&gt;




&lt;h2&gt;
  
  
  Putting It All Together: Training Configuration
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;training_args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TrainingArguments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;output_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./models/biobert-lora-improved&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;eval_strategy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;epoch&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;learning_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;2e-4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Higher LR for LoRA (10× standard fine-tuning)
&lt;/span&gt;    &lt;span class="n"&gt;per_device_train_batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;num_train_epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;weight_decay&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;load_best_model_at_end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;metric_for_best_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;macro_f1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;fp16&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Mixed precision for faster training
&lt;/span&gt;    &lt;span class="n"&gt;warmup_ratio&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;trainer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;WeightedTrainer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;class_weights&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;class_weights_tensor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;training_args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;train_dataset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;train_dataset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;eval_dataset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;val_dataset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;compute_metrics&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;compute_metrics&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;trainer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key hyperparameters explained&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Learning rate (2e-4)&lt;/strong&gt;: Higher than typical fine-tuning (2e-5) because LoRA adapters can handle larger updates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Batch size (16)&lt;/strong&gt;: Balanced between GPU memory and gradient quality&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Epochs (15)&lt;/strong&gt;: Sufficient for convergence without overfitting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FP16&lt;/strong&gt;: Reduces memory usage and speeds up training by ~2×&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Results: From Failure to Success
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Performance Metrics
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Accuracy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;94.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Macro F1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.944&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Weighted F1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.945&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Macro Precision&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.944&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Macro Recall&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.950&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Comparison to naive approach&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Macro F1&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Naive (full docs, all classes, full fine-tuning)&lt;/td&gt;
&lt;td&gt;0.023&lt;/td&gt;
&lt;td&gt;Baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Improved (evidence + LoRA + augmentation)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.944&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+4,000%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Per-Class Performance
&lt;/h3&gt;

&lt;p&gt;The model achieves balanced performance across all 18 classes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                    precision    recall  f1-score   support

         E11.9          0.95      0.95      0.95        20
         I10            0.93      0.97      0.95        15
         E78.5          0.94      0.94      0.94        18
         ...

    macro avg          0.94      0.95      0.94       240
 weighted avg          0.95      0.94      0.95       240
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No class falls below 90% F1—demonstrating that our techniques successfully handle the remaining imbalance.&lt;/p&gt;




&lt;h2&gt;
  
  
  What We've Learned: Key Takeaways
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ✅ &lt;strong&gt;Do This&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Extract focused context&lt;/strong&gt;: Don't train on full documents when evidence spans are available&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filter aggressively&lt;/strong&gt;: Better to excel at 18 codes than fail at 158&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Augment intelligently&lt;/strong&gt;: Back-translation preserves semantics while adding variation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use parameter-efficient methods&lt;/strong&gt;: LoRA prevents overfitting on small datasets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weight your loss&lt;/strong&gt;: Account for remaining class imbalance&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  ❌ &lt;strong&gt;Avoid This&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Training on full documents&lt;/strong&gt;: Dilutes diagnostic signals&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Including rare classes&lt;/strong&gt;: &amp;lt;10 examples per class is unlearnable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mixing augmented data into validation&lt;/strong&gt;: Creates overly optimistic metrics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full fine-tuning&lt;/strong&gt;: Causes catastrophic overfitting on small datasets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ignoring class imbalance&lt;/strong&gt;: Model will focus only on frequent classes&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Limitations and Future Work
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Current Limitations
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Limited Code Coverage&lt;/strong&gt;&lt;br&gt;
We only handle 18 out of 158 codes. For production use, you'd need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More training data for rare codes&lt;/li&gt;
&lt;li&gt;Hierarchical classification (predict ICD chapter first, then specific code)&lt;/li&gt;
&lt;li&gt;Hybrid approach with commercial APIs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Evidence Dependency&lt;/strong&gt;&lt;br&gt;
Our approach requires supporting evidence annotations. For new data without annotations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use attention weights to identify key spans&lt;/li&gt;
&lt;li&gt;Employ named entity recognition (NER) to extract diagnoses&lt;/li&gt;
&lt;li&gt;Apply the trained model to full documents (with performance degradation)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Multi-Label Simplification&lt;/strong&gt;&lt;br&gt;
We converted multi-label to single-label (one example per code). True multi-label classification would:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Predict all relevant codes simultaneously&lt;/li&gt;
&lt;li&gt;Model code co-occurrence patterns&lt;/li&gt;
&lt;li&gt;Better reflect real clinical scenarios&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Next Steps
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Hierarchical Classification&lt;/strong&gt;: Leverage ICD-10's tree structure (Chapter → Category → Code)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full Augmentation&lt;/strong&gt;: Implement FR and ES translations for 4× data expansion&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ensemble Methods&lt;/strong&gt;: Combine multiple augmented models with different random seeds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Label Extension&lt;/strong&gt;: Train on documents with all codes simultaneously&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transfer Learning&lt;/strong&gt;: Pre-train on medical entity recognition before ICD-10 classification&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;
  
  
  Coming Up in Part 2: AWS Comprehend Medical
&lt;/h2&gt;

&lt;p&gt;In the next article, we'll explore a completely different approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Zero-shot inference&lt;/strong&gt; using AWS's pre-trained medical NLP service&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Entity trait filtering&lt;/strong&gt; to handle negations, hypotheticals, and family history&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-label evaluation&lt;/strong&gt; at the document level&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Head-to-head comparison&lt;/strong&gt; with our BioBERT model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid strategy&lt;/strong&gt; combining both approaches for optimal results&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We'll discover that AWS Comprehend Medical achieves 27% macro F1 on all 158 codes (vs. our 94% on 18 codes)—a fascinating trade-off between coverage and accuracy.&lt;/p&gt;
&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;All code is available in the GitHub repository:&lt;/p&gt;

&lt;p&gt;🔗 &lt;strong&gt;&lt;a href="https://github.com/alexretana/clinical-nlp-claims-processing" rel="noopener noreferrer"&gt;clinical-nlp-claims-processing&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;To run this notebook&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone the repository&lt;/span&gt;
git clone https://github.com/alexretana/clinical-nlp-claims-processing.git
&lt;span class="nb"&gt;cd &lt;/span&gt;clinical-nlp-claims-processing

&lt;span class="c"&gt;# Install dependencies (using uv)&lt;/span&gt;
curl &lt;span class="nt"&gt;-LsSf&lt;/span&gt; https://astral.sh/uv/install.sh | sh
uv &lt;span class="nb"&gt;sync&lt;/span&gt;

&lt;span class="c"&gt;# Launch Jupyter&lt;/span&gt;
&lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate  &lt;span class="c"&gt;# On Windows: .venv\Scripts\activate&lt;/span&gt;
jupyter lab

&lt;span class="c"&gt;# Open notebooks/01_BioBERT_Fine-Tuning_NLP.ipynb&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Hardware requirements&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPU with 8GB+ VRAM (RTX 3060, V100, A100) for reasonable training times&lt;/li&gt;
&lt;li&gt;16GB+ system RAM&lt;/li&gt;
&lt;li&gt;Training takes ~2-4 hours on GPU, much longer on CPU&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Building production-quality medical NLP systems requires more than throwing data at a pre-trained model. By combining:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Evidence-focused training&lt;/li&gt;
&lt;li&gt;Strategic label filtering&lt;/li&gt;
&lt;li&gt;Back-translation augmentation&lt;/li&gt;
&lt;li&gt;LoRA parameter-efficient fine-tuning&lt;/li&gt;
&lt;li&gt;Class-weighted loss&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We transformed a failing system (2.3% F1) into one that performs at 94.4% F1—good enough for real-world deployment with human oversight.&lt;/p&gt;

&lt;p&gt;The techniques we've covered apply far beyond medical coding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Legal document analysis&lt;/strong&gt; (case law classification)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scientific literature mining&lt;/strong&gt; (research topic categorization)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Customer support&lt;/strong&gt; (ticket routing and classification)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Content moderation&lt;/strong&gt; (policy violation detection)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anywhere you face limited training data and class imbalance, this toolkit will serve you well.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Next time&lt;/strong&gt;, we'll see how AWS Comprehend Medical tackles the same problem without any training data at all—and explore when each approach makes sense.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What challenges have you faced when training NLP models on limited data? Share your experiences in the comments! And if you found this helpful, follow me for Part 2 where we dive into AWS Comprehend Medical.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;📚 Further Reading&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/1901.08746" rel="noopener noreferrer"&gt;BioBERT Paper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2106.09685" rel="noopener noreferrer"&gt;LoRA Paper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://physionet.org/content/medcoder/1.0.0/" rel="noopener noreferrer"&gt;MedCodER Dataset&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/docs/peft" rel="noopener noreferrer"&gt;Hugging Face PEFT Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Tags: #machinelearning #nlp #healthcare #python #biobert #transformers #medicalcoding #datascience&lt;/em&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>nlp</category>
      <category>healthcare</category>
      <category>python</category>
    </item>
    <item>
      <title>Building Reproducible n8n Environments with CLI-Based Configuration Management</title>
      <dc:creator>Alex Retana</dc:creator>
      <pubDate>Wed, 22 Oct 2025 21:01:03 +0000</pubDate>
      <link>https://forem.com/alexretana/building-reproducible-n8n-environments-with-cli-based-configuration-management-2hi</link>
      <guid>https://forem.com/alexretana/building-reproducible-n8n-environments-with-cli-based-configuration-management-2hi</guid>
      <description>&lt;h1&gt;
  
  
  Building Reproducible n8n Environments with CLI-Based Configuration Management
&lt;/h1&gt;

&lt;p&gt;When you're building applications with n8n as a core component—not just using it as a standalone automation tool—you need a way to provision n8n instances with pre-configured credentials, workflows, and integrated services. This article shows you a pattern for creating fully reproducible n8n environments using the n8n CLI and environment variable substitution.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Problem: n8n as an Application Component
&lt;/h2&gt;

&lt;p&gt;Most n8n tutorials focus on getting started quickly. But what if you're building an application where n8n is one piece of a larger system? You need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reproducible environments&lt;/strong&gt; - Same setup across dev, staging, production&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre-configured credentials&lt;/strong&gt; - Database connections ready to use&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrated services&lt;/strong&gt; - PostgreSQL for data storage, Redis for agent memory&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero manual setup&lt;/strong&gt; - No clicking through UIs to configure connections&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Version controlled configuration&lt;/strong&gt; - Infrastructure as code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't about convenience—it's about treating n8n as a first-class component in your application stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: n8n CLI + Environment Variables
&lt;/h2&gt;

&lt;p&gt;n8n includes powerful CLI commands that most people don't know about:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Export all credentials to JSON&lt;/span&gt;
n8n &lt;span class="nb"&gt;export&lt;/span&gt;:credentials &lt;span class="nt"&gt;--all&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;creds.json

&lt;span class="c"&gt;# Import credentials from JSON&lt;/span&gt;
n8n import:credentials &lt;span class="nt"&gt;--input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;creds.json

&lt;span class="c"&gt;# Export workflows&lt;/span&gt;
n8n &lt;span class="nb"&gt;export&lt;/span&gt;:workflow &lt;span class="nt"&gt;--all&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;workflows.json

&lt;span class="c"&gt;# Import workflows&lt;/span&gt;
n8n import:workflow &lt;span class="nt"&gt;--input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;workflows.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These commands are the foundation of reproducible n8n deployments. But there's a problem: exported credentials contain hardcoded values. If you export a PostgreSQL credential, it has a specific password baked in.&lt;/p&gt;

&lt;h2&gt;
  
  
  The envsubst Trick
&lt;/h2&gt;

&lt;p&gt;Here's the key insight: we can use &lt;code&gt;envsubst&lt;/code&gt; to transform n8n credential exports into templates with environment variable placeholders.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Export a credential manually (one time)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;n8n &lt;span class="nb"&gt;export&lt;/span&gt;:credentials &lt;span class="nt"&gt;--all&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;creds.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2: Replace hardcoded values with environment variables&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Transform this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"postgres_local"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Local PostgreSQL Database"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"host"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"postgres"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"password"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"some_hardcoded_password"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"n8n_user"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"postgres"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Into this template:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${N8N_POSTGRES_CREDENTIAL_ID}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Local PostgreSQL Database"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"host"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${POSTGRES_HOST}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"password"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${POSTGRES_PASSWORD}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${POSTGRES_USER}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"database"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${POSTGRES_DB}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"port"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;POSTGRES_PORT&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"postgres"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 3: Use envsubst to substitute at runtime&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;envsubst &amp;lt; creds.json.template &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; creds.json
n8n import:credentials &lt;span class="nt"&gt;--input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;creds.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now your credentials are environment-driven. Same template works in dev, staging, production—just different environment variables.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters: Building Applications with n8n
&lt;/h2&gt;

&lt;p&gt;This pattern unlocks powerful use cases:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;AI Agents with Redis Memory&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;n8n's AI Agent node has a "Simple Memory" option that stores conversation history in n8n's database. For production AI applications, you want Redis instead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Faster access times&lt;/li&gt;
&lt;li&gt;Better scalability&lt;/li&gt;
&lt;li&gt;Shared memory across n8n instances&lt;/li&gt;
&lt;li&gt;TTL-based conversation expiry&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With our pattern, you can provision n8n with Redis credentials already configured. Your workflows can immediately use Redis memory nodes without manual setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Multi-Tenant SaaS Applications&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Spin up isolated n8n instances per customer, each with credentials for their dedicated database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CUSTOMER_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;_db_password"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;REDIS_PASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CUSTOMER_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;_redis_password"&lt;/span&gt;
./provision-n8n.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. &lt;strong&gt;Development Environment Parity&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Every developer gets the same n8n setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone your-app
./init-credentials.sh  &lt;span class="c"&gt;# Generate dev credentials&lt;/span&gt;
./start.sh             &lt;span class="c"&gt;# Everything works&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Demo: Dockerized n8n with PostgreSQL and Redis
&lt;/h2&gt;

&lt;p&gt;Our example repository demonstrates this pattern with a complete Docker Compose setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────┐
│  Docker Compose Environment             │
│                                         │
│  ┌──────────┐  ┌──────────┐  ┌──────┐ │
│  │ n8n      │→ │ Postgres │  │Redis │ │
│  │          │  │          │  │      │ │
│  │ - Creds  │  │ (auto    │  │(auto │ │
│  │   auto-  │  │  config) │  │ cfg) │ │
│  │   imported│  │          │  │      │ │
│  └──────────┘  └──────────┘  └──────┘ │
│       ↑                                 │
│       │ envsubst at startup             │
│       │                                 │
│  ┌─────────────────────────────┐       │
│  │ .env (generated secrets)    │       │
│  │ POSTGRES_PASSWORD=&amp;lt;random&amp;gt;  │       │
│  │ REDIS_PASSWORD=&amp;lt;random&amp;gt;     │       │
│  └─────────────────────────────┘       │
└─────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Generate Environment Variables&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./init-credentials.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This script creates a &lt;code&gt;.env&lt;/code&gt; file with generated secrets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;openssl rand &lt;span class="nt"&gt;-base64&lt;/span&gt; 48 | &lt;span class="nb"&gt;tr&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s2"&gt;"=+/"&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-c1-32&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;REDIS_PASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;openssl rand &lt;span class="nt"&gt;-base64&lt;/span&gt; 48 | &lt;span class="nb"&gt;tr&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s2"&gt;"=+/"&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-c1-32&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;N8N_ENCRYPTION_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;openssl rand &lt;span class="nt"&gt;-hex&lt;/span&gt; 32&lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Start with Exported Variables&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's the critical part. Docker Compose can use &lt;code&gt;${VARIABLE}&lt;/code&gt; syntax, but only if those variables are &lt;strong&gt;exported&lt;/strong&gt; to the environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# start.sh&lt;/span&gt;

&lt;span class="c"&gt;# This is the key - export all variables&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-a&lt;/span&gt;
&lt;span class="nb"&gt;source&lt;/span&gt; .env
&lt;span class="nb"&gt;set&lt;/span&gt; +a

docker-compose up &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--build&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The Gotcha:&lt;/strong&gt; If you just run &lt;code&gt;docker-compose up&lt;/code&gt; without exporting variables, your &lt;code&gt;docker-compose.yml&lt;/code&gt; file won't resolve &lt;code&gt;${POSTGRES_PASSWORD}&lt;/code&gt; and services will fail to authenticate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Docker Compose Provisions Services&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;postgres&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres:15-alpine&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${POSTGRES_PASSWORD}&lt;/span&gt;  &lt;span class="c1"&gt;# Resolved from exported env&lt;/span&gt;
      &lt;span class="na"&gt;POSTGRES_USER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${POSTGRES_USER}&lt;/span&gt;
      &lt;span class="na"&gt;POSTGRES_DB&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${POSTGRES_DB}&lt;/span&gt;

  &lt;span class="na"&gt;redis&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;redis:7-alpine&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;redis-server --requirepass ${REDIS_PASSWORD}&lt;/span&gt;  &lt;span class="c1"&gt;# Also resolved&lt;/span&gt;

  &lt;span class="na"&gt;n8n&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./n8n&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;DB_POSTGRESDB_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${POSTGRES_PASSWORD}&lt;/span&gt;  &lt;span class="c1"&gt;# Same password&lt;/span&gt;
      &lt;span class="c1"&gt;# ... other config&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;postgres&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service_healthy&lt;/span&gt;
      &lt;span class="na"&gt;redis&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service_healthy&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. n8n Container Auto-Imports Credentials&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On first startup, our custom entrypoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# n8n/entrypoint.sh&lt;/span&gt;
envsubst &amp;lt; /data/workflow_creds/decrypt_creds.json.template &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /data/workflow_creds/decrypt_creds.json
n8n import:credentials &lt;span class="nt"&gt;--input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/data/workflow_creds/decrypt_creds.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now n8n has credentials for PostgreSQL and Redis, using the same passwords Docker Compose used to provision those services.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Credential Template
&lt;/h2&gt;

&lt;p&gt;Here's what the template looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${N8N_POSTGRES_CREDENTIAL_ID}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Local PostgreSQL Database"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"host"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${POSTGRES_HOST}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"database"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${POSTGRES_DB}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${POSTGRES_USER}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"password"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${POSTGRES_PASSWORD}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"port"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;POSTGRES_PORT&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"ssl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"disable"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"postgres"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${N8N_REDIS_CREDENTIAL_ID}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Local Redis Cache"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"password"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${REDIS_PASSWORD}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"host"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${REDIS_HOST}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"port"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;REDIS_PORT&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"database"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"redis"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This template was originally created by:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Manually creating credentials in n8n&lt;/li&gt;
&lt;li&gt;Exporting them with &lt;code&gt;n8n export:credentials&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Replacing values with &lt;code&gt;${VARIABLE}&lt;/code&gt; placeholders&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now it's reusable across all environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Use Case: Redis for AI Agent Memory
&lt;/h2&gt;

&lt;p&gt;This setup shines when building AI agents with n8n. Instead of this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AI Agent Node → Simple Memory (in n8n database)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AI Agent Node → Redis Chat Memory Node → Redis (provisioned)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Persistence across restarts&lt;/strong&gt; - Conversation history survives n8n restarts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared state&lt;/strong&gt; - Multiple n8n instances share conversation history&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance&lt;/strong&gt; - Redis is optimized for this access pattern&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability&lt;/strong&gt; - Redis can handle millions of conversation threads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the Redis credential is already configured—no manual setup required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Extending the Pattern
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Add MongoDB
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Add to &lt;code&gt;.env.example&lt;/code&gt;:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;MONGO_PASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;__GENERATED__
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Update credential template:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mongo_local"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Local MongoDB"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"host"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${MONGO_HOST}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"password"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${MONGO_PASSWORD}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${MONGO_USER}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"database"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${MONGO_DB}"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mongoDb"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Add to &lt;code&gt;docker-compose.yml&lt;/code&gt;:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;mongo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mongo:7&lt;/span&gt;
  &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;MONGO_INITDB_ROOT_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${MONGO_PASSWORD}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Import Workflows Too
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# In entrypoint.sh, after importing credentials:&lt;/span&gt;
n8n import:workflow &lt;span class="nt"&gt;--input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/data/workflows/default-workflows.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This lets you ship n8n with pre-built workflows for your application.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Repository
&lt;/h2&gt;

&lt;p&gt;The complete implementation is available at:&lt;br&gt;
&lt;a href="https://github.com/alexretana/n8n-docker-demo" rel="noopener noreferrer"&gt;github.com/alexretana/n8n-docker-demo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Use it as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A starting point&lt;/strong&gt; for your own n8n-based applications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A reference&lt;/strong&gt; for the envsubst + CLI pattern&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A foundation&lt;/strong&gt; to build on&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/alexretana/n8n-docker-demo
&lt;span class="nb"&gt;cd &lt;/span&gt;n8n-docker-demo

./init-credentials.sh  &lt;span class="c"&gt;# Generate secrets&lt;/span&gt;
./start.sh            &lt;span class="c"&gt;# Start everything&lt;/span&gt;

&lt;span class="c"&gt;# Access n8n at http://localhost:5678&lt;/span&gt;
&lt;span class="c"&gt;# PostgreSQL and Redis credentials already configured&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Building Applications Around n8n
&lt;/h2&gt;

&lt;p&gt;This pattern enables you to build applications where n8n is a component, not the entire application. Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SaaS platforms&lt;/strong&gt; - n8n handles workflow orchestration, your app handles user management&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI applications&lt;/strong&gt; - n8n orchestrates AI agents, Redis stores conversation history&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data pipelines&lt;/strong&gt; - n8n coordinates ETL, PostgreSQL stores results&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Internal tools&lt;/strong&gt; - n8n automates business processes, your app provides the UI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key is treating n8n configuration as code. With the CLI + envsubst pattern, your n8n setup becomes reproducible, version-controlled, and automatable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Invitation
&lt;/h2&gt;

&lt;p&gt;The repository is open source and free to use. But more than that—I'd love to see what you build.&lt;/p&gt;

&lt;p&gt;If you're creating an application around n8n:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fork the repo&lt;/strong&gt; and adapt it to your needs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Share what you're building&lt;/strong&gt; - open an issue or discussion&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contribute improvements&lt;/strong&gt; - PRs welcome&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ask questions&lt;/strong&gt; - I'm happy to help&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pattern shown here is just a starting point. The real value comes from what you build on top of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;n8n CLI is powerful&lt;/strong&gt; - Use &lt;code&gt;export&lt;/code&gt; and &lt;code&gt;import&lt;/code&gt; commands for configuration management&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;envsubst bridges the gap&lt;/strong&gt; - Transform exported configs into reusable templates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Export variables in start scripts&lt;/strong&gt; - Docker Compose needs them exported, not just in .env&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Treat n8n as a component&lt;/strong&gt; - Build applications around it, not just use it standalone&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configuration as code&lt;/strong&gt; - Make your n8n setup reproducible and version-controlled&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Happy building!&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Further Reading:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.n8n.io/hosting/cli-commands/" rel="noopener noreferrer"&gt;n8n CLI Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.docker.com/compose/environment-variables/" rel="noopener noreferrer"&gt;Docker Compose Variable Substitution&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.gnu.org/software/gettext/manual/html_node/envsubst-Invocation.html" rel="noopener noreferrer"&gt;envsubst Manual&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; #n8n #automation #docker #devops #postgresql #redis #ai #reproducibility&lt;/p&gt;

</description>
      <category>n8n</category>
      <category>docker</category>
      <category>postgres</category>
      <category>redis</category>
    </item>
    <item>
      <title>Build an AI Research Archivist with n8n: Stop Researching the Same Topics Twice</title>
      <dc:creator>Alex Retana</dc:creator>
      <pubDate>Fri, 10 Oct 2025 19:58:02 +0000</pubDate>
      <link>https://forem.com/alexretana/build-an-ai-research-archivist-with-n8n-stop-researching-the-same-topics-twice-4k06</link>
      <guid>https://forem.com/alexretana/build-an-ai-research-archivist-with-n8n-stop-researching-the-same-topics-twice-4k06</guid>
      <description>&lt;h1&gt;
  
  
  Build an AI Research Archivist with n8n: Stop Researching the Same Topics Twice
&lt;/h1&gt;

&lt;h2&gt;
  
  
  The $15K Problem You Didn't Know You Had
&lt;/h2&gt;

&lt;p&gt;Picture this: It's Tuesday morning, and you're diving into researching authentication patterns for your new microservices architecture. You spend two hours reading articles, comparing approaches, and documenting your findings in a scattered collection of browser tabs and sticky notes.&lt;/p&gt;

&lt;p&gt;Fast forward three months. A colleague asks about authentication strategies. You vaguely remember researching this, but where did you save those findings? What were the key takeaways? You end up starting from scratch.&lt;/p&gt;

&lt;p&gt;Studies show that knowledge workers waste nearly &lt;strong&gt;6 hours per week&lt;/strong&gt; duplicating research efforts. For a developer making $80K annually, that's roughly &lt;strong&gt;$15,000 in wasted productivity every year&lt;/strong&gt;. Multiply that across a team, and the numbers become staggering.&lt;/p&gt;

&lt;p&gt;The solution isn't another note-taking app—it's an intelligent system that actively prevents duplicate research by checking what you've already investigated before conducting new searches.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We're Building
&lt;/h2&gt;

&lt;p&gt;In this tutorial, you'll build a &lt;strong&gt;Research Archivist Agent&lt;/strong&gt; using n8n that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Checks your existing research archive before conducting new searches&lt;/li&gt;
&lt;li&gt;Uses Perplexity AI for high-quality research synthesis&lt;/li&gt;
&lt;li&gt;Automatically stores findings in Google Sheets with proper citations&lt;/li&gt;
&lt;li&gt;Maintains searchable keywords for easy retrieval&lt;/li&gt;
&lt;li&gt;Guides users through a structured research workflow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tech Stack:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;n8n&lt;/strong&gt; (workflow automation)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic Claude Sonnet 4.5&lt;/strong&gt; (agent orchestration)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Perplexity AI&lt;/strong&gt; (research tool)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Sheets&lt;/strong&gt; (knowledge archive)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;You'll need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;n8n instance (&lt;a href="https://n8n.io/" rel="noopener noreferrer"&gt;cloud&lt;/a&gt; or &lt;a href="https://docs.n8n.io/hosting/" rel="noopener noreferrer"&gt;self-hosted&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://console.anthropic.com/" rel="noopener noreferrer"&gt;Anthropic API key&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.perplexity.ai/" rel="noopener noreferrer"&gt;Perplexity API key&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Google account&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cost estimate:&lt;/strong&gt; ~$5-10/month for API usage with moderate research volume.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Set Up Your Knowledge Archive
&lt;/h2&gt;

&lt;p&gt;Create a new Google Sheet with these columns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Document Name | Document Content | Reference Link | Research Date | Keywords
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this structure?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Document Name&lt;/strong&gt;: Human-readable identifier for quick scanning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document Content&lt;/strong&gt;: Summary of findings (not full articles)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reference Link&lt;/strong&gt;: Source URL for verification&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Research Date&lt;/strong&gt;: Helps identify outdated research&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keywords&lt;/strong&gt;: Enables semantic search across topics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Save the Sheet URL—you'll need it for the n8n workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Import the n8n Template
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Download the template from the &lt;a href="https://github.com/alexretana/n8n-simple-archivist-template" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;In n8n, go to &lt;strong&gt;Workflows&lt;/strong&gt; → &lt;strong&gt;Import from File&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Select &lt;code&gt;Archivist Agent Template.json&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You'll see seven nodes connected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Chat Trigger → Archivist Agent → Claude Model
                        ↓
              [Simple Memory]
                        ↓
        ┌───────────────┴───────────────┐
        ↓                               ↓
  Perplexity Tool            Google Sheets Tools (x2)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: Configure Credentials
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Anthropic API
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Click &lt;strong&gt;Anthropic Chat Model&lt;/strong&gt; node&lt;/li&gt;
&lt;li&gt;Create credential → Enter your API key&lt;/li&gt;
&lt;li&gt;Ensure model is &lt;code&gt;claude-sonnet-4-5-20250929&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Perplexity API
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Click &lt;strong&gt;Message a model in Perplexity&lt;/strong&gt; node&lt;/li&gt;
&lt;li&gt;Create credential → Enter your API key&lt;/li&gt;
&lt;li&gt;Keep model as &lt;code&gt;sonar-pro&lt;/code&gt; for best research quality&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Google Sheets
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Click either Google Sheets node&lt;/li&gt;
&lt;li&gt;Create credential → Select &lt;strong&gt;OAuth2&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Follow Google's authorization flow&lt;/li&gt;
&lt;li&gt;Paste your Sheet URL in both nodes:

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Get row(s) in sheet&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Append or update row&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 4: Understanding the Agent System Prompt
&lt;/h2&gt;

&lt;p&gt;The core intelligence comes from the system prompt in the &lt;strong&gt;Archivist Agent&lt;/strong&gt; node. Here's what makes it work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Workflow Process&lt;/span&gt;

&lt;span class="gu"&gt;### Phase 1: Initial Check&lt;/span&gt;
When a user requests research:
&lt;span class="p"&gt;1.&lt;/span&gt; Search existing archive using "Get row(s) in sheet"
&lt;span class="p"&gt;2.&lt;/span&gt; If found, present existing research
&lt;span class="p"&gt;3.&lt;/span&gt; Confirm if user wants updated information

&lt;span class="gu"&gt;### Phase 2: New Research&lt;/span&gt;
If no existing research found:
&lt;span class="p"&gt;1.&lt;/span&gt; Conduct research using Perplexity AI
&lt;span class="p"&gt;2.&lt;/span&gt; Summarize findings
&lt;span class="p"&gt;3.&lt;/span&gt; Store in archive
&lt;span class="p"&gt;4.&lt;/span&gt; Provide summary to user

&lt;span class="gu"&gt;### Phase 3: Archive Management&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Search and retrieve specific topics
&lt;span class="p"&gt;-&lt;/span&gt; Update entries when needed
&lt;span class="p"&gt;-&lt;/span&gt; Organize content
&lt;span class="p"&gt;-&lt;/span&gt; Remove duplicates
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This three-phase approach ensures you &lt;strong&gt;never research the same topic twice&lt;/strong&gt; unless you explicitly need updated information.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Test Your Agent
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Click &lt;strong&gt;Save&lt;/strong&gt; and &lt;strong&gt;Activate&lt;/strong&gt; the workflow&lt;/li&gt;
&lt;li&gt;Click the &lt;strong&gt;Chat&lt;/strong&gt; button (webhook icon on the trigger node)&lt;/li&gt;
&lt;li&gt;Try these test queries:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;First research request:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Research the benefits of edge computing for web applications
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent will:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Check the archive (empty for first run)&lt;/li&gt;
&lt;li&gt;Conduct Perplexity research&lt;/li&gt;
&lt;li&gt;Store findings in your Sheet&lt;/li&gt;
&lt;li&gt;Return a summary&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Duplicate check:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;What do we have on edge computing?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent will:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Find your previous research&lt;/li&gt;
&lt;li&gt;Present existing findings&lt;/li&gt;
&lt;li&gt;Ask if you want updated research&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 6: Advanced Configuration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Adjust Memory Window
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;Simple Memory&lt;/strong&gt; node stores conversation context. Default is 15 messages. Increase for longer research sessions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;contextWindowLength&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;  &lt;span class="c1"&gt;// stores last 30 messages&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Customize Research Depth
&lt;/h3&gt;

&lt;p&gt;In the Perplexity node, adjust for different research needs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Quick facts&lt;/span&gt;
&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sonar&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;// Deep research (recommended)&lt;/span&gt;
&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sonar-pro&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Add Search Filters
&lt;/h3&gt;

&lt;p&gt;Modify the Google Sheets search node to filter by date:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Only search research from last 6 months&lt;/span&gt;
&lt;span class="nx"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Research Date &amp;gt;= DATE(2024, 4, 1)&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Real-World Usage Patterns
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Daily Standup Research
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"What research do we have on our current sprint topics?"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Technical Decision Making
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Compare our previous research on GraphQL vs REST APIs"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Onboarding New Developers
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Find all research related to our authentication architecture"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Knowledge Transfer
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"What did we learn about database sharding last quarter?"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Troubleshooting Common Issues
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Agent researches instead of checking archive first&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Verify Google Sheets credentials and that the Sheet URL includes the sheet tab name&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Perplexity returns generic results&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Craft more specific queries. Bad: "web security" Good: "OWASP top 10 mitigation strategies for Node.js REST APIs"&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Duplicate entries appearing&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Use consistent naming conventions. Create a naming guide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ "JWT Authentication Best Practices"&lt;/li&gt;
&lt;li&gt;❌ "jwt auth", "JWT stuff", "authentication research"&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Scaling Your Archive
&lt;/h2&gt;

&lt;p&gt;As your knowledge base grows, consider these enhancements:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Add Tagging System&lt;/strong&gt;&lt;br&gt;
Add a "Tags" column with comma-separated values:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Tags: authentication, security, nodejs, jwt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Create Research Templates&lt;/strong&gt;&lt;br&gt;
Define standard research formats for common topics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Technical Comparisons: Pros, Cons, Performance, Cost&lt;/li&gt;
&lt;li&gt;Tool Evaluations: Features, Integration, Community, Pricing&lt;/li&gt;
&lt;li&gt;Best Practices: Pattern, When to Use, Common Pitfalls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Implement Version Control&lt;/strong&gt;&lt;br&gt;
Track research updates by adding columns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Version | Last Updated By | Change Summary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Extension Challenge: Build a Weekly Digest
&lt;/h2&gt;

&lt;p&gt;Ready to level up? Here's your challenge: &lt;strong&gt;Create an automated weekly research digest&lt;/strong&gt; that emails you a summary of all research conducted in the past week.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hints:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add a &lt;strong&gt;Schedule Trigger&lt;/strong&gt; node that runs weekly&lt;/li&gt;
&lt;li&gt;Query Google Sheets for entries from the last 7 days&lt;/li&gt;
&lt;li&gt;Use Claude to generate a formatted summary&lt;/li&gt;
&lt;li&gt;Send via &lt;strong&gt;Gmail&lt;/strong&gt; or &lt;strong&gt;SendGrid&lt;/strong&gt; node&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Bonus points:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Include most-searched keywords&lt;/li&gt;
&lt;li&gt;Highlight research gaps (topics with old data)&lt;/li&gt;
&lt;li&gt;Add "Related research suggestions" using Claude&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Share your solution!&lt;/strong&gt; Post your workflow to the &lt;a href="https://community.n8n.io/" rel="noopener noreferrer"&gt;n8n community&lt;/a&gt; or tweet it with #n8n and tag me—I'd love to see what you build.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;Personal Knowledge Management isn't just productivity theater—it's a competitive advantage. When you can instantly recall research insights from six months ago, you make faster decisions. When your team shares a searchable knowledge archive, you eliminate duplicate work and accelerate onboarding.&lt;/p&gt;

&lt;p&gt;The Research Archivist Agent isn't just a tool—it's a mindset shift from "search and forget" to "research once, reference forever."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Next steps:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://github.com/alexretana/n8n-simple-archivist-template" rel="noopener noreferrer"&gt;Clone the repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Set up your workflow today&lt;/li&gt;
&lt;li&gt;Research your first topic&lt;/li&gt;
&lt;li&gt;Watch your knowledge compound&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Three months from now, you'll have a valuable archive of research that would have otherwise been lost to browser history and forgotten bookmarks.&lt;/p&gt;

&lt;p&gt;What will you research first?&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found this helpful? Drop a ❤️ and share it with your team. Have questions or improvements? Drop them in the comments below—I read and respond to every one.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>automation</category>
      <category>agents</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>Optimizing Multi-Agent Workflows in n8n: A Context-Aware Approach to Agent Handoffs</title>
      <dc:creator>Alex Retana</dc:creator>
      <pubDate>Wed, 01 Oct 2025 19:22:53 +0000</pubDate>
      <link>https://forem.com/alexretana/optimizing-multi-agent-workflows-in-n8n-a-context-aware-approach-to-agent-handoffs-1hc4</link>
      <guid>https://forem.com/alexretana/optimizing-multi-agent-workflows-in-n8n-a-context-aware-approach-to-agent-handoffs-1hc4</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffqshko5zarwhi2wh839s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffqshko5zarwhi2wh839s.png" alt="Sequential agent handoff workflow" width="800" height="394"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fag4niw8i2067zht19r9t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fag4niw8i2067zht19r9t.png" alt="Master agent with switch routing" width="800" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Optimizing Multi-Agent Workflows in n8n: A Context-Aware Approach to Agent Handoffs
&lt;/h1&gt;

&lt;p&gt;When working with multi-agent systems like the BMAD (Big Model, Agent Design) pattern, context window management becomes critical for model performance and cost efficiency. While you could dump an entire agent bundle into a Claude Project and let it figure things out, you'll quickly burn through tokens on instruction sets that may never be relevant to the current task.&lt;/p&gt;

&lt;p&gt;This tutorial demonstrates how to build intelligent agent routing in n8n—the popular node-based automation tool—that maintains tight control over context and enables direct user-to-subagent communication without wasteful token processing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;Traditional approaches to multi-agent orchestration often suffer from two key problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Context bloat&lt;/strong&gt;: Loading all agent instructions upfront wastes tokens on irrelevant context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Indirect communication&lt;/strong&gt;: Routing everything through a master agent doubles processing costs and adds latency&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;While Claude Projects offers solutions like separating master instructions from agent definitions and using RAG for knowledge retrieval, building a custom workflow in n8n gives you explicit control over data flow and context management. This pattern extends beyond chatbots—use it anywhere you need task-specific agents with optimized context windows.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Build: Two Demonstration Workflows
&lt;/h2&gt;

&lt;p&gt;I've created two n8n workflows that progressively demonstrate agent handoff patterns. Both use intentionally simple agent instructions to focus on the routing mechanics, but these patterns scale to complex production systems.&lt;/p&gt;

&lt;p&gt;You can copy the templates to import into your own n8n instance at my github repo: &lt;a href="https://github.com/alexretana/n8n-multi-agent-handoff-templates" rel="noopener noreferrer"&gt;N8n Multi Agent Handoff Templates&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Demo 1: Sequential Agent Pass-Through
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffqshko5zarwhi2wh839s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffqshko5zarwhi2wh839s.png" alt="Sequential agent handoff workflow" width="800" height="394"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This workflow demonstrates the fundamental pattern: &lt;strong&gt;how to pass control from one agent to another&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Flow breakdown:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Chat Trigger&lt;/strong&gt; receives the user message&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Agent 1&lt;/strong&gt; processes the input with access to:

&lt;ul&gt;
&lt;li&gt;OpenAI GPT-4.1-mini (shared language model)&lt;/li&gt;
&lt;li&gt;Simple Memory (conversation history)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Agent 1 outputs to &lt;strong&gt;two&lt;/strong&gt; destinations simultaneously:

&lt;ul&gt;
&lt;li&gt;"Respond to Chat" node (user feedback)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Agent 2&lt;/strong&gt; (next agent in chain)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9kqs4e9p8uwaa8lk4sam.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9kqs4e9p8uwaa8lk4sam.png" alt=" " width="800" height="909"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;AI Agent 2&lt;/strong&gt; receives Agent 1's output via the prompt template:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   {{ $json.output || $json.chatInput }}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This expression handles both the initial user input and subsequent agent outputs&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Agent 2 responds back to the user through "Respond to Chat1"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmn4x6j2o8wpeizy3jbxc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmn4x6j2o8wpeizy3jbxc.png" alt=" " width="800" height="924"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The loop continues—Agent 2's response feeds back into itself. As you can see below, when I ask what agent it is, it says agent 2, and with out routing through agent 1 (we don't see agent 1 messaged again).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq2ixuw0cv17zmtox1orr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq2ixuw0cv17zmtox1orr.png" alt=" " width="800" height="465"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key architectural decisions:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Shared memory&lt;/strong&gt;: Both agents use the same Simple Memory node to maintain conversation continuity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared model&lt;/strong&gt;: Single OpenAI connection reduces configuration overhead&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Branching output&lt;/strong&gt;: Agent 1 uses n8n's multiple output connections to respond AND handoff simultaneously&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Code reference&lt;/strong&gt; (from &lt;code&gt;Demonstrate Agent Pass Off.json&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"parameters"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"promptType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"define"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"={{ $json.output ||  $json.chatInput}}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"systemMessage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"You are Agent 2. If you're asked to respond to the chat with what agent you are, just say &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Yes, I'm Agent 2&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@n8n/n8n-nodes-langchain.agent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AI Agent 2"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Demo 2: Dynamic Routing with Master Agent
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fag4niw8i2067zht19r9t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fag4niw8i2067zht19r9t.png" alt="Master agent with switch routing" width="800" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This workflow adds &lt;strong&gt;intelligent routing&lt;/strong&gt;: a master agent decides which specialized agent should handle each request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Flow breakdown:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Chat Trigger&lt;/strong&gt; → &lt;strong&gt;AI Master Agent&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Master Agent analyzes the request and outputs &lt;strong&gt;structured JSON&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"direct_response_to_user"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"I'm routing you to Agent 2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"agent_to_route_to"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Agent 2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"forwarded_message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"User asked about X. Routing because Y."&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Map Master Agent's Response&lt;/strong&gt; extracts these fields using n8n expressions:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;   &lt;span class="nx"&gt;$json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parseJson&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nx"&gt;agent_to_route_to&lt;/span&gt;
   &lt;span class="nx"&gt;$json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parseJson&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nx"&gt;forwarded_message&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Data splits into two paths:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Master Agent Responds To Chat&lt;/strong&gt;: Sends routing explanation to user (no wait)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Switch Node&lt;/strong&gt;: Routes to Agent 1, 2, or 3 based on &lt;code&gt;agent_to_route_to&lt;/code&gt; value&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzt5uwezdove4726my928.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzt5uwezdove4726my928.png" alt=" " width="800" height="914"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Selected agent receives a &lt;strong&gt;contextualized prompt&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   User's Original Message:
   ${$('When chat message received').item.json.chatInput}

   Master Agent's message to you:
   ${$json.forwarded_message}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Agent responds through its dedicated chat node&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ndkfuw93lrymgcv9px4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ndkfuw93lrymgcv9px4.png" alt=" " width="800" height="929"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Critical differences from Demo 1:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Isolated memory&lt;/strong&gt;: Each agent (including Master) has separate memory nodes (Simple Memory1/2/3)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context preservation&lt;/strong&gt;: The forwarded message includes both the original user input AND the master's routing rationale&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel execution&lt;/strong&gt;: User gets immediate feedback while the selected agent processes in parallel&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Master Agent system prompt&lt;/strong&gt; (edited for clarity):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are the Master Agent. You route user requests to the correct agent.

IMPORTANT: Output only valid JSON in this format:

{
  "direct_response_to_user": "I'm routing you to Agent 1",
  "agent_to_route_to": "Agent 1",
  "forwarded_message": "**Summary of user request and routing rationale**"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why structured output matters&lt;/strong&gt;: The JSON format enables programmatic routing via the Switch node. In production, you'd add validation to handle malformed responses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation Details You Need to Know
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Context Window Optimization
&lt;/h3&gt;

&lt;p&gt;Each agent only loads:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Its own system prompt (~100-500 tokens)&lt;/li&gt;
&lt;li&gt;Relevant conversation history (window-buffered)&lt;/li&gt;
&lt;li&gt;The forwarded context from the master agent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Compare this to loading all 3 agent instruction sets upfront—you'd waste thousands of tokens per request.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Switch Node Configuration
&lt;/h3&gt;

&lt;p&gt;The Switch node uses n8n's rule-based routing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"conditions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"conditions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"leftValue"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"={{ $json.agent_to_route_to }}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"rightValue"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Agent 2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"operator"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"operation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"equals"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three rules match "Agent 1", "Agent 2", or "Agent 3" exactly. Unmatched requests fall through (you'd want error handling in production).&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory Architecture Trade-offs
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Demo 1&lt;/strong&gt;: Shared memory allows agents to reference each other's outputs naturally, but blurs agent boundaries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Demo 2&lt;/strong&gt;: Isolated memory per agent creates cleaner separation but requires explicit context passing via &lt;code&gt;forwarded_message&lt;/code&gt;. This scales better for specialized agents with distinct conversation contexts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running These Workflows
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Import the JSON files into n8n (both included with this post)&lt;/li&gt;
&lt;li&gt;Configure your OpenAI API credentials in the "OpenAI: gpt-4.1-mini" node&lt;/li&gt;
&lt;li&gt;Activate the workflow&lt;/li&gt;
&lt;li&gt;Open the chat interface via the webhook URL&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Test prompts:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Who are you?" (tests agent self-identification)&lt;/li&gt;
&lt;li&gt;"Pass me to Agent 2" (tests routing logic)&lt;/li&gt;
&lt;li&gt;"What did Agent 1 say?" (tests memory persistence)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;The natural evolution is &lt;strong&gt;bidirectional routing&lt;/strong&gt;: subagents should be able to return control to the master when they complete their task. This creates a true orchestration layer where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Master Agent delegates to specialists&lt;/li&gt;
&lt;li&gt;Specialists execute and report back&lt;/li&gt;
&lt;li&gt;Master Agent synthesizes results or delegates further&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Challenge for you&lt;/strong&gt;: Can you modify Demo 2 to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Let each subagent indicate completion in its output (maybe via JSON like the master)?&lt;/li&gt;
&lt;li&gt;Route completed tasks back to the Master Agent?&lt;/li&gt;
&lt;li&gt;Have the Master Agent decide whether to route again or provide a final response?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This pattern mirrors how tools like LangGraph handle cyclic agent flows, but with explicit control over every transition.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;: Multi-agent systems in n8n benefit from explicit routing and context management. Use sequential pass-through (Demo 1) for simple pipelines; use master-agent routing with structured output (Demo 2) for dynamic task distribution. Both patterns dramatically reduce token waste compared to loading all agent instructions upfront. Next step: implement agent-to-master return logic for full orchestration loops.&lt;/p&gt;

&lt;p&gt;The workflows demonstrated here show that intelligent agent handoffs aren't magic—they're just careful data flow management. n8n's visual interface makes the logic transparent, which is invaluable when debugging complex agent interactions or optimizing for cost.&lt;/p&gt;

&lt;p&gt;Try implementing the return-to-master pattern yourself, and share your solution in the comments. What other agent routing patterns would be useful for your projects?&lt;/p&gt;

</description>
      <category>performance</category>
      <category>tutorial</category>
      <category>llm</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Streamlining MCP Management: Bundle Multiple Servers with FastMCP Proxies</title>
      <dc:creator>Alex Retana</dc:creator>
      <pubDate>Tue, 23 Sep 2025 18:14:36 +0000</pubDate>
      <link>https://forem.com/alexretana/streamlining-mcp-management-bundle-multiple-servers-with-fastmcp-proxies-n3i</link>
      <guid>https://forem.com/alexretana/streamlining-mcp-management-bundle-multiple-servers-with-fastmcp-proxies-n3i</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Model Context Protocol (MCP) servers have revolutionized how AI applications access external tools and data sources. From web browsing with Playwright to documentation search with Context7, MCPs provide a standardized way to extend AI capabilities beyond their training data.&lt;/p&gt;

&lt;p&gt;However, as the MCP ecosystem grows, managing multiple servers becomes increasingly complex. Each MCP server typically requires separate installation, configuration, and maintenance across different clients like Claude Desktop, Cursor, or Claude Code. This fragmentation creates several pain points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Configuration sprawl&lt;/strong&gt;: Each client needs individual server configurations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dependency conflicts&lt;/strong&gt;: Different servers may require conflicting Python versions or packages
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource overhead&lt;/strong&gt;: Multiple server processes consume unnecessary system resources&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintenance burden&lt;/strong&gt;: Updates and troubleshooting multiply across installations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;FastMCP's proxy capabilities solve these challenges by allowing you to bundle multiple MCP servers behind a single endpoint. Combined with FastMCP's CLI tools, you can easily deploy this unified proxy to any MCP client with a single command.&lt;/p&gt;

&lt;p&gt;I created a small github repo with example code if you'd like to follow along with it. &lt;a href="https://github.com/alexretana/FastMCP-Simple-Proxy-Bundling" rel="noopener noreferrer"&gt;alexretana/FastMCP-Simple-Proxy-Bundling&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important Warning&lt;/strong&gt;: While bundling MCPs is convenient, be mindful of tool overload. Providing too many tools to an MCP client can overwhelm the AI model and degrade performance. Start with essential tools and add more selectively based on your specific workflow needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Install FastMCP
&lt;/h3&gt;

&lt;p&gt;They recommend using &lt;a href="https://docs.astral.sh/uv/getting-started/installation/" rel="noopener noreferrer"&gt;uv&lt;/a&gt; to install and manage FastMCP. You can install it directly with &lt;code&gt;uv pip&lt;/code&gt; or &lt;code&gt;pip&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Using uv (recommended)&lt;/span&gt;
uv pip &lt;span class="nb"&gt;install &lt;/span&gt;fastmcp

&lt;span class="c"&gt;# Or using pip&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;fastmcp

&lt;span class="c"&gt;# Add as a tool through uv (My preference)&lt;/span&gt;
uv tool &lt;span class="nb"&gt;install &lt;/span&gt;fastmcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Install uv (Required for MCP Client Integration)
&lt;/h3&gt;

&lt;p&gt;FastMCP's CLI tools require &lt;code&gt;uv&lt;/code&gt; for dependency management when installing to MCP clients. Install &lt;code&gt;uv&lt;/code&gt; for your platform:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Windows:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Using PowerShell&lt;/span&gt;
powershell &lt;span class="nt"&gt;-ExecutionPolicy&lt;/span&gt; ByPass &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"irm https://astral.sh/uv/install.ps1 | iex"&lt;/span&gt;

&lt;span class="c"&gt;# Or using pip&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;uv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;macOS:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Using Homebrew (recommended)&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;uv

&lt;span class="c"&gt;# Or using curl&lt;/span&gt;
curl &lt;span class="nt"&gt;-LsSf&lt;/span&gt; https://astral.sh/uv/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Linux:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Using curl&lt;/span&gt;
curl &lt;span class="nt"&gt;-LsSf&lt;/span&gt; https://astral.sh/uv/install.sh | sh

&lt;span class="c"&gt;# Or using pip&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;uv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Verify Installation
&lt;/h3&gt;

&lt;p&gt;To verify that FastMCP is installed correctly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;fastmcp version
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see output like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;fastmcp version
FastMCP version:                           2.11.3
MCP version:                               1.12.4
Python version:                            3.12.2
Platform:            macOS-15.3.1-arm64-arm-64bit
FastMCP root path:            ~/Developer/fastmcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmu1sy94jrug6zv0lqo7j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmu1sy94jrug6zv0lqo7j.png" alt="Screenshot: Terminal showing fastmcp version output with version details" width="800" height="160"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Running with JSON Configuration
&lt;/h2&gt;

&lt;p&gt;FastMCP can run servers directly from JSON configuration files, making it easy to define and deploy multi-server setups. Let's create a configuration that bundles Context7 (documentation search) and Playwright (web automation) into a single endpoint.&lt;/p&gt;

&lt;p&gt;Create a file named &lt;code&gt;fastmcp.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"$schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://gofastmcp.com/public/schemas/fastmcp.json/v1.json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"environment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uv"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"python"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;gt;=3.10"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; 
    &lt;/span&gt;&lt;span class="nl"&gt;"dependencies"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"fastmcp"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"deployment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"transport"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sse"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"log_level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DEBUG"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"context7"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@upstash/context7-mcp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--api-key"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"YOUR_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"playwright"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; 
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"@playwright/mcp@latest"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Configuration breakdown&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;environment&lt;/strong&gt;: Specifies Python version and FastMCP dependency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;deployment&lt;/strong&gt;: Sets transport method and logging level&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;mcpServers&lt;/strong&gt;: Defines the backend servers to proxy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Replace &lt;code&gt;YOUR_API_KEY&lt;/code&gt; with your actual Context7 API key from &lt;a href="https://upstash.com/" rel="noopener noreferrer"&gt;Upstash&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Now run the server using FastMCP CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;fastmcp run fastmcp.json &lt;span class="nt"&gt;--transport&lt;/span&gt; http &lt;span class="nt"&gt;--host&lt;/span&gt; localhost &lt;span class="nt"&gt;--port&lt;/span&gt; 53456
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: the values I use in the cli call will overwrite options I picked in fastmcp.json file. I intentionally made them conflicting to point this out.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv7iijxtr2zku5qmtkbgl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv7iijxtr2zku5qmtkbgl.png" alt="Screenshot: Terminal showing FastMCP server starting up with log messages about loading both servers" width="800" height="471"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The server will start on &lt;code&gt;http://localhost:53456&lt;/code&gt; and automatically proxy requests to both Context7 and Playwright servers. You can test it by accessing the server endpoint directly or integrating it with MCP clients.&lt;/p&gt;

&lt;p&gt;Although I didn't demonstrate it here, the github repo also includes a dockerfile example if you need help getting started with dockerizing FastMCP.&lt;/p&gt;

&lt;h2&gt;
  
  
  Define a Python File and CLI Install Feature
&lt;/h2&gt;

&lt;p&gt;While JSON configuration works well for direct server execution, you might prefer a Python-based approach for more complex scenarios or better IDE support. Let's create a simple proxy server file.&lt;/p&gt;

&lt;p&gt;Create &lt;code&gt;mcp-proxy.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastmcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt;

&lt;span class="c1"&gt;# Your MCP servers configuration
&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mcpServers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;context7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;command&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;args&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@upstash/context7-mcp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--api-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;playwright&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;command&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;args&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@playwright/mcp@latest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Create the proxy
&lt;/span&gt;&lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_proxy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Multi-MCP-Proxy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This Python file defines the same proxy configuration as our JSON, but in a more programmatic format that allows for easier customization and extension.&lt;/p&gt;

&lt;h3&gt;
  
  
  Installing in Claude Code
&lt;/h3&gt;

&lt;p&gt;FastMCP's CLI makes installation trivial. For Claude Code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;fastmcp &lt;span class="nb"&gt;install &lt;/span&gt;claude-code mcp-proxy.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fesh8c1zowo4a6a322wpq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fesh8c1zowo4a6a322wpq.png" alt="Screenshot: Terminal showing successful installation message for Claude Code" width="800" height="39"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This command automatically configures Claude Code to run your proxy server with all necessary dependencies managed by &lt;code&gt;uv&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Installing in Claude Desktop
&lt;/h3&gt;

&lt;p&gt;For Claude Desktop installation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;fastmcp &lt;span class="nb"&gt;install &lt;/span&gt;claude-desktop mcp-proxy.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fswbi5r2b0f03233ctuak.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fswbi5r2b0f03233ctuak.png" alt="Screenshot: Terminal showing successful installation message and configuration file path" width="800" height="114"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I got an error on my freshly installed claude desktop. That is because it couldn't find the claude_desktop_config.json. You can go to Settings &amp;gt; Developer tab, and click 'Edit Config', and it'll automatically make one. Then run the fastmcp install command again, and that should resolve.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F84cem1cto6a1kbauq1ff.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F84cem1cto6a1kbauq1ff.png" alt="Claude Desktop's Settings Developer page prior to installing config file" width="800" height="599"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The install command automatically updates Claude Desktop's configuration file with the proper server entry, including dependency management through &lt;code&gt;uv&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Testing the Installation
&lt;/h3&gt;

&lt;p&gt;Open Claude Desktop (Note: you will need to restart Claude Desktop after installing the FastMCP server) and verify the installation worked by asking it to search for FastMCP documentation. I actually had many issues trying to do this. It ran in Claude Code just fine, but for some reason I couldn't get context7 to work in Claude Desktop like this, only playwright. Honestly, Claude Code is better at using tools anyways, so you shouldn't lean as much on Claude Desktop using tools. Granted that the context7 tool is pretty good for the 'planning period' of agentic development, so you'll probably want to install it. Also, as of writing of this article, the MCP tool in Claude Desktop is a beta feature (so it can get better and more bug free in the future), and I believe they are trying to focus more on extension instead since Claude Desktop has a more general audience (not just programmers) compared to Claude Code. That's just my speculations.&lt;/p&gt;

&lt;p&gt;Just to finish this demonstration, I removed context7 temporarily from mcp-proxy.py, to show how a working mcp tool looks like in Claude Desktop.&lt;/p&gt;

&lt;p&gt;From a new chat, you can click the plus, and see what mcp servers are available and what tools are exposed. You can even enable/disable why server or tool. You should definitely leverage this feature.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F43t1xnfwv1p3kfpeplzz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F43t1xnfwv1p3kfpeplzz.png" alt="Screen shot of Claude Desktop showing the confirmation that tools are available and enabled from the installed FastMCP proxy server" width="800" height="593"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, to try out the tool, make a simple request like:&lt;br&gt;
"using the playwright mcp tool, can go to gofastmcp.com ?"&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmc0bulksdzhjkhw2ibbd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmc0bulksdzhjkhw2ibbd.png" alt="Screenshot: Claude Desktop interface showing a successful search result from Playwright with FastMCP documentation" width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The response should show that Claude successfully used the Playwright tool to go to the FastMCP homepage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;: Create JSON or Python proxy configs, run with &lt;code&gt;fastmcp run&lt;/code&gt;, install to clients with &lt;code&gt;fastmcp install&lt;/code&gt;. Automatic dependency management via &lt;code&gt;uv&lt;/code&gt; handles the complexity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;FastMCP's proxy capabilities transform MCP server management from a fragmented, per-client configuration nightmare into a streamlined, centralized approach. By bundling multiple servers behind a single endpoint, you gain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Simplified deployment&lt;/strong&gt;: One proxy serves all your MCP tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistent configuration&lt;/strong&gt;: Single source of truth across all clients
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource efficiency&lt;/strong&gt;: Fewer running processes and managed dependencies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easy maintenance&lt;/strong&gt;: Update proxy configuration once, benefit everywhere&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The CLI tools make integration seamless—whether you prefer JSON configurations for simplicity or Python files for programmability, FastMCP handles the complexity of dependency management and client integration automatically.&lt;/p&gt;

&lt;p&gt;As the MCP ecosystem continues growing, this proxy pattern will become increasingly valuable for developers who want to harness multiple specialized tools without the operational overhead. Start with your most essential MCPs, test the performance impact, and gradually expand your toolkit as needed.&lt;/p&gt;

&lt;p&gt;Remember: the goal isn't to bundle every available MCP, but to create a curated, efficient collection that enhances your AI workflows without overwhelming the underlying models.&lt;/p&gt;

</description>
      <category>fastmcp</category>
      <category>claude</category>
      <category>mcp</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>The Job Pilot Chronicles: 94 Commits, 27 Days, and the Brutal Reality of AI-Assisted Development</title>
      <dc:creator>Alex Retana</dc:creator>
      <pubDate>Tue, 16 Sep 2025 22:54:11 +0000</pubDate>
      <link>https://forem.com/alexretana/the-job-pilot-chronicles-94-commits-27-days-and-the-brutal-reality-of-ai-assisted-development-2cek</link>
      <guid>https://forem.com/alexretana/the-job-pilot-chronicles-94-commits-27-days-and-the-brutal-reality-of-ai-assisted-development-2cek</guid>
      <description>&lt;p&gt;&lt;em&gt;A brutally honest story of building a full-stack app in the AI age - where every "firts commit" typo and late-night debugging session reveals what we're really signing up for&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Hook That Started Everything
&lt;/h2&gt;

&lt;p&gt;It was August 20th, 2025, 11:47 PM. I typed &lt;code&gt;git commit -m "firts commit"&lt;/code&gt; and hit enter.&lt;/p&gt;

&lt;p&gt;Yes, "firts." With a typo. Because apparently, even in the age of AI coding assistants that can write entire applications, I still can't spell "first" correctly when I'm excited about a new project.&lt;/p&gt;

&lt;p&gt;That typo-laden commit would become the first of 94 commits across 27 days - a journey that perfectly captures the paradox every developer faces in 2025: &lt;strong&gt;AI tools promise to make us faster and smarter, but somehow we're still debugging our own mistakes at 2 AM, wondering if we're more productive or just more confused.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl935q0clnd9a6fx8i1bg.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl935q0clnd9a6fx8i1bg.jpg" alt="Ai Generated Image of a Desktop with contrasting warm and cool colors" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers Don't Lie (But They Don't Tell the Whole Story Either)
&lt;/h2&gt;

&lt;p&gt;Let me hit you with the raw data from my git archaeology:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;94 commits&lt;/strong&gt; in 27 days (that's 3.5 commits per day, for those keeping score)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;100% solo development&lt;/strong&gt; (just me, my coffee, and an army of AI assistants)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Peak intensity&lt;/strong&gt;: 78 commits in August alone&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Top commit message keywords&lt;/strong&gt;: "api" (23 times), "implement" (21 times), "tests" (13 times)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But here's what those numbers &lt;em&gt;don't&lt;/em&gt; show:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hours spent arguing with AI about why its "perfect" code wouldn't compile&lt;/li&gt;
&lt;li&gt;The number of times I copy-pasted AI suggestions that looked brilliant but were subtly, devastatingly wrong&lt;/li&gt;
&lt;li&gt;How many "implement frontend" commits were actually "please God just make this work" in disguise&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sound familiar? You're not alone.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Great AI Coding Reality Check
&lt;/h2&gt;

&lt;p&gt;While I was grinding through Job Pilot, researchers were documenting what every developer using AI tools secretly knows but rarely admits: &lt;strong&gt;we're not actually as productive as we think we are.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Recent studies show that 84% of developers now use AI coding assistants, but here's the kicker - experienced developers actually take &lt;strong&gt;19% longer&lt;/strong&gt; to complete tasks when using AI tools. We expected a 24% speedup. Instead, we got a productivity drag.&lt;/p&gt;

&lt;p&gt;Why? Because AI solutions are "almost right, but not quite" 45% of the time. And debugging almost-right code is somehow more painful than writing it from scratch.&lt;/p&gt;

&lt;p&gt;My commit history tells this exact story.&lt;/p&gt;

&lt;h2&gt;
  
  
  Act I: The Foundation Fantasy (August 20-25)
&lt;/h2&gt;

&lt;p&gt;After that infamous "firts commit," I did what any developer does when starting fresh - I immediately restructured everything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;August 23rd&lt;/strong&gt;: &lt;em&gt;"Complete project restructure: app → backend"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This wasn't procrastination. This was wisdom. I was setting up proper separation of concerns before they became technical debt. The AI tools were great at generating boilerplate, but they had zero opinions about project architecture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;August 25th&lt;/strong&gt;: The API explosion began.&lt;/p&gt;

&lt;p&gt;In a single day, I built:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Authentication &amp;amp; Authorization endpoints&lt;/li&gt;
&lt;li&gt;Job listings CRUD&lt;/li&gt;
&lt;li&gt;User profiles API&lt;/li&gt;
&lt;li&gt;Companies API&lt;/li&gt;
&lt;li&gt;Job Applications API&lt;/li&gt;
&lt;li&gt;Resumes API&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My commit messages from that day read like a developer's fever dream:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;"feat: Implement FastAPI backend structure with TDD approach"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"Add comprehensive job listings endpoints with validation"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"Implement user authentication with JWT tokens"&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I felt unstoppable. The AI was cranking out endpoint after endpoint. I was a full-stack architect, building the future of job searching.&lt;/p&gt;

&lt;p&gt;Then I tried to connect it all together.&lt;/p&gt;

&lt;h2&gt;
  
  
  Act II: The API Renaissance Meets Reality (August 25-26)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;August 26th&lt;/strong&gt; was when I learned the first brutal lesson about AI-assisted development: &lt;strong&gt;AI is excellent at writing isolated components, terrible at making them work together.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While GitHub Copilot was suggesting perfect-looking API routes, and Claude was generating comprehensive test suites, none of them understood how my authentication middleware should interact with my database models, or why my job deduplication logic was creating infinite loops.&lt;/p&gt;

&lt;p&gt;The commit messages tell the story:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;"Fix authentication middleware integration"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"Debug job deduplication infinite loop"&lt;/em&gt; &lt;/li&gt;
&lt;li&gt;&lt;em&gt;"Implement proper error handling across all endpoints"&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmgziwjwj11v1zjycesdc.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmgziwjwj11v1zjycesdc.jpg" alt="AI Generated Cartoon Developer in denial of surrounding flames" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This mirrors what 90% of developers report: AI tools struggle with large codebase context. They don't understand your existing patterns, your architectural decisions, or the subtle dependencies between your modules.&lt;/p&gt;

&lt;p&gt;I was becoming a translator between different AI suggestions, spending more time debugging AI-generated integration bugs than I would have spent just writing the damn thing myself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Act III: The Frontend Struggle is Real (August 27-30)
&lt;/h2&gt;

&lt;p&gt;And then came the frontend. Oh, the frontend.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;August 27th&lt;/strong&gt;: &lt;em&gt;"Write out plan for migrating jsx component to new reworked frontend service"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Translation: "The AI generated a bunch of React components that look perfect in isolation but form a Frankenstein's monster when assembled."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;August 28th&lt;/strong&gt;: &lt;em&gt;"Restart on making the tsx rendering layer; using playwright mcp this time"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Translation: "Nothing works. Starting over. Again."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;August 30th&lt;/strong&gt;: The commits from this day perfectly capture the AI development experience:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;"Ai wrote a bunch for some reason"&lt;/em&gt; (When you let Claude take the wheel and it generated 200 lines of code you didn't ask for)&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"Done; no human validation, hope this went well"&lt;/em&gt; (Every developer's prayer when using AI)&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"Finish reworking frontend; still bugs"&lt;/em&gt; (Brutal honesty)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the reality nobody talks about: &lt;strong&gt;AI tools excel at the easy stuff but struggle with the integration layer where real applications live.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Act IV: The Testing Enlightenment (Late August)
&lt;/h2&gt;

&lt;p&gt;Here's where my story diverges from the typical "AI made me super productive" narrative. When everything was falling apart, I doubled down on testing.&lt;/p&gt;

&lt;p&gt;Not because I'm some testing evangelist, but because &lt;strong&gt;testing was the only way to verify that AI suggestions actually worked.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;My commits became:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;"Big effort to implement playwright testing"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"Implement full-stack test suite, and first integration tests"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"Create foundation for Playwright Testing"&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Testing became my AI reality check. When Claude confidently assured me that its authentication flow was "production-ready," my tests caught the security holes. When Copilot generated database queries that looked elegant, my integration tests revealed they'd break under load.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is the pattern successful AI-assisted developers follow&lt;/strong&gt;: Use AI for generation, humans for validation, and tests for truth.&lt;/p&gt;

&lt;h2&gt;
  
  
  Act V: The September Reset (The Plot Twist)
&lt;/h2&gt;

&lt;p&gt;After August's 78-commit marathon, something interesting happened. I burned out. Hard.&lt;/p&gt;

&lt;p&gt;But instead of abandoning the project, I did something that separates experienced developers from beginners: &lt;strong&gt;I made a strategic reset.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;September 15th&lt;/strong&gt;: &lt;em&gt;"Branch reset: Reset main branch to ff06fcd"&lt;/em&gt; followed by &lt;em&gt;"Restart Frontend Components"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This wasn't failure. This was wisdom. Sometimes the best progress is admitting your current approach isn't working and starting fresh with better knowledge.&lt;/p&gt;

&lt;p&gt;The September commits show a more measured approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;"Remove a bunch of the bloat"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"Begin frontend rework"&lt;/em&gt; (yes, another typo - some things never change)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"Create first draft of frontend rewrite"&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Real Lessons (What They Don't Tell You About AI Development)
&lt;/h2&gt;

&lt;p&gt;After 94 commits and 27 days, here's what I learned about developing with AI in 2025:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. AI is a Powerful Intern, Not a Senior Developer
&lt;/h3&gt;

&lt;p&gt;AI tools are incredible at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generating boilerplate code&lt;/li&gt;
&lt;li&gt;Writing isolated functions&lt;/li&gt;
&lt;li&gt;Creating comprehensive test cases&lt;/li&gt;
&lt;li&gt;Suggesting patterns you forgot existed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But they're terrible at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Understanding your existing codebase context&lt;/li&gt;
&lt;li&gt;Making architectural decisions&lt;/li&gt;
&lt;li&gt;Debugging integration issues&lt;/li&gt;
&lt;li&gt;Knowing when to stop generating code&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. The "Almost Right" Problem is Real
&lt;/h3&gt;

&lt;p&gt;That 45% statistic about AI being "almost right, but not quite"? It's devastatingly accurate. &lt;/p&gt;

&lt;p&gt;Almost-right code is worse than obviously broken code because it looks correct until it fails in production. You spend more time debugging subtle AI mistakes than you would writing correct code from scratch.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Testing Becomes Non-Negotiable
&lt;/h3&gt;

&lt;p&gt;In the pre-AI era, you could sometimes get away with light testing if you wrote careful code. With AI assistance, comprehensive testing isn't optional - it's the only way to verify that generated code actually works.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Humans Excel at Integration and Architecture
&lt;/h3&gt;

&lt;p&gt;AI generates components. Humans integrate systems. The real skill in AI-assisted development isn't prompting the AI to write perfect code - it's knowing how to combine AI-generated pieces into a coherent, maintainable system.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. The Restart is a Feature, Not a Bug
&lt;/h3&gt;

&lt;p&gt;Traditional development wisdom says "never rewrite." AI-assisted development changes this. Sometimes, starting fresh with better prompts and clearer architecture is faster than debugging a messy AI-generated codebase.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Paradox We're All Living
&lt;/h2&gt;

&lt;p&gt;Here's the thing that nobody wants to admit: &lt;strong&gt;AI coding tools are simultaneously the best and most frustrating thing to happen to development.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;They're the best because they can generate complex, sophisticated code in seconds. They handle boilerplate, suggest patterns, and can kickstart projects that would take days to set up manually.&lt;/p&gt;

&lt;p&gt;They're the most frustrating because they create a false sense of progress. You feel incredibly productive generating hundreds of lines of code, until you realize none of it works together and you're debugging problems you don't understand.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for You
&lt;/h2&gt;

&lt;p&gt;If you're using AI coding tools (and statistically, you probably are), here's my advice based on 94 commits of real experience:&lt;/p&gt;

&lt;h3&gt;
  
  
  Start with Architecture, Not Code
&lt;/h3&gt;

&lt;p&gt;Don't let AI drive your technical decisions. Plan your system architecture first, then use AI to implement the pieces.&lt;/p&gt;

&lt;h3&gt;
  
  
  Embrace the Human-AI Workflow
&lt;/h3&gt;

&lt;p&gt;Use AI for generation, yourself for validation, and tests for verification. This trinity is your safeguard against the "almost right" problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Budget for Integration Time
&lt;/h3&gt;

&lt;p&gt;AI can generate components in minutes, but integrating them takes human time. Plan accordingly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Test Everything
&lt;/h3&gt;

&lt;p&gt;If AI wrote it, test it. If you modified AI code, test it again. If it looks too good to be true, definitely test it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Know When to Reset
&lt;/h3&gt;

&lt;p&gt;Sometimes starting fresh with better prompts is faster than debugging a tangled AI-generated mess.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Job Pilot Epilogue
&lt;/h2&gt;

&lt;p&gt;As I write this, Job Pilot is alive and actively being developed. It's not the perfect app I envisioned on August 20th, but it's something better - a real application built through the messy, iterative process of human-AI collaboration.&lt;/p&gt;

&lt;p&gt;The final commit message as of September 16th reads: &lt;em&gt;"Document frontend-backend connection"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Not "Revolutionary AI-Generated App Launches" or "Perfect Code Generated by AI." Just the humble work of documenting how pieces fit together - the most human part of development.&lt;/p&gt;

&lt;p&gt;That's the real story of coding with AI in 2025. It's not about AI replacing developers or making us obsolete. It's about learning to collaborate with incredibly powerful but fundamentally limited tools.&lt;/p&gt;

&lt;p&gt;We're not just writing code anymore. We're conducting an orchestra where half the musicians are brilliant but can't read music, and the other half (us) need to make sure everyone plays in harmony.&lt;/p&gt;

&lt;p&gt;And sometimes, just sometimes, when all the pieces align and the tests pass and the user clicks through your app without encountering a single bug, you remember why you fell in love with building software in the first place.&lt;/p&gt;

&lt;p&gt;Even if your first commit had a typo.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Your AI Development Story?
&lt;/h2&gt;

&lt;p&gt;I've shared my 94-commit journey into the reality of AI-assisted development. Now I want to hear yours. &lt;/p&gt;

&lt;p&gt;Have you experienced the "almost right, but not quite" problem? How do you balance AI assistance with human judgment? What's your biggest AI development win or failure?&lt;/p&gt;

&lt;p&gt;Share your story in the comments - let's build a real picture of what development looks like in the AI age, beyond the hype and the headlines.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Want to see the complete commit history that inspired this post? Check out the Job Pilot repository (coming soon) or follow my development journey for more real stories from the trenches of AI-assisted coding.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>beginners</category>
      <category>qwen</category>
    </item>
    <item>
      <title>Crafting a Monster Hunter Wilds AI Assistant: Scrapy, Vector Search &amp; Prompt Engineering</title>
      <dc:creator>Alex Retana</dc:creator>
      <pubDate>Tue, 12 Aug 2025 21:54:59 +0000</pubDate>
      <link>https://forem.com/alexretana/crafting-a-monster-hunter-wilds-ai-assistant-scrapy-vector-search-prompt-engineering-5253</link>
      <guid>https://forem.com/alexretana/crafting-a-monster-hunter-wilds-ai-assistant-scrapy-vector-search-prompt-engineering-5253</guid>
      <description>&lt;h1&gt;
  
  
  Building a Local Monster Hunter Wilds RAG System: From Web Scraping to Prompt Engineering
&lt;/h1&gt;

&lt;p&gt;Gaming wikis are treasure troves of detailed information, but finding the right answer to specific questions can be like hunting a Rathalos in a thunderstorm. What if you could have a personal Monster Hunter expert that knows every weapon combo, monster weakness, and crafting recipe? That's exactly what I built with my Monster Hunter Wilds RAG (Retrieval-Augmented Generation) system.&lt;/p&gt;

&lt;p&gt;In this article, I'll walk you through building a complete RAG pipeline that scrapes gaming wiki content, vectorizes it for fast retrieval, and serves intelligent answers through a local web interface. Along the way, we'll explore why certain architectural decisions were made and how prompt engineering can dramatically improve system performance.&lt;/p&gt;

&lt;h2&gt;
  
  
  🏗️ System Architecture: Two-Part Approach
&lt;/h2&gt;

&lt;p&gt;The system consists of two main components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Intelligent Web Scraper&lt;/strong&gt;: Harvests and structures wiki content&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAG Pipeline&lt;/strong&gt;: Retrieves relevant content and generates contextual answers&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let's dive into each part.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 1: Building the Web Scraper with Scrapy
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why Scrapy Over Custom Solutions?
&lt;/h3&gt;

&lt;p&gt;When building a web scraper, you have several options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Write a custom scraper with &lt;code&gt;requests&lt;/code&gt; and &lt;code&gt;BeautifulSoup&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Use browser automation tools like Selenium&lt;/li&gt;
&lt;li&gt;Leverage a professional scraping framework like Scrapy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I chose &lt;strong&gt;Scrapy&lt;/strong&gt; for several compelling reasons:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Built-in Politeness&lt;/strong&gt;: Scrapy respects &lt;code&gt;robots.txt&lt;/code&gt; files and implements automatic delays between requests, making it respectful to target servers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Robust Crawling Features&lt;/strong&gt;: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pause/resume functionality through &lt;code&gt;JOBDIR&lt;/code&gt; settings&lt;/li&gt;
&lt;li&gt;Automatic duplicate detection and filtering&lt;/li&gt;
&lt;li&gt;Depth limiting to prevent infinite crawling&lt;/li&gt;
&lt;li&gt;Built-in retry mechanisms for failed requests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Scalability&lt;/strong&gt;: Scrapy handles concurrent requests efficiently and can scale from small wikis to massive sites.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Extensibility&lt;/strong&gt;: The pipeline architecture allows for easy data processing and storage customization.&lt;/p&gt;

&lt;h3&gt;
  
  
  Spider Implementation
&lt;/h3&gt;

&lt;p&gt;Here's the core of my Fextralife spider:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MyFextralifeSpider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scrapy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Spider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;myfextralifespider&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;allowed_domains&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;monsterhunterwilds.wiki.fextralife.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;start_urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://monsterhunterwilds.wiki.fextralife.com/Monster+Hunter+Wilds+Wiki&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;custom_settings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JOBDIR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;jobs/daily-fextralife-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;today&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DEPTH_LIMIT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CLOSESPIDER_TIMEOUT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ITEM_PIPELINES&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;wikiproject.pipelines.WikiprojectPipeline&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The spider automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Follows internal links within the domain&lt;/li&gt;
&lt;li&gt;Skips static assets (images, CSS, JS files)&lt;/li&gt;
&lt;li&gt;Limits crawl depth to prevent infinite loops&lt;/li&gt;
&lt;li&gt;Saves progress for pause/resume functionality&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Intelligent Content Extraction
&lt;/h3&gt;

&lt;p&gt;The magic happens in the content parsing. Wiki pages contain both structured (tables) and unstructured (text) content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse_wiki_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Extract clean text content from the main wiki content block
&lt;/span&gt;    &lt;span class="n"&gt;wikicontent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;//div[@id=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wiki-content-block&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;]//text()&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;getall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;])).&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\xa0&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wikicontent&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse_wiki_tables&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Convert HTML tables to structured JSON
&lt;/span&gt;    &lt;span class="c1"&gt;# Handles nested tables, images with alt text, and complex structures
&lt;/span&gt;    &lt;span class="c1"&gt;# Returns normalized data ready for vectorization
&lt;/span&gt;    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The system extracts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Breadcrumb navigation&lt;/strong&gt; for content categorization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clean text content&lt;/strong&gt; from the main wiki areas&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured table data&lt;/strong&gt; converted to JSON format&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;URL references&lt;/strong&gt; for source attribution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each page is transformed into a structured document:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;If the user's answer is answered by information in this file, please direct them to {url}
URL: {url}
####################
Page Title: {title}
####################
Breadcrumb: {breadcrumb}
####################
Page Content:
{clean_text_content}
####################
Page Tables Stored as JSON:
{structured_table_data}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Critical on_close() Function
&lt;/h3&gt;

&lt;p&gt;Here's where the scraped data gets vectorized and stored. In Scrapy's pipeline system, the &lt;code&gt;close_spider&lt;/code&gt; method in &lt;code&gt;pipelines.py&lt;/code&gt; is called when crawling finishes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;close_spider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;spider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Deduplicate scraped content and build breadcrumb map
&lt;/span&gt;    &lt;span class="n"&gt;breadcrumb_map&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total_page_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;dedupe_and_build_breadcrumb_map&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Total Pages Scraped: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;total_page_count&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Data has been ingested into Chroma vector store&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;dedupe_and_build_breadcrumb_map()&lt;/code&gt; function handles the final data processing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;upsert_into_chroma&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Upserts DataFrame content into Chroma vector store.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Starting Chroma ingestion...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Initialize embedding model
&lt;/span&gt;    &lt;span class="n"&gt;embed_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FixedOllamaEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nomic-embed-text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Create persistent Chroma client
&lt;/span&gt;    &lt;span class="n"&gt;chroma_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chromadb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PersistentClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;../chroma_db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;chroma_collection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chroma_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_or_create_collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;monsterhunter_fextralife_wiki&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;vector_store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChromaVectorStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chroma_collection&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;chroma_collection&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Convert to LlamaIndex Documents and create vector index
&lt;/span&gt;    &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wiki_content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]})&lt;/span&gt; 
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iterrows&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;

    &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;VectorStoreIndex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;storage_context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;storage_context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embed_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embed_model&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Successfully ingested &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach ensures all scraped content is automatically vectorized and ready for semantic search.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 2: The RAG System with OpenWebUI
&lt;/h2&gt;

&lt;h3&gt;
  
  
  OpenWebUI + Pipelines Architecture
&lt;/h3&gt;

&lt;p&gt;I chose OpenWebUI as the frontend because it provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Familiar Chat Interface&lt;/strong&gt;: ChatGPT-like experience for users&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pipeline System&lt;/strong&gt;: Custom processing between user input and LLM&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local Hosting&lt;/strong&gt;: Complete control over data and privacy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple Model Support&lt;/strong&gt;: Works with Ollama's local models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pipeline architecture works like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Query → OpenWebUI → Custom Pipeline → Chroma Search → Context + Query → LLM → Response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Early Implementation: Simple Interception
&lt;/h3&gt;

&lt;p&gt;Initially, the pipeline was quite basic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;__user__&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Simply intercept the message
&lt;/span&gt;    &lt;span class="n"&gt;user_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Search Chroma for relevant content  
&lt;/span&gt;    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Lazily combine results with query
&lt;/span&gt;    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;enhanced_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Context: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Query: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# Pass to LLM
&lt;/span&gt;    &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;enhanced_query&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This worked, but responses were generic and often missed domain-specific nuances.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 3: Evaluation Framework
&lt;/h2&gt;

&lt;p&gt;Before diving into improvements, I built a comprehensive evaluation system to measure performance objectively.&lt;/p&gt;

&lt;h3&gt;
  
  
  Evaluation Metrics
&lt;/h3&gt;

&lt;p&gt;Following LlamaIndex best practices, I implemented both end-to-end and component-wise evaluation:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;End-to-End Metrics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Faithfulness&lt;/strong&gt; (0-1): Are responses faithful to retrieved context? (No hallucinations)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Relevancy&lt;/strong&gt; (0-1): Are responses relevant to the query?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Correctness&lt;/strong&gt; (0-1): Are responses factually correct?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic Similarity&lt;/strong&gt; (0-1): How similar are responses to expected answers?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Component-Wise Metrics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hit Rate&lt;/strong&gt;: Percentage of queries where relevant documents are retrieved&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mean Reciprocal Rank (MRR)&lt;/strong&gt;: Quality of retrieval ranking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Response Time&lt;/strong&gt;: Performance measurement&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Dataset Generation
&lt;/h3&gt;

&lt;p&gt;I created two approaches for generating evaluation data:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Automated Question Generation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_questions_from_vectorstore&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# Sample random documents from Chroma
&lt;/span&gt;    &lt;span class="c1"&gt;# Use LLM to generate realistic questions
&lt;/span&gt;    &lt;span class="c1"&gt;# Create diverse query types (factual, procedural, comparative)
&lt;/span&gt;    &lt;span class="c1"&gt;# Categorize by content type (weapons, monsters, crafting, etc.)
&lt;/span&gt;    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Manual Answer Annotation:&lt;/strong&gt;&lt;br&gt;
I built a separate annotation program that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Takes generated questions&lt;/li&gt;
&lt;li&gt;Retrieves potential answers from the RAG system&lt;/li&gt;
&lt;li&gt;Presents them to human reviewers for validation&lt;/li&gt;
&lt;li&gt;Builds high-quality ground truth datasets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This hybrid approach ensured both scale (144 auto-generated questions) and quality (15 carefully curated sample queries).&lt;/p&gt;
&lt;h2&gt;
  
  
  Part 4: The Power of Prompt Engineering
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Pre-Prompt Engineering Results
&lt;/h3&gt;

&lt;p&gt;Running evaluation on the basic system revealed significant issues:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dataset&lt;/th&gt;
&lt;th&gt;Faithfulness&lt;/th&gt;
&lt;th&gt;Relevancy&lt;/th&gt;
&lt;th&gt;Correctness&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sample Queries (15)&lt;/td&gt;
&lt;td&gt;80.0%&lt;/td&gt;
&lt;td&gt;86.67%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;26.67%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Generated Questions (144)&lt;/td&gt;
&lt;td&gt;77.08%&lt;/td&gt;
&lt;td&gt;90.97%&lt;/td&gt;
&lt;td&gt;83.33%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The correctness scores revealed a major problem: while the system could find relevant information, it struggled to provide accurate, domain-specific answers.&lt;/p&gt;
&lt;h3&gt;
  
  
  What is Prompt Engineering?
&lt;/h3&gt;

&lt;p&gt;Prompt engineering is the practice of designing, optimizing, and refining the instructions given to language models to achieve better performance on specific tasks. It involves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Role Definition&lt;/strong&gt;: Establishing the AI's persona and expertise&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context Guidelines&lt;/strong&gt;: Specifying how to use provided information&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output Formatting&lt;/strong&gt;: Defining response structure and style&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error Handling&lt;/strong&gt;: Instructions for edge cases and missing information&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Custom Monster Hunter Prompts
&lt;/h3&gt;

&lt;p&gt;I implemented domain-specific prompts that transformed the system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;mh_qa_template&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PromptTemplate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are an expert Monster Hunter guide and wiki assistant with deep knowledge &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;of Monster Hunter: Wilds. Your role is to provide accurate, helpful information &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;about weapons, monsters, gameplay mechanics, and strategies.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;IMPORTANT GUIDELINES:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- Use ONLY the information provided in the context below&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- Use correct Monster Hunter terminology (e.g., &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Great Sword&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; not &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Greatsword&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- If information is insufficient, clearly state what you cannot answer&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- Include relevant URLs when directing users to specific pages&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- Structure responses clearly with sections when appropriate&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Context Information:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{context_str}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User Question: {query_str}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Provide a comprehensive answer based on the context above:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key improvements included:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Expert Persona&lt;/strong&gt;: "You are an expert Monster Hunter guide"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Terminology Enforcement&lt;/strong&gt;: Specific language requirements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context Boundaries&lt;/strong&gt;: "Use ONLY the information provided"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Response Structure&lt;/strong&gt;: Clear formatting guidelines&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Source Attribution&lt;/strong&gt;: Including URLs for references&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Results After Prompt Engineering
&lt;/h3&gt;

&lt;p&gt;The impact was dramatic:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dataset&lt;/th&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sample Queries&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Correctness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;26.67%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;93.33%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+250%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sample Queries&lt;/td&gt;
&lt;td&gt;Faithfulness&lt;/td&gt;
&lt;td&gt;80.0%&lt;/td&gt;
&lt;td&gt;80.0%&lt;/td&gt;
&lt;td&gt;Maintained&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Generated Questions&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Correctness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;83.33%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;91.67%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+10%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Generated Questions&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Faithfulness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;77.08%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;86.11%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+12%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Key Performance Highlights
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Exceptional Correctness Improvement:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sample dataset correctness jumped from 26.67% to 93.33% - a 250% improvement&lt;/li&gt;
&lt;li&gt;Large dataset correctness increased from 83.33% to 91.67%&lt;/li&gt;
&lt;li&gt;Users now receive significantly more accurate responses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Enhanced Faithfulness:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;12% improvement on large dataset (reduced hallucinations)&lt;/li&gt;
&lt;li&gt;Better adherence to source material&lt;/li&gt;
&lt;li&gt;Increased system reliability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Domain Expertise Integration:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Proper Monster Hunter terminology usage&lt;/li&gt;
&lt;li&gt;Contextually appropriate responses&lt;/li&gt;
&lt;li&gt;Category-specific performance improvements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system now provides accurate answers 9 out of 10 times, with responses that stay true to the source material while being highly relevant to user queries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 5: Local Hosting and Hardware Considerations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why Local Over Cloud?
&lt;/h3&gt;

&lt;p&gt;I made the conscious decision to keep this system local rather than hosting it online for several reasons:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost Considerations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPU Requirements&lt;/strong&gt;: The system performs best with GPU acceleration for embeddings and LLM inference&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High Memory Usage&lt;/strong&gt;: Running multiple large language models (embedding + chat model) requires significant RAM&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage Needs&lt;/strong&gt;: Vector databases and model files consume substantial disk space&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compute Costs&lt;/strong&gt;: Cloud GPU instances are expensive for continuous operation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Privacy Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complete control over data&lt;/li&gt;
&lt;li&gt;No external API dependencies&lt;/li&gt;
&lt;li&gt;Gaming queries remain private&lt;/li&gt;
&lt;li&gt;Can customize without service restrictions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Hardware Requirements
&lt;/h3&gt;

&lt;p&gt;The system runs smoothly on my RTX 3090 setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPU&lt;/strong&gt;: RTX 3090 (24GB VRAM) - handles both embedding and LLM inference&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAM&lt;/strong&gt;: 32GB system RAM for vector operations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage&lt;/strong&gt;: SSD storage for fast vector database access&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Performance with RTX 3090:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Embedding Generation&lt;/strong&gt;: ~2-3 seconds for query embedding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector Search&lt;/strong&gt;: Sub-second retrieval from Chroma&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM Inference&lt;/strong&gt;: 8-15 seconds for complete responses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total Response Time&lt;/strong&gt;: 10-20 seconds end-to-end&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Automated Setup Scripts
&lt;/h3&gt;

&lt;p&gt;To make the system accessible, I created comprehensive build and startup scripts:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Frontend Build Process:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Windows&lt;/span&gt;
build.bat

&lt;span class="c"&gt;# Linux/macOS  &lt;/span&gt;
./build.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;System Startup:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Windows&lt;/span&gt;
start_windows.bat

&lt;span class="c"&gt;# Linux/macOS&lt;/span&gt;
./start.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The scripts automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Install and configure Ollama server&lt;/li&gt;
&lt;li&gt;Download required AI models (llama3:8b, nomic-embed-text)&lt;/li&gt;
&lt;li&gt;Set up conda environments for different components&lt;/li&gt;
&lt;li&gt;Build the OpenWebUI frontend&lt;/li&gt;
&lt;li&gt;Launch all services in separate terminal windows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This automation transforms a complex multi-component system into a simple double-click experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Lessons Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Scrapy's Professional Features Matter
&lt;/h3&gt;

&lt;p&gt;The built-in politeness, retry mechanisms, and pause/resume capabilities saved countless hours compared to custom solutions.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Data Quality Trumps Quantity
&lt;/h3&gt;

&lt;p&gt;150 well-processed, structured documents outperformed thousands of poorly parsed pages.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Prompt Engineering is Critical
&lt;/h3&gt;

&lt;p&gt;Generic prompts led to 26.67% correctness; domain-specific prompts achieved 93.33% - a game-changing difference.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Evaluation Drives Improvement
&lt;/h3&gt;

&lt;p&gt;Without quantitative metrics, I would have never discovered the correctness issues or measured the dramatic improvements.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Local Hosting is Viable for Personal Projects
&lt;/h3&gt;

&lt;p&gt;Modern consumer GPUs like the RTX 3090 make sophisticated AI systems accessible for personal use without ongoing cloud costs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Future Enhancements
&lt;/h2&gt;

&lt;p&gt;Several improvements could further enhance the system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-game Support&lt;/strong&gt;: Extend to other gaming wikis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advanced Context&lt;/strong&gt;: Conversation history and user preferences&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance Optimization&lt;/strong&gt;: Reduce response times while maintaining quality&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mobile Interface&lt;/strong&gt;: Responsive design for gaming on-the-go&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community Features&lt;/strong&gt;: Shared question libraries and answer validation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Building this Monster Hunter RAG system taught me that modern AI tools can transform how we interact with domain-specific knowledge. The combination of intelligent web scraping, vector search, and carefully engineered prompts creates an experience far superior to traditional wiki browsing.&lt;/p&gt;

&lt;p&gt;The system went from providing correct answers 1 in 4 times to 9 in 10 times through prompt engineering alone. This demonstrates the critical importance of domain-specific customization in RAG systems.&lt;/p&gt;

&lt;p&gt;For gaming enthusiasts, researchers, or anyone working with specialized knowledge domains, this architecture provides a blueprint for building your own intelligent information systems. The complete codebase, evaluation framework, and setup scripts make it accessible even for those new to RAG systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Want to build your own gaming RAG system?&lt;/strong&gt; The complete project is open source and includes automated setup scripts, comprehensive evaluation tools, and detailed documentation to get you started.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Happy hunting! 🏹&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tech Stack Used:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Web Scraping&lt;/strong&gt;: Scrapy, BeautifulSoup&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector Database&lt;/strong&gt;: ChromaDB&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM Framework&lt;/strong&gt;: LlamaIndex
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Models&lt;/strong&gt;: Ollama (Llama 3, Nomic Embed)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontend&lt;/strong&gt;: OpenWebUI (SvelteKit)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation&lt;/strong&gt;: Custom framework with automated metrics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Languages&lt;/strong&gt;: Python, JavaScript, Shell scripting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;This project showcases the power of combining modern AI tools with careful engineering to create practical, high-performance systems for specialized domains.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>rag</category>
      <category>webscraping</category>
      <category>vectordatabase</category>
    </item>
    <item>
      <title>From SageMaker to Static Site: Hosting a Deep Learning Model on the Frontend</title>
      <dc:creator>Alex Retana</dc:creator>
      <pubDate>Mon, 28 Jul 2025 20:13:16 +0000</pubDate>
      <link>https://forem.com/alexretana/from-sagemaker-to-static-site-hosting-a-deep-learning-model-on-the-frontend-26kf</link>
      <guid>https://forem.com/alexretana/from-sagemaker-to-static-site-hosting-a-deep-learning-model-on-the-frontend-26kf</guid>
      <description>&lt;p&gt;A couple of weeks ago, I revisited an old project: a face mask classifier I originally built in Keras.&lt;br&gt;
In my last article, I retrained it three different ways:&lt;/p&gt;

&lt;p&gt;✅ Classic deep learning (TensorFlow inside SageMaker Studio)&lt;/p&gt;

&lt;p&gt;⚙️ Low-code SageMaker Canvas&lt;/p&gt;

&lt;p&gt;🧠 Fully managed Rekognition Custom Labels&lt;/p&gt;

&lt;p&gt;This time, I wanted to see if I could make the model run entirely in the browser. Not just for fun, but because it felt like a win on multiple fronts: it would remove backend inference costs since there’d be no server running 24/7; keep the webcam feed local to the user’s machine, improving privacy; and create a live demo anyone could try instantly, without waiting on an API call or spinning up infrastructure.&lt;/p&gt;

&lt;p&gt;Here’s how that turned out, and why it was trickier than I first thought.&lt;/p&gt;
&lt;h2&gt;
  
  
  ⚙️ Step 1: Converting my Keras model for TensorFlow.js
&lt;/h2&gt;

&lt;p&gt;My original classifier was trained in SageMaker Studio, saved in the latest Keras v3 format.&lt;br&gt;
Problem: TensorFlow.js only supports converting the old Keras v2 .h5 format.&lt;/p&gt;

&lt;p&gt;So the first thing I had to do:&lt;/p&gt;

&lt;p&gt;Retrain the model (same code, but explicitly save it to .h5)&lt;/p&gt;

&lt;p&gt;Use the CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;tensorflowjs_converter &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--input_format&lt;/span&gt; keras &lt;span class="se"&gt;\&lt;/span&gt;
  my_model.h5 &lt;span class="se"&gt;\&lt;/span&gt;
  ./model_web/
That produced a browser-ready model.json + weight files.
Loading it &lt;span class="k"&gt;in &lt;/span&gt;JS was simple:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loadLayersModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/model_web/model.json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This felt like a small win — but the model only takes cropped face images.&lt;br&gt;
Next problem: how do I detect faces?&lt;/p&gt;
&lt;h2&gt;
  
  
  📦 Step 2: Adding face detection
&lt;/h2&gt;

&lt;p&gt;In the original pipeline, I used a YOLOv3 model to detect faces, then classified them.&lt;br&gt;
But YOLOv3 is heavy for browser use.&lt;/p&gt;

&lt;p&gt;I needed something smaller, that worked in TensorFlow.js.&lt;/p&gt;

&lt;p&gt;Luckily, TensorFlow.js has some pretrained lightweight face detectors.&lt;br&gt;
I picked one, tested it on the webcam stream, and it worked surprisingly well:&lt;/p&gt;

&lt;p&gt;Detect faces → crop → run classifier → draw predictions&lt;/p&gt;

&lt;p&gt;All in real time.&lt;/p&gt;

&lt;p&gt;Suddenly, I had a browser app that could see your face and tell if you were wearing a mask — without sending anything to a server.&lt;/p&gt;
&lt;h2&gt;
  
  
  🚫 Honorable Mention: Why Canvas &amp;amp; Rekognition models didn’t make it
&lt;/h2&gt;

&lt;p&gt;At this point, I was hoping I could also bring over the models I built in SageMaker Canvas and Rekognition to run directly in the browser. But pretty quickly, I ran into hard limits: SageMaker Canvas only lets you export a model meant for Python or TensorFlow Serving, with no option to get a .h5 or SavedModel that I could convert for TensorFlow.js; and Rekognition Custom Labels doesn’t let you download the trained model at all — it’s locked behind AWS’s API. Since the whole goal was to keep everything frontend-only and client-side, these two paths just didn’t fit. It was a good reminder that the more managed and abstracted a tool is, the less portable your model ends up being.&lt;/p&gt;
&lt;h2&gt;
  
  
  🧰 Step 3: Building the demo &amp;amp; making it repeatable
&lt;/h2&gt;

&lt;p&gt;With the model running locally in the browser, I wanted to take the next step: actually host it online so anyone could try it, and make deployments effortless. To do that, I built a small React frontend that grabs the webcam feed, detects faces, runs the mask classifier, and draws the predictions on screen in real time. Then I wrote some Terraform to handle the infrastructure: provisioning a public S3 bucket for static hosting, a CloudFront distribution for global CDN, and IAM roles to support CI/CD. Finally, I set up GitHub Actions so that every time I push to the repository, it automatically builds the site and deploys it to S3.&lt;/p&gt;

&lt;p&gt;Now it’s fully repeatable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform apply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And it’s live.&lt;/p&gt;

&lt;h2&gt;
  
  
  ✅ Wrapping it up
&lt;/h2&gt;

&lt;p&gt;In the end, what started as an old Keras side project turned into a modern, privacy‑friendly browser demo — running real‑time face detection and mask classification entirely on the client. To clean up the frontend, I rebuilt it using Solid.js for fast reactivity, styled it with Tailwind CSS and daisyUI, and added subtle animations with auto‑animate and solid‑transition‑group to make the UI feel more alive.&lt;/p&gt;

&lt;p&gt;I even tried to get it working on mobile devices, but ran into a familiar wall: the model was just too big to run smoothly in the browser on most phones. At that point, training a new, smaller model felt like it deserved to be its own project — and I decided to leave it for another day.&lt;/p&gt;

&lt;p&gt;Still, I’m happy with how it turned out: a repeatable, low‑cost, fully frontend ML demo that anyone can try without sending a single frame to a backend. And while it’s not production‑ready, it’s proof that with the right tools and some cloud glue, you can bring even an old deep learning project back to life — and make it feel brand new.&lt;/p&gt;

&lt;p&gt;If you’ve tried something similar, run into the same Keras/TensorFlow.js headaches, or have ideas on building lighter models for mobile, I’d love to hear about it in the comments!&lt;/p&gt;

&lt;p&gt;You can &lt;strong&gt;try the live demo here → &lt;a href="https://face-mask-classifier-demo.retanatech.com" rel="noopener noreferrer"&gt;Face Mask Classifier Demo&lt;/a&gt;&lt;/strong&gt;, and if you’re curious about my other projects, check out my portfolio at &lt;strong&gt;&lt;a href="https://retanatech.com" rel="noopener noreferrer"&gt;retanatech.com&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

</description>
      <category>tensorflow</category>
      <category>terraform</category>
      <category>webdev</category>
      <category>computervision</category>
    </item>
    <item>
      <title>Comparing 3 ways to Train a Face Mask Classifier: Tensorflow, AWS Canvas, and Rekognition</title>
      <dc:creator>Alex Retana</dc:creator>
      <pubDate>Thu, 10 Jul 2025 21:48:27 +0000</pubDate>
      <link>https://forem.com/alexretana/comparing-3-ways-to-deploy-a-face-mask-classifier-tensorflow-aws-canvas-and-rekognition-266d</link>
      <guid>https://forem.com/alexretana/comparing-3-ways-to-deploy-a-face-mask-classifier-tensorflow-aws-canvas-and-rekognition-266d</guid>
      <description>&lt;h2&gt;
  
  
  🛠️ Introduction
&lt;/h2&gt;

&lt;p&gt;A few years ago, I built a simple face mask image classifier using Keras and TensorFlow, trained locally on my own hardware. Recently, I decided to revisit this project for a few reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;To see how easy (or hard) it would be to rerun my old Jupyter notebook from 4–5 years ago.&lt;/li&gt;
&lt;li&gt;To try running custom training jobs inside Amazon SageMaker Studio, instead of relying on my own machine.&lt;/li&gt;
&lt;li&gt;And while I was at it, I wanted to compare my custom-trained model against other ways of building and deploying models on AWS, including low-code/no-code tools and out-of-the-box computer vision APIs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here are the three approaches I tested:&lt;/p&gt;

&lt;p&gt;✅ Classic deep learning: Running my original Jupyter notebook inside a SageMaker Studio JupyterLab instance, retraining the model with TensorFlow, then hosting it for a front-end demo using TensorFlow.js + S3.&lt;/p&gt;

&lt;p&gt;⚙️ Low-code/no-code: Using AWS SageMaker Canvas, which lets you upload images and build models through a point-and-click UI, without writing code.&lt;/p&gt;

&lt;p&gt;🧠 Fully managed pre-trained service: Using AWS Rekognition’s facial analysis API to see if it can detect masks directly — no training required.&lt;/p&gt;

&lt;p&gt;For each method, I wanted to evaluate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ease of training/setup&lt;/li&gt;
&lt;li&gt;Options for deployment (can it run in the frontend? backend only? real-time or batch?)&lt;/li&gt;
&lt;li&gt;AWS pricing cost&lt;/li&gt;
&lt;li&gt;Computational cost &amp;amp; latency (how fast can it return predictions?)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the rest of this article, I’ll walk through each method, compare their results, and share what I learned along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  📦 Method 1: Classic Deep Learning (TensorFlow + Jupyter)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  📜 Revisiting the Old Project
&lt;/h3&gt;

&lt;p&gt;The starting point for this method is my older project:&lt;br&gt;&lt;br&gt;
👉 &lt;a href="https://github.com/alexretana/facemaskclassifier" rel="noopener noreferrer"&gt;alexretana/facemaskclassifier on GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This was a small computer vision project I created a few years ago to explore transfer learning and pretrained models. The goal was to build a pipeline that could detect faces in an image and classify whether each face was wearing a mask. To do this, I combined a YOLOv3 model (pretrained to detect faces) with a custom classifier trained to recognize masks.&lt;/p&gt;

&lt;p&gt;The workflow was straightforward: given an input image, the YOLOv3 model would identify and draw bounding boxes around the faces. Each detected face would then be cropped and passed to the mask classifier, which predicted “mask” or “no mask” along with a confidence score. Finally, the pipeline overlaid labels on the image to show the results.&lt;/p&gt;

&lt;p&gt;I learned a lot during this process, especially about loading and fine-tuning pretrained models, feature extraction, and how to stitch multiple models together into a single pipeline.&lt;/p&gt;

&lt;p&gt;Special thanks to &lt;a href="https://pyimagesearch.com/" rel="noopener noreferrer"&gt;PyImageSearch&lt;/a&gt; by Adrian Rosebrock. Many tutorials there helped me build this!&lt;/p&gt;

&lt;p&gt;If you’re curious, the repo contains several notebooks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;PlayAroundWithPretrainModels.ipynb&lt;/code&gt; – experimenting with pretrained models&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;TransferLearning-FeatureExtraction.ipynb&lt;/code&gt; – logistic regression on extracted features&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;TransferLearning-FineTurning.ipynb&lt;/code&gt; – fine-tuning pretrained model layers&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;predict.ipynb&lt;/code&gt; – final pipeline: detection → cropping → classification → annotated output&lt;/li&gt;
&lt;/ul&gt;



&lt;p&gt;Next, I'll describe how I retrained and ran this project &lt;strong&gt;inside SageMaker Studio&lt;/strong&gt; instead of on my local machine.&lt;/p&gt;
&lt;h3&gt;
  
  
  ⚙️ Running in SageMaker Studio
&lt;/h3&gt;

&lt;p&gt;With my old notebooks ready, I wanted to see how easy it would be to train the same model using AWS SageMaker Studio, instead of my local machine.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🛠 If you haven’t set up SageMaker Studio yet, here’s &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-quick-start.html" rel="noopener noreferrer"&gt;AWS’s quick start guide&lt;/a&gt; — it walks through creating the Studio environment in a few clicks.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once my SageMaker Studio was provisioned, the workflow was surprisingly smooth. From the Studio home dashboard, it’s straightforward to launch new compute instances to run Jupyter notebooks or other tools. I started by spinning up an &lt;code&gt;ml.t3.medium&lt;/code&gt; instance, the cheapest option at the time of writing, just to get started.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu32aximvaj9gun83w25y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu32aximvaj9gun83w25y.png" alt=" " width="800" height="357"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The UI makes it easy to open a terminal or create a new notebook. I opened the terminal to clone my old project repo from GitHub. One thing I quickly realized: my original project didn’t include a &lt;code&gt;requirements.txt&lt;/code&gt; file (lesson learned for the future!). Thankfully, SageMaker’s default environments already come with many common libraries pre-installed, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;pandas&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;numpy&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;tensorflow&lt;/code&gt; / &lt;code&gt;keras&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;scikit-learn&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The only extra dependencies I had to install were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;imutils&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;opencv-python&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For OpenCV to work properly, it also needed an additional system package:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; libgl1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The biggest hiccup I ran into was around dataset preparation: my old notebooks didn’t include clear instructions or scripts to recreate the train/validation/test splits. I had to figure that part out again before training could actually run. The dataset itself has over 10,000 images (but is thankfully only around 20 MB). At first, I tried simply dragging and dropping the dataset into the JupyterLab web interface, but this turned out to be unreliable: not every file transferred, and it took a long time.&lt;/p&gt;

&lt;p&gt;From reading the docs and best practices, a better solution (and a common pattern for larger file transfers) was to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Upload the dataset to an S3 bucket&lt;/li&gt;
&lt;li&gt;Download it from S3 to the notebook instance using the terminal&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Uploading to S3 took about 20 minutes, but copying it down to the notebook instance was much faster; probably under a minute. This workflow felt much cleaner and avoided partial transfers.&lt;/p&gt;

&lt;p&gt;Aside from that, the first notebook &lt;code&gt;TransferLearning-FeatureExtraction.ipynb&lt;/code&gt; ran without any code changes. But I did run into another practical issue: the ml.t3.medium instance didn’t have enough RAM, and the process kept running out of memory, which would crash the kernel and restart the instance.&lt;/p&gt;

&lt;p&gt;The fix was simple: I shut down the notebook instance and upgraded it to an ml.m5d.2xlarge instance (which has about 32GB RAM, which is roughly what I used to have on my local dev machine). After restarting, everything picked up right where it left off. No need to clone the repos and redownload images; however, the packages did have to get reinstalled.&lt;/p&gt;

&lt;p&gt;After training my model in the new SageMaker environment, I wanted to compare the training curves to those from my earlier runs a few years ago.&lt;/p&gt;

&lt;p&gt;In this chart, you can see there are two graphs for each year. That’s because the transfer learning process includes two rounds of training: first training only the network head, and then fine-tuning the entire model after unfreezing more layers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flgym1ar55a446nganuom.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flgym1ar55a446nganuom.png" alt=" " width="800" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzwnzvjuysvs2my4wtqbo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzwnzvjuysvs2my4wtqbo.png" alt=" " width="800" height="389"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While the overall accuracy results are similar, I noticed that the training loss and training accuracy curves are much noisier and more sporadic in the recent run.&lt;/p&gt;

&lt;p&gt;From what I’ve read, improvements in data augmentation, optimizer updates, and weight initialization defaults in frameworks like Keras and TensorFlow over the last few years can produce this kind of noisier but potentially more robust training process. If anyone has experience or thoughts on why this might happen, I’d love to hear your perspective in the comments!&lt;/p&gt;

&lt;h2&gt;
  
  
  ⚙️ Method 2: Low-Code / No-Code with SageMaker Canvas
&lt;/h2&gt;

&lt;p&gt;For the second approach, I wanted to try AWS SageMaker Canvas — a no-code tool that lets you build machine learning models through a web UI, without writing a single line of code.&lt;/p&gt;

&lt;p&gt;The first step was to prepare my dataset in a format Canvas could use. To do this, I reorganized the images into labeled folders (e.g., &lt;code&gt;mask/&lt;/code&gt; and &lt;code&gt;no_mask/&lt;/code&gt;). When you import data in Canvas, it can automatically use the folder names as class labels. I then uploaded this new dataset structure into an S3 bucket.&lt;/p&gt;

&lt;p&gt;In Canvas, creating the dataset is straightforward: you create a new dataset and point it at your S3 bucket location. Once imported, you can see the list of images and labels Canvas detected.&lt;/p&gt;




&lt;h3&gt;
  
  
  🏗 Training the model
&lt;/h3&gt;

&lt;p&gt;I kicked off a &lt;strong&gt;standard training job&lt;/strong&gt; (since the quick mode couldn't handle the size of my dataset). Canvas estimated it might take 3–5 hours, but in reality it completed in under 2 hours — maybe even less than one.&lt;/p&gt;

&lt;p&gt;The best part? It was truly one-click training: Canvas doesn’t ask you to choose architectures or tune hyperparameters. Instead, it quietly evaluates multiple candidate models behind the scenes, though it doesn’t disclose exactly which models it tried or what metrics guided the selection.&lt;/p&gt;




&lt;h3&gt;
  
  
  📊 Model evaluation &amp;amp; explainability
&lt;/h3&gt;

&lt;p&gt;For evaluation, Canvas automatically showed me per-label accuracy so I could see which class performed better, along with actual examples of images it got right or wrong. It also generated heatmaps (using Class Activation Maps) that highlighted where the model focused when making decisions, and included a confusion matrix to visualize where it confused “masked” vs “unmasked.” All of this appeared right after training finished, without needing to write any visualization code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F670nnwgrpbocecfkqhh5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F670nnwgrpbocecfkqhh5.png" alt=" " width="800" height="423"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftohw89vwg10ezk2vu4l6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftohw89vwg10ezk2vu4l6.png" alt=" " width="800" height="266"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5cij2ob996yrj8htxpg4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5cij2ob996yrj8htxpg4.png" alt=" " width="800" height="314"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  ⚡ Making predictions
&lt;/h3&gt;

&lt;p&gt;When it came time to test the model, Canvas offered two options: upload a single image to get an instant prediction, or run a batch prediction over multiple images at once. I tried both, but unfortunately the outputs either came back empty or had “FAILED” values in the CSV results, so I decided to skip ahead and deploy the model as an inference endpoint instead.&lt;/p&gt;

&lt;p&gt;With just a few clicks, Canvas can deploy your trained model to an endpoint you can call via API, and I did that so I could finish my evaluation outside of the Canvas UI.&lt;/p&gt;

&lt;p&gt;Starting from code written in fine tuning, I adapted a similar function to evaluate the accuracy of this model's predictions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftmm6nduhrioe56n64xbe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftmm6nduhrioe56n64xbe.png" alt=" " width="585" height="278"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The results were surprisingly good: Canvas’s model ended up with slightly better accuracy than my manually trained TensorFlow model. However, batch processing did take a bit longer overall. Though it’s worth noting that both models were running inference on the same instance type &lt;code&gt;ml.m5d.2xlarge&lt;/code&gt;, so the comparison is fair in terms of hardware. Here’s the classification report showing the final accuracy and per-class metrics:&lt;/p&gt;

&lt;p&gt;In the end, SageMaker Canvas impressed me: it handled training, visualization, and deployment with almost no code. While I did run into some quirks with the batch prediction UI, the overall experience was very beginner-friendly — and the final model quality was competitive with a hand-crafted TensorFlow pipeline (granted my model is 5 years old).&lt;/p&gt;

&lt;h2&gt;
  
  
  🧠 Method 3: Fully Managed Pre-Trained Service (Rekognition Custom Labels)
&lt;/h2&gt;

&lt;p&gt;For the last approach, I wanted to explore Amazon Rekognition’s &lt;strong&gt;Custom Labels&lt;/strong&gt; feature, which lets you train your own image classifier on a custom dataset — still without writing code, but built directly into Rekognition’s console rather than SageMaker. The interface make following the steps developing your model straight forward and stream line. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpfko9h8pwa2l3prwdgif.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpfko9h8pwa2l3prwdgif.png" alt=" " width="800" height="225"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The setup was familiar: I uploaded my dataset to an S3 bucket, using labeled folders (&lt;code&gt;masked/&lt;/code&gt; and &lt;code&gt;unmasked/&lt;/code&gt;) so Rekognition could automatically detect the classes. After confirming the dataset, training was supposed to be as simple as clicking a button and waiting for it to finish.&lt;/p&gt;

&lt;p&gt;However, the training failed on my first attempt. After digging into the documentation, I realized Rekognition requires all images in the training and test datasets to meet a minimum resolution. My original dataset included images smaller than that threshold. To fix this, I wrote a quick script to resize all images to an acceptable resolution, re-uploaded the updated dataset to S3, and restarted the training job.&lt;/p&gt;

&lt;p&gt;In hindsight, this might explain why the prediction feature in Canvas also struggled with the same dataset, although it’s interesting that the inference endpoint created by Canvas worked fine with those smaller images.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4zmddx6js0rlxiv931hs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4zmddx6js0rlxiv931hs.png" alt=" " width="800" height="429"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After the training completed (which took about an hour), the results ended up being pretty comparable to what I got with SageMaker Canvas, and noticeably better than my old YOLOv3-based code.  &lt;/p&gt;

&lt;p&gt;One important limitation, though: unlike Canvas, Rekognition Custom Labels doesn’t let you register and download the raw model artifact. Instead, you’re fully dependent on calling Rekognition’s API for inference. That makes the solution less portable if you ever want to run the model outside AWS. On the plus side, this also means it’s incredibly quick to get started: after training finishes, you can deploy and start making predictions right away. Overall, this makes Rekognition Custom Labels a strong option for &lt;strong&gt;proof-of-concept projects&lt;/strong&gt; or when you need to get something running with minimal setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  💰 Cost Analysis
&lt;/h2&gt;

&lt;p&gt;While testing each method, I kept track of the costs I saw in my AWS billing dashboard. Running everything manually through the Jupyter notebook (inside SageMaker Studio) ended up costing me less than &lt;strong&gt;$12&lt;/strong&gt; total — even after upgrading to a more expensive instance for training.&lt;/p&gt;

&lt;p&gt;In contrast, SageMaker Canvas cost quite a bit more: about &lt;strong&gt;$49&lt;/strong&gt;. To be fair, a lot of that cost probably came from my repeated attempts to run batch predictions, which ultimately didn’t work but still counted as billed time. If I would have to estimate the cost if things had run smoothly, I'd guess &lt;strong&gt;$10-$20&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Rekognition Custom Labels was by far the cheapest in my experiment: I was only charged &lt;strong&gt;$7.90&lt;/strong&gt;. It’s worth noting, though, that this only covers training costs — &lt;strong&gt;not&lt;/strong&gt; the cost of hosting the model or running real-time inference in production. I’m also curious how well Rekognition pricing scales over time as usage increases.&lt;/p&gt;




&lt;h2&gt;
  
  
  ✅ Final Review &amp;amp; Comparison
&lt;/h2&gt;

&lt;p&gt;Here’s how the three approaches stack up:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Control &amp;amp; Flexibility&lt;/th&gt;
&lt;th&gt;Ease of Use&lt;/th&gt;
&lt;th&gt;Cost in Test&lt;/th&gt;
&lt;th&gt;Portability&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Classic Jupyter + TensorFlow&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐&lt;/td&gt;
&lt;td&gt;~$12&lt;/td&gt;
&lt;td&gt;Can export / host anywhere&lt;/td&gt;
&lt;td&gt;Most setup &amp;amp; coding required; fully customizable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SageMaker Canvas&lt;/td&gt;
&lt;td&gt;⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;~$49(probably actually ~$10-$20)&lt;/td&gt;
&lt;td&gt;Can export model artifact&lt;/td&gt;
&lt;td&gt;Great built-in visualizations; had issues with batch predictions; higher cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rekognition Custom Labels&lt;/td&gt;
&lt;td&gt;⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;~$8&lt;/td&gt;
&lt;td&gt;Must use Rekognition API&lt;/td&gt;
&lt;td&gt;Fastest setup; lowest upfront cost; can't download model; great for proof of concept&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;In the end, each option had its place:&lt;br&gt;&lt;br&gt;
If you want &lt;strong&gt;full control and portability&lt;/strong&gt;, running your own TensorFlow notebooks (even inside SageMaker Studio) still feels best.&lt;br&gt;&lt;br&gt;
If you prefer &lt;strong&gt;no-code training and easy visualization tools&lt;/strong&gt;, Canvas makes it remarkably simple to build, analyze, and deploy models — though at a higher cost and occasional quirks.&lt;br&gt;&lt;br&gt;
And if you just need to &lt;strong&gt;get something working fast&lt;/strong&gt;, Rekognition Custom Labels is incredibly quick to set up and cheap to run — as long as you’re okay relying on AWS’s API for hosting.&lt;/p&gt;

&lt;p&gt;Overall, revisiting this project showed me that today’s cloud tools can save a huge amount of time — but there are still trade-offs in cost, control, and portability. In the next article, I’ll look at deploying these models and providing a usable live demo so you can see them in action.&lt;br&gt;
I’d love to hear if you’ve tried similar experiments, or what your experience has been — drop a comment below!&lt;/p&gt;




</description>
      <category>computervision</category>
      <category>aws</category>
      <category>tensorflow</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
