<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: angu10</title>
    <description>The latest articles on Forem by angu10 (@angu10).</description>
    <link>https://forem.com/angu10</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1044663%2F6f0294a9-2404-41a6-b770-c2062d20a809.png</url>
      <title>Forem: angu10</title>
      <link>https://forem.com/angu10</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/angu10"/>
    <language>en</language>
    <item>
      <title>I fine-tuned Gemma 4 E2B on a Mac to extract 22 fields from Indian GST invoices — privately, at $0 per call.</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Fri, 08 May 2026 04:46:55 +0000</pubDate>
      <link>https://forem.com/angu10/i-fine-tuned-gemma-4-e2b-on-a-mac-to-extract-22-fields-from-indian-gst-invoices-privately-at-0-4pie</link>
      <guid>https://forem.com/angu10/i-fine-tuned-gemma-4-e2b-on-a-mac-to-extract-22-fields-from-indian-gst-invoices-privately-at-0-4pie</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/angu10/the-model-isnt-the-hard-part-the-data-pipeline-i-built-to-teach-gemma-4-e2b-to-read-indian-gst-1bh9" class="crayons-story__hidden-navigation-link"&gt;The model isn’t the hard part: the data pipeline I built to teach Gemma 4 E2B to read Indian GST invoices.&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
      &lt;a href="https://dev.to/angu10/the-model-isnt-the-hard-part-the-data-pipeline-i-built-to-teach-gemma-4-e2b-to-read-indian-gst-1bh9" class="crayons-article__context-note crayons-article__context-note__feed"&gt;&lt;p&gt;Gemma 4 Challenge: Write about Gemma 4 Submission&lt;/p&gt;

&lt;/a&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/angu10" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1044663%2F6f0294a9-2404-41a6-b770-c2062d20a809.png" alt="angu10 profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/angu10" class="crayons-story__secondary fw-medium m:hidden"&gt;
              angu10
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                angu10
                
              
              &lt;div id="story-author-preview-content-3630512" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/angu10" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1044663%2F6f0294a9-2404-41a6-b770-c2062d20a809.png" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;angu10&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/angu10/the-model-isnt-the-hard-part-the-data-pipeline-i-built-to-teach-gemma-4-e2b-to-read-indian-gst-1bh9" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;May 8&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/angu10/the-model-isnt-the-hard-part-the-data-pipeline-i-built-to-teach-gemma-4-e2b-to-read-indian-gst-1bh9" id="article-link-3630512"&gt;
          The model isn’t the hard part: the data pipeline I built to teach Gemma 4 E2B to read Indian GST invoices.
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/gemmachallenge"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;gemmachallenge&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/devchallenge"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;devchallenge&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/gemma"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;gemma&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/angu10/the-model-isnt-the-hard-part-the-data-pipeline-i-built-to-teach-gemma-4-e2b-to-read-indian-gst-1bh9" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;1&lt;span class="hidden s:inline"&gt; reaction&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/angu10/the-model-isnt-the-hard-part-the-data-pipeline-i-built-to-teach-gemma-4-e2b-to-read-indian-gst-1bh9#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            7 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>ai</category>
      <category>dataengineering</category>
      <category>llm</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>The model isn’t the hard part: the data pipeline I built to teach Gemma 4 E2B to read Indian GST invoices.</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Fri, 08 May 2026 04:35:18 +0000</pubDate>
      <link>https://forem.com/angu10/the-model-isnt-the-hard-part-the-data-pipeline-i-built-to-teach-gemma-4-e2b-to-read-indian-gst-1bh9</link>
      <guid>https://forem.com/angu10/the-model-isnt-the-hard-part-the-data-pipeline-i-built-to-teach-gemma-4-e2b-to-read-indian-gst-1bh9</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Write About Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;How I fine-tuned a small Gemma 4 model on a Mac to extract 22 invoice fields privately, and why the data strategy mattered more than the prompt.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I needed to read Indian GST invoices without sending them to an external API every time.&lt;/p&gt;

&lt;p&gt;Gemma 4 E2B is an open multimodal model designed for local and edge deployment, with a 128K context window, native system prompt support, and an instruction-tuned variant that is usable without a giant serving stack. Google positions the small Gemma 4 models as practical for on-device and local workflows, not just as miniatures of the larger models. That made it a good fit for a problem I care about: structured invoice extraction where privacy, cost, and control matter as much as raw quality.&lt;/p&gt;

&lt;p&gt;At my document volume, a hosted model would have been simple to prototype but expensive to normalize around. Roughly speaking, a model like GPT-4o lands around a cent per invoice at this prompt and output length. A local Gemma 4 setup costs time up front, but effectively &lt;code&gt;$0&lt;/code&gt; per call after that.&lt;/p&gt;

&lt;p&gt;My goal was simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;take OCR text from Indian GST invoices&lt;/li&gt;
&lt;li&gt;extract 22 fields into strict JSON&lt;/li&gt;
&lt;li&gt;fine-tune locally instead of paying per document to a hosted model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This was not a benchmark project. It was a practical test of what a small local Gemma 4 model can actually learn.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Gemma 4 E2B was the right model to try
&lt;/h2&gt;

&lt;p&gt;I did not need a general-purpose assistant. I needed a model that could learn a narrow, structured task and run locally.&lt;/p&gt;

&lt;p&gt;That made Gemma 4 E2B interesting for three reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It is small enough to experiment with on local hardware.&lt;/li&gt;
&lt;li&gt;It is capable enough to handle long, messy invoice OCR.&lt;/li&gt;
&lt;li&gt;It is open enough to fine-tune and evaluate honestly.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The instruction-tuned &lt;code&gt;google/gemma-4-E2B-it&lt;/code&gt; model gave me a real starting point, not just a base model that needed a large GPU cluster to become useful.&lt;/p&gt;

&lt;p&gt;I ran LoRA fine-tuning with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model: &lt;code&gt;google/gemma-4-E2B-it.&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;framework: &lt;code&gt;MLX-LM&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;trainable params: &lt;code&gt;7.291M / 4647.450.M&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;trainable fraction: &lt;code&gt;0.157%&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;hardware: Mac&lt;/li&gt;
&lt;li&gt;peak memory during stable runs: roughly &lt;code&gt;12.4 GB&lt;/code&gt; to &lt;code&gt;13.0 GB '.GB&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That was the first encouraging sign. This was na ot theory. A small Gemma 4 model could be trained locally on a real business-shaped extraction task.&lt;/p&gt;

&lt;h2&gt;
  
  
  The task
&lt;/h2&gt;

&lt;p&gt;The extraction target was a strict 22-field JSON schema for Indian GST invoices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;supplier identity&lt;/li&gt;
&lt;li&gt;buyer identity&lt;/li&gt;
&lt;li&gt;invoice number and dates&lt;/li&gt;
&lt;li&gt;place of supply&lt;/li&gt;
&lt;li&gt;HSN or SAC&lt;/li&gt;
&lt;li&gt;description&lt;/li&gt;
&lt;li&gt;taxable value&lt;/li&gt;
&lt;li&gt;tax rates and amounts&lt;/li&gt;
&lt;li&gt;total invoice&lt;/li&gt;
&lt;li&gt;reverse charge&lt;/li&gt;
&lt;li&gt;e-invoice IRN&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The downstream requirement was not "answer roughly correctly." It was:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;valid JSON&lt;/li&gt;
&lt;li&gt;stable field typing&lt;/li&gt;
&lt;li&gt;exact field mapping&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a much harder and more useful task than general summarization.&lt;/p&gt;

&lt;h2&gt;
  
  
  What fine-tuning actually changed
&lt;/h2&gt;

&lt;p&gt;The fastest way to see the difference between a "generic capable model" and a "task-adapted model" is to look at one invoice.&lt;/p&gt;

&lt;p&gt;Before fine-tuning, the baseline model was capable of understanding the document but not disciplined enough to behave like an extractor. In some runs, it produced malformed JSON, mixed reasoning-style text into the answer, or mapped totals into the wrong fields.&lt;/p&gt;

&lt;p&gt;After fine-tuning, the same invoice started producing compact,t structured outputs with the fields it had learned reliably:&lt;/p&gt;

&lt;p&gt;"&lt;code&gt;json&lt;br&gt;
{&lt;br&gt;
  "supplier_name": "Sample Supplier Pvt Ltd",&lt;br&gt;
  “supplier_gstin”: “27XXXXXXXXXX1ZX”,&lt;br&gt;
  “invoice_no”: “INV-001”,&lt;br&gt;
  “invoice_date”: “16-02-2026”,&lt;br&gt;
  “cgst_rate”: 0.09,&lt;br&gt;
  "cgst_amt": 285.3,&lt;br&gt;
  "sgst_amt": 285.3,&lt;br&gt;
  "total_invoice": 3741,&lt;br&gt;
  “igst_rate”: 0.0,&lt;br&gt;
  "igst_amt": 0,&lt;br&gt;
  "reverse_charge": "No"&lt;br&gt;
}&lt;br&gt;
&lt;/code&gt;``&lt;/p&gt;

&lt;p&gt;That did not mean the model was finished. Fields like &lt;code&gt;taxable_value&lt;/code&gt; still need more real training examples to get right, which is where the project is heading next. But it had crossed the line from "general model guessing at documents" to "specialized extractor that can be improved with data."&lt;/p&gt;

&lt;h2&gt;
  
  
  The first version: synthetic data was enough to build the pipeline
&lt;/h2&gt;

&lt;p&gt;I started with synthetic data because I did not have a large labeled corpus of invoices.&lt;/p&gt;

&lt;p&gt;That synthetic pipeline gave me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OCR-like invoice text&lt;/li&gt;
&lt;li&gt;paired 22-field targets&lt;/li&gt;
&lt;li&gt;tax arithmetic coverage&lt;/li&gt;
&lt;li&gt;repeatable training exports&lt;/li&gt;
&lt;li&gt;a way to debug LoRA and evaluation locally&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first clean synthetic-only run looked excellent.&lt;/p&gt;

&lt;p&gt;Validation loss on the synthetic holdout improved from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;0.552&lt;/code&gt; at iteration 1&lt;/li&gt;
&lt;li&gt;to &lt;code&gt;0.024&lt;/code&gt; at iteration 300&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On paper, that looked close to done.&lt;/p&gt;

&lt;p&gt;But synthetic validation was measuring whether the model understood the synthetic world I had created, not whether it understood real invoices from real suppliers.&lt;/p&gt;

&lt;p&gt;That distinction ended up shaping the whole project.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real work was data engineering
&lt;/h2&gt;

&lt;p&gt;The model was not the hard part.&lt;/p&gt;

&lt;p&gt;The hard part was teaching the model what real invoice variance looks like.&lt;/p&gt;

&lt;p&gt;I eventually built the dataset in layers:&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: generic synthetic invoices
&lt;/h3&gt;

&lt;p&gt;These were useful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;schema coverage&lt;/li&gt;
&lt;li&gt;GST arithmetic patterns&lt;/li&gt;
&lt;li&gt;JSON output discipline&lt;/li&gt;
&lt;li&gt;basic extraction behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Layer 2: real annotated invoices
&lt;/h3&gt;

&lt;p&gt;I merged and cleaned real invoice annotations into a single CSV:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;28&lt;/code&gt; real invoices&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;22&lt;/code&gt; unique suppliers&lt;/li&gt;
&lt;li&gt;a mix of PDF and image invoices&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before retraining, I split them into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;20&lt;/code&gt; real train invoices&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;8&lt;/code&gt; real holdout invoices&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That was the first time I had a real evaluation set that could tell me something meaningful.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: Archive-derived layout variants
&lt;/h3&gt;

&lt;p&gt;This was the most important change to the dataset.&lt;/p&gt;

&lt;p&gt;Instead of generating more generic synthetic invoices, I reused the structure of real invoice layouts from an &lt;code&gt;Archive/&lt;/code&gt; folder:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;alternate labels like &lt;code&gt;No.&lt;/code&gt;, &lt;code&gt;Bill No.&lt;/code&gt;, &lt;code&gt;GST No&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;weakly labeled subtotal rows&lt;/li&gt;
&lt;li&gt;dense table layouts&lt;/li&gt;
&lt;li&gt;multiline descriptions&lt;/li&gt;
&lt;li&gt;inconsistent spacing&lt;/li&gt;
&lt;li&gt;subtotal-only item blocks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From those real layouts, I generated synthetic OCR variants that preserved layout difficulty while changing values and identities.&lt;/p&gt;

&lt;p&gt;The final hybrid training mix was:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;250&lt;/code&gt; generic synthetic examples&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;360&lt;/code&gt; Archive-layout variants&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;8&lt;/code&gt; exact real OCR train examples matched back to source documents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Validation used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;8&lt;/code&gt; held-out real invoices&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That was the first dataset composition that looked like a real fine-tuning strategy rather than a synthetic demo.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Gemma 4 E2B actually learned
&lt;/h2&gt;

&lt;p&gt;This is the part I think matters most for anyone considering Gemma 4 for domain adaptation.&lt;/p&gt;

&lt;p&gt;Gemma 4 E2B clearly learned the task.&lt;/p&gt;

&lt;p&gt;Not in the vague sense of "it sounded good," but in the operational sense:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JSON stayed structurally stable&lt;/li&gt;
&lt;li&gt;The model learned invoice field boundaries&lt;/li&gt;
&lt;li&gt;It handled many supplier layouts&lt;/li&gt;
&lt;li&gt;It converged reliably on a Mac with a tiny trainable parameter budget&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most meaningful run was the first hybrid real-holdout run.&lt;/p&gt;

&lt;p&gt;Its validation loss on the real holdout set improved from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;0.786&lt;/code&gt; at iteration 1&lt;/li&gt;
&lt;li&gt;to &lt;code&gt;0.132&lt;/code&gt; at iteration 250&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then I fixed issues in the Archive-variant generator:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;removed annotation-format leakage like &lt;code&gt;@ ₹xxx = ₹xxx&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;parsed real per-line amounts instead of splitting totals evenly&lt;/li&gt;
&lt;li&gt;forced &lt;code&gt;igst = 0&lt;/code&gt; for intra-state invoices in the target JSON&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That produced a slightly better run:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;final real-holdout validation loss: &lt;code&gt;0.130&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I also tried a prompt change specifically aimed at multiline extraction. That got worse, not better:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;final validation loss: &lt;code&gt;0.147&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I would not overstate that result. On an 8-invoice real holdout, that difference is not strong enough to claim that prompt engineering is harmful in general.&lt;/p&gt;

&lt;p&gt;What it did show me is something narrower and more useful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The prompt tweak did not clearly beat the best data-only run&lt;/li&gt;
&lt;li&gt;The biggest gains in this project came from dataset composition, not instruction wording&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What this taught me about Gemma 4
&lt;/h2&gt;

&lt;p&gt;The headline lesson was not that Gemma 4 needed heroic prompt engineering.&lt;/p&gt;

&lt;p&gt;It was that Gemma 4 E2B was already capable enough that the next bottleneck was dataset quality.&lt;/p&gt;

&lt;p&gt;That is a good sign for the model.&lt;/p&gt;

&lt;p&gt;Small models become interesting when they are strong enough that your time moves from "can this model learn the task at all?" to "what data do I need to make it trustworthy?"&lt;/p&gt;

&lt;p&gt;That is where this project ended up.&lt;/p&gt;

&lt;h2&gt;
  
  
  What went wrong, and why it was useful
&lt;/h2&gt;

&lt;p&gt;I do not think a useful Gemma 4 write-up should pretend everything worked on the first run.&lt;/p&gt;

&lt;p&gt;Two failures were especially instructive.&lt;/p&gt;

&lt;h3&gt;
  
  
  Failure 1: overfitting after a great checkpoint
&lt;/h3&gt;

&lt;p&gt;In an earlier run, validation loss got very low and then degraded badly later:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;0.552&lt;/code&gt; at iteration 1&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;0.022&lt;/code&gt; at iteration 200&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;1.397&lt;/code&gt; at iteration 400&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That taught me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The best checkpoint is not necessarily the last checkpoint&lt;/li&gt;
&lt;li&gt;Dense evaluation checkpoints matter&lt;/li&gt;
&lt;li&gt;Small local runs can still overtrain quickly&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Failure 2: NaN training from sequence problems
&lt;/h3&gt;

&lt;p&gt;Another run looked healthy until long examples were truncated. After that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;train loss became &lt;code&gt;nan.&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;validation loss became &lt;code&gt;na.n&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The rest of the run was unusable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That forced me to treat dataset export and sequence control as first-class parts of the training pipeline, not cleanup tasks.&lt;/p&gt;

&lt;p&gt;Both failures improved the eventual Gemma 4 setup more than another round of prompt edits would have.&lt;/p&gt;

&lt;h2&gt;
  
  
  The practical result
&lt;/h2&gt;

&lt;p&gt;I would summarize the outcome this way:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gemma 4 E2B was strong enough to learn a real structured extraction task locally&lt;/li&gt;
&lt;li&gt;LoRA fine-tuning worked within a very small trainable parameter budget&lt;/li&gt;
&lt;li&gt;The model benefited more from better data composition than from prompt tweaking&lt;/li&gt;
&lt;li&gt;Synthetic data was useful for bootstrapping, but real layout variance determined what I could trust&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a very good place for a small, open model.&lt;/p&gt;

&lt;h2&gt;
  
  
  If I were starting again
&lt;/h2&gt;

&lt;p&gt;I would do three things earlier:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a real holdout set before the first "good" run.&lt;/li&gt;
&lt;li&gt;Build layout-derived synthetic variants before scaling generic synthetic data.&lt;/li&gt;
&lt;li&gt;Evaluate field-level errors on real invoices sooner instead of trusting synthetic validation curves.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Those are not just invoice lessons. They are reusable lessons for anyone fine-tuning Gemma 4 on domain-specific extraction tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this made me more optimistic about Gemma 4, not less
&lt;/h2&gt;

&lt;p&gt;The most interesting thing about this project was not that Gemma 4 E2B solved everything immediately.&lt;/p&gt;

&lt;p&gt;It was that a local, open, small model got far enough that the real work shifted to data design, evaluation discipline, and layout coverage.&lt;/p&gt;

&lt;p&gt;That is exactly the kind of capability I want from an open model.&lt;/p&gt;

&lt;p&gt;Not a toy.&lt;br&gt;
Not a benchmark artifact.&lt;br&gt;
A model that is small enough to run locally, but capable enough to deserve serious data engineering.&lt;/p&gt;

&lt;p&gt;For this task, Gemma 4 E2B crossed that line.&lt;/p&gt;

</description>
      <category>gemmachallenge</category>
      <category>devchallenge</category>
      <category>gemma</category>
    </item>
    <item>
      <title>The model looked great on validation until one real invoice broke four assumptions</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Thu, 09 Apr 2026 02:28:43 +0000</pubDate>
      <link>https://forem.com/angu10/the-model-looked-great-on-validation-until-one-real-invoice-broke-four-assumptions-5g2l</link>
      <guid>https://forem.com/angu10/the-model-looked-great-on-validation-until-one-real-invoice-broke-four-assumptions-5g2l</guid>
      <description>&lt;p&gt;&lt;em&gt;An empirical note on what synthetic invoice data taught a Gemma fine-tune, what it hid, and how one real document exposed the gap.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I fine-tuned a small Gemma model to parse Indian invoices because I wanted a path that was cheaper, more private, and easier to deploy than calling a hosted API for every document.&lt;/p&gt;

&lt;p&gt;The training metrics looked excellent.&lt;/p&gt;

&lt;p&gt;Then I ran the model on one real invoice.&lt;/p&gt;

&lt;p&gt;It got the total right, the supplier right, the address right, and still failed in four ways that would make the output unusable in a real finance workflow.&lt;/p&gt;

&lt;p&gt;That invoice was more useful than another few hundred synthetic examples.&lt;/p&gt;

&lt;p&gt;None of the headline conclusions here are new to anyone with ML experience:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;synthetic data has domain gap&lt;/li&gt;
&lt;li&gt;synthetic validation can be overly optimistic&lt;/li&gt;
&lt;li&gt;real data changes what you trust&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What felt worth documenting was the concrete shape of the failure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which fields broke first&lt;/li&gt;
&lt;li&gt;which assumptions in the synthetic distribution caused it&lt;/li&gt;
&lt;li&gt;what the training curves looked like before and after instability&lt;/li&gt;
&lt;li&gt;and which lessons were actually about data, not models&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The setup
&lt;/h2&gt;

&lt;p&gt;I did not have a large labeled invoice corpus, so I started with synthetic data.&lt;/p&gt;

&lt;p&gt;The extraction target was a strict 22-field JSON schema, and the synthetic dataset was large enough to build a real training pipeline. It was not large enough to tell me whether the model understood real invoices.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why validation looked so good
&lt;/h2&gt;

&lt;p&gt;The final stable training run used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model: &lt;code&gt;google/gemma-4-E2B-it&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;framework: &lt;code&gt;MLX-LM 0.31.2&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;trainable params: &lt;code&gt;7.291M / 4647.450M&lt;/code&gt; (&lt;code&gt;0.157%&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;iterations: &lt;code&gt;300&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;learning rate: &lt;code&gt;5e-5&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;num_layers: 8&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;batch_size: 1&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;grad_accumulation_steps: 8&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;max_seq_length: 1536&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It trained cleanly on a Mac with peak memory of about &lt;code&gt;13.677 GB&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Validation loss improved almost monotonically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Iter 1&lt;/code&gt;: &lt;code&gt;0.552&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Iter 50&lt;/code&gt;: &lt;code&gt;0.084&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Iter 100&lt;/code&gt;: &lt;code&gt;0.056&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Iter 150&lt;/code&gt;: &lt;code&gt;0.046&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Iter 200&lt;/code&gt;: &lt;code&gt;0.044&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Iter 250&lt;/code&gt;: &lt;code&gt;0.029&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Iter 300&lt;/code&gt;: &lt;code&gt;0.024&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If all I had looked at was the validation curve, I would have said the model was basically ready.&lt;/p&gt;

&lt;p&gt;That would have been wrong.&lt;/p&gt;

&lt;p&gt;That sentence is obvious in the abstract. It only becomes useful when you can point to the exact fields and failure modes that made it wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  One real invoice broke four assumptions
&lt;/h2&gt;

&lt;p&gt;The invoice came from &lt;code&gt;Jon Doe Print&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The model output looked plausible enough to pass a quick skim:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;supplier name: &lt;code&gt;Jon Doe Print&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;supplier GSTIN: correct format and state code&lt;/li&gt;
&lt;li&gt;supplier address: mostly correct&lt;/li&gt;
&lt;li&gt;invoice number: a plausible-looking integer value&lt;/li&gt;
&lt;li&gt;invoice date: correctly extracted&lt;/li&gt;
&lt;li&gt;total invoice: captured correctly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But the failure table tells the real story:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Model output&lt;/th&gt;
&lt;th&gt;Correct&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;description&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;3D Printed Prototype&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;3D Printed Prototype (Pre filter)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Wrong item identity in downstream categorization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;taxable_value&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;line-item amount&lt;/td&gt;
&lt;td&gt;invoice subtotal&lt;/td&gt;
&lt;td&gt;Wrong amount booked to accounts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;igst_rate&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0.09&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0.0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Wrong tax treatment and downstream GST logic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;reverse_charge&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;No&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Type mismatch that can break strict downstream parsers&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The model also captured some things correctly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;total invoice correctly&lt;/li&gt;
&lt;li&gt;tax amounts correctly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is what made the failure interesting.&lt;/p&gt;

&lt;p&gt;The model was not random. It had learned enough invoice structure to look useful. It just had not learned enough real invoice variance to be trustworthy.&lt;/p&gt;

&lt;p&gt;That distinction is the center of the project.&lt;/p&gt;

&lt;p&gt;The problem was not that the model failed to learn invoice extraction at all.&lt;/p&gt;

&lt;p&gt;The problem was that it learned the synthetic version of invoice extraction more faithfully than the real one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The four assumptions that invoice broke
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. I assumed subtotal rows would be easy to identify
&lt;/h3&gt;

&lt;p&gt;The invoice had multiple line items.&lt;/p&gt;

&lt;p&gt;The model extracted a line-level amount as &lt;code&gt;taxable_value&lt;/code&gt; instead of the invoice subtotal row.&lt;/p&gt;

&lt;p&gt;In a synthetic dataset, subtotal rows are easy to standardize:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;same position&lt;/li&gt;
&lt;li&gt;same label family&lt;/li&gt;
&lt;li&gt;same spacing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In real invoices, subtotal rows compete with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;unit prices&lt;/li&gt;
&lt;li&gt;per-line totals&lt;/li&gt;
&lt;li&gt;tax-inclusive values&lt;/li&gt;
&lt;li&gt;noisy formatting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model had learned “there is a number near the items.” It had not learned “this is the subtotal row that should override the line-level values.”&lt;/p&gt;

&lt;h3&gt;
  
  
  2. I assumed the model would map visible tax rates to the right field
&lt;/h3&gt;

&lt;p&gt;The supplier and place of supply were both in the same state:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;supplier GSTIN state code matched the place-of-supply state code&lt;/li&gt;
&lt;li&gt;the invoice was intra-state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So this was an intra-state invoice.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;CGST &amp;gt; 0&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;SGST &amp;gt; 0&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;IGST = 0&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model still output:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;igst_rate = 0.09&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a subtle but important failure.&lt;/p&gt;

&lt;p&gt;It saw a printed &lt;code&gt;18%&lt;/code&gt; tax context on the invoice and mapped that rate into the wrong slot.&lt;/p&gt;

&lt;p&gt;That is not an arithmetic problem. It is a field-to-concept mapping problem.&lt;/p&gt;

&lt;p&gt;Synthetic data had taught the model what tax fields exist. It had not sufficiently taught it how to disambiguate them when the invoice layout was less explicit.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. I assumed missing fields would default safely
&lt;/h3&gt;

&lt;p&gt;The model returned:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;reverse_charge = 0&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The correct value was:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;reverse_charge = "No"&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This looks small until you think about how these systems get deployed.&lt;/p&gt;

&lt;p&gt;If the downstream consumer expects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a strict string enum&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and gets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a number&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;you now have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;broken JSON contracts&lt;/li&gt;
&lt;li&gt;parser failures&lt;/li&gt;
&lt;li&gt;brittle rule-engine behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model did not just guess the wrong value. It guessed the wrong type.&lt;/p&gt;

&lt;p&gt;That is a very different category of failure.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. I assumed synthetic layout diversity was enough
&lt;/h3&gt;

&lt;p&gt;The invoice format differed from the synthetic training distribution in small ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;weaker or alternate labels&lt;/li&gt;
&lt;li&gt;less structured spacing&lt;/li&gt;
&lt;li&gt;no clean field presentation for some values&lt;/li&gt;
&lt;li&gt;multi-line item complexity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of those differences are dramatic in isolation.&lt;/p&gt;

&lt;p&gt;Together, they were enough to push the model into the wrong extraction path.&lt;/p&gt;

&lt;p&gt;That is the real problem with synthetic validation:&lt;/p&gt;

&lt;p&gt;You can cover many business scenarios while still under-covering format variance.&lt;/p&gt;

&lt;p&gt;The model learns the contract of the synthetic world very well.&lt;/p&gt;

&lt;p&gt;Then one real document shows you which parts of the world your contract forgot to mention.&lt;/p&gt;

&lt;h2&gt;
  
  
  Synthetic data did help. Just not in the way validation suggested.
&lt;/h2&gt;

&lt;p&gt;This is the part that matters most.&lt;/p&gt;

&lt;p&gt;The synthetic data was not a waste.&lt;/p&gt;

&lt;p&gt;It gave me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a working training loop&lt;/li&gt;
&lt;li&gt;a rendered dataset&lt;/li&gt;
&lt;li&gt;stable checkpointing&lt;/li&gt;
&lt;li&gt;a measurable extraction task&lt;/li&gt;
&lt;li&gt;a way to iterate cheaply&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And it gave the model enough structure to learn the task.&lt;/p&gt;

&lt;p&gt;The training curve from the final stable run proves that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Val loss 0.552 -&amp;gt; 0.024&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is real learning.&lt;/p&gt;

&lt;p&gt;But the real invoice test showed what that learning actually meant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the model learned the schema&lt;/li&gt;
&lt;li&gt;it did not yet learn the full shape of real-world invoices&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That distinction is the whole article.&lt;/p&gt;

&lt;p&gt;Synthetic data was useful because it taught the model the contract.&lt;/p&gt;

&lt;p&gt;The real invoice exposed the parts of the contract that were underspecified.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two failed runs were part of the lesson too
&lt;/h2&gt;

&lt;p&gt;Before the stable run, I had two failed runs that made the later result more believable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run 1: the overfit run
&lt;/h3&gt;

&lt;p&gt;The first successful run had a strong early checkpoint and then degraded badly.&lt;/p&gt;

&lt;p&gt;Validation loss:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Iter 1&lt;/code&gt;: &lt;code&gt;0.552&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Iter 200&lt;/code&gt;: &lt;code&gt;0.022&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Iter 400&lt;/code&gt;: &lt;code&gt;1.397&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Iter 500&lt;/code&gt;: &lt;code&gt;0.122&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model got to a very good point by &lt;code&gt;iter 200&lt;/code&gt;, then drifted away from it.&lt;/p&gt;

&lt;p&gt;That run taught me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the best checkpoint is not necessarily the last checkpoint&lt;/li&gt;
&lt;li&gt;a constant aggressive learning rate on a small synthetic dataset can destroy a good run after it already succeeded&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Run 2: the NaN run
&lt;/h3&gt;

&lt;p&gt;The second run looked healthier until sequence-length issues showed up.&lt;/p&gt;

&lt;p&gt;At &lt;code&gt;iter 150&lt;/code&gt;, the log warned:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[WARNING] Some sequences are longer than 1536 tokens.
The longest sentence 1973 will be truncated to 1536.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Immediately afterward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Train loss nan&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;then &lt;code&gt;Val loss nan&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;then the rest of the run stayed corrupted&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The last clean checkpoint in that run was &lt;code&gt;iter 100&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That run taught me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;token limits are not just throughput constraints&lt;/li&gt;
&lt;li&gt;one bad sample can invalidate the rest of a training run&lt;/li&gt;
&lt;li&gt;“the run finished” is not the same as “the run is usable”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those failures are worth mentioning because they stop the final result from sounding cleaner than it really was.&lt;/p&gt;

&lt;p&gt;They also explain why the stable v3 run is more believable than it would be in isolation. The earlier runs failed in concrete, diagnosable ways.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the project actually proved
&lt;/h2&gt;

&lt;p&gt;It did not prove that synthetic data is enough.&lt;/p&gt;

&lt;p&gt;It proved three narrower things:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Synthetic data is excellent for bootstrapping a structured extraction task
&lt;/h3&gt;

&lt;p&gt;It gave me scale, perfect labels, and scenario coverage fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Validation on synthetic data can dramatically overstate readiness
&lt;/h3&gt;

&lt;p&gt;The model’s synthetic metrics looked excellent before the real-invoice test exposed field-mapping failures.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. A small real corpus is disproportionately valuable
&lt;/h3&gt;

&lt;p&gt;The single real invoice I tested taught me more about generalization than another hundred synthetic invoices would have.&lt;/p&gt;

&lt;p&gt;That is not because synthetic data is bad.&lt;/p&gt;

&lt;p&gt;It is because synthetic data and real data teach different things:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Synthetic data teaches&lt;/th&gt;
&lt;th&gt;Real data teaches&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;schema&lt;/td&gt;
&lt;td&gt;variance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;business scenarios&lt;/td&gt;
&lt;td&gt;layout ambiguity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;output format discipline&lt;/td&gt;
&lt;td&gt;how documents actually break&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;scale&lt;/td&gt;
&lt;td&gt;trust&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What I would change next
&lt;/h2&gt;

&lt;p&gt;Only three things matter now:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Build a small real-invoice gold set and make it part of evaluation immediately.
The main gap here was format variance, not business-rule coverage.&lt;/li&gt;
&lt;li&gt;Add real invoices into training earlier instead of trying to synthetic my way out of layout variance.
The real-invoice failure was a distribution problem, not a parameter-count problem.&lt;/li&gt;
&lt;li&gt;Strengthen JSON and type constraints so missing fields fail safely instead of defaulting to &lt;code&gt;0&lt;/code&gt;.
The &lt;code&gt;reverse_charge = 0&lt;/code&gt; output is the kind of bug that looks cosmetic in a notebook and expensive in a real pipeline.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Everything else is secondary.&lt;/p&gt;

&lt;h2&gt;
  
  
  The lesson I am keeping
&lt;/h2&gt;

&lt;p&gt;Synthetic data got the model to the point where it knew what an invoice parser is supposed to do.&lt;/p&gt;

&lt;p&gt;One real invoice showed me what it still did not understand.&lt;/p&gt;

&lt;p&gt;That is the difference.&lt;/p&gt;

&lt;p&gt;Synthetic data teaches the task.&lt;/p&gt;

&lt;p&gt;Real data teaches the world.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>finetuning</category>
      <category>python</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Building CloudGuide: A Real-Time AI Assistant for Navigating Google Cloud</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Fri, 13 Mar 2026 05:34:40 +0000</pubDate>
      <link>https://forem.com/angu10/building-cloudguide-a-real-time-ai-assistant-for-navigating-google-cloud-m8b</link>
      <guid>https://forem.com/angu10/building-cloudguide-a-real-time-ai-assistant-for-navigating-google-cloud-m8b</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Cloud platforms are incredibly powerful — but navigating them can be confusing even for experienced developers.&lt;/p&gt;

&lt;p&gt;Recently, while setting up a static website on Google Cloud Storage, I realized how easy it is to make small mistakes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A permission checkbox hidden deep in the UI&lt;/li&gt;
&lt;li&gt;A configuration buried under another menu&lt;/li&gt;
&lt;li&gt;A setting that appears correct but fails silently&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In many cases, developers spend more time &lt;strong&gt;searching the console UI&lt;/strong&gt; than actually building their application.&lt;/p&gt;

&lt;p&gt;That led to a question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What if an AI assistant could watch your screen, listen to your question, and guide you step‑by‑step through cloud configuration?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That idea became &lt;strong&gt;CloudGuide&lt;/strong&gt;, a multimodal AI agent built with &lt;strong&gt;Google AI models and Google Cloud&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This project was developed specifically for the &lt;strong&gt;Google AI Hackathon&lt;/strong&gt;, and this post explains how it works under the hood.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Idea
&lt;/h2&gt;

&lt;p&gt;CloudGuide is a &lt;strong&gt;voice-enabled AI assistant that helps users configure cloud resources in real time&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of reading documentation or watching tutorials, users can simply say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Help me deploy a website on Google Cloud.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The AI agent then:&lt;/p&gt;

&lt;p&gt;• Watches the user’s screen&lt;br&gt;&lt;br&gt;
• Listens to voice input&lt;br&gt;&lt;br&gt;
• Speaks instructions&lt;br&gt;&lt;br&gt;
• Highlights UI elements to click&lt;br&gt;&lt;br&gt;
• Verifies steps using real Google Cloud APIs  &lt;/p&gt;

&lt;p&gt;The goal is to turn complex cloud configuration into a &lt;strong&gt;guided interactive experience&lt;/strong&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  Key Capabilities
&lt;/h2&gt;

&lt;p&gt;CloudGuide combines several multimodal capabilities:&lt;/p&gt;
&lt;h3&gt;
  
  
  Screen Understanding
&lt;/h3&gt;

&lt;p&gt;The system captures periodic screenshots of the user's screen and sends them to the AI model for interpretation.&lt;/p&gt;
&lt;h3&gt;
  
  
  Voice Interaction
&lt;/h3&gt;

&lt;p&gt;Users communicate naturally through a microphone, asking questions or requesting help.&lt;/p&gt;
&lt;h3&gt;
  
  
  Real-Time Voice Responses
&lt;/h3&gt;

&lt;p&gt;The AI responds with &lt;strong&gt;native audio output&lt;/strong&gt; using Gemini’s audio model.&lt;/p&gt;
&lt;h3&gt;
  
  
  API Grounding
&lt;/h3&gt;

&lt;p&gt;Instead of trusting screenshots alone, the system verifies actions using the &lt;strong&gt;Google Cloud Storage API&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  Visual Click Guidance
&lt;/h3&gt;

&lt;p&gt;The system highlights the exact UI element users need to click.&lt;/p&gt;

&lt;p&gt;This dramatically reduces confusion when navigating complex cloud interfaces.&lt;/p&gt;


&lt;h2&gt;
  
  
  System Architecture
&lt;/h2&gt;

&lt;p&gt;The system consists of three primary layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Client&lt;/li&gt;
&lt;li&gt;Backend&lt;/li&gt;
&lt;li&gt;AI Model&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  Client (User Machine)
&lt;/h3&gt;

&lt;p&gt;The client runs locally and handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Screen capture&lt;/li&gt;
&lt;li&gt;Microphone input&lt;/li&gt;
&lt;li&gt;Speaker output&lt;/li&gt;
&lt;li&gt;Browser highlighting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Technologies used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;mss&lt;/strong&gt; – screen capture&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;pyaudio&lt;/strong&gt; – audio streaming&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Playwright + Chrome DevTools Protocol&lt;/strong&gt; – UI highlighting&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The client streams screenshots and audio to the backend via WebSocket.&lt;/p&gt;


&lt;h3&gt;
  
  
  Backend (Google Cloud Run)
&lt;/h3&gt;

&lt;p&gt;The backend is deployed on &lt;strong&gt;Google Cloud Run&lt;/strong&gt; and built with &lt;strong&gt;FastAPI&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Responsibilities include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Managing WebSocket connections&lt;/li&gt;
&lt;li&gt;Streaming data to the AI model&lt;/li&gt;
&lt;li&gt;Executing tool calls&lt;/li&gt;
&lt;li&gt;Verifying cloud configuration through APIs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All multimodal input flows through a request queue before being sent to the AI model.&lt;/p&gt;


&lt;h3&gt;
  
  
  AI Model (Gemini Live)
&lt;/h3&gt;

&lt;p&gt;CloudGuide uses the &lt;strong&gt;Gemini Live API&lt;/strong&gt; with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gemini-2.5-flash-native-audio-latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This model supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real-time audio streaming&lt;/li&gt;
&lt;li&gt;Image understanding&lt;/li&gt;
&lt;li&gt;Tool calling&lt;/li&gt;
&lt;li&gt;Natural voice output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model processes &lt;strong&gt;audio and screenshots together&lt;/strong&gt; within a single streaming session.&lt;/p&gt;

&lt;p&gt;This creates a natural conversational experience.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tool Calling and API Grounding
&lt;/h2&gt;

&lt;p&gt;One challenge with vision-based AI systems is that screenshots can be misleading.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;A UI might show a resource as created even though the underlying API operation failed.&lt;/p&gt;

&lt;p&gt;To solve this, CloudGuide uses &lt;strong&gt;tool calling&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Example tools include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;check_bucket&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;list_bucket_files&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;check_bucket_permissions&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;diagnose_bucket_issues&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These tools query the &lt;strong&gt;Google Cloud Storage API&lt;/strong&gt; directly.&lt;/p&gt;

&lt;p&gt;This allows the AI agent to verify that each step actually worked.&lt;/p&gt;




&lt;h2&gt;
  
  
  Visual UI Guidance
&lt;/h2&gt;

&lt;p&gt;Another major feature is &lt;strong&gt;element highlighting&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Before asking the user to click something, the system highlights the UI element.&lt;/p&gt;

&lt;p&gt;A pulsing rectangle appears around the button or input field.&lt;/p&gt;

&lt;p&gt;This is implemented using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Playwright&lt;/li&gt;
&lt;li&gt;Chrome DevTools Protocol&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The backend sends highlight instructions, and the client injects an overlay into the browser.&lt;/p&gt;




&lt;h2&gt;
  
  
  Challenges Encountered
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Audio Feedback Loops
&lt;/h3&gt;

&lt;p&gt;When the AI speaks through speakers, the microphone can pick up the sound and send it back to the model.&lt;/p&gt;

&lt;p&gt;This creates a feedback loop where the model responds to itself.&lt;/p&gt;

&lt;p&gt;Using headphones mitigates this issue, but future improvements could include built‑in echo cancellation.&lt;/p&gt;




&lt;h3&gt;
  
  
  Voice Activity Detection
&lt;/h3&gt;

&lt;p&gt;The Live API sometimes interprets background noise as speech.&lt;/p&gt;

&lt;p&gt;Adding voice activity detection would improve reliability.&lt;/p&gt;




&lt;h3&gt;
  
  
  UI Changes
&lt;/h3&gt;

&lt;p&gt;Cloud interfaces evolve frequently.&lt;/p&gt;

&lt;p&gt;During development, certain UI paths moved or appeared differently than documented, which required adjustments to the workflow detection logic.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deployment
&lt;/h2&gt;

&lt;p&gt;The backend runs on &lt;strong&gt;Google Cloud Run&lt;/strong&gt; and is deployed through a CI/CD pipeline using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cloud Build&lt;/li&gt;
&lt;li&gt;Container Registry&lt;/li&gt;
&lt;li&gt;Docker&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Deployment is automated through a simple script that builds and deploys the backend service.&lt;/p&gt;

&lt;p&gt;This makes the system scalable and accessible from anywhere.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;Building this project revealed several key insights:&lt;/p&gt;

&lt;h3&gt;
  
  
  Multimodal AI is powerful
&lt;/h3&gt;

&lt;p&gt;Combining screen understanding with voice interaction creates a much more natural interface.&lt;/p&gt;

&lt;h3&gt;
  
  
  API grounding improves reliability
&lt;/h3&gt;

&lt;p&gt;Vision alone is not enough. Verifying system state using APIs is essential.&lt;/p&gt;

&lt;h3&gt;
  
  
  Voice interfaces still need infrastructure improvements
&lt;/h3&gt;

&lt;p&gt;Capabilities like echo cancellation and voice activity detection will make voice agents significantly more robust.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Cloud platforms are incredibly capable, but their complexity often slows developers down.&lt;/p&gt;

&lt;p&gt;Projects like CloudGuide explore a new paradigm:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI agents that guide users through complex systems in real time.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;By combining:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Google AI models&lt;/li&gt;
&lt;li&gt;Gemini Live streaming&lt;/li&gt;
&lt;li&gt;Google Cloud Run&lt;/li&gt;
&lt;li&gt;Real API grounding&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;we can create assistants that truly understand what users are doing.&lt;/p&gt;

&lt;p&gt;This project was built using &lt;strong&gt;Google AI models and Google Cloud services&lt;/strong&gt; and the accompanying content was created specifically for entering the &lt;strong&gt;Google AI Hackathon&lt;/strong&gt;.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>geminiliveagentchallenge</category>
      <category>python</category>
      <category>gcp</category>
    </item>
    <item>
      <title>Stop Print Debugging Your AI Agents: A Deep Dive into Agent Observability</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Sun, 04 Jan 2026 05:15:46 +0000</pubDate>
      <link>https://forem.com/angu10/stop-print-debugging-your-ai-agents-a-deep-dive-into-agent-observability-29eo</link>
      <guid>https://forem.com/angu10/stop-print-debugging-your-ai-agents-a-deep-dive-into-agent-observability-29eo</guid>
      <description>&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The Invisible Agent Problem&lt;/li&gt;
&lt;li&gt;Why Current Solutions Fall Short&lt;/li&gt;
&lt;li&gt;Introducing Agent Recorder&lt;/li&gt;
&lt;li&gt;How It Works: Technical Deep Dive&lt;/li&gt;
&lt;li&gt;Real-World Use Cases&lt;/li&gt;
&lt;li&gt;Comparing Approaches&lt;/li&gt;
&lt;li&gt;Building Production-Ready Agents&lt;/li&gt;
&lt;li&gt;The Road Ahead&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Invisible Agent Problem
&lt;/h2&gt;

&lt;p&gt;It's 2 AM. Your AI agent just went into an infinite loop consuming API credits. Again.&lt;/p&gt;

&lt;p&gt;You've built what should be a simple customer service agent:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Parse user question&lt;/li&gt;
&lt;li&gt;Search knowledge base&lt;/li&gt;
&lt;li&gt;Query database if needed&lt;/li&gt;
&lt;li&gt;Format response&lt;/li&gt;
&lt;li&gt;Maybe escalate to human support&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Simple, right? Except somewhere in those 5 steps, your agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Called the same database query 15 times&lt;/li&gt;
&lt;li&gt;Got stuck in a loop asking the LLM to "try again"&lt;/li&gt;
&lt;li&gt;Hallucinated data that doesn't exist&lt;/li&gt;
&lt;li&gt;Crashed with a cryptic error in step 4&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And you have &lt;strong&gt;no idea which one&lt;/strong&gt; until you start debugging.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Print Statement Spiral
&lt;/h3&gt;

&lt;p&gt;So you do what every developer does. You add logging:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[DEBUG] Calling LLM with: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[DEBUG] LLM took &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[DEBUG] Got response: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_database&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[DEBUG] Searching DB: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[DEBUG] Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_customer_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[DEBUG] Getting customer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;customer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[DEBUG] Customer: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Unknown&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;customer&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An hour later, your terminal looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[DEBUG] Calling LLM with: Find all orders for customer John Smith...
[DEBUG] LLM took 1.23s
[DEBUG] Got response: I'll search for that customer...
[DEBUG] Searching DB: customer_name=John Smith
[DEBUG] Found 2 results
[DEBUG] Getting customer 123
[DEBUG] Customer: John Smith
[DEBUG] Calling LLM with: Here are the customer details: {'id': 123...
[DEBUG] LLM took 0.87s
[DEBUG] Got response: Let me get their orders...
[DEBUG] Searching DB: orders WHERE customer_id=123
[DEBUG] Found 3 results
[DEBUG] Calling LLM with: Here are the orders: [{'id': 1001, 'to...
[DEBUG] LLM took 1.45s
[DEBUG] Got response: The customer has 3 orders...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You're staring at hundreds of lines of logs trying to answer basic questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How many times did we call the LLM?&lt;/li&gt;
&lt;li&gt;What was the total execution time?&lt;/li&gt;
&lt;li&gt;Which step failed?&lt;/li&gt;
&lt;li&gt;What were the actual arguments passed to each function?&lt;/li&gt;
&lt;li&gt;When did it start looping?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;This is not sustainable.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Real Cost of Poor Observability
&lt;/h3&gt;

&lt;p&gt;Let me share some real numbers from my experience building AI agents:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time Spent Debugging:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Print debugging: 2-4 hours per bug&lt;/li&gt;
&lt;li&gt;Adding proper logging: 30 minutes per function&lt;/li&gt;
&lt;li&gt;Actually finding the bug: 15 minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total:&lt;/strong&gt; 3-5 hours for issues that should take 15 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Developer Frustration:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Losing context between debugging sessions&lt;/li&gt;
&lt;li&gt;Unable to reproduce issues&lt;/li&gt;
&lt;li&gt;No way to compare "working" vs "broken" runs&lt;/li&gt;
&lt;li&gt;Every new team member asks: "How do I debug this?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;API Inefficiency:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agents making 3x more API calls than necessary&lt;/li&gt;
&lt;li&gt;Inefficient prompts using excessive tokens&lt;/li&gt;
&lt;li&gt;Unable to identify performance bottlenecks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We've spent decades building amazing developer tools for web apps, mobile apps, backend services. But for AI agents? We're back to &lt;code&gt;print()&lt;/code&gt; statements like it's 1995.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Current Solutions Fall Short
&lt;/h2&gt;

&lt;p&gt;Before building Agent Recorder, I tried everything:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Standard Logging Libraries
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;

&lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getLogger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Calling LLM with prompt: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Got response: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Problems:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Still just text logs in a file&lt;/li&gt;
&lt;li&gt;No structure, no visualization&lt;/li&gt;
&lt;li&gt;Manual instrumentation everywhere&lt;/li&gt;
&lt;li&gt;Hard to correlate across async calls&lt;/li&gt;
&lt;li&gt;No timing information without extra code&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Cloud Observability Tools (DataDog, New Relic, etc.)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problems:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Expensive for small teams and individuals&lt;/li&gt;
&lt;li&gt;Send your prompts/responses to third-party servers (security issue)&lt;/li&gt;
&lt;li&gt;Heavy SDKs that bloat your dependencies&lt;/li&gt;
&lt;li&gt;Designed for traditional apps, not agent workflows&lt;/li&gt;
&lt;li&gt;Over-engineered for "just see what my agent did"&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. LLM Provider Dashboards (OpenAI, Anthropic)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problems:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only see LLM calls, not your tool calls&lt;/li&gt;
&lt;li&gt;No local context (what led to this call?)&lt;/li&gt;
&lt;li&gt;Delayed (not real-time)&lt;/li&gt;
&lt;li&gt;Can't see your custom logic&lt;/li&gt;
&lt;li&gt;Vendor lock-in&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Framework-Specific Tools (LangSmith for LangChain)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problems:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only works with that framework&lt;/li&gt;
&lt;li&gt;Requires rewriting code to use their patterns&lt;/li&gt;
&lt;li&gt;Still cloud-based with subscription fees&lt;/li&gt;
&lt;li&gt;What if you use raw APIs or multiple frameworks?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What I needed was simple:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;See every LLM call and tool call&lt;/li&gt;
&lt;li&gt;Local storage (my data, my machine)&lt;/li&gt;
&lt;li&gt;Framework-agnostic (works with anything)&lt;/li&gt;
&lt;li&gt;Minimal code changes&lt;/li&gt;
&lt;li&gt;Beautiful visualization&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Free and open source&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That tool didn't exist. So I built it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing Agent Recorder
&lt;/h2&gt;

&lt;p&gt;Agent Recorder is &lt;strong&gt;Redux DevTools for AI agents&lt;/strong&gt;. If you've ever used Redux DevTools for React development, you know the power of seeing every action, every state change, with the ability to inspect, time-travel, and understand your application flow.&lt;/p&gt;

&lt;p&gt;Now imagine that, but for your AI agent's execution.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Two-Decorator Solution
&lt;/h3&gt;

&lt;p&gt;Here's all you need to add to your code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agent_recorder&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;llm_call&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;

&lt;span class="nd"&gt;@llm_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer-service-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

&lt;span class="nd"&gt;@tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer-service-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_database&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;

&lt;span class="nd"&gt;@tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer-service-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_customer_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM orders WHERE customer_id = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No context managers, no complex setup, no configuration files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The &lt;code&gt;run_name&lt;/code&gt; parameter groups related calls together.&lt;/strong&gt; All functions decorated with &lt;code&gt;run_name="customer-service-agent"&lt;/code&gt; will be recorded in the same timeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Gets Captured Automatically
&lt;/h3&gt;

&lt;p&gt;Every decorated function automatically logs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Function name&lt;/strong&gt; - What was called&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Arguments&lt;/strong&gt; - All input parameters with their values&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Return value&lt;/strong&gt; - Complete output from the function&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Duration&lt;/strong&gt; - Execution time in milliseconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Timestamp&lt;/strong&gt; - Exact time of invocation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Errors&lt;/strong&gt; - Full exception details if it failed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parent tracking&lt;/strong&gt; - For nested function calls&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;No manual annotation needed.&lt;/strong&gt; Just add the decorator.&lt;/p&gt;

&lt;h3&gt;
  
  
  Running Your Agent
&lt;/h3&gt;

&lt;p&gt;Use your functions exactly as before:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This is your agent logic - unchanged!
&lt;/span&gt;&lt;span class="n"&gt;user_question&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find all orders for customer John Smith&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Step 1: Ask LLM to understand the query
&lt;/span&gt;&lt;span class="n"&gt;intent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User asks: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Step 2: Search for the customer
&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_database&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer_name=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;John Smith&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Step 3: Get their orders
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;customer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_customer_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 4: Summarize results
&lt;/span&gt;    &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;call_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Summarize these orders: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything is being recorded in the background.&lt;/p&gt;

&lt;h3&gt;
  
  
  Viewing the Timeline
&lt;/h3&gt;

&lt;p&gt;When your agent finishes (or crashes), run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agent-recorder view latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your browser opens to a beautiful web-based timeline showing the complete execution flow.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works: Technical Deep Dive
&lt;/h2&gt;

&lt;p&gt;Let me walk you through the architecture and implementation details.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Decorator-Based Instrumentation
&lt;/h3&gt;

&lt;p&gt;When you write:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@llm_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's what happens under the hood:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Registry Lookup:&lt;/strong&gt; Agent Recorder checks if a &lt;code&gt;Recorder&lt;/code&gt; instance exists for &lt;code&gt;"my-agent"&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-Creation:&lt;/strong&gt; If not, it creates one with a unique run ID (timestamp + UUID)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Function Wrapping:&lt;/strong&gt; Your function gets wrapped with timing and logging logic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution:&lt;/strong&gt; When called, it captures args, executes the function, captures the result&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Event Writing:&lt;/strong&gt; Writes a structured event to a JSONL file immediately&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The actual implementation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;llm_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
             &lt;span class="n"&gt;capture_args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capture_result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Get or create a Recorder instance for this run_name
&lt;/span&gt;    &lt;span class="n"&gt;recorder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_get_or_create_recorder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Return the actual decorator that wraps your function
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;recorder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;llm_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capture_args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;capture_args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="n"&gt;capture_result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;capture_result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Event Storage Format
&lt;/h3&gt;

&lt;p&gt;All events are stored as &lt;strong&gt;JSONL&lt;/strong&gt; (JSON Lines) - one JSON object per line. This format is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Streamable:&lt;/strong&gt; Can write events as they happen&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parseable:&lt;/strong&gt; Easy to read line-by-line&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Crash-resistant:&lt;/strong&gt; If your program crashes, all events up to that point are saved&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tooling-friendly:&lt;/strong&gt; Standard format used by many data tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example event:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"run_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"20260103_192705_c2207bde"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"event_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"4f85a880-2ab7-45bf-a0ba-9c776581a5de"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-01-03T19:27:06.097562"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"llm_call"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"parent_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"function_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"call_llm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"User asks: Find all orders for customer John Smith"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"duration_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;760&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"result"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"I'll help you find customer information. Let me search the database."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Storage location:&lt;/strong&gt; &lt;code&gt;~/.agent-recorder/runs/&amp;lt;run_id&amp;gt;.jsonl&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Event Types
&lt;/h3&gt;

&lt;p&gt;Agent Recorder tracks 5 event types:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;run_start&lt;/strong&gt; - Marks the beginning of a run
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"run_start"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"customer-service-agent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"run_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"20260103_192705_c2207bde"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-01-03T19:27:05.337192"&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;llm_call&lt;/strong&gt; - LLM function execution
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"llm_call"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"function_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"call_llm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"result"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"duration_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1234&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;tool_call&lt;/strong&gt; - Tool function execution
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tool_call"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"function_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"search_database"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"result"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"duration_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;340&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;error&lt;/strong&gt; - Exception that occurred
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"error_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ValueError"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Customer not found"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
       &lt;/span&gt;&lt;span class="nl"&gt;"traceback"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;run_end&lt;/strong&gt; - Marks completion (optional in v0.1.1)&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  4. Async Support
&lt;/h3&gt;

&lt;p&gt;The same decorators work seamlessly with async functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@llm_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;async-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_llm_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;openai_async&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

&lt;span class="nd"&gt;@tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;async-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AsyncClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.weather.com/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Use with asyncio
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;call_llm_async&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the weather in SF?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;weather&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;San Francisco&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agent Recorder detects if your function is a coroutine and handles it appropriately.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Web Viewer Architecture
&lt;/h3&gt;

&lt;p&gt;The viewer is a &lt;strong&gt;self-contained HTML file&lt;/strong&gt; with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No external dependencies (no CDN calls)&lt;/li&gt;
&lt;li&gt;Vanilla JavaScript for parsing JSONL&lt;/li&gt;
&lt;li&gt;CSS for the timeline UI&lt;/li&gt;
&lt;li&gt;Syntax highlighting for JSON data&lt;/li&gt;
&lt;li&gt;Collapsible event cards&lt;/li&gt;
&lt;li&gt;Search and filter capabilities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you run &lt;code&gt;agent-recorder view latest&lt;/code&gt;, it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Finds the latest run in &lt;code&gt;~/.agent-recorder/runs/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Starts a local HTTP server (default port 8765)&lt;/li&gt;
&lt;li&gt;Serves the HTML viewer + JSONL data&lt;/li&gt;
&lt;li&gt;Opens your browser to &lt;code&gt;http://localhost:8765/runs/&amp;lt;run_id&amp;gt;.html&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Everything stays local.&lt;/strong&gt; No data leaves your machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Use Cases
&lt;/h2&gt;

&lt;p&gt;Let me show you how Agent Recorder solves actual problems I've encountered.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Case 1: Debugging Infinite Loops
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt; Agent keeps calling the same tool over and over.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Without Agent Recorder:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[DEBUG] Calling search_database with query: customer_name='John'
[DEBUG] Got 0 results
[DEBUG] Calling LLM...
[DEBUG] LLM says: Let me search again
[DEBUG] Calling search_database with query: customer_name='John'
[DEBUG] Got 0 results
[DEBUG] Calling LLM...
[DEBUG] LLM says: Let me search again
... (500 more lines)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You have to manually count log lines and realize it's looping.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With Agent Recorder:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Open the timeline and immediately see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. llm_call - "Find customer John"
2. tool_call - search_database(query="customer_name='John'") → []
3. llm_call - "I got no results, let me try again"
4. tool_call - search_database(query="customer_name='John'") → []
5. llm_call - "I got no results, let me try again"
6. tool_call - search_database(query="customer_name='John'") → []
... (pattern visible immediately)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; The database query is wrong (should be &lt;code&gt;customer_name='John Smith'&lt;/code&gt;). Also, the LLM needs explicit instruction to stop after 1 failed attempt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time saved:&lt;/strong&gt; 2 hours → 5 minutes&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Case 2: Performance Optimization
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt; Agent is slow but you don't know which part.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With Agent Recorder:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Look at the timeline durations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. llm_call - 1.2s ⚡ (acceptable)
2. tool_call - search_database - 3.8s 🐌 (SLOW!)
3. tool_call - get_orders - 0.4s ⚡
4. llm_call - 0.9s ⚡
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Add a database index on &lt;code&gt;customer_name&lt;/code&gt;. Duration drops to 0.2s.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Total execution time: 6.3s → 2.7s (57% faster)&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Case 3: Token Usage Optimization
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt; High API usage, unclear why.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With Agent Recorder:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Export the run to JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agent-recorder &lt;span class="nb"&gt;export&lt;/span&gt; &amp;lt;run_id&amp;gt; &lt;span class="nt"&gt;-o&lt;/span&gt; run.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Write a quick script to analyze:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;total_prompt_length&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;total_calls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;run.json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;events&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;llm_call&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;total_calls&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;args&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;total_prompt_length&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Total LLM calls: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;total_calls&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Average prompt length: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;total_prompt_length&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;total_calls&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Discovery:&lt;/strong&gt; One LLM call had a 5000-character prompt that included the entire knowledge base unnecessarily.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Pass only relevant excerpts to the LLM. Token usage drops significantly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Case 4: Comparing Runs
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt; "It worked yesterday, now it's broken."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With Agent Recorder:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List all runs&lt;/span&gt;
agent-recorder list

&lt;span class="c"&gt;# Output:&lt;/span&gt;
&lt;span class="c"&gt;# 20260102_143022_abc123  customer-agent  2026-01-02 14:30:22 (working)&lt;/span&gt;
&lt;span class="c"&gt;# 20260103_192705_c2207b  customer-agent  2026-01-03 19:27:05 (broken)&lt;/span&gt;

&lt;span class="c"&gt;# Export both&lt;/span&gt;
agent-recorder &lt;span class="nb"&gt;export &lt;/span&gt;20260102_143022_abc123 &lt;span class="nt"&gt;-o&lt;/span&gt; working.json
agent-recorder &lt;span class="nb"&gt;export &lt;/span&gt;20260103_192705_c2207b &lt;span class="nt"&gt;-o&lt;/span&gt; broken.json

&lt;span class="c"&gt;# Compare with diff tool or custom script&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Discovery:&lt;/strong&gt; In the broken version, a new validation step was added that always returns empty results.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time saved:&lt;/strong&gt; 4 hours → 15 minutes&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Case 5: Onboarding New Team Members
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt; "How does this agent work?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With Agent Recorder:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Run a sample execution:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python examples/customer_service_agent.py
agent-recorder view latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Show them the timeline. They instantly understand:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Agent asks LLM to parse the query&lt;/li&gt;
&lt;li&gt;LLM decides which tools to call&lt;/li&gt;
&lt;li&gt;Agent executes tools (database, API calls)&lt;/li&gt;
&lt;li&gt;LLM synthesizes the response&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;No documentation needed.&lt;/strong&gt; The timeline is living documentation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparing Approaches
&lt;/h2&gt;

&lt;p&gt;Let me compare different debugging approaches with a real scenario:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario:&lt;/strong&gt; Debug why customer order lookup fails for "John Smith"&lt;/p&gt;

&lt;h3&gt;
  
  
  Approach 1: Print Statements
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;find_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DEBUG: Looking for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_customers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DEBUG: Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; customers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DEBUG: No customers found, returning empty&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DEBUG: Getting orders for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DEBUG: Got &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; orders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Time to find bug:&lt;/strong&gt; 30-60 minutes&lt;br&gt;
&lt;strong&gt;Lines of debug code:&lt;/strong&gt; 15-20&lt;br&gt;
&lt;strong&gt;After fixing:&lt;/strong&gt; Remove all print statements&lt;br&gt;
&lt;strong&gt;If it breaks again:&lt;/strong&gt; Add them all back&lt;/p&gt;
&lt;h3&gt;
  
  
  Approach 2: Logging Framework
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;
&lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getLogger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;find_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Looking for customer: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_customers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; customers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No customers found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Getting orders for customer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Retrieved &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; orders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Time to find bug:&lt;/strong&gt; 20-30 minutes&lt;br&gt;
&lt;strong&gt;Lines of debug code:&lt;/strong&gt; 20-25 (permanent overhead)&lt;br&gt;
&lt;strong&gt;After fixing:&lt;/strong&gt; Logs stay (clutter over time)&lt;br&gt;
&lt;strong&gt;Visualization:&lt;/strong&gt; Still just text in a file&lt;/p&gt;
&lt;h3&gt;
  
  
  Approach 3: Cloud Observability (e.g., DataDog)
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ddtrace&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;

&lt;span class="nd"&gt;@tracer.wrap&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;find_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_customers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_customers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_orders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Time to find bug:&lt;/strong&gt; 10-15 minutes&lt;br&gt;
&lt;strong&gt;Setup time:&lt;/strong&gt; 2-3 hours (SDK, config, account)&lt;br&gt;
&lt;strong&gt;Ongoing:&lt;/strong&gt; Monthly subscription&lt;br&gt;
&lt;strong&gt;Security:&lt;/strong&gt; Data sent to third-party&lt;br&gt;
&lt;strong&gt;Lines of instrumentation:&lt;/strong&gt; 15-20&lt;/p&gt;
&lt;h3&gt;
  
  
  Approach 4: Agent Recorder
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agent_recorder&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;

&lt;span class="nd"&gt;@tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order-lookup&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;find_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_customers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;

&lt;span class="nd"&gt;@tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order-lookup&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_customers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM customers WHERE name = &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order-lookup&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM orders WHERE customer_id = &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Time to find bug:&lt;/strong&gt; 5-10 minutes&lt;br&gt;
&lt;strong&gt;Setup time:&lt;/strong&gt; 30 seconds (&lt;code&gt;pip install&lt;/code&gt;)&lt;br&gt;
&lt;strong&gt;Ongoing:&lt;/strong&gt; Free&lt;br&gt;
&lt;strong&gt;Security:&lt;/strong&gt; All data local&lt;br&gt;
&lt;strong&gt;Lines of instrumentation:&lt;/strong&gt; 3 decorators&lt;br&gt;
&lt;strong&gt;After fixing:&lt;/strong&gt; Decorators stay (useful for future debugging)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Winner:&lt;/strong&gt; Agent Recorder provides the best balance of simplicity, effectiveness, and privacy.&lt;/p&gt;
&lt;h2&gt;
  
  
  Building Production-Ready Agents
&lt;/h2&gt;

&lt;p&gt;Agent Recorder isn't just for debugging - it's essential for production agents.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Handling Sensitive Data
&lt;/h3&gt;

&lt;p&gt;Don't log API keys or personal information:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@llm_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;secure-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capture_args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_llm_with_key&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# api_key won't be logged
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

&lt;span class="nd"&gt;@tool_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;secure-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capture_result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_user_pii&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Result won't be logged (but function call and args will)
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_user_sensitive_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Custom Storage Location
&lt;/h3&gt;

&lt;p&gt;For production deployments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@llm_call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;run_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prod-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;storage_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/var/log/agent-recorder&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Cleanup Old Runs
&lt;/h3&gt;

&lt;p&gt;Keep disk usage under control:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete runs older than 7 days&lt;/span&gt;
agent-recorder cleanup &lt;span class="nt"&gt;--older-than&lt;/span&gt; 7d

&lt;span class="c"&gt;# Dry run to see what would be deleted&lt;/span&gt;
agent-recorder cleanup &lt;span class="nt"&gt;--older-than&lt;/span&gt; 7d &lt;span class="nt"&gt;--dry-run&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Automated Analysis
&lt;/h3&gt;

&lt;p&gt;Export and analyze runs programmatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Export to JSON
&lt;/span&gt;    &lt;span class="n"&gt;export_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/tmp/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;system&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent-recorder export &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; -o &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;export_path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Load and analyze
&lt;/span&gt;    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;export_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_llm_calls&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_tool_calls&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_duration&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;errors&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;events&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;llm_call&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_llm_calls&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_duration&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;duration_ms&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tool_call&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_tool_calls&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_duration&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;duration_ms&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;errors&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Road Ahead
&lt;/h2&gt;

&lt;p&gt;Agent Recorder v0.1.1 is just the beginning. Here's what's coming:&lt;/p&gt;

&lt;h3&gt;
  
  
  v0.2.0 - Enhanced Visualization (Planned)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tree/Graph View:&lt;/strong&gt; See nested calls as a visual tree&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token Counting:&lt;/strong&gt; Automatic token counting for OpenAI/Anthropic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Estimation:&lt;/strong&gt; Calculate API costs for each run&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance Metrics:&lt;/strong&gt; Identify bottlenecks automatically&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Export Formats:&lt;/strong&gt; PDF, HTML, CSV for reports&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  v0.3.0 - Framework Integrations (Planned)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LangChain Adapter:&lt;/strong&gt; Auto-instrument LangChain agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LlamaIndex Adapter:&lt;/strong&gt; Seamless integration with LlamaIndex&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AutoGen Support:&lt;/strong&gt; Track multi-agent conversations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CrewAI Integration:&lt;/strong&gt; Monitor crew workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  v0.4.0 - Advanced Features (Planned)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Real-time Streaming:&lt;/strong&gt; Watch agent execution live&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-agent Support:&lt;/strong&gt; Track multiple agents interacting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Diff View:&lt;/strong&gt; Compare two runs side-by-side&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom Events:&lt;/strong&gt; Log your own event types&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plugin System:&lt;/strong&gt; Extend with custom visualizations&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  v0.5.0 - Language Ports (Community Welcome!)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TypeScript/Node.js SDK:&lt;/strong&gt; For JavaScript agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Go SDK:&lt;/strong&gt; For Go-based agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rust SDK:&lt;/strong&gt; For high-performance agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Want to contribute?&lt;/strong&gt; Check out the GitHub repo for good first issues!&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Observability Is Not Optional
&lt;/h2&gt;

&lt;p&gt;As AI agents move from prototypes to production, observability isn't a nice-to-have - &lt;strong&gt;it's essential.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can't optimize what you can't measure.&lt;br&gt;
You can't debug what you can't see.&lt;br&gt;
You can't trust what you can't verify.&lt;/p&gt;

&lt;p&gt;Agent Recorder gives you that visibility with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Two simple decorators&lt;/li&gt;
&lt;li&gt;✅ Zero configuration&lt;/li&gt;
&lt;li&gt;✅ Local-first architecture&lt;/li&gt;
&lt;li&gt;✅ Framework-agnostic design&lt;/li&gt;
&lt;li&gt;✅ Beautiful visualization&lt;/li&gt;
&lt;li&gt;✅ Free and open source&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Get Started Today
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone and install&lt;/span&gt;
git clone https://github.com/yourusername/agent-recorder.git
&lt;span class="nb"&gt;cd &lt;/span&gt;agent-recorder
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# Try the example&lt;/span&gt;
python examples/simple_agent.py

&lt;span class="c"&gt;# View the recording&lt;/span&gt;
agent-recorder view latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/yourusername/agent-recorder" rel="noopener noreferrer"&gt;https://github.com/yourusername/agent-recorder&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;License:&lt;/strong&gt; MIT&lt;br&gt;
&lt;strong&gt;Docs:&lt;/strong&gt; See README.md for full documentation&lt;/p&gt;

&lt;h2&gt;
  
  
  Join the Community
&lt;/h2&gt;

&lt;p&gt;Agent Recorder is open source and built for the community. Whether you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Found a bug → Open an issue&lt;/li&gt;
&lt;li&gt;Have a feature idea → Start a discussion&lt;/li&gt;
&lt;li&gt;Want to contribute → Submit a PR&lt;/li&gt;
&lt;li&gt;Built something cool → Share your story&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We're building the future of agent observability together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Star the repo&lt;/strong&gt; if you find it useful - it helps others discover the project!&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>agents</category>
    </item>
    <item>
      <title>Building Self-Correcting Database Agents with Meta's Llama-4-Scout: From Natural Language to SQL Insights</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Thu, 09 Oct 2025 15:09:03 +0000</pubDate>
      <link>https://forem.com/angu10/building-self-correcting-database-agents-with-metas-llama-4-scout-from-natural-language-to-sql-2k0a</link>
      <guid>https://forem.com/angu10/building-self-correcting-database-agents-with-metas-llama-4-scout-from-natural-language-to-sql-2k0a</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;The ability to query databases using natural language represents one of the most practical applications of large language models in enterprise settings. However, most text-to-SQL implementations suffer from a critical flaw: they generate queries once and hope for the best. When a query fails—due to incorrect table names, misunderstood schema relationships, or logical errors—the system simply returns an error message, leaving users frustrated.&lt;/p&gt;

&lt;p&gt;In this technical guide, we'll build a sophisticated database agent using Meta's Llama-4-Scout that doesn't just generate SQL queries—it thinks through the problem, validates its approach, and most importantly, learns from its mistakes to self-correct. This agent implements a five-phase cognitive framework: &lt;strong&gt;Understand → Plan → Generate → Validate → Execute&lt;/strong&gt;, with intelligent retry logic when queries fail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Database Agents Matter
&lt;/h2&gt;

&lt;p&gt;Business analysts, product managers, and non-technical stakeholders frequently need data insights but lack SQL expertise. Traditional BI tools require pre-built dashboards, creating bottlenecks when ad-hoc questions arise. A well-designed database agent democratizes data access while maintaining query safety and accuracy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Llama-4-Scout for Structured Data Tasks
&lt;/h2&gt;

&lt;p&gt;Llama-4-Scout excels at structured reasoning tasks for several reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Schema comprehension&lt;/strong&gt;: Understands relational database concepts including foreign keys, joins, and aggregations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error diagnosis&lt;/strong&gt;: Can parse SQL error messages and identify root causes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Iterative refinement&lt;/strong&gt;: Maintains context across retry attempts to avoid repeating mistakes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safety awareness&lt;/strong&gt;: Can be prompted to avoid destructive operations&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Implementation Architecture
&lt;/h2&gt;

&lt;p&gt;Our implementation consists of four layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Database Layer&lt;/strong&gt;: SQLite database with realistic employee/department data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent Cognitive Layer&lt;/strong&gt;: Five-phase reasoning pipeline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safety &amp;amp; Validation Layer&lt;/strong&gt;: SQL injection prevention and query validation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User Interface Layer&lt;/strong&gt;: Streamlit application for interaction&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let's build this step by step.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Environment Setup and Dependencies
&lt;/h2&gt;

&lt;p&gt;`python&lt;/p&gt;

&lt;h1&gt;
  
  
  Required packages
&lt;/h1&gt;

&lt;h1&gt;
  
  
  pip install streamlit together sqlite3 pandas
&lt;/h1&gt;

&lt;p&gt;import streamlit as st&lt;br&gt;
import sqlite3&lt;br&gt;
import pandas as pd&lt;br&gt;
from together import Together&lt;br&gt;
import re&lt;br&gt;
from datetime import datetime&lt;br&gt;
import json&lt;br&gt;
&lt;code&gt;&lt;/code&gt;`&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2: Database Schema and Sample Data Creation
&lt;/h2&gt;

&lt;p&gt;First, we'll create a realistic database with three interconnected tables representing a company's organizational structure:&lt;/p&gt;

&lt;p&gt;`&lt;code&gt;&lt;/code&gt;python&lt;br&gt;
def create_sample_database():&lt;br&gt;
    """Create a sample SQLite database with employee, department, and project tables"""&lt;br&gt;
    conn = sqlite3.connect('company.db')&lt;br&gt;
    cursor = conn.cursor()&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Drop existing tables if they exist
cursor.execute('DROP TABLE IF EXISTS projects')
cursor.execute('DROP TABLE IF EXISTS employees')
cursor.execute('DROP TABLE IF EXISTS departments')

# Create departments table
cursor.execute('''
CREATE TABLE departments (
    department_id INTEGER PRIMARY KEY,
    department_name TEXT NOT NULL,
    budget DECIMAL(12, 2),
    location TEXT
)
''')

# Create employees table with self-referential manager relationship
cursor.execute('''
CREATE TABLE employees (
    employee_id INTEGER PRIMARY KEY,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL,
    email TEXT UNIQUE,
    hire_date DATE,
    salary DECIMAL(10, 2),
    department_id INTEGER,
    manager_id INTEGER,
    FOREIGN KEY (department_id) REFERENCES departments(department_id),
    FOREIGN KEY (manager_id) REFERENCES employees(employee_id)
)
''')

# Create projects table
cursor.execute('''
CREATE TABLE projects (
    project_id INTEGER PRIMARY KEY,
    project_name TEXT NOT NULL,
    department_id INTEGER,
    start_date DATE,
    end_date DATE,
    status TEXT,
    budget DECIMAL(10, 2),
    FOREIGN KEY (department_id) REFERENCES departments(department_id)
)
''')

# Insert sample departments
departments = [
    (1, 'Engineering', 2500000.00, 'San Francisco'),
    (2, 'Sales', 1500000.00, 'New York'),
    (3, 'Marketing', 800000.00, 'Austin'),
    (4, 'Human Resources', 600000.00, 'San Francisco'),
    (5, 'Finance', 700000.00, 'New York')
]
cursor.executemany('INSERT INTO departments VALUES (?, ?, ?, ?)', departments)

# Insert sample employees (managers first, then their reports)
employees = [
    # Managers (no manager_id)
    (1, 'Sarah', 'Johnson', 'sarah.j@company.com', '2018-01-15', 150000, 1, None),
    (2, 'Michael', 'Chen', 'michael.c@company.com', '2017-03-20', 145000, 2, None),
    (3, 'Emily', 'Rodriguez', 'emily.r@company.com', '2019-06-10', 125000, 3, None),
    (4, 'David', 'Kim', 'david.k@company.com', '2018-09-05', 130000, 4, None),
    (5, 'Lisa', 'Patel', 'lisa.p@company.com', '2017-11-12', 140000, 5, None),

    # Engineering team
    (6, 'James', 'Wilson', 'james.w@company.com', '2020-02-14', 115000, 1, 1),
    (7, 'Anna', 'Martinez', 'anna.m@company.com', '2021-05-18', 95000, 1, 1),
    (8, 'Robert', 'Taylor', 'robert.t@company.com', '2019-08-22', 120000, 1, 1),
    (9, 'Jennifer', 'Anderson', 'jennifer.a@company.com', '2022-01-10', 85000, 1, 6),
    (10, 'William', 'Thomas', 'william.t@company.com', '2021-11-30', 90000, 1, 6),

    # Sales team
    (11, 'Jessica', 'Moore', 'jessica.m@company.com', '2020-04-05', 95000, 2, 2),
    (12, 'Christopher', 'Jackson', 'chris.j@company.com', '2021-07-19', 88000, 2, 2),
    (13, 'Amanda', 'White', 'amanda.w@company.com', '2019-12-01', 105000, 2, 2),
    (14, 'Daniel', 'Harris', 'daniel.h@company.com', '2022-03-15', 75000, 2, 11),

    # Marketing team
    (15, 'Michelle', 'Martin', 'michelle.m@company.com', '2020-09-08', 82000, 3, 3),
    (16, 'Kevin', 'Thompson', 'kevin.t@company.com', '2021-10-20', 78000, 3, 3),
    (17, 'Rachel', 'Garcia', 'rachel.g@company.com', '2022-02-28', 72000, 3, 15),

    # HR team
    (18, 'Brian', 'Martinez', 'brian.m@company.com', '2019-05-14', 85000, 4, 4),
    (19, 'Nicole', 'Robinson', 'nicole.r@company.com', '2021-08-03', 75000, 4, 4),

    # Finance team
    (20, 'Andrew', 'Clark', 'andrew.c@company.com', '2018-07-22', 95000, 5, 5),
    (21, 'Stephanie', 'Lewis', 'stephanie.l@company.com', '2020-11-11', 88000, 5, 5),
    (22, 'Joshua', 'Lee', 'joshua.l@company.com', '2022-04-06', 80000, 5, 20)
]
cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?, ?, ?, ?)', employees)

# Insert sample projects
projects = [
    (1, 'Cloud Migration', 1, '2024-01-01', '2024-12-31', 'In Progress', 500000),
    (2, 'Mobile App Redesign', 1, '2024-03-15', '2024-09-30', 'In Progress', 300000),
    (3, 'Q4 Sales Campaign', 2, '2024-10-01', '2024-12-31', 'Planning', 200000),
    (4, 'Enterprise Sales Initiative', 2, '2024-02-01', '2024-11-30', 'In Progress', 150000),
    (5, 'Brand Refresh', 3, '2024-05-01', '2024-08-31', 'Completed', 180000),
    (6, 'Social Media Strategy', 3, '2024-07-01', '2025-06-30', 'In Progress', 120000),
    (7, 'HR System Upgrade', 4, '2024-04-01', '2024-10-31', 'In Progress', 100000),
    (8, 'Financial Audit 2024', 5, '2024-01-01', '2024-03-31', 'Completed', 80000)
]
cursor.executemany('INSERT INTO projects VALUES (?, ?, ?, ?, ?, ?, ?)', projects)

conn.commit()
conn.close()

return "Database created successfully!"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;def get_database_schema():&lt;br&gt;
    """Extract and format database schema for the agent"""&lt;br&gt;
    conn = sqlite3.connect('company.db')&lt;br&gt;
    cursor = conn.cursor()&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;schema_info = []

# Get all tables
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
tables = cursor.fetchall()

for table in tables:
    table_name = table[0]
    schema_info.append(f"\n--- Table: {table_name} ---")

    # Get column information
    cursor.execute(f"PRAGMA table_info({table_name})")
    columns = cursor.fetchall()

    for col in columns:
        col_name = col[1]
        col_type = col[2]
        is_pk = " (PRIMARY KEY)" if col[5] else ""
        schema_info.append(f"  - {col_name}: {col_type}{is_pk}")

    # Get foreign key information
    cursor.execute(f"PRAGMA foreign_key_list({table_name})")
    fks = cursor.fetchall()

    if fks:
        schema_info.append("  Foreign Keys:")
        for fk in fks:
            schema_info.append(f"    - {fk[3]} -&amp;gt; {fk[2]}({fk[4]})")

conn.close()
return "\n".join(schema_info)`
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Step 3: The Five-Phase Agent Cognitive Framework
&lt;/h2&gt;

&lt;p&gt;Now we implement the core intelligence of our agent. Each phase has a specific responsibility in the query generation pipeline:&lt;/p&gt;

&lt;p&gt;`&lt;br&gt;
python&lt;/p&gt;
&lt;h1&gt;
  
  
  Initialize Together AI client
&lt;/h1&gt;

&lt;p&gt;together_client = Together(api_key="your_api_key_here")&lt;/p&gt;

&lt;p&gt;def phase_1_understand(question, schema):&lt;br&gt;
    """&lt;br&gt;
    PHASE 1: UNDERSTAND&lt;br&gt;
    Analyze the natural language question and understand what data is needed&lt;br&gt;
    """&lt;br&gt;
    instruction = f"""&lt;br&gt;
You are a database expert analyzing a natural language question.&lt;/p&gt;

&lt;p&gt;DATABASE SCHEMA:&lt;br&gt;
{schema}&lt;/p&gt;

&lt;p&gt;USER QUESTION: "{question}"&lt;/p&gt;

&lt;p&gt;TASK: Analyze this question to understand:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What specific information is the user requesting?&lt;/li&gt;
&lt;li&gt;Which tables contain this information?&lt;/li&gt;
&lt;li&gt;What columns are relevant?&lt;/li&gt;
&lt;li&gt;Are any calculations, aggregations, or groupings needed?&lt;/li&gt;
&lt;li&gt;Are there any ambiguities in the question?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Provide a clear analysis of what data needs to be retrieved.&lt;br&gt;
"""&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;try:
    response = together_client.chat.completions.create(
        model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
        messages=[{"role": "user", "content": instruction}],
        max_tokens=400,
        temperature=0.2
    )
    return response.choices[0].message.content
except Exception as e:
    return f"Error in understanding phase: {str(e)}"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;def phase_2_plan(question, understanding, schema, previous_error=None):&lt;br&gt;
    """&lt;br&gt;
    PHASE 2: PLAN&lt;br&gt;
    Create a detailed plan for constructing the SQL query&lt;br&gt;
    """&lt;br&gt;
    error_context = ""&lt;br&gt;
    if previous_error:&lt;br&gt;
        error_context = f"\n\nPREVIOUS ATTEMPT FAILED WITH ERROR:\n{previous_error}\n\nAdjust your plan to avoid this error."&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;instruction = f"""
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;You are a database expert creating a query plan.&lt;/p&gt;

&lt;p&gt;DATABASE SCHEMA:&lt;br&gt;
{schema}&lt;/p&gt;

&lt;p&gt;USER QUESTION: "{question}"&lt;/p&gt;

&lt;p&gt;YOUR UNDERSTANDING: {understanding}&lt;br&gt;
{error_context}&lt;/p&gt;

&lt;p&gt;TASK: Create a detailed SQL query plan:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Which tables need to be queried?&lt;/li&gt;
&lt;li&gt;What JOIN operations are required (if any)?&lt;/li&gt;
&lt;li&gt;What WHERE conditions are needed?&lt;/li&gt;
&lt;li&gt;Are GROUP BY or aggregate functions needed?&lt;/li&gt;
&lt;li&gt;Should results be sorted or limited?&lt;/li&gt;
&lt;li&gt;What specific columns should be selected?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Provide a step-by-step plan in numbered format.&lt;br&gt;
"""&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;try:
    response = together_client.chat.completions.create(
        model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
        messages=[{"role": "user", "content": instruction}],
        max_tokens=500,
        temperature=0.2
    )
    return response.choices[0].message.content
except Exception as e:
    return f"Error in planning phase: {str(e)}"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;def phase_3_generate(question, understanding, plan, schema, previous_error=None, previous_sql=None):&lt;br&gt;
    """&lt;br&gt;
    PHASE 3: GENERATE&lt;br&gt;
    Generate the actual SQL query based on the plan&lt;br&gt;
    """&lt;br&gt;
    error_context = ""&lt;br&gt;
    if previous_error and previous_sql:&lt;br&gt;
        error_context = f"""&lt;br&gt;
\n\nPREVIOUS SQL ATTEMPT:&lt;br&gt;
{previous_sql}&lt;/p&gt;

&lt;p&gt;ERROR RECEIVED:&lt;br&gt;
{previous_error}&lt;/p&gt;

&lt;p&gt;IMPORTANT: Fix the specific error above. Common issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Incorrect table or column names (check schema carefully)&lt;/li&gt;
&lt;li&gt;Missing JOIN conditions&lt;/li&gt;
&lt;li&gt;Syntax errors&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Incorrect aggregate functions&lt;br&gt;
"""&lt;/p&gt;

&lt;p&gt;instruction = f"""&lt;br&gt;
You are a SQL expert generating a query.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;DATABASE SCHEMA:&lt;br&gt;
{schema}&lt;/p&gt;

&lt;p&gt;USER QUESTION: "{question}"&lt;/p&gt;

&lt;p&gt;YOUR UNDERSTANDING: {understanding}&lt;/p&gt;

&lt;p&gt;YOUR PLAN: {plan}&lt;br&gt;
{error_context}&lt;/p&gt;

&lt;p&gt;TASK: Generate a valid SQLite SQL query that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Follows the plan exactly&lt;/li&gt;
&lt;li&gt;Uses correct table and column names from the schema&lt;/li&gt;
&lt;li&gt;Includes proper JOIN syntax if needed&lt;/li&gt;
&lt;li&gt;Has correct WHERE, GROUP BY, and ORDER BY clauses&lt;/li&gt;
&lt;li&gt;Is properly formatted and includes comments&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;CRITICAL RULES:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only use tables and columns that exist in the schema above&lt;/li&gt;
&lt;li&gt;For SQLite, use proper syntax (e.g., no TOP keyword, use LIMIT instead)&lt;/li&gt;
&lt;li&gt;Always use table aliases for clarity in JOINs&lt;/li&gt;
&lt;li&gt;Double-check column names match the schema exactly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Respond with ONLY the SQL query, nothing else. Do not include markdown code blocks or explanations.&lt;br&gt;
"""&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;try:
    response = together_client.chat.completions.create(
        model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
        messages=[{"role": "user", "content": instruction}],
        max_tokens=600,
        temperature=0.1
    )
    sql_query = response.choices[0].message.content.strip()

    # Clean up the SQL (remove markdown code blocks if present)
    sql_query = re.sub(r'^```sql\s*', '', sql_query)
    sql_query = re.sub(r'^```\s*', '', sql_query)
    sql_query = re.sub(r'\s*```$', '', sql_query)

    return sql_query.strip()
except Exception as e:
    return f"Error in generation phase: {str(e)}"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;def phase_4_validate(sql_query, schema):&lt;br&gt;
    """&lt;br&gt;
    PHASE 4: VALIDATE&lt;br&gt;
    Validate the SQL query for safety and correctness before execution&lt;br&gt;
    """&lt;br&gt;
    validation_results = {&lt;br&gt;
        "is_safe": True,&lt;br&gt;
        "is_valid": True,&lt;br&gt;
        "warnings": [],&lt;br&gt;
        "errors": []&lt;br&gt;
    }&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Check for dangerous operations
dangerous_keywords = ['DROP', 'DELETE', 'UPDATE', 'INSERT', 'ALTER', 'TRUNCATE', 'GRANT', 'REVOKE']
sql_upper = sql_query.upper()

for keyword in dangerous_keywords:
    if keyword in sql_upper:
        validation_results["is_safe"] = False
        validation_results["errors"].append(f"Dangerous operation detected: {keyword}")

# Check for SELECT statement
if not sql_upper.strip().startswith('SELECT'):
    validation_results["is_safe"] = False
    validation_results["errors"].append("Only SELECT queries are allowed")

# Check for semicolon (potential SQL injection)
if sql_query.count(';') &amp;gt; 1:
    validation_results["warnings"].append("Multiple statements detected - potential security risk")

# Basic syntax check
if 'FROM' not in sql_upper:
    validation_results["is_valid"] = False
    validation_results["errors"].append("Invalid SQL: Missing FROM clause")

return validation_results
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;def phase_5_execute(sql_query):&lt;br&gt;
    """&lt;br&gt;
    PHASE 5: EXECUTE&lt;br&gt;
    Execute the SQL query and return results&lt;br&gt;
    """&lt;br&gt;
    try:&lt;br&gt;
        conn = sqlite3.connect('company.db')&lt;br&gt;
        df = pd.read_sql_query(sql_query, conn)&lt;br&gt;
        conn.close()&lt;br&gt;
        return df, None&lt;br&gt;
    except Exception as e:&lt;br&gt;
        return None, str(e)&lt;br&gt;
`&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Self-Correction and Retry Logic
&lt;/h2&gt;

&lt;p&gt;This is where our agent becomes truly intelligent. When a query fails, the agent analyzes the error and attempts to fix it:&lt;/p&gt;

&lt;p&gt;`&lt;br&gt;
python&lt;br&gt;
def diagnose_error(sql_query, error_message, schema):&lt;br&gt;
    """&lt;br&gt;
    Use the LLM to diagnose what went wrong and provide guidance&lt;br&gt;
    """&lt;br&gt;
    instruction = f"""&lt;br&gt;
You are a SQL debugging expert.&lt;/p&gt;

&lt;p&gt;DATABASE SCHEMA:&lt;br&gt;
{schema}&lt;/p&gt;

&lt;p&gt;FAILED SQL QUERY:&lt;br&gt;
{sql_query}&lt;/p&gt;

&lt;p&gt;ERROR MESSAGE:&lt;br&gt;
{error_message}&lt;/p&gt;

&lt;p&gt;TASK: Diagnose the specific problem:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What caused this error?&lt;/li&gt;
&lt;li&gt;Which part of the query is incorrect?&lt;/li&gt;
&lt;li&gt;What should be changed to fix it?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Be specific and concise. Focus on the exact fix needed.&lt;br&gt;
"""&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;try:
    response = together_client.chat.completions.create(
        model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
        messages=[{"role": "user", "content": instruction}],
        max_tokens=300,
        temperature=0.2
    )
    return response.choices[0].message.content
except Exception as e:
    return f"Error in diagnosis: {str(e)}"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;def execute_query_with_retry(question, schema, max_attempts=3):&lt;br&gt;
    """&lt;br&gt;
    Complete pipeline with retry logic&lt;br&gt;
    """&lt;br&gt;
    attempt_history = []&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for attempt in range(max_attempts):
    attempt_info = {
        "attempt_number": attempt + 1,
        "understanding": "",
        "plan": "",
        "sql": "",
        "validation": {},
        "result": None,
        "error": None,
        "diagnosis": ""
    }

    # Get previous error context if this is a retry
    previous_error = attempt_history[-1]["error"] if attempt_history else None
    previous_sql = attempt_history[-1]["sql"] if attempt_history else None

    # Phase 1: Understand
    understanding = phase_1_understand(question, schema)
    attempt_info["understanding"] = understanding

    # Phase 2: Plan (with error context if retrying)
    plan = phase_2_plan(question, understanding, schema, previous_error)
    attempt_info["plan"] = plan

    # Phase 3: Generate SQL (with error context if retrying)
    sql_query = phase_3_generate(question, understanding, plan, schema, previous_error, previous_sql)
    attempt_info["sql"] = sql_query

    # Phase 4: Validate
    validation = phase_4_validate(sql_query, schema)
    attempt_info["validation"] = validation

    if not validation["is_safe"] or not validation["is_valid"]:
        attempt_info["error"] = "Query failed validation: " + ", ".join(validation["errors"])
        attempt_history.append(attempt_info)
        continue

    # Phase 5: Execute
    result, error = phase_5_execute(sql_query)
    attempt_info["result"] = result
    attempt_info["error"] = error

    if error:
        # Diagnose the error
        diagnosis = diagnose_error(sql_query, error, schema)
        attempt_info["diagnosis"] = diagnosis
        attempt_history.append(attempt_info)
        continue

    # Success!
    attempt_history.append(attempt_info)
    return result, sql_query, attempt_history

# All attempts failed
return None, None, attempt_history
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;def format_natural_language_response(question, df, sql_query):&lt;br&gt;
    """&lt;br&gt;
    Generate a natural language explanation of the results&lt;br&gt;
    """&lt;br&gt;
    if df is None or df.empty:&lt;br&gt;
        return "No results found for your query."&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;result_preview = df.head(10).to_string()
row_count = len(df)

instruction = f"""
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;You are a data analyst explaining query results to a business user.&lt;/p&gt;

&lt;p&gt;USER QUESTION: "{question}"&lt;/p&gt;

&lt;p&gt;SQL QUERY EXECUTED:&lt;br&gt;
{sql_query}&lt;/p&gt;

&lt;p&gt;RESULTS ({row_count} rows total):&lt;br&gt;
{result_preview}&lt;/p&gt;

&lt;p&gt;TASK: Provide a clear, business-friendly explanation of these results:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Directly answer the user's question&lt;/li&gt;
&lt;li&gt;Highlight key insights from the data&lt;/li&gt;
&lt;li&gt;Mention the total number of results if relevant&lt;/li&gt;
&lt;li&gt;Keep it concise but informative&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Do not show the SQL query or technical details. Focus on the business meaning.&lt;br&gt;
"""&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;try:
    response = together_client.chat.completions.create(
        model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
        messages=[{"role": "user", "content": instruction}],
        max_tokens=400,
        temperature=0.3
    )
    return response.choices[0].message.content
except Exception as e:
    return f"Results retrieved successfully. {row_count} rows returned."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;`&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Building the Streamlit User Interface
&lt;/h2&gt;

&lt;p&gt;Now let's create an interactive interface that showcases the agent's cognitive process:&lt;/p&gt;

&lt;p&gt;`&lt;br&gt;
python&lt;br&gt;
def main():&lt;br&gt;
    st.set_page_config(page_title="SQL Agent with Llama-4-Scout", layout="wide")&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;st.title("🤖 Self-Correcting Database Agent")
st.markdown("*Powered by Meta's Llama-4-Scout*")

# Initialize session state
if 'db_initialized' not in st.session_state:
    st.session_state.db_initialized = False
if 'query_history' not in st.session_state:
    st.session_state.query_history = []

# Sidebar for database setup
with st.sidebar:
    st.header("⚙️ Database Setup")

    if not st.session_state.db_initialized:
        if st.button("Initialize Sample Database", type="primary"):
            with st.spinner("Creating database..."):
                result = create_sample_database()
                st.session_state.db_initialized = True
                st.success(result)
                st.rerun()
    else:
        st.success("✅ Database ready")

        if st.button("Reset Database"):
            create_sample_database()
            st.session_state.query_history = []
            st.success("Database reset!")
            st.rerun()

    st.markdown("---")
    st.header("📊 Database Schema")

    if st.session_state.db_initialized:
        with st.expander("View Schema", expanded=False):
            schema = get_database_schema()
            st.code(schema, language="text")

    st.markdown("---")
    st.header("💡 Example Questions")
    st.markdown("""
    **Simple:**
    - How many employees do we have?
    - Show all departments

    **Medium:**
    - What's the average salary by department?
    - Who are the managers?

    **Complex:**
    - Which employees earn more than their manager?
    - Show departments with highest average salaries

    **Advanced:**
    - Compare project budgets to department budgets
    - Find managers with most direct reports
    """)

# Main content area
if not st.session_state.db_initialized:
    st.info("👈 Please initialize the database using the sidebar to get started.")
    return

st.header("💬 Ask Questions About Your Data")

# Query input
col1, col2 = st.columns([3, 1])

with col1:
    user_question = st.text_input(
        "Your Question:",
        placeholder="e.g., Who are the top 5 highest paid employees?",
        key="question_input"
    )

with col2:
    st.write("")  # Spacing
    st.write("")  # Spacing
    show_thinking = st.checkbox("Show reasoning", value=True)

if st.button("🔍 Get Answer", type="primary", use_container_width=True):
    if not user_question:
        st.warning("Please enter a question.")
        return

    # Get schema
    schema = get_database_schema()

    # Execute query with retry logic
    with st.spinner("🧠 Agent is thinking..."):
        result_df, final_sql, attempt_history = execute_query_with_retry(
            user_question, 
            schema, 
            max_attempts=3
        )

    # Display results
    if result_df is not None:
        st.success("✅ Query successful!")

        # Show the agent's thinking process
        if show_thinking:
            with st.expander("🧠 Agent's Cognitive Process", expanded=True):
                for i, attempt in enumerate(attempt_history):
                    if len(attempt_history) &amp;gt; 1:
                        if i &amp;lt; len(attempt_history) - 1:
                            st.error(f"❌ Attempt {attempt['attempt_number']} (Failed)")
                        else:
                            st.success(f"✅ Attempt {attempt['attempt_number']} (Success)")

                    col1, col2 = st.columns(2)

                    with col1:
                        st.markdown("**1️⃣ UNDERSTAND**")
                        st.info(attempt['understanding'])

                    with col2:
                        st.markdown("**2️⃣ PLAN**")
                        st.info(attempt['plan'])

                    st.markdown("**3️⃣ GENERATE SQL**")
                    st.code(attempt['sql'], language="sql")

                    if attempt['validation']['warnings']:
                        st.warning("Validation Warnings: " + ", ".join(attempt['validation']['warnings']))

                    if attempt['error']:
                        st.error(f"**Error:** {attempt['error']}")

                        if attempt['diagnosis']:
                            st.markdown("**🔍 Diagnosis:**")
                            st.warning(attempt['diagnosis'])

                    if i &amp;lt; len(attempt_history) - 1:
                        st.markdown("---")
                        st.markdown("↻ **Retrying with corrections...**")
                        st.markdown("---")

        # Display natural language response
        st.markdown("### 📝 Answer")
        with st.spinner("Generating explanation..."):
            explanation = format_natural_language_response(user_question, result_df, final_sql)
        st.markdown(explanation)

        # Display results table
        st.markdown("### 📊 Data")
        st.dataframe(result_df, use_container_width=True)

        # Show final SQL
        with st.expander("View SQL Query"):
            st.code(final_sql, language="sql")

        # Add to history
        st.session_state.query_history.append({
            "question": user_question,
            "sql": final_sql,
            "rows": len(result_df),
            "attempts": len(attempt_history),
            "timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        })

    else:
        st.error("❌ Query failed after all retry attempts")

        if show_thinking:
            with st.expander("🧠 All Attempts", expanded=True):
                for i, attempt in enumerate(attempt_history):
                    st.error(f"**Attempt {attempt['attempt_number']}**")
                    st.code(attempt['sql'], language="sql")
                    st.error(f"Error: {attempt['error']}")
                    if attempt['diagnosis']:
                        st.warning(f"Diagnosis: {attempt['diagnosis']}")
                    st.markdown("---")

# Query history
if st.session_state.query_history:
    st.markdown("---")
    st.header("📜 Query History")

    history_df = pd.DataFrame(st.session_state.query_history)
    st.dataframe(history_df, use_container_width=True)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;if &lt;strong&gt;name&lt;/strong&gt; == "&lt;strong&gt;main&lt;/strong&gt;":&lt;br&gt;
    main()&lt;br&gt;
`&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Implementation Insights
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. The Power of Multi-Phase Reasoning
&lt;/h3&gt;

&lt;p&gt;The five-phase approach significantly improves query accuracy compared to single-shot generation. Here's why:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase Separation Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Understand&lt;/strong&gt;: Forces the model to explicitly identify required data before coding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plan&lt;/strong&gt;: Creates a logical roadmap, reducing the chance of structural errors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate&lt;/strong&gt;: Focuses solely on syntax, with context from previous phases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validate&lt;/strong&gt;: Catches dangerous operations before execution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execute&lt;/strong&gt;: Provides real-world feedback for learning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This separation of concerns mirrors how human SQL experts approach complex queries—they don't immediately write code; they think through the problem first.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Self-Correction Through Error Analysis
&lt;/h3&gt;

&lt;p&gt;The retry mechanism is sophisticated because it provides the agent with:&lt;/p&gt;

&lt;p&gt;`&lt;br&gt;
python&lt;br&gt;
error_context = f"""&lt;br&gt;
PREVIOUS SQL ATTEMPT:&lt;br&gt;
{previous_sql}&lt;/p&gt;

&lt;p&gt;ERROR RECEIVED:&lt;br&gt;
{previous_error}&lt;/p&gt;

&lt;p&gt;IMPORTANT: Fix the specific error above.&lt;br&gt;
"""&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;This context allows the model to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;See exactly what it tried before&lt;/li&gt;
&lt;li&gt;Understand why it failed&lt;/li&gt;
&lt;li&gt;Adjust its approach specifically&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most text-to-SQL systems fail silently. Our agent learns and adapts.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Safety Through Validation
&lt;/h3&gt;

&lt;p&gt;The validation phase prevents dangerous operations:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;br&gt;
python&lt;br&gt;
dangerous_keywords = ['DROP', 'DELETE', 'UPDATE', 'INSERT', 'ALTER', 'TRUNCATE']&lt;br&gt;
&lt;/code&gt;&lt;br&gt;
This is critical for production systems where a malicious or misunderstood prompt could corrupt data. The agent is constrained to read-only operations, making it safe for business users.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Schema-Aware Query Generation
&lt;/h3&gt;

&lt;p&gt;By providing the complete schema to the agent at each phase, we eliminate a common failure mode: queries that reference non-existent tables or columns. The schema serves as ground truth that the agent must respect.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Building a self-correcting database agent with Llama-4-Scout demonstrates the power of structured agentic reasoning. Unlike simple prompt-and-hope approaches, our five-phase cognitive framework—&lt;strong&gt;Understand → Plan → Generate → Validate → Execute&lt;/strong&gt;—creates a system that thinks through problems methodically and learns from mistakes.&lt;/p&gt;

&lt;p&gt;The key innovations in this implementation are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Multi-phase reasoning&lt;/strong&gt;: Each phase has a specific cognitive responsibility&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intelligent retry logic&lt;/strong&gt;: The agent learns from failures and adapts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safety-first design&lt;/strong&gt;: Validation prevents destructive operations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Natural language interface&lt;/strong&gt;: Business users can access data without SQL knowledge&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transparent reasoning&lt;/strong&gt;: Users can inspect the agent's thought process&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This approach can be extended beyond SQL to other structured tasks: API query generation, configuration file creation, or any domain where precision matters and mistakes have consequences. The think-plan-execute-validate-retry pattern is broadly applicable to agentic systems.&lt;/p&gt;

&lt;p&gt;As LLMs continue to improve, we can expect even more sophisticated error recovery, better schema understanding, and more nuanced query optimization. The combination of powerful base models like Llama-4-Scout with well-designed cognitive frameworks opens up new possibilities for making complex technical systems accessible to everyone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ready to build your own database agent?&lt;/strong&gt; The complete code is provided in this article. Start with the sample database, test with the example questions, and then adapt it to your own data infrastructure. The future of data access is conversational, safe, and intelligent.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>agenticai</category>
      <category>python</category>
    </item>
    <item>
      <title>Student-Teacher Distillation: A Complete Guide for Model Compression</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Tue, 26 Aug 2025 18:48:58 +0000</pubDate>
      <link>https://forem.com/angu10/student-teacher-distillation-a-complete-guide-for-model-compression-37ed</link>
      <guid>https://forem.com/angu10/student-teacher-distillation-a-complete-guide-for-model-compression-37ed</guid>
      <description>&lt;p&gt;&lt;em&gt;Part 1 of our Deep Learning Model Optimization Series&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In the rapidly evolving world of machine learning, deploying large, powerful models in production environments often presents significant challenges. Enter student-teacher distillation—a powerful technique that allows us to compress the knowledge of complex models into smaller, more efficient ones without sacrificing too much accuracy. This comprehensive guide will walk you through everything you need to know about this fascinating approach to model optimization.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Student-Teacher Distillation?
&lt;/h2&gt;

&lt;p&gt;Student-teacher distillation, also known as knowledge distillation, is a model compression technique where a smaller "student" model learns to mimic the behavior of a larger, more complex "teacher" model. Think of it as an experienced professor (teacher) passing on their knowledge and wisdom to a bright student who can then apply that knowledge more efficiently.&lt;/p&gt;

&lt;p&gt;The core idea is beautifully simple: instead of training the student model from scratch on the original data, we train it to match the teacher's outputs. The teacher model provides "soft targets" or probability distributions that contain richer information than the hard labels in the original dataset. These soft targets capture the teacher's uncertainty and the relationships between different classes, which helps the student learn more nuanced decision boundaries.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Magic Behind Soft Targets
&lt;/h3&gt;

&lt;p&gt;When a teacher model predicts a class, it doesn't just output a binary decision—it provides a probability distribution across all possible classes. For example, when classifying an image of a dog, the teacher might output:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dog: 0.8&lt;/li&gt;
&lt;li&gt;Wolf: 0.15&lt;/li&gt;
&lt;li&gt;Cat: 0.03&lt;/li&gt;
&lt;li&gt;Other: 0.02&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This distribution tells us that while the model is confident it's a dog, it also sees some wolf-like features. This nuanced understanding, captured in the soft targets, helps the student model learn more effectively than just knowing "this is a dog."&lt;/p&gt;

&lt;h2&gt;
  
  
  Student-Teacher Distillation vs. Fine-Tuning: Understanding the Key Differences
&lt;/h2&gt;

&lt;p&gt;While both techniques involve training smaller models, they serve fundamentally different purposes and follow distinct approaches.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fine-Tuning: Adapting Pre-trained Knowledge
&lt;/h3&gt;

&lt;p&gt;Fine-tuning takes a pre-trained model (often trained on a large, general dataset) and adapts it to a specific task or domain. You start with a model that already understands general patterns and then specialize it for your particular use case. It's like taking a general physician and having them specialize in cardiology.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key characteristics of fine-tuning:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Starts with a pre-trained model of similar or identical size&lt;/li&gt;
&lt;li&gt;Adapts existing knowledge to new domains or tasks&lt;/li&gt;
&lt;li&gt;Typically involves training on task-specific data&lt;/li&gt;
&lt;li&gt;The final model size remains roughly the same&lt;/li&gt;
&lt;li&gt;Primary goal is task adaptation, not compression&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Student-Teacher Distillation: Knowledge Compression
&lt;/h3&gt;

&lt;p&gt;Distillation, on the other hand, is primarily about compression and efficiency. You're taking a large, complex model and teaching a smaller model to replicate its behavior. The student model learns to approximate the teacher's decision-making process within a more constrained architecture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key characteristics of distillation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Creates a smaller model from a larger one&lt;/li&gt;
&lt;li&gt;Focuses on knowledge transfer and compression&lt;/li&gt;
&lt;li&gt;The student learns from the teacher's outputs, not just original data&lt;/li&gt;
&lt;li&gt;Significant reduction in model size and computational requirements&lt;/li&gt;
&lt;li&gt;Primary goal is efficiency while maintaining accuracy&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to Use Which Approach
&lt;/h3&gt;

&lt;p&gt;Choose &lt;strong&gt;fine-tuning&lt;/strong&gt; when you have a model that's already the right size for your deployment constraints, but you need to adapt it to a specific domain or task. Choose &lt;strong&gt;distillation&lt;/strong&gt; when you have a high-performing model that's too large or slow for your production requirements.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Choose the Right Teacher Model
&lt;/h2&gt;

&lt;p&gt;Selecting an appropriate teacher model is crucial for successful distillation. The teacher sets the accuracy ceiling for your student, so this decision significantly impacts your final results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Accuracy is King
&lt;/h3&gt;

&lt;p&gt;Your teacher model should excel at the target task. There's no point in distilling from a mediocre teacher—the student can only learn what the teacher knows. Look for models with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High accuracy on your target dataset&lt;/li&gt;
&lt;li&gt;Strong generalization capabilities&lt;/li&gt;
&lt;li&gt;Robust accuracy across different data distributions&lt;/li&gt;
&lt;li&gt;Well-calibrated confidence scores&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Architecture Considerations
&lt;/h3&gt;

&lt;p&gt;While the teacher doesn't need to share the same architecture as the student, some considerations matter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Complexity advantage&lt;/strong&gt;: The teacher should be significantly more complex than the student to justify the distillation process&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Task alignment&lt;/strong&gt;: Models designed for similar tasks often make better teachers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output compatibility&lt;/strong&gt;: Ensure the teacher's output format aligns with your distillation setup&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Practical Factors
&lt;/h3&gt;

&lt;p&gt;Don't overlook practical constraints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Computational resources&lt;/strong&gt;: You need to be able to run inference on the teacher model during training&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Licensing and availability&lt;/strong&gt;: Ensure you have access to the teacher model and can use it for your purposes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data compatibility&lt;/strong&gt;: The teacher should work well with your training data&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Multi-Teacher Approaches
&lt;/h3&gt;

&lt;p&gt;Consider using multiple teacher models when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Different teachers excel at different aspects of the task&lt;/li&gt;
&lt;li&gt;You want to ensemble knowledge from various sources&lt;/li&gt;
&lt;li&gt;You're working with complex, multi-modal tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Selecting the Perfect Student Model
&lt;/h2&gt;

&lt;p&gt;Choosing the student model involves balancing accuracy goals with deployment constraints. This is where the art of distillation really shines.&lt;/p&gt;

&lt;h3&gt;
  
  
  Size and Efficiency Targets
&lt;/h3&gt;

&lt;p&gt;Start by defining your deployment requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Latency constraints&lt;/strong&gt;: How fast must inference be?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory limitations&lt;/strong&gt;: What's your RAM/storage budget?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Power consumption&lt;/strong&gt;: Are you deploying on mobile or edge devices?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Throughput requirements&lt;/strong&gt;: How many predictions per second do you need?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Architectural Choices
&lt;/h3&gt;

&lt;p&gt;The student architecture should be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Appropriately sized&lt;/strong&gt;: Small enough to meet deployment constraints, large enough to capture essential patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Well-suited to the task&lt;/strong&gt;: Some architectures naturally excel at certain types of problems&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Efficiently designed&lt;/strong&gt;: Modern efficient architectures like MobileNets, EfficientNets, or DistilBERT are often good starting points&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Goldilocks Principle
&lt;/h3&gt;

&lt;p&gt;Your student model size should be "just right":&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Too small&lt;/strong&gt;: The model lacks the capacity to learn the teacher's knowledge effectively&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Too large&lt;/strong&gt;: You lose the efficiency benefits and might as well use a larger model directly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Just right&lt;/strong&gt;: Provides the best trade-off between accuracy and efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Advanced Strategy: Progressive Distillation
&lt;/h3&gt;

&lt;p&gt;For very large compression ratios, consider progressive distillation as your student model selection strategy:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start with a large teacher&lt;/li&gt;
&lt;li&gt;Distill to a medium-sized intermediate model&lt;/li&gt;
&lt;li&gt;Use the intermediate model as a teacher for an even smaller student&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This stepped approach often yields better results than trying to compress directly from very large to very small models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pros and Cons of Student-Teacher Distillation
&lt;/h2&gt;

&lt;p&gt;Like any technique, distillation comes with its own set of advantages and limitations. Understanding these will help you make informed decisions about when and how to apply this approach.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Compelling Advantages
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Significant Model Compression&lt;/strong&gt;&lt;br&gt;
The most obvious benefit is the dramatic reduction in model size. You can often achieve 5-10x compression while retaining 90-95% of the original accuracy. This makes deployment feasible in resource-constrained environments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Faster Inference&lt;/strong&gt;&lt;br&gt;
Smaller models mean faster predictions. This translates to better user experience, lower latency, and the ability to serve more requests with the same hardware.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lower Computational Costs&lt;/strong&gt;&lt;br&gt;
Reduced model size means lower memory usage, less power consumption, and cheaper inference costs—especially important when serving millions of requests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Preserved Knowledge Quality&lt;/strong&gt;&lt;br&gt;
Unlike simple pruning or quantization, distillation preserves the nuanced decision-making patterns of the teacher model. The student learns not just what to predict, but how to think about the problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enhanced Generalization&lt;/strong&gt;&lt;br&gt;
Soft targets from the teacher model often help students generalize better than training on hard labels alone. The teacher's uncertainty provides valuable regularization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Flexibility in Architecture&lt;/strong&gt;&lt;br&gt;
You can distill knowledge across different architectures, allowing you to optimize for specific deployment requirements while retaining accuracy.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Notable Limitations
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Accuracy Ceiling&lt;/strong&gt;&lt;br&gt;
The student can rarely exceed the teacher's accuracy. You're fundamentally limited by the teacher's knowledge and capabilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Training Complexity&lt;/strong&gt;&lt;br&gt;
Distillation requires careful hyperparameter tuning, temperature selection, and loss function balancing. It's more complex than standard training.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Computational Overhead During Training&lt;/strong&gt;&lt;br&gt;
You need to run both teacher and student models during training, which can be computationally expensive and time-consuming.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Teacher Dependency&lt;/strong&gt;&lt;br&gt;
The quality of your distillation is fundamentally limited by your teacher model. A biased or poorly accurate teacher will pass these issues to the student.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Diminishing Returns&lt;/strong&gt;&lt;br&gt;
Very aggressive compression (e.g., 100x smaller) often leads to significant accuracy degradation. There are practical limits to how much you can compress.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Task-Specific Effectiveness&lt;/strong&gt;&lt;br&gt;
Distillation works better for some tasks than others. Classification tasks often see better results than generation tasks, for instance.&lt;/p&gt;

&lt;h3&gt;
  
  
  When Distillation Shines
&lt;/h3&gt;

&lt;p&gt;Student-teacher distillation is particularly effective when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have a high-accuracy large model that's too slow for production&lt;/li&gt;
&lt;li&gt;Deployment constraints (mobile, edge devices) require smaller models&lt;/li&gt;
&lt;li&gt;You need to serve high-volume requests efficiently&lt;/li&gt;
&lt;li&gt;The task has clear input-output relationships&lt;/li&gt;
&lt;li&gt;You have sufficient computational resources for training&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to Consider Alternatives
&lt;/h3&gt;

&lt;p&gt;Consider other approaches when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your teacher model isn't significantly better than smaller alternatives&lt;/li&gt;
&lt;li&gt;Training time and computational costs outweigh deployment benefits&lt;/li&gt;
&lt;li&gt;You need the absolute best accuracy regardless of size&lt;/li&gt;
&lt;li&gt;Your deployment environment can accommodate larger models&lt;/li&gt;
&lt;li&gt;The task requires capabilities that are hard to distill (like complex reasoning)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Looking Ahead
&lt;/h2&gt;

&lt;p&gt;Student-teacher distillation represents a powerful tool in the modern ML practitioner's toolkit. As models continue to grow larger and more capable, the ability to efficiently compress and deploy them becomes increasingly valuable.&lt;/p&gt;

&lt;p&gt;In our next article, we'll dive into the practical implementation details, including code examples, loss function design, and training strategies that will help you implement your own distillation pipeline. We'll explore different distillation variants, advanced techniques like attention transfer, and share best practices learned from real-world deployments.&lt;/p&gt;

&lt;p&gt;The journey from understanding the theory to implementing effective distillation systems is both challenging and rewarding. With the foundation we've built here, you're well-equipped to start exploring this fascinating area of machine learning optimization.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Stay tuned for Part 2, where we'll get our hands dirty with implementation details and practical code examples that will bring these concepts to life.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
    </item>
    <item>
      <title>Beyond the Competition: How Claude Sonnet 4, GPT-4o, and Gemini 2.5 Can Work Together in Agent Harmony</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Sun, 22 Jun 2025 17:51:30 +0000</pubDate>
      <link>https://forem.com/angu10/beyond-the-competition-how-claude-sonnet-4-gpt-4o-and-gemini-25-can-work-together-in-agent-3o6b</link>
      <guid>https://forem.com/angu10/beyond-the-competition-how-claude-sonnet-4-gpt-4o-and-gemini-25-can-work-together-in-agent-3o6b</guid>
      <description>&lt;p&gt;The AI landscape is often portrayed as a zero-sum game where models compete for dominance. But what if we shifted our perspective? Instead of choosing one model to rule them all, what if we leveraged the unique strengths of each model to create a more powerful, complementary system?&lt;/p&gt;

&lt;p&gt;In this article, we'll explore how Claude Sonnet-4, OpenAI's GPT-4o, and Google's Gemini 2.5 can work together in an agentic architecture, creating a symphony of AI capabilities that's greater than the sum of its parts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Each Model's Unique Strengths
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Claude Sonnet 4: The Thoughtful Analyst
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Strengths:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exceptional reasoning and analysis capabilities&lt;/li&gt;
&lt;li&gt;Strong ethical reasoning and safety considerations&lt;/li&gt;
&lt;li&gt;Excellent at breaking down complex problems methodically&lt;/li&gt;
&lt;li&gt;Superior performance in structured thinking and logical reasoning&lt;/li&gt;
&lt;li&gt;Excellent at handling nuanced conversations and context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Ideal Use Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Code review and analysis&lt;/li&gt;
&lt;li&gt;Complex problem decomposition&lt;/li&gt;
&lt;li&gt;Ethical decision-making processes&lt;/li&gt;
&lt;li&gt;Research and analysis tasks&lt;/li&gt;
&lt;li&gt;Long-form content creation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  GPT-4o: The Versatile Performer
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Strengths:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Excellent multimodal capabilities (text, vision, audio)&lt;/li&gt;
&lt;li&gt;Strong creative writing and content generation&lt;/li&gt;
&lt;li&gt;Robust API ecosystem and integration options&lt;/li&gt;
&lt;li&gt;Consistent performance across diverse tasks&lt;/li&gt;
&lt;li&gt;Great at following specific formatting instructions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Ideal Use Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Content generation and creative writing&lt;/li&gt;
&lt;li&gt;Multimodal processing tasks&lt;/li&gt;
&lt;li&gt;API integrations and automation&lt;/li&gt;
&lt;li&gt;Quick prototyping and ideation&lt;/li&gt;
&lt;li&gt;Image analysis and description&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Gemini 2.5: The Technical Powerhouse
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Strengths:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exceptional mathematical and scientific reasoning&lt;/li&gt;
&lt;li&gt;Strong coding capabilities and technical documentation&lt;/li&gt;
&lt;li&gt;Excellent at handling large contexts and complex data&lt;/li&gt;
&lt;li&gt;Superior performance in research and technical analysis&lt;/li&gt;
&lt;li&gt;Great integration with Google's ecosystem&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Ideal Use Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scientific research and analysis&lt;/li&gt;
&lt;li&gt;Complex mathematical computations&lt;/li&gt;
&lt;li&gt;Technical documentation&lt;/li&gt;
&lt;li&gt;Data analysis and processing&lt;/li&gt;
&lt;li&gt;Integration with Google services&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Complementary Architecture: Building a Multi-Agent System
&lt;/h2&gt;

&lt;p&gt;Instead of choosing one model, let's design a system where each model handles what they do best. Here's how we can create a complementary agentic architecture:&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation: Python-Based Multi-Agent System
&lt;/h2&gt;

&lt;p&gt;Let's build a practical example that demonstrates how these models can work together. We'll create a research assistant that leverages all three models.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;enum&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Enum&lt;/span&gt;

&lt;span class="c1"&gt;# Mock API clients - Here we're going to have only Mock API 
# but reader can replace with actual API implementations
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Enum&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;CLAUDE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;GPT4O&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;GEMINI&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ModelType&lt;/span&gt;
    &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MultiAgentResearchAssistant&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CLAUDE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_init_claude_client&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GPT4O&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_init_gpt4o_client&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GEMINI&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_init_gemini_client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_init_claude_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Initialize Claude client
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Claude Sonnet 4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyst&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_init_gpt4o_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Initialize GPT-4o client
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GPT-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;creator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_init_gemini_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Initialize Gemini client
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gemini 2.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;technical_expert&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;research_topic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Orchestrates a comprehensive research process using all three models
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;🔍 Starting research on: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Phase 1: Claude analyzes and breaks down the topic
&lt;/span&gt;        &lt;span class="n"&gt;analysis_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_claude_analyze_topic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Phase 2: Gemini conducts technical research
&lt;/span&gt;        &lt;span class="n"&gt;technical_research&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_gemini_technical_research&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;analysis_task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Phase 3: GPT-4o creates comprehensive content
&lt;/span&gt;        &lt;span class="n"&gt;final_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_gpt4o_synthesize_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;analysis_task&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;technical_research&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Phase 4: Claude reviews and provides final insights
&lt;/span&gt;        &lt;span class="n"&gt;final_review&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_claude_review_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;final_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;analysis_task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;technical_research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;technical_research&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;final_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;review&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;final_review&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_create_summary&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="n"&gt;analysis_task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;technical_research&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;final_content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;final_review&lt;/span&gt;
            &lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_claude_analyze_topic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Claude&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s role: Thoughtful analysis and problem decomposition&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="c1"&gt;# Simulate Claude's analytical approach
&lt;/span&gt;        &lt;span class="n"&gt;analysis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Analysis of &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:

        1. Core Components:
           - Primary research areas to explore
           - Key stakeholders and perspectives
           - Potential challenges and considerations

        2. Research Strategy:
           - Technical aspects requiring deep expertise
           - Creative elements for engaging presentation
           - Ethical considerations and implications

        3. Success Metrics:
           - Accuracy and depth of information
           - Clarity of presentation
           - Practical applicability
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CLAUDE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.92&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reasoning_steps&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;considerations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_gemini_technical_research&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Gemini&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s role: Deep technical research and data analysis&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="c1"&gt;# Simulate Gemini's technical research capabilities
&lt;/span&gt;        &lt;span class="n"&gt;research&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Technical Research for &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:

        📊 Data Analysis:
        - Latest statistical trends and patterns
        - Mathematical models and algorithms
        - Scientific papers and research findings

        🔬 Technical Implementation:
        - Code examples and technical specifications
        - Performance benchmarks and comparisons
        - Integration possibilities and frameworks

        📈 Quantitative Insights:
        - Market data and growth projections
        - Technical performance metrics
        - Scalability considerations
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GEMINI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;technical_research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;research&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.95&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data_points&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sources&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_gpt4o_synthesize_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                                       &lt;span class="n"&gt;research&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;GPT-4o&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s role: Creative synthesis and content generation&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="c1"&gt;# Simulate GPT-4o's content creation capabilities
&lt;/span&gt;        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        # Comprehensive Guide to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

        ## Executive Summary
        Based on our multi-faceted analysis, &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; represents a significant 
        opportunity with both technical and strategic implications.

        ## Key Findings
        - Strategic insights from analytical review
        - Technical breakthroughs from research data
        - Implementation roadmap for practical application

        ## Creative Applications
        - Innovative use cases and scenarios
        - Engaging examples and case studies
        - Visual concepts and presentation ideas

        ## Actionable Recommendations
        1. Immediate next steps
        2. Long-term strategic planning
        3. Risk mitigation strategies
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GPT4O&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content_synthesis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.89&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sections&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recommendations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_claude_review_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Claude&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s role: Final review and quality assurance&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;review&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Quality Review:

        ✅ Strengths:
        - Comprehensive coverage of key topics
        - Well-structured and logical flow
        - Balanced technical and strategic perspectives

        🔧 Recommendations:
        - Consider adding more specific examples
        - Strengthen the conclusion with actionable insights
        - Ensure accessibility for diverse audiences

        📋 Final Assessment:
        Content meets high standards for accuracy, clarity, and usefulness.
        Ready for publication with minor enhancements.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CLAUDE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quality_review&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;review&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.94&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;review_criteria&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;passed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_create_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Create a summary of the collaborative process&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        🤝 Collaborative Research Summary:

        Models Involved: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
        Total Tasks: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
        Average Confidence: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

        Process Flow:
        1. Claude provided analytical framework and strategic thinking
        2. Gemini delivered technical depth and data-driven insights
        3. GPT-4o synthesized information into engaging, actionable content
        4. Claude conducted final quality review and validation

        This complementary approach leveraged each model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s unique strengths
        to produce a more comprehensive and valuable outcome.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="c1"&gt;# Advanced Use Case: Code Review Pipeline
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CodeReviewPipeline&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assistant&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MultiAgentResearchAssistant&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;review_code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        Multi-model code review process
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="c1"&gt;# Claude: Logical analysis and architecture review
&lt;/span&gt;        &lt;span class="n"&gt;claude_review&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_claude_code_analysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Gemini: Technical optimization and performance analysis
&lt;/span&gt;        &lt;span class="n"&gt;gemini_review&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_gemini_performance_analysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# GPT-4o: Documentation and improvement suggestions
&lt;/span&gt;        &lt;span class="n"&gt;gpt4o_review&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_gpt4o_documentation_review&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;logical_analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;claude_review&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;performance_analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;gemini_review&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documentation_review&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;gpt4o_review&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;combined_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_calculate_combined_score&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="n"&gt;claude_review&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gemini_review&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gpt4o_review&lt;/span&gt;
            &lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_claude_code_analysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Claude analyzes code logic and architecture&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CLAUDE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code_logic_analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Logical structure is sound with clear separation of concerns...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.91&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;issues_found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;suggestions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_gemini_performance_analysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Gemini analyzes performance and optimization opportunities&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GEMINI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;performance_analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Performance bottlenecks identified in data processing loops...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.88&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;optimizations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;complexity_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;7.2&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_gpt4o_documentation_review&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;GPT-4o reviews documentation and suggests improvements&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GPT4O&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documentation_review&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Documentation coverage is 73% with opportunities for improvement...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.85&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doc_coverage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.73&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;improvement_areas&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_calculate_combined_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Calculate a weighted combined score&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code_logic_analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;performance_analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.35&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documentation_review&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="n"&gt;total_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;weight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.33&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;total_score&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;weight&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;total_score&lt;/span&gt;

&lt;span class="c1"&gt;# Usage Example
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# Initialize the multi-agent system
&lt;/span&gt;    &lt;span class="n"&gt;research_assistant&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MultiAgentResearchAssistant&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;code_reviewer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CodeReviewPipeline&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Example 1: Research a complex topic
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=== Research Assistant Example ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;research_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;research_assistant&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;research_topic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Implementing Microservices Architecture with Event-Driven Design&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research completed with &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;research_result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; phases&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;research_result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# Example 2: Code review process
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;=== Code Review Example ===&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;sample_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    def process_data(data_list):
        result = []
        for item in data_list:
            if item &amp;gt; 0:
                result.append(item * 2)
        return result
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;review_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;code_reviewer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;review_code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Code review completed with combined score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;review_result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;combined_score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Real-World Applications and Benefits
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Content Creation Pipeline
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude&lt;/strong&gt;: Analyzes the audience and creates a content strategy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini&lt;/strong&gt;: Researches technical accuracy and data validation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-4o&lt;/strong&gt;: Generates engaging, well-formatted content&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Software Development
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude&lt;/strong&gt;: Architectural decisions and code logic review&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini&lt;/strong&gt;: Performance optimization and technical implementation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-4o&lt;/strong&gt;: Documentation, testing strategies, and user interface design&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Research and Analysis
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude&lt;/strong&gt;: Problem decomposition and critical thinking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini&lt;/strong&gt;: Data analysis and scientific methodology&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-4o&lt;/strong&gt;: Report writing and presentation creation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Implementation Best Practices
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Task Orchestration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TaskOrchestrator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;task_queue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_capabilities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CLAUDE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reasoning&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;review&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GEMINI&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;technical&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mathematical&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GPT4O&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;creative&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;synthesis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;formatting&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;assign_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Intelligently assign tasks based on model strengths&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capabilities&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_capabilities&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;task_type&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;capabilities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GPT4O&lt;/span&gt;  &lt;span class="c1"&gt;# Default fallback
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Quality Assurance
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;QualityAssurance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nd"&gt;@staticmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate_results&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Validate results across multiple models&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;avg_confidence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;avg_confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;

    &lt;span class="nd"&gt;@staticmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;consensus_check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;TaskResult&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Check if models agree on key points&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="c1"&gt;# Implementation would compare semantic similarity
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;  &lt;span class="c1"&gt;# Simplified for example
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Cost Optimization
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CostOptimizer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_costs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CLAUDE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.015&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# per 1k tokens
&lt;/span&gt;            &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GEMINI&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.012&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GPT4O&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.018&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;optimize_task_assignment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ModelType&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Assign tasks to minimize cost while maximizing quality&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;assignments&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Logic to assign based on cost-effectiveness
&lt;/span&gt;            &lt;span class="n"&gt;assignments&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_best_model_for_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;assignments&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Future of Complementary AI
&lt;/h2&gt;

&lt;p&gt;As AI models continue to evolve, the concept of complementary architectures becomes even more powerful. We're moving toward a future where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Specialized Models&lt;/strong&gt;: Each model excels in specific domains&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intelligent Orchestration&lt;/strong&gt;: Systems automatically choose the best model for each task&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Continuous Learning&lt;/strong&gt;: Models learn from each other's outputs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Seamless Integration&lt;/strong&gt;: Users don't need to know which model is handling their request&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The future of AI isn't about one model dominating all others — it's about creating intelligent systems that leverage the unique strengths of each model. By building complementary architectures with Claude Sonnet 4, GPT-4o, and Gemini 2.5, we can create more robust, accurate, and efficient AI solutions.&lt;/p&gt;

&lt;p&gt;The examples and code provided in this article demonstrate practical approaches to implementing these complementary systems. As you build your own multi-agent architectures, remember that the goal isn't to replace human intelligence but to augment it with the best that each AI model has to offer.&lt;/p&gt;

&lt;p&gt;Start small, experiment with different task assignments, and gradually build more sophisticated orchestration systems. The complementary approach not only provides better results but also creates more resilient and adaptable AI solutions for the future.&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
    </item>
    <item>
      <title>Code Less, Prompt Better: Unlocking Python's Built-in LLM Enhancers</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Fri, 16 May 2025 22:07:23 +0000</pubDate>
      <link>https://forem.com/angu10/code-less-prompt-better-unlocking-pythons-built-in-llm-enhancers-3hdd</link>
      <guid>https://forem.com/angu10/code-less-prompt-better-unlocking-pythons-built-in-llm-enhancers-3hdd</guid>
      <description>&lt;p&gt;In the rapidly evolving landscape of Large Language Models (LLMs), effective prompt engineering has become a crucial skill. While much attention is given to the art of crafting effective prompts, less focus has been placed on how to efficiently manage these prompts programmatically. Python, with its rich set of built-in features, offers powerful tools to dynamically construct, optimize, and manage LLM prompts.&lt;br&gt;
This article explores how Python's built-in features can transform your approach to LLM prompt engineering, making your code more efficient, maintainable, and powerful.&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;1. Using locals() for Dynamic Context Injection&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;&lt;br&gt;
When working with LLMs, we often need to inject contextual information into our prompts. The traditional approach involves manual string formatting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def generate_response(user_name, user_query, previous_context):
    prompt = f"""
    User name: {user_name}
    User query: {user_query}
    Previous context: {previous_context}

    Please respond to the user's query considering the context above.
    """

    return call_llm_api(prompt)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works well for simple cases, but becomes unwieldy as the number of variables increases. It's also error-prone – you might forget to include a variable or update a variable name.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Solution with locals()&lt;/strong&gt;&lt;br&gt;
Python's locals() function returns a dictionary containing all local variables in the current scope. We can leverage this to automatically include all relevant context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def generate_response(user_name, user_query, previous_context, user_preferences=None, user_history=None):
    # All local variables are now accessible
    context_dict = locals()

    # Build a dynamic prompt section with all available context
    context_sections = []
    for key, value in context_dict.items():
        if value is not None:  # Only include non-None values
            context_sections.append(f"{key}: {value}")

    context_text = "\n".join(context_sections)

    prompt = f"""
    Context information:
    {context_text}

    Please respond to the user's query considering the context above.
    """

    return call_llm_api(prompt)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Automatic variable inclusion: If you add a new parameter to your function, it's automatically included in the context.&lt;br&gt;
Reduced errors: No need to manually update string formatting when variables change.&lt;br&gt;
Cleaner code: Separates the mechanism of context injection from the specific variables.&lt;/p&gt;
&lt;h2&gt;
  
  
  2. Using inspect for Function Documentation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;&lt;br&gt;
When creating LLM prompts that involve function execution or code generation, providing accurate function documentation is crucial:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def create_function_prompt(func_name, params):
    prompt = f"""
    Create a Python function named '{func_name}' with the following parameters:
    {params}
    """
    return prompt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach requires manually specifying function details, which can be tedious and error-prone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Solution with inspect&lt;/strong&gt;&lt;br&gt;
Python's inspect module allows us to extract rich metadata from functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import inspect

def create_function_prompt(func_reference):
    # Get the function signature
    signature = inspect.signature(func_reference)

    # Get the function docstring
    doc = inspect.getdoc(func_reference) or "No documentation available"

    # Get source code if available
    try:
        source = inspect.getsource(func_reference)
    except:
        source = "Source code not available"

    prompt = f"""
    Function name: {func_reference.__name__}

    Signature: {signature}

    Documentation:
    {doc}

    Original source code:
    {source}

    Please create an improved version of this function.
    """

    return prompt

# Example usage
def example_func(a, b=10):
    """This function adds two numbers together."""
    return a + b

improved_function_prompt = create_function_prompt(example_func)
# Send to LLM for improvement
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This dynamically extracts all relevant information about the function, making the prompt much more informative.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Context Management with Class Attributes
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;&lt;br&gt;
Managing conversation history and context with LLMs often leads to repetitive code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;conversation_history = []

def chat_with_llm(user_input):
    # Manually build the prompt with history
    prompt = "Previous conversation:\n"
    for entry in conversation_history:
        prompt += f"{entry['role']}: {entry['content']}\n"

    prompt += f"User: {user_input}\n"
    prompt += "Assistant: "

    response = call_llm_api(prompt)

    # Update history
    conversation_history.append({"role": "User", "content": user_input})
    conversation_history.append({"role": "Assistant", "content": response})

    return response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The Solution with Class Attributes and &lt;strong&gt;dict&lt;/strong&gt;&lt;/strong&gt;&lt;br&gt;
We can create a conversation manager class that uses Python's object attributes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class ConversationManager:
    def __init__(self, system_prompt=None, max_history=10):
        self.history = []
        self.system_prompt = system_prompt
        self.max_history = max_history
        self.user_info = {}
        self.conversation_attributes = {
            "tone": "helpful",
            "style": "concise",
            "knowledge_level": "expert"
        }

    def add_user_info(self, **kwargs):
        """Add user-specific information to the conversation context."""
        self.user_info.update(kwargs)

    def set_attribute(self, key, value):
        """Set a conversation attribute."""
        self.conversation_attributes[key] = value

    def build_prompt(self, user_input):
        """Build a complete prompt using object attributes."""
        prompt_parts = []

        # Add system prompt if available
        if self.system_prompt:
            prompt_parts.append(f"System: {self.system_prompt}")

        # Add conversation attributes
        prompt_parts.append("Conversation attributes:")
        for key, value in self.conversation_attributes.items():
            prompt_parts.append(f"- {key}: {value}")

        # Add user info if available
        if self.user_info:
            prompt_parts.append("\nUser information:")
            for key, value in self.user_info.items():
                prompt_parts.append(f"- {key}: {value}")

        # Add conversation history
        if self.history:
            prompt_parts.append("\nConversation history:")
            for entry in self.history[-self.max_history:]:
                prompt_parts.append(f"{entry['role']}: {entry['content']}")

        # Add current user input
        prompt_parts.append(f"\nUser: {user_input}")
        prompt_parts.append("Assistant:")

        return "\n".join(prompt_parts)

    def chat(self, user_input):
        """Process a user message and get response from LLM."""
        prompt = self.build_prompt(user_input)

        response = call_llm_api(prompt)

        # Update history
        self.history.append({"role": "User", "content": user_input})
        self.history.append({"role": "Assistant", "content": response})

        return response

    def get_state_as_dict(self):
        """Return a dictionary of the conversation state using __dict__."""
        return self.__dict__

    def save_state(self, filename):
        """Save the conversation state to a file."""
        import json
        with open(filename, 'w') as f:
            json.dump(self.get_state_as_dict(), f)

    def load_state(self, filename):
        """Load the conversation state from a file."""
        import json
        with open(filename, 'r') as f:
            state = json.load(f)
            self.__dict__.update(state)```



Using this approach:

# Create a conversation manager
convo = ConversationManager(system_prompt="You are a helpful assistant.")

# Add user information
convo.add_user_info(name="John", expertise="beginner", interests=["Python", "AI"])

# Set conversation attributes
convo.set_attribute("tone", "friendly")

# Chat with the LLM
response = convo.chat("Can you help me understand how Python dictionaries work?")
print(response)

# Later, save the conversation state
convo.save_state("conversation_backup.json")

# And load it back
new_convo = ConversationManager()
new_convo.load_state("conversation_backup.json")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  &lt;strong&gt;4. Using dir() for Object Exploration&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;&lt;br&gt;
When working with complex objects or APIs, it can be challenging to know what data is available to include in prompts:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

def generate_data_analysis_prompt(dataset):
    # Manually specifying what we think is available
    prompt = f"""
    Dataset name: {dataset.name}
    Number of rows: {len(dataset)}

    Please analyze this dataset.
    """
    return prompt

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The Solution with dir()&lt;/strong&gt;&lt;br&gt;
Python's dir() function lets us dynamically discover object attributes and methods:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
def generate_data_analysis_prompt(dataset):
    # Discover available attributes
    attributes = dir(dataset)

    # Filter out private attributes (those starting with _)
    public_attrs = [attr for attr in attributes if not attr.startswith('_')]

    # Build metadata section
    metadata = []
    for attr in public_attrs:
        try:
            value = getattr(dataset, attr)
            # Only include non-method attributes with simple values
            if not callable(value) and not hasattr(value, '__dict__'):
                metadata.append(f"{attr}: {value}")
        except:
            pass  # Skip attributes that can't be accessed

    metadata_text = "\n".join(metadata)

    prompt = f"""
    Dataset metadata:
    {metadata_text}

    Please analyze this dataset based on the metadata above.
    """

    return prompt


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This approach automatically discovers and includes relevant metadata without requiring us to know the exact structure of the dataset object in advance.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;5. String Manipulation for Prompt Cleaning&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;&lt;br&gt;
User inputs and other text data often contain formatting issues that can affect LLM performance:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

def process_document(document_text):
    prompt = f"""
    Document:
    {document_text}

    Please summarize the key points from this document.
    """
    return call_llm_api(prompt)


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The Solution with String Methods&lt;/strong&gt;&lt;br&gt;
Python's rich set of string manipulation methods can clean and normalize text:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

def process_document(document_text):
    # Remove excessive whitespace
    cleaned_text = ' '.join(document_text.split())

    # Normalize line breaks
    cleaned_text = cleaned_text.replace('\r\n', '\n').replace('\r', '\n')

    # Limit length (many LLMs have token limits)
    max_chars = 5000
    if len(cleaned_text) &amp;gt; max_chars:
        cleaned_text = cleaned_text[:max_chars] + "... [truncated]"

    # Replace problematic characters
    for char, replacement in [('\u2018', "'"), ('\u2019', "'"), ('\u201c', '"'), ('\u201d', '"')]:
        cleaned_text = cleaned_text.replace(char, replacement)

    prompt = f"""
    Document:
    {cleaned_text}

    Please summarize the key points from this document.
    """

    return call_llm_api(prompt)


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Python's built-in features offer powerful capabilities for enhancing LLM prompts:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dynamic Context:&lt;/strong&gt; Using locals() and &lt;strong&gt;dict&lt;/strong&gt; to automatically include relevant variables&lt;br&gt;
&lt;strong&gt;Introspection:&lt;/strong&gt; Using inspect and dir() to extract rich metadata from objects and functions&lt;br&gt;
&lt;strong&gt;String Manipulation:&lt;/strong&gt; Using Python's string methods to clean and normalize text&lt;/p&gt;

&lt;p&gt;By leveraging these built-in features, you can create more robust, maintainable, and dynamic LLM interactions. The techniques in this article can help you move beyond static prompt templates to create truly adaptive and context-aware LLM applications.&lt;br&gt;
Most importantly, these approaches scale well as your LLM applications become more complex, allowing you to maintain clean, readable code while supporting sophisticated prompt engineering techniques.&lt;br&gt;
Whether you're building a simple chatbot or a complex AI assistant, Python's built-in features can help you create more effective LLM interactions with less code and fewer errors.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>python</category>
      <category>programming</category>
    </item>
    <item>
      <title>AI in the Clinical Arena: Llama 4 Scout vs Claude 3.7 Statistical Showdown</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Fri, 11 Apr 2025 06:04:22 +0000</pubDate>
      <link>https://forem.com/angu10/ai-in-the-clinical-arena-llama-4-scout-vs-claude-37-statistical-showdown-2m43</link>
      <guid>https://forem.com/angu10/ai-in-the-clinical-arena-llama-4-scout-vs-claude-37-statistical-showdown-2m43</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;As artificial intelligence advances, there is growing interest in evaluating how different AI models perform in specialized domains like clinical trial statistics. This article compares two state-of-the-art large language models — Llama 4 Scout Reasoning and Claude 3.7 — on their ability to solve common statistical problems in clinical trials. It’s important to emphasize that this study examines only a limited set of three clinical trial problems and should not be interpreted as a comprehensive assessment of these models’ overall capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Llama 4 Scout Instruct Model
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fce1akyftlyw2brixi709.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fce1akyftlyw2brixi709.png" alt="Image description" width="800" height="326"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F73vwb2fqhx2ryuehf2wz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F73vwb2fqhx2ryuehf2wz.png" alt="Image description" width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbmi5le9dn9vslq0l3hov.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbmi5le9dn9vslq0l3hov.png" alt="Image description" width="800" height="326"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude 3.7
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flgr8ystkgqldyvd68f0k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flgr8ystkgqldyvd68f0k.png" alt="Image description" width="800" height="1065"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F08rsnioau33t3dmg3a1b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F08rsnioau33t3dmg3a1b.png" alt="Image description" width="786" height="828"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiac30d0svr17bjqmpr0i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiac30d0svr17bjqmpr0i.png" alt="Image description" width="796" height="980"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem Selection
&lt;/h2&gt;

&lt;p&gt;Three foundational clinical trial statistical problems were selected to evaluate the models:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Treatment Effect Analysis&lt;/strong&gt;: Calculating response rates, absolute risk reduction (ARR), and number needed to treat (NNT) in a cancer treatment study comparing experimental and control arms&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Non-inferiority Trial Design&lt;/strong&gt;: Determining the minimum cure rate required for a new antibiotic to be considered non-inferior to the standard of care&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Interim Analysis Decision-Making&lt;/strong&gt;: Applying O’Brien-Fleming boundaries to decide whether to stop a trial early based on interim results&lt;/p&gt;

&lt;h2&gt;
  
  
  Evaluation Criteria
&lt;/h2&gt;

&lt;p&gt;The outputs from both models were compared across several dimensions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mathematical accuracy&lt;/li&gt;
&lt;li&gt;Statistical reasoning approach&lt;/li&gt;
&lt;li&gt;Clarity of explanation&lt;/li&gt;
&lt;li&gt;Contextual understanding&lt;/li&gt;
&lt;li&gt;Presentation format&lt;/li&gt;
&lt;li&gt;Result interpretation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Detailed Findings
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Mathematical Precision
&lt;/h2&gt;

&lt;p&gt;Both models demonstrated excellent mathematical precision, arriving at identical numerical answers for all three problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In Problem 1, both correctly calculated the response rates (55.6% vs 44.4%), ARR (11.2%), and NNT (9)&lt;/li&gt;
&lt;li&gt;In Problem 2, both determined the minimum acceptable cure rate to be 70%&lt;/li&gt;
&lt;li&gt;In Problem 3, both correctly concluded that the trial should not be stopped based on the interim analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Approach to Statistical Reasoning
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Llama 4 Scout Reasoning
&lt;/h2&gt;

&lt;p&gt;Llama 4 Scout approached problems with a highly structured methodology:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Began by clearly organizing given information in bulleted lists&lt;/li&gt;
&lt;li&gt;Used explicit section headings to demarcate reasoning steps&lt;/li&gt;
&lt;li&gt;Provided direct formula applications with clear variable substitutions&lt;/li&gt;
&lt;li&gt;Included practical interpretations of the final statistical outcomes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Claude 3.7
&lt;/h2&gt;

&lt;p&gt;Claude 3.7 demonstrated a more narrative reasoning style:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Used numbered steps with detailed explanations before formula application&lt;/li&gt;
&lt;li&gt;Provided more extensive context about the statistical principles being applied&lt;/li&gt;
&lt;li&gt;Explained the reasoning behind formula selection&lt;/li&gt;
&lt;li&gt;Included additional interpretation of why certain approaches were appropriate&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Explanation Quality
&lt;/h2&gt;

&lt;p&gt;The models differed somewhat in their explanatory approaches:&lt;/p&gt;

&lt;h2&gt;
  
  
  Llama 4 Scout Reasoning
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Provided concise but complete explanations&lt;/li&gt;
&lt;li&gt;Excellent at clarifying practical implications (e.g., “This means we would need to treat 9 patients with the experimental treatment instead of the control treatment to achieve one additional response”)&lt;/li&gt;
&lt;li&gt;Included additional context about threshold interpretations&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Explicit about Type I error control in the interim analysis problem&lt;/p&gt;
&lt;h2&gt;
  
  
  Claude 3.7
&lt;/h2&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Offered more detailed contextual explanations of statistical concepts&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provided more extensive rationale for calculation approaches&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Explained why certain statistical formulas were appropriate in each context&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Included more discussion of underlying statistical principles (e.g., family-wise error rate)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Visual Presentation
&lt;/h2&gt;

&lt;p&gt;The models showed distinct presentation styles:&lt;/p&gt;

&lt;h2&gt;
  
  
  Llama 4 Scout Reasoning
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;More visually structured with effective use of bulleted lists&lt;/li&gt;
&lt;li&gt;Clear section labeling and organization&lt;/li&gt;
&lt;li&gt;Highlighted key information more prominently&lt;/li&gt;
&lt;li&gt;More consistent formatting throughout responses&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Claude 3.7
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;More narrative in style with detailed steps&lt;/li&gt;
&lt;li&gt;Used numbered steps with explanatory headings&lt;/li&gt;
&lt;li&gt;Integrated formulas within explanatory text&lt;/li&gt;
&lt;li&gt;Maintained professional presentation but with less visual structuring&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Discussion
&lt;/h2&gt;

&lt;p&gt;Complementary Strengths&lt;br&gt;
This comparative analysis revealed that both models possess strong capabilities in clinical trial statistical reasoning but with slightly different emphases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Llama 4 Scout Reasoning&lt;/strong&gt; excelled in structural organization, practical interpretation, and concise presentation of statistical concepts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude 3.7&lt;/strong&gt; demonstrated particular strength in contextual explanation, statistical rationale, and discussion of underlying principles.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Practical Implications
&lt;/h2&gt;

&lt;p&gt;For professionals working in clinical trial statistics, the choice between these models might depend on specific needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For quick statistical calculations with a clear structure: Llama 4 Scout Reasoning’s concise and well-organized approach might be preferred.&lt;/li&gt;
&lt;li&gt;Claude 3.7’s more detailed explanations might offer more excellent educational value for a deeper understanding of statistical principles.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, the mathematical precision demonstrated by both models suggests that either would be reliable for calculating vital clinical trial statistics.&lt;/p&gt;

&lt;p&gt;Comparing Llama 4 Scout Reasoning vs Claude 3.7 in Clinical Trial Problem Solving&lt;/p&gt;

&lt;h1&gt;
  
  
  Comparing Llama 4 Scout Reasoning vs Claude 3.5 in Clinical Trial Problem Solving
&lt;/h1&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Llama 4 Scout Reasoning&lt;/th&gt;
&lt;th&gt;Claude 3.5&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Information Organization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Used bulleted lists to highlight given information&lt;/td&gt;
&lt;td&gt;Presented information in numbered steps with headings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Structure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Clear section headings with step numbers&lt;/td&gt;
&lt;td&gt;Numbered steps with detailed headings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Problem 1: Response Rates&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Experimental arm rate&lt;/td&gt;
&lt;td&gt;55.6%&lt;/td&gt;
&lt;td&gt;55.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Control arm rate&lt;/td&gt;
&lt;td&gt;44.4%&lt;/td&gt;
&lt;td&gt;44.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ARR calculation&lt;/td&gt;
&lt;td&gt;0.556 - 0.444 = 0.112 = 11.2%&lt;/td&gt;
&lt;td&gt;55.6% - 44.4% = 11.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NNT calculation&lt;/td&gt;
&lt;td&gt;1/0.112 = 8.93 → 9&lt;/td&gt;
&lt;td&gt;1/0.112 = 9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Additional context&lt;/td&gt;
&lt;td&gt;Explained meaning of NNT (9 patients for one additional response)&lt;/td&gt;
&lt;td&gt;Provided explanation of why ARR used experimental minus control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Problem 2: Non-inferiority Trial&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Concept explanation&lt;/td&gt;
&lt;td&gt;Concise explanation of non-inferiority concept&lt;/td&gt;
&lt;td&gt;More detailed explanation of non-inferiority margin&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Calculation approach&lt;/td&gt;
&lt;td&gt;Direct formula application&lt;/td&gt;
&lt;td&gt;Step-by-step explanation with formula justification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Final answer&lt;/td&gt;
&lt;td&gt;70%&lt;/td&gt;
&lt;td&gt;70%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Additional explanation&lt;/td&gt;
&lt;td&gt;Added what happens if cure rate is below/above threshold&lt;/td&gt;
&lt;td&gt;Included context about the meaning of non-inferiority margin&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Problem 3: O'Brien-Fleming Boundaries&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Decision framework&lt;/td&gt;
&lt;td&gt;Clear comparison of p-value to boundary&lt;/td&gt;
&lt;td&gt;Detailed explanation of boundary concept&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Decision logic&lt;/td&gt;
&lt;td&gt;p-value (0.01) &amp;gt; boundary (0.0001) → don't stop&lt;/td&gt;
&lt;td&gt;Same conclusion with more contextual explanation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Additional explanation&lt;/td&gt;
&lt;td&gt;Included explanation of Type I error control&lt;/td&gt;
&lt;td&gt;Discussed family-wise error rate control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Overall Characteristics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Formatting style&lt;/td&gt;
&lt;td&gt;More visually structured with bulleted lists&lt;/td&gt;
&lt;td&gt;More narrative with detailed steps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mathematical accuracy&lt;/td&gt;
&lt;td&gt;Identical answers across all problems&lt;/td&gt;
&lt;td&gt;Identical answers across all problems&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Result interpretation&lt;/td&gt;
&lt;td&gt;More explicit interpretation of final results&lt;/td&gt;
&lt;td&gt;More context on the statistical principles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Explanation depth&lt;/td&gt;
&lt;td&gt;Concise but complete&lt;/td&gt;
&lt;td&gt;More detailed statistical context&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This limited comparison suggests that Llama 4 Scout Reasoning and Claude 3.7 demonstrate strong capabilities in solving clinical trial statistical problems. However, Llama 4 Scout is open-source, and you can fine-tune it with your data, which will be more powerful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It’s worth emphasizing that this analysis is based on only three specific problems and should not be extrapolated to represent overall model capabilities across the broad and complex domain of clinical trial statistics. A more comprehensive evaluation would require testing across a broader range of problem types, complexity levels, and specialized statistical methods used in clinical trials.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>llama</category>
      <category>ai</category>
      <category>clinical</category>
      <category>llm</category>
    </item>
    <item>
      <title>Document Whisperer: Llama-4-Scout and the Future of Intelligent Content Extraction</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Sun, 06 Apr 2025 05:28:55 +0000</pubDate>
      <link>https://forem.com/angu10/document-whisperer-llama-4-scout-and-the-future-of-intelligent-content-extraction-1g0k</link>
      <guid>https://forem.com/angu10/document-whisperer-llama-4-scout-and-the-future-of-intelligent-content-extraction-1g0k</guid>
      <description>&lt;p&gt;In today's data-driven world, the ability to quickly extract insights from documents is becoming increasingly valuable. Whether you're a researcher sifting through academic papers, a legal professional analyzing contracts, or a business analyst reviewing reports, the challenge of efficiently navigating through lengthy documents remains universal.&lt;/p&gt;

&lt;p&gt;Enter Llama-4-Scout, Meta's latest language model that's showing remarkable capabilities in document question-answering tasks. Released on April 5, 2025, Scout is a groundbreaking addition to the new Llama 4 family, specifically designed to be more reliable, precise, and effective at finding and extracting information from documents.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Makes Llama-4-Scout Special?
&lt;/h2&gt;

&lt;p&gt;Llama-4-Scout utilizes a Mixture-of-Experts (MoE) architecture with 17 billion active parameters (out of 109 billion total parameters across 16 experts) that strikes an optimal balance between performance and efficiency. What truly sets Scout apart is its unprecedented 10-million token context window, making it best in class for long-context tasks like summarizing huge codebases and analyzing extensive documents.&lt;/p&gt;

&lt;p&gt;Scout was pretrained on approximately 40 trillion tokens of multimodal data and excels at:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Document comprehension&lt;/strong&gt;: Understanding context across extremely long documents&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Information retrieval&lt;/strong&gt;: Precisely locating specific details in complex content&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Source attribution&lt;/strong&gt;: Providing clear references to where information was found&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multimodal reasoning&lt;/strong&gt;: Visual recognition, image reasoning, captioning, and answering questions about document visuals&lt;/p&gt;

&lt;p&gt;This focused design makes Scout particularly well-suited for professional document analysis tasks that require both accuracy and reasonable computational resources. Remarkably, the model fits on a single H100 GPU when quantized to Int4, making it accessible for organizations without massive computing infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Example: Document Q&amp;amp;A on the AGENTS Framework
&lt;/h2&gt;

&lt;p&gt;I recently had the opportunity to test Llama-4-Scout's capabilities by building a simple PDF Q&amp;amp;A application with a research paper on the AGENTS framework (a system for autonomous language agents). The results demonstrated Scout's remarkable ability to extract, understand, and contextualize information from technical documents. Here's what I observed:&lt;/p&gt;

&lt;h2&gt;
  
  
  Source Paper: &lt;a href="https://arxiv.org/pdf/2309.07870" rel="noopener noreferrer"&gt;https://arxiv.org/pdf/2309.07870&lt;/a&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Example 1: Identifying Core Framework Components
&lt;/h2&gt;

&lt;p&gt;When asked about the main features of the framework described in the document, Scout successfully extracted and summarized the key components:&lt;/p&gt;

&lt;p&gt;Query: "What are the main features of the AGENTS framework?"&lt;/p&gt;

&lt;p&gt;Scout's Response: "The main features of the AGENTS framework include planning, memory (specifically long-short term memory implementation), tool usage capabilities, multi-agent communication protocols, and fine-grained symbolic control through SOPs (Standard Operating Procedures). Source: Page 4, Framework Architecture section"&lt;/p&gt;

&lt;h2&gt;
  
  
  Example 2: Understanding Technical Implementation Details
&lt;/h2&gt;

&lt;p&gt;Scout excelled at extracting specific technical implementation details related to memory systems:&lt;/p&gt;

&lt;p&gt;Query: "How does long-short term memory work in AGENTS?"&lt;/p&gt;

&lt;p&gt;Scout's Response: "The AGENTS framework implements a dual-memory system where long-term memories are constructed as action histories embedded by sentence-transformers and stored in VectorDB for semantic search retrieval. In contrast, short-term memories (working memories) are maintained in natural language form and dynamically updated by an LLM through a carefully tuned prompt system. Source: Pages 7-8, Memory Architecture section"&lt;/p&gt;

&lt;h2&gt;
  
  
  Example 3: Explaining Human-Agent Interaction Mechanisms
&lt;/h2&gt;

&lt;p&gt;The model demonstrated its ability to identify specific configuration details that enable key functionalities:&lt;/p&gt;

&lt;p&gt;Query: "How does the AGENTS framework support human-agent interaction?"&lt;/p&gt;

&lt;p&gt;Scout's Response: "The AGENTS framework supports human-agent interaction through a simple but effective configuration mechanism. Users can change an agent's 'is_human' field to 'True' in the config file, which enables humans to take the role of an agent and interact with other language agents in both single-agent and multi-agent scenarios. Source: Page 12, Human-Agent Interaction section"&lt;/p&gt;

&lt;h2&gt;
  
  
  Example 4: Clarifying Domain-Specific Terminology
&lt;/h2&gt;

&lt;p&gt;When asked about specialized terminology in the document, Scout provided clear and accurate explanations:&lt;/p&gt;

&lt;p&gt;Query: "What is SOP in the AGENTS framework?"&lt;/p&gt;

&lt;p&gt;Scout's Response: "In the AGENTS framework, SOP (Standard Operating Procedures) is a symbolic plan that provides fine-grained control of agents' behaviors. Technically, it's implemented as a graph of multiple states that define different situations an agent may encounter, with explicit transition rules between these states. This architecture makes agents' behavior more stable and predictable across various scenarios. Source: Page 9, Agent Control Mechanisms section"&lt;/p&gt;

&lt;h2&gt;
  
  
  Example 5: Identifying Ecosystem Components
&lt;/h2&gt;

&lt;p&gt;Scout successfully extracted information about the broader ecosystem supporting the framework:&lt;/p&gt;

&lt;p&gt;Query: "What is the Agent Hub in the AGENTS framework?"&lt;/p&gt;

&lt;p&gt;Scout's Response: "The Agent Hub is a platform component of the AGENTS framework that enables users to share their fine-tuned language agents and search/download useful language agents created by others. This significantly reduces the effort required for designing, testing, and tuning language agents from scratch, creating a collaborative ecosystem for agent development. Source: Page 15, Ecosystem and Tools section"&lt;/p&gt;

&lt;h2&gt;
  
  
  Building Your Own Document Intelligence Tool
&lt;/h2&gt;

&lt;p&gt;The ability to build custom applications around Llama-4-Scout is straightforward through providers like Together.ai. I created a simple Streamlit application that allows users to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Upload any PDF document&lt;/li&gt;
&lt;li&gt;Process both the text and visual content&lt;/li&gt;
&lt;li&gt;Ask questions in natural language&lt;/li&gt;
&lt;li&gt;Receive answers with source references&lt;/li&gt;
&lt;li&gt;Evaluate answer quality with multiple criteria&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This type of application can transform how teams work with documents, significantly reducing the time spent searching for information and improving the accuracy of insights extracted. Scout's ability to process images in documents means it can work with charts, tables, diagrams, and other visual elements that are often critical for full document understanding and is "optimized for visual recognition, image reasoning, captioning, and answering general questions about an image."&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Capabilities and Performance
&lt;/h2&gt;

&lt;p&gt;Llama-4-Scout demonstrates impressive performance relative to competing models. In comparative evaluations, Scout has shown "superior performance relative to contemporary models such as Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 across recognized benchmark datasets."&lt;/p&gt;

&lt;p&gt;What makes Scout particularly practical is its efficiency. Scout "fits on a single H100 GPU when quantized to Int4" while still delivering high-quality results. This efficiency means organizations can implement advanced document intelligence without requiring massive computational resources.&lt;/p&gt;

&lt;h2&gt;
  
  
  Looking Ahead: The Future of Document Intelligence
&lt;/h2&gt;

&lt;p&gt;As models like Llama-4-Scout continue to evolve, we can expect even more sophisticated document intelligence capabilities. Future developments will likely include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deeper reasoning across multiple documents&lt;/li&gt;
&lt;li&gt;More nuanced understanding of domain-specific content&lt;/li&gt;
&lt;li&gt;Better handling of ambiguity and uncertain information&lt;/li&gt;
&lt;li&gt;Enhanced multimodal capabilities for complex visual content&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Llama-4-Scout represents a significant step forward in making advanced document intelligence accessible. Its balanced approach to performance and efficiency makes it particularly valuable for professional applications where accuracy and attribution matter.&lt;/p&gt;

&lt;p&gt;For organizations dealing with large volumes of documents, investing in tools built around models like Scout could yield substantial returns through improved information accessibility and insight generation. The model's ability to "process and work with extremely lengthy documents" makes it ideal for enterprises with extensive documentation needs.&lt;/p&gt;

&lt;p&gt;Have you experimented with Llama-4-Scout or similar models for document analysis? I'd love to hear about your experiences and applications in the comments below.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: The examples provided are based on actual testing of Llama-4-Scout through Together.ai's API integration. Results may vary depending on document complexity and specific implementation details.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
    </item>
    <item>
      <title>OpenAI - Gibili Portrait Assistance: AI-Powered Image Generation Made Simple</title>
      <dc:creator>angu10</dc:creator>
      <pubDate>Mon, 31 Mar 2025 17:50:22 +0000</pubDate>
      <link>https://forem.com/angu10/openai-gibili-portrait-assistance-ai-powered-image-generation-made-simple-285e</link>
      <guid>https://forem.com/angu10/openai-gibili-portrait-assistance-ai-powered-image-generation-made-simple-285e</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Ever wished you could create stunning portraits with just a few clicks? Meet Gibili Portrait Assistance, an AI-powered tool that makes generating high-quality portraits effortless. Whether you’re an artist, designer, or simply someone who loves experimenting with AI, Gibili can help bring your ideas to life.&lt;/p&gt;

&lt;p&gt;In this post, we’ll walk you through how to use Gibili Portrait Assistance and explore the OpenAI architecture behind it.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Use Gibili Portrait Assistance
&lt;/h2&gt;

&lt;p&gt;Using Gibili is straightforward and requires no prior technical knowledge. Here’s a simple step-by-step guide:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Enter Your Description or Upload an Image&lt;/strong&gt;&lt;br&gt;
You can either type a text description of the portrait you want or upload an existing image to be enhanced or transformed by AI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Text Prompt Example:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“A realistic portrait of a woman with curly brown hair, wearing a red scarf, in a cinematic lighting style.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Image Upload:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you have an image you want to modify or enhance, simply upload it, and Gibili will apply AI-powered enhancements or transformations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Customize Your Preferences&lt;/strong&gt;&lt;br&gt;
You can fine-tune details such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Art Style&lt;/strong&gt;: Realistic, digital painting, anime, etc.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Background&lt;/strong&gt;: Solid color, blurred, natural scenery.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Facial Expressions&lt;/strong&gt;: Smiling, neutral, surprised.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Additional Features&lt;/strong&gt;: Glasses, hats, jewelry, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Generate the Image&lt;/strong&gt;&lt;br&gt;
Press Enter, and within seconds, Gibili will produce a high-resolution portrait based on your input or uploaded image.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Refine and Download&lt;/strong&gt;&lt;br&gt;
If you want adjustments, you can tweak your input and regenerate until you’re satisfied. Once ready, download your portrait in high-quality format.&lt;/p&gt;

&lt;h2&gt;
  
  
  The OpenAI Architecture Behind Gibili
&lt;/h2&gt;

&lt;p&gt;Gibili Portrait Assistance is powered by OpenAI’s advanced image generation models, leveraging diffusion models to create highly detailed and realistic portraits. Here’s a simplified breakdown:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Text-to-Image &amp;amp; Image-to-Image Generation&lt;/strong&gt;&lt;br&gt;
When you provide a text prompt, the AI model translates it into a visual representation using deep learning techniques. If you upload an image, the model can enhance, transform, or stylize it while maintaining its core structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Fine-Tuned on Portrait Data&lt;/strong&gt;&lt;br&gt;
The model has been trained on a vast dataset of portraits across different styles, ensuring high accuracy and creativity in generated images.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Iterative Refinement&lt;/strong&gt;&lt;br&gt;
Instead of creating the final image instantly, the AI gradually refines it through multiple steps, ensuring greater precision and quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. User-Guided Adjustments&lt;/strong&gt;&lt;br&gt;
Users can modify parameters like style and background, and the model will intelligently adjust the portrait while maintaining coherence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Use Gibili Portrait Assistance?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ✅ Easy to Use
&lt;/h3&gt;

&lt;p&gt;No need for advanced design skills — just describe what you want or upload an image, and AI does the rest.&lt;/p&gt;

&lt;h3&gt;
  
  
  🎨 Customizable Output
&lt;/h3&gt;

&lt;p&gt;From photorealistic portraits to artistic illustrations, you can tailor the results to your liking.&lt;/p&gt;

&lt;h3&gt;
  
  
  🚀 Fast &amp;amp; High-Quality
&lt;/h3&gt;

&lt;p&gt;Generate high-resolution images within seconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  🖌️ Creative Freedom
&lt;/h3&gt;

&lt;p&gt;Perfect for artists, marketers, and content creators looking for unique visuals.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get Started with Gibili Today!
&lt;/h2&gt;

&lt;p&gt;Ready to create amazing AI-generated portraits? Try Gibili Portrait Assistance now and explore the limitless possibilities of AI-powered creativity!&lt;/p&gt;

</description>
      <category>openai</category>
      <category>llm</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
