<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Nishant prakash</title>
    <description>The latest articles on Forem by Nishant prakash (@nishant_prakash_780f5d541).</description>
    <link>https://forem.com/nishant_prakash_780f5d541</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3596122%2Fbf82630b-dedc-4453-8918-a3cc87df539f.jpg</url>
      <title>Forem: Nishant prakash</title>
      <link>https://forem.com/nishant_prakash_780f5d541</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/nishant_prakash_780f5d541"/>
    <language>en</language>
    <item>
      <title>Stop stuffing tools into your agent 😤</title>
      <dc:creator>Nishant prakash</dc:creator>
      <pubDate>Wed, 08 Apr 2026 08:25:21 +0000</pubDate>
      <link>https://forem.com/nishant_prakash_780f5d541/stop-stuffing-tools-into-your-agent-j8h</link>
      <guid>https://forem.com/nishant_prakash_780f5d541/stop-stuffing-tools-into-your-agent-j8h</guid>
      <description>&lt;p&gt;There is a point in almost every agent project where the excitement starts to fade.&lt;/p&gt;

&lt;p&gt;At first, it feels magical. You wire an LLM to a few Python functions, wrap them as tools, and suddenly your assistant can calculate, search, transform, and automate. But then the project grows. A few tools become ten. Ten become thirty. Business logic starts mixing with agent logic. Logging becomes messy. Reuse becomes painful. One agent needs the same tools as another, so you copy code. Then you copy it again. And somewhere in that process, your “smart system” quietly turns into a pile of tightly coupled Python.&lt;/p&gt;

&lt;p&gt;That is exactly where &lt;strong&gt;MCP&lt;/strong&gt; starts to make sense.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F77rtkhfilui9n05ufknf.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F77rtkhfilui9n05ufknf.webp" alt="MCP Meme" width="620" height="497"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt; is an open standard for exposing tools, resources, and prompts to LLM applications in a structured way. The official docs describe it as a standardized way for AI apps to connect to external systems, and even compare it to a “USB-C port for AI applications.” (&lt;a href="https://modelcontextprotocol.io/specification/2025-11-25?utm_source=chatgpt.com" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;And once that clicks, a very important design shift becomes obvious:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your tools do not have to live inside your agent code anymore.&lt;/strong&gt;&lt;br&gt;
You can build them once, run them as an MCP server, and let agents consume them cleanly from the outside. (&lt;a href="https://docs.langchain.com/oss/python/langchain/mcp?utm_source=chatgpt.com" rel="noopener noreferrer"&gt;LangChain Docs&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;That is the idea this post is about.&lt;/p&gt;

&lt;p&gt;I’ll show you how I built a tiny MCP server with math and dice tools, added logging to observe tool calls, and then plugged that server into a LangChain agent exposed via FastAPI. Along the way, the architecture changed from “my agent has tools” to something much cleaner:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;my tools live in their own server, and my agent just uses them.&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Why this matters more than it looks
&lt;/h2&gt;

&lt;p&gt;Imagine a normal backend team.&lt;/p&gt;

&lt;p&gt;One person owns business logic. Another owns APIs. Another owns observability. Now imagine if every API consumer copied that business logic into their own codebase. That would quickly turn into chaos.&lt;/p&gt;

&lt;p&gt;That’s essentially what happens when we embed tools directly inside agent code.&lt;/p&gt;

&lt;p&gt;The issue isn’t that it breaks, it’s that it doesn’t scale.&lt;/p&gt;

&lt;p&gt;With MCP, things get cleaner:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the agent focuses on reasoning&lt;/li&gt;
&lt;li&gt;the tool server handles execution&lt;/li&gt;
&lt;li&gt;logging and latency stay observable&lt;/li&gt;
&lt;li&gt;tools can be reused across agents&lt;/li&gt;
&lt;li&gt;you can evolve tools without touching the agent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That separation is exactly what MCP brings to the table, and libraries like &lt;code&gt;langchain-mcp-adapters&lt;/code&gt; make this integration seamless(&lt;a href="https://docs.langchain.com/oss/python/langchain/mcp?utm_source=chatgpt.com" rel="noopener noreferrer"&gt;LangChain Docs&lt;/a&gt;).&lt;/p&gt;


&lt;h2&gt;
  
  
  So what is MCP, in plain English?
&lt;/h2&gt;

&lt;p&gt;Let’s strip away the buzzwords.&lt;/p&gt;

&lt;p&gt;When an LLM needs to do something real, like querying a database, calling an API, or running a workflow, we usually define those as tools inside the agent code.&lt;/p&gt;

&lt;p&gt;MCP changes that idea:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What if tools were exposed through a standard protocol instead?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now, the same tool server can be used by multiple agents, clients, or even frameworks(&lt;a href="https://gofastmcp.com/getting-started/welcome" rel="noopener noreferrer"&gt;FASTMCP&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;A simple way to think about it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MCP server&lt;/strong&gt; → where tools live&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP client&lt;/strong&gt; → connects to the server&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent&lt;/strong&gt; → decides when to use tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It sounds like a small shift, but it’s the difference between a quick demo and a system you can actually scale.&lt;/p&gt;


&lt;h2&gt;
  
  
  Building a tiny MCP server (and making it observable)
&lt;/h2&gt;

&lt;p&gt;To understand MCP, I didn’t start with anything complex.&lt;/p&gt;

&lt;p&gt;I built a small server with just two kinds of tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;basic math operations&lt;/li&gt;
&lt;li&gt;a dice roll&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of pasting the full code here, you can check the clean version directly:&lt;/p&gt;

&lt;p&gt;Simple MCP server: &lt;a href="https://github.com/trickste/mcp_test/blob/main/mcp_server/server_simple.py" rel="noopener noreferrer"&gt;server_simple.py&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At its core, it looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp.server.fastmcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt;

&lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;math-server&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s all it takes to expose a function as a tool.&lt;/p&gt;

&lt;p&gt;Now here’s the important part:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;These tools are &lt;strong&gt;not inside your agent&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;They are running in a &lt;strong&gt;separate server&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Running the MCP server
&lt;/h3&gt;

&lt;p&gt;You can start the server using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;fastmcp run server.py &lt;span class="nt"&gt;--transport&lt;/span&gt; http &lt;span class="nt"&gt;--port&lt;/span&gt; 9090
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This exposes your tools over HTTP at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://127.0.0.1:9090/mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now any agent (or client) can connect to it.&lt;/p&gt;




&lt;h3&gt;
  
  
  The moment things start to feel real
&lt;/h3&gt;

&lt;p&gt;At this point, everything worked.&lt;br&gt;
But something was missing.&lt;/p&gt;

&lt;p&gt;I could call tools…&lt;br&gt;
but I couldn’t &lt;strong&gt;see what was happening inside them&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;And that’s when logging becomes non-negotiable.&lt;/p&gt;


&lt;h3&gt;
  
  
  Adding logging (turning it into a real service)
&lt;/h3&gt;

&lt;p&gt;Instead of rewriting everything, I added a simple decorator to log:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;when a tool starts&lt;/li&gt;
&lt;li&gt;what inputs it received&lt;/li&gt;
&lt;li&gt;what it returned&lt;/li&gt;
&lt;li&gt;how long it took&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can check the full version here:&lt;/p&gt;

&lt;p&gt;Logged MCP server: &lt;a href="https://github.com/trickste/mcp_test/blob/main/mcp_server/server.py" rel="noopener noreferrer"&gt;server.py&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The core idea looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wrapper&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;START &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;END &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; -&amp;gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wrapper&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nd"&gt;@log_tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;roll_dice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;faces&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  What this gives you
&lt;/h3&gt;

&lt;p&gt;Now when your agent calls a tool, you don’t just get a result,&lt;br&gt;
you get visibility:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TOOL START | roll_dice | faces=20
TOOL END   | roll_dice | result=15
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And this is where MCP starts to click.&lt;/p&gt;

&lt;p&gt;Your “tool layer” is no longer hidden inside your agent.&lt;br&gt;
It’s running as a &lt;strong&gt;separate, observable service&lt;/strong&gt;.&lt;/p&gt;


&lt;h3&gt;
  
  
  Why this step matters
&lt;/h3&gt;

&lt;p&gt;This small setup already gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tools defined independently&lt;/li&gt;
&lt;li&gt;a server that can be reused&lt;/li&gt;
&lt;li&gt;logging and latency visibility&lt;/li&gt;
&lt;li&gt;a clean boundary between reasoning and execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And we haven’t even touched the agent yet.&lt;/p&gt;

&lt;p&gt;That’s where things get interesting next.&lt;/p&gt;


&lt;h2&gt;
  
  
  Now! Can I plug this server into an agent?
&lt;/h2&gt;

&lt;p&gt;And the answer is yes.&lt;/p&gt;

&lt;p&gt;LangChain now provides an MCP adapter library, &lt;code&gt;langchain-mcp-adapters&lt;/code&gt;, which lets agents consume tools directly from MCP servers. Its &lt;code&gt;MultiServerMCPClient&lt;/code&gt; can connect to one or more MCP servers, and by default it is stateless: each tool invocation opens a fresh MCP session, executes, and cleans up. (&lt;a href="https://docs.langchain.com/oss/python/langchain/mcp?utm_source=chatgpt.com" rel="noopener noreferrer"&gt;LangChain Docs&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;That stateless behavior turned out to match my use case nicely.&lt;/p&gt;


&lt;h3&gt;
  
  
  The agent side: FastAPI + LangChain + MCP
&lt;/h3&gt;

&lt;p&gt;Now comes the satisfying part.&lt;/p&gt;

&lt;p&gt;Instead of embedding tools inside the agent, I made the agent &lt;strong&gt;connect to the MCP server over HTTP&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You can check the full working code here:&lt;/p&gt;

&lt;p&gt;Agent (FastAPI + MCP): &lt;a href="https://github.com/trickste/mcp_test/blob/main/agent/agent.py" rel="noopener noreferrer"&gt;agent.py&lt;/a&gt;&lt;/p&gt;


&lt;h3&gt;
  
  
  What the agent really does
&lt;/h3&gt;

&lt;p&gt;At a high level, the agent is surprisingly simple.&lt;/p&gt;

&lt;p&gt;It:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;connects to the MCP server&lt;/li&gt;
&lt;li&gt;fetches available tools&lt;/li&gt;
&lt;li&gt;initializes an LLM&lt;/li&gt;
&lt;li&gt;lets the agent use those tools when needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key piece looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MultiServerMCPClient&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;math&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;transport&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://127.0.0.1:9090/mcp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_tools&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s it.&lt;/p&gt;

&lt;p&gt;MCP tools → automatically become usable by the agent&lt;/p&gt;




&lt;h3&gt;
  
  
  Adding the agent on top
&lt;/h3&gt;

&lt;p&gt;Then we plug those tools into a LangChain agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant that can use tools.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And expose it via FastAPI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatRequest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ainvoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  And here’s the shift
&lt;/h3&gt;

&lt;p&gt;Notice what’s missing.&lt;/p&gt;

&lt;p&gt;There is &lt;strong&gt;no math logic here&lt;/strong&gt;.&lt;br&gt;
No &lt;code&gt;add&lt;/code&gt;, no &lt;code&gt;divide&lt;/code&gt;, no &lt;code&gt;roll_dice&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The FastAPI app simply says:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;here is my LLM&lt;/li&gt;
&lt;li&gt;here is my MCP client&lt;/li&gt;
&lt;li&gt;give me the tools&lt;/li&gt;
&lt;li&gt;let the agent use them&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  What the full flow looks like
&lt;/h2&gt;

&lt;p&gt;Let’s say the user sends:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Roll a 20 sided dice and then add 5 to it&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The request travels like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;user hits the FastAPI &lt;code&gt;/chat&lt;/code&gt; endpoint&lt;/li&gt;
&lt;li&gt;the agent receives the query&lt;/li&gt;
&lt;li&gt;the agent decides it needs a tool&lt;/li&gt;
&lt;li&gt;LangChain calls the MCP adapter&lt;/li&gt;
&lt;li&gt;the MCP adapter calls the MCP server over HTTP&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;roll_dice(faces=20)&lt;/code&gt; executes&lt;/li&gt;
&lt;li&gt;result comes back&lt;/li&gt;
&lt;li&gt;the agent uses that numeric output in the next tool call&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;add(a=15, b=5)&lt;/code&gt; executes&lt;/li&gt;
&lt;li&gt;final answer is returned to the user&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;And because the server has logging, you can actually watch this happen.&lt;/p&gt;

&lt;p&gt;Here is the kind of trace I saw:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2026-04-08 12:34:34,822 | INFO | [7cb8650b] TOOL START | roll_dice | args=() kwargs={'faces': 20}
2026-04-08 12:34:34,822 | INFO | [7cb8650b] TOOL END   | roll_dice | result=15 | 0.03ms

2026-04-08 12:34:36,231 | INFO | [149d5f59] TOOL START | add | args=() kwargs={'a': 15.0, 'b': 5.0}
2026-04-08 12:34:36,231 | INFO | [149d5f59] TOOL END   | add | result=20.0 | 0.03ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is one of those moments where the architecture suddenly feels real.&lt;/p&gt;

&lt;p&gt;You are not just “calling functions from an LLM.”&lt;br&gt;
You are watching an agent orchestrate a tool server.&lt;/p&gt;




&lt;h2&gt;
  
  
  The part that stayed with me
&lt;/h2&gt;

&lt;p&gt;What stood out wasn’t the dice roll or the API.&lt;/p&gt;

&lt;p&gt;It was the separation.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the &lt;strong&gt;MCP server owns the tools&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;the &lt;strong&gt;agent handles reasoning&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;the &lt;strong&gt;API manages interaction&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each piece has a clear role and can evolve independently.&lt;/p&gt;

&lt;p&gt;MCP challenges the habit of packing everything into one place and instead gives you a cleaner way to separate intelligence from execution.&lt;/p&gt;

&lt;p&gt;And once you see that, it’s hard not to think: &lt;br&gt;
&lt;strong&gt;&lt;em&gt;Why was all of this in one file to begin with?&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  What’s next
&lt;/h3&gt;

&lt;p&gt;This is just the starting point.&lt;/p&gt;

&lt;p&gt;In upcoming blogs, I’ll go deeper into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;making MCP tools more robust&lt;/li&gt;
&lt;li&gt;improving reliability and performance&lt;/li&gt;
&lt;li&gt;adding security and access control&lt;/li&gt;
&lt;li&gt;and turning this into something closer to production-ready&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because building tools is one thing,&lt;br&gt;
building &lt;strong&gt;safe, scalable, and reliable tool systems&lt;/strong&gt; is where things get really interesting.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>rag</category>
      <category>python</category>
    </item>
    <item>
      <title>I Thought TOON Was Hype. Then I Tested It…</title>
      <dc:creator>Nishant prakash</dc:creator>
      <pubDate>Sat, 22 Nov 2025 19:03:09 +0000</pubDate>
      <link>https://forem.com/nishant_prakash_780f5d541/i-thought-toon-was-hype-then-i-tested-it-3bad</link>
      <guid>https://forem.com/nishant_prakash_780f5d541/i-thought-toon-was-hype-then-i-tested-it-3bad</guid>
      <description>&lt;p&gt;If you’re even slightly deep into the world of agentic AI, you’ve probably seen it: blog after blog mentioning something called &lt;strong&gt;TOON&lt;/strong&gt;. The posts were short. Some offered impressive claims like “cuts JSON token usage in half.” Others just mentioned it as a rising serialization format for LLMs, especially in agent-based workflows.&lt;/p&gt;

&lt;p&gt;Curious but unsatisfied, I decided to take a different route. I didn’t just want to read about TOON, I wanted to build with it. I wanted to &lt;strong&gt;see what made it special&lt;/strong&gt;, how it compares to formats like JSON and CSV, and where it actually shines. And of course, I wanted to share what I found in a way that makes it easier for you to understand too.&lt;/p&gt;

&lt;p&gt;This blog is that story.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6wy86wjzz81qt2ebkl4o.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6wy86wjzz81qt2ebkl4o.gif" alt="Why waste token" width="480" height="400"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Agentic AI in One Paragraph
&lt;/h2&gt;

&lt;p&gt;If you’re working with agentic AI frameworks like &lt;strong&gt;CrewAI&lt;/strong&gt;, &lt;strong&gt;LangGraph&lt;/strong&gt;, or &lt;strong&gt;AutoGen&lt;/strong&gt;, you already know the basics: agents powered by LLMs collaborate, passing structured information between each other to complete tasks. Think of each agent like a focused worker, one might gather data, another might summarize it, and another might generate recommendations.&lt;/p&gt;

&lt;p&gt;But here’s the catch: &lt;strong&gt;every time they exchange data&lt;/strong&gt;, it costs tokens. And in LLMs, tokens mean time, money, and performance limits.&lt;/p&gt;

&lt;p&gt;That’s where TOON comes in.&lt;/p&gt;




&lt;h2&gt;
  
  
  So... What Is TOON, Really?
&lt;/h2&gt;

&lt;p&gt;TOON stands for &lt;strong&gt;Token-Oriented Object Notation&lt;/strong&gt;. It’s a new serialization format that’s designed specifically for LLM communication. Like JSON, it can represent nested, structured data. But unlike JSON, it avoids all the extra syntax, no curly braces, no repeated field names, and no need for quotes around every string.&lt;/p&gt;

&lt;p&gt;Here’s a side-by-side comparison to make that clearer:&lt;/p&gt;

&lt;h3&gt;
  
  
  A Simple JSON:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Alice"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bob"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;92&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Same in TOON:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;data[2]{id,name,score}:
  1,Alice,85
  2,Bob,92
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;TOON is compact, readable, and tailor-made for LLMs. But I wanted to know: &lt;strong&gt;how much better is it, really?&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The POC I Built
&lt;/h2&gt;

&lt;p&gt;Before diving into results, here are the exact two scripts used for benchmarking. These are available publicly so readers can clone and run them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Test 1 (Complex Nested JSON)&lt;/strong&gt;: &lt;a href="https://github.com/trickste/toontoon/blob/main/toon_test_1.py" rel="noopener noreferrer"&gt;https://github.com/trickste/toontoon/blob/main/toon_test_1.py&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test 2 (Flat JSON)&lt;/strong&gt;: &lt;a href="https://github.com/trickste/toontoon/blob/main/toon_test_2.py" rel="noopener noreferrer"&gt;https://github.com/trickste/toontoon/blob/main/toon_test_2.py&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These scripts serialize the same dataset into different formats and measure how many tokens each representation consumes when passed through an LLM.&lt;/p&gt;

&lt;p&gt;To really understand TOON, I built a &lt;strong&gt;two-agent proof of concept&lt;/strong&gt; powered by &lt;strong&gt;Ollama (LLaMA 3.1)&lt;/strong&gt; running locally. The idea was simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agent 1&lt;/strong&gt; (Data Generator): produces structured data in Python&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent 2&lt;/strong&gt; (LLM Analyzer): consumes that data in various formats and explains it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But here’s the twist: I fed the data into Agent 2 using &lt;strong&gt;six different serialization formats&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JSON&lt;/li&gt;
&lt;li&gt;JSON (Compact)&lt;/li&gt;
&lt;li&gt;YAML&lt;/li&gt;
&lt;li&gt;XML&lt;/li&gt;
&lt;li&gt;CSV&lt;/li&gt;
&lt;li&gt;TOON&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And for each one, I measured:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How many &lt;strong&gt;tokens&lt;/strong&gt; the format used&lt;/li&gt;
&lt;li&gt;Whether the LLM could interpret it clearly&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  How the Data Looks Across Formats
&lt;/h2&gt;

&lt;p&gt;Below are simplified examples to help you visualize how &lt;strong&gt;flat&lt;/strong&gt; and &lt;strong&gt;nested&lt;/strong&gt; JSON convert into TOON.&lt;/p&gt;




&lt;h3&gt;
  
  
  Flat JSON Example
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;JSON&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Alice"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bob"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;92&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Charlie"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;78&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;TOON&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;data[3]{id,name,score}:
  1,Alice,85
  2,Bob,92
  3,Charlie,78
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Complex Nested JSON Example
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;JSON&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Alice"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;85&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"contact"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"alice@example.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"phone"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"123-456"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"math"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"science"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"projects"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AI Lab"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"year"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2022&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Robotics"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"year"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2023&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bob"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;92&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"contact"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bob@example.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"phone"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"789-012"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"physics"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"projects"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Quantum Lab"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"year"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2021&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;TOON&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;data[2]{id,name,score,contact,tags,projects}:
  1,Alice,85,{'email': 'alice@example.com', 'phone': '123-456'},['math','science'],[{'title':'AI Lab','year':2022},{'title':'Robotics','year':2023}]
  2,Bob,92,{'email': 'bob@example.com', 'phone': '789-012'},['physics'],[{'title':'Quantum Lab','year':2021}]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where things get interesting. You can &lt;em&gt;see&lt;/em&gt; immediately that for simple rows, TOON is incredibly compact... but as soon as nested structures appear, TOON begins embedding Python-like lists and dictionaries inside its rows.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Results: Token Counts by Format
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Test 1: Complex Nested Data (contacts, tags, projects)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Format&lt;/th&gt;
&lt;th&gt;Token Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CSV&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;58&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TOON&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;118&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSON Compact&lt;/td&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;YAML&lt;/td&gt;
&lt;td&gt;135&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;XML&lt;/td&gt;
&lt;td&gt;178&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSON&lt;/td&gt;
&lt;td&gt;203&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Test 2: Simple Flat Data (id, name, score)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Format&lt;/th&gt;
&lt;th&gt;Token Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CSV&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TOON&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSON Compact&lt;/td&gt;
&lt;td&gt;38&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;YAML&lt;/td&gt;
&lt;td&gt;51&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;XML&lt;/td&gt;
&lt;td&gt;71&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSON&lt;/td&gt;
&lt;td&gt;77&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🧠 Key Takeaway: CSV Isn’t Always Enough
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Wait... CSV is cheaper than TOON in both cases?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Yes, but there’s a catch.&lt;/p&gt;

&lt;p&gt;CSV is great for flat data. But as soon as your data includes &lt;strong&gt;nested objects&lt;/strong&gt;, &lt;strong&gt;lists&lt;/strong&gt;, or &lt;strong&gt;hierarchies&lt;/strong&gt;, CSV becomes... lossy. You end up flattening objects into strings. You lose type safety. You lose clarity.&lt;/p&gt;

&lt;p&gt;In contrast, &lt;strong&gt;TOON maintains full structure&lt;/strong&gt;, while being much leaner than JSON.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;JSON&lt;/strong&gt; repeats every key&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;XML&lt;/strong&gt; doubles them with open and close tags&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;YAML&lt;/strong&gt; saves space but isn’t built for tabular rows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TOON&lt;/strong&gt; gives you schema + data, with nearly CSV-level efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s why TOON is rising fast in agentic tools. It’s &lt;strong&gt;not about beating CSV&lt;/strong&gt;, but about &lt;strong&gt;replacing JSON&lt;/strong&gt; in places where token count actually matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned (And What You Can Try)
&lt;/h2&gt;

&lt;p&gt;This POC taught me far more than just how different serialization formats compare. It revealed how LLMs &lt;em&gt;actually behave&lt;/em&gt; when they are fed structured data, and why some formats may be popular even when they don’t always win the token race.&lt;/p&gt;

&lt;p&gt;Here are the real lessons that stood out.&lt;/p&gt;

&lt;h3&gt;
  
  
  TOON shines when structure matters
&lt;/h3&gt;

&lt;p&gt;If your data has any level of hierarchy, TOON keeps the meaning intact while still being much more compact than JSON. It lets you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;preserve nested objects&lt;/li&gt;
&lt;li&gt;keep column-like readability&lt;/li&gt;
&lt;li&gt;maintain type hints&lt;/li&gt;
&lt;li&gt;avoid repeated keys&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This balance explains why TOON is becoming a popular choice in agentic systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  CSV gives the fewest tokens, but at a real cost
&lt;/h3&gt;

&lt;p&gt;CSV had the lowest token counts in both tests. That part is true.&lt;/p&gt;

&lt;p&gt;But CSV comes with limitations that make it hard to use in multi-agent AI workflows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it cannot represent nested structures&lt;/li&gt;
&lt;li&gt;lists and objects get flattened into strings&lt;/li&gt;
&lt;li&gt;type information is lost&lt;/li&gt;
&lt;li&gt;you cannot reliably reconstruct the original JSON&lt;/li&gt;
&lt;li&gt;LLMs sometimes misinterpret flattened mixed content&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So CSV wins in raw token count, but loses the moment your agents need anything more than simple rows.&lt;/p&gt;

&lt;h3&gt;
  
  
  JSON Compact is a practical middle ground
&lt;/h3&gt;

&lt;p&gt;Compact JSON performed far better than formatted JSON. It:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;keeps full structure&lt;/li&gt;
&lt;li&gt;removes all whitespace&lt;/li&gt;
&lt;li&gt;tokenizes efficiently&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is still more verbose than TOON for flat data, but more predictable for nested data.&lt;/p&gt;

&lt;h3&gt;
  
  
  TOON is optimized for LLM clarity, not just token savings
&lt;/h3&gt;

&lt;p&gt;When LLMs read TOON, the schema line gives them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;number of items&lt;/li&gt;
&lt;li&gt;field names&lt;/li&gt;
&lt;li&gt;expected structure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This helps models interpret the data more reliably than CSV or XML.&lt;/p&gt;

&lt;p&gt;In real agent workflows, clarity often beats a small token savings.&lt;/p&gt;

&lt;h3&gt;
  
  
  Nested data changes everything
&lt;/h3&gt;

&lt;p&gt;In the nested test, TOON did not produce the smallest token count because the inline lists and dicts added overhead. This showed a valuable insight:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;TOON works best for uniform row-like structures. For deep nesting, compact JSON may be more efficient.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is important because it means serialization choices should match your data shape, not simply rely on trends.&lt;/p&gt;

&lt;h3&gt;
  
  
  The real takeaway
&lt;/h3&gt;

&lt;p&gt;Each format has strengths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CSV&lt;/strong&gt;: smallest tokens, weakest structure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TOON&lt;/strong&gt;: great structure-to-token ratio, very LLM friendly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JSON Compact&lt;/strong&gt;: predictable, solid for nested data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;YAML&lt;/strong&gt;: readable, moderate token usage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;XML&lt;/strong&gt;: expressive but token-heavy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the biggest thing we learned:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;TOON is becoming popular not because it always produces the smallest token count, but because it gives you structure, compactness, and LLM-friendly clarity in one place.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If your agents need to share structured data and reason about it reliably, TOON is a strong contender.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tools I Used
&lt;/h2&gt;

&lt;p&gt;Here’s the stack I used for this POC:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Python 3.10+&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ollama&lt;/strong&gt; running &lt;strong&gt;LLaMA3:latest&lt;/strong&gt; locally&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;tiktoken&lt;/strong&gt; for token counting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;python-toon&lt;/strong&gt; (fallback inline encoder included)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CSV / JSON / YAML / XML&lt;/strong&gt; modules&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agents are simple but modular, you can replace Agent 1 with an API call or Agent 2 with a more sophisticated analysis.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This POC showed that serialization shapes how agents think. TOON isn’t always the smallest, but it delivers a strong balance of structure and clarity for agentic workflows. Test different formats with your own data and see what your LLM responds to best.&lt;/p&gt;

&lt;p&gt;Thanks for reading!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>python</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I Was Done Getting Answers - So I Built RAG That Asks Questions Too</title>
      <dc:creator>Nishant prakash</dc:creator>
      <pubDate>Wed, 12 Nov 2025 18:12:44 +0000</pubDate>
      <link>https://forem.com/nishant_prakash_780f5d541/i-was-done-getting-answers-so-i-built-rag-that-asks-questions-too-1c2e</link>
      <guid>https://forem.com/nishant_prakash_780f5d541/i-was-done-getting-answers-so-i-built-rag-that-asks-questions-too-1c2e</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“What if a RAG system could not only fetch information but also reason about it, critique itself, and write a report, all autonomously?”&lt;/em&gt;&lt;br&gt;
That question sent me down a rabbit hole that ended with &lt;strong&gt;Data-Inspector&lt;/strong&gt;, a proof-of-concept &lt;strong&gt;Agentic RAG&lt;/strong&gt; pipeline built with Ollama, LangChain, Tavily, and Streamlit.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8cdjseraqpstwz83wplc.jpeg" alt="Rag was not enough" width="800" height="578"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  The Spark — From RAG to &lt;em&gt;Agentic&lt;/em&gt; RAG
&lt;/h2&gt;

&lt;p&gt;Traditional RAG systems are brilliant at &lt;em&gt;retrieval and response&lt;/em&gt;, but not at &lt;em&gt;reasoning or reflection.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;They typically:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Retrieve documents relevant to a query.&lt;/li&gt;
&lt;li&gt;Feed them to a large language model (LLM).&lt;/li&gt;
&lt;li&gt;Generate an answer that sounds confident, even when it’s wrong.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The model reads, but it doesn’t &lt;em&gt;think.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;So I began wondering: what if we could &lt;strong&gt;assign roles&lt;/strong&gt; inside the RAG flow?&lt;br&gt;
One agent fetches data, another summarizes, another synthesizes, another critiques, like a research team working in harmony.&lt;/p&gt;

&lt;p&gt;That’s how &lt;strong&gt;Data-Inspector&lt;/strong&gt; was born, a system that doesn’t just “search and answer,” but “reads, reasons, and reviews.”&lt;/p&gt;


&lt;h2&gt;
  
  
  What Exactly &lt;em&gt;Is&lt;/em&gt; Agentic RAG?
&lt;/h2&gt;

&lt;p&gt;Before diving into code, let’s unpack what this buzzword really means.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agentic RAG (Retrieval-Augmented Generation)&lt;/strong&gt; is an evolution of the classic RAG pipeline.&lt;br&gt;
While traditional RAG enhances an LLM with external knowledge, &lt;strong&gt;Agentic RAG gives that process a mind of its own.&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  From Static Pipelines to Autonomous Reasoners
&lt;/h3&gt;

&lt;p&gt;In standard RAG, you have a single, linear pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Retrieve → Generate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It’s powerful but static, there’s no reflection, no iteration, and no specialization.&lt;/p&gt;

&lt;p&gt;Agentic RAG transforms this static chain into a &lt;strong&gt;network of intelligent roles&lt;/strong&gt;, each responsible for one cognitive task:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Retrieve → Understand → Synthesize → Critique → Generate → (Loop back if needed)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every role acts as an &lt;em&gt;agent&lt;/em&gt;, capable of reasoning over its inputs, producing structured outputs, and handing them off to the next stage.&lt;/p&gt;




&lt;h3&gt;
  
  
  The Key Principles Behind Agentic RAG
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt; &lt;strong&gt;Role-based Autonomy&lt;/strong&gt;
Each agent (retriever, summarizer, critic, etc.) has a clearly defined job and communicates via structured data (JSON, markdown).&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;This modularity allows independent improvement of each skill, like retraining just your summarizer agent for better factual grounding.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt; &lt;strong&gt;Reflection Loops&lt;/strong&gt;
Agentic systems don’t stop at the first output. They evaluate and refine.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;This is what turns a “talkative assistant” into a “thoughtful collaborator.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dynamic Knowledge Access&lt;/strong&gt;&lt;br&gt;
Instead of relying only on a static vector database, agentic systems can trigger live searches, query APIs, or even plan multi-step reasoning chains.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Transparency &amp;amp; Explainability&lt;/strong&gt;&lt;br&gt;
Each stage produces interpretable intermediate artifacts, summaries, reviews, critique logs, making the system auditable and debuggable.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Common Architectures of Agentic RAG
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Architecture Type&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Example Use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Planner–Executor Loop&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A planning agent decomposes a task, executors handle retrieval and summarization.&lt;/td&gt;
&lt;td&gt;Workflow orchestration in research assistants.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Critic–Refiner Loop&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The system critiques its own output and regenerates it.&lt;/td&gt;
&lt;td&gt;Self-RAG, Self-Refine, Reflexion.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-Agent Collaboration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multiple specialized agents work in a pipeline, passing structured outputs downstream.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Data-Inspector&lt;/strong&gt;😛&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The approach I took, &lt;strong&gt;multi-agent collaboration&lt;/strong&gt;, felt the most natural.&lt;br&gt;
Each Python class became a self-contained professional: retriever, summarizer, synthesizer, and critic, all orchestrated by a pipeline.&lt;/p&gt;


&lt;h2&gt;
  
  
  Architecture Overview — A RAG System with Personality
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Data-Inspector/
├── agents/
│   ├── retriever.py       # Retrieval
│   ├── summarizer.py      # Summarization
│   ├── synthesizer.py     # Knowledge fusion
│   └── critic.py          # Review / Reflection
├── rag/
│   ├── chunker.py         # Document processing
│   └── vectorstore.py     # Vector memory (optional)
├── pipeline.py            # Agentic orchestration
└── ui_streamlit.py        # Interactive interface
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Each component acts like a neuron in a cognitive system, independent yet collaborative.&lt;/p&gt;


&lt;h2&gt;
  
  
  Retrieval — Learning to Find Relevant Knowledge
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;Retriever&lt;/strong&gt; is powered by the Tavily API. It’s the system’s scout, locating relevant information for the query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WebRetriever&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_urls&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_sources&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;results&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])[:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_sources&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Unlike traditional RAG’s static embeddings, this retrieves &lt;strong&gt;live knowledge&lt;/strong&gt;, keeping the system temporally aware and factually updated.&lt;/p&gt;




&lt;h2&gt;
  
  
  Chunking — Learning to Read Like a Human
&lt;/h2&gt;

&lt;p&gt;HTML pages are noisy. The &lt;code&gt;chunker.py&lt;/code&gt; module cleans and splits them into coherent text segments.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;prepare_chunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_html&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;cleaned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;clean_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_html&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;chunk_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Breaking long text into overlapping chunks lets the summarizer think locally while preserving context globally, just like a human scanning through paragraphs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summarization — Turning Reading Into Understanding
&lt;/h2&gt;

&lt;p&gt;Each chunk passes through the &lt;strong&gt;SummarizerAgent&lt;/strong&gt;, guided by a structured system prompt.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;SYSTEM_SUMMARIZER&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
You are a precise technical summarizer...
Return JSON with: key_points[], methods, evidence[], limitations[]
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sample output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"RAG vs Fine-Tuning"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"key_points"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"RAG adapts faster"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Fine-tuning offers deeper control"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"limitations"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Depends on retrieval quality"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All agents speak in JSON, a shared language that prevents context drift and ensures machine-readable collaboration.&lt;/p&gt;




&lt;h2&gt;
  
  
  Synthesis — Connecting the Dots
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;SynthesisAgent&lt;/strong&gt; merges multiple summaries into a unified comparative analysis.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;synthesize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;summaries&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;bulletized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;key_points&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;summaries&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;System:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;User: Query: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;bulletized&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, the model evolves from “reader” to “analyst,” forming relationships between insights and organizing them logically.&lt;/p&gt;




&lt;h2&gt;
  
  
  Critique — Giving the System a Conscience
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;CriticAgent&lt;/strong&gt; inspects the synthesized narrative and calls out weak logic or missing perspectives.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;review&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;synthesis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;summaries&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;System:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;User: Query: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;SYNTHESIS:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;synthesis&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"missing_perspectives"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Data bias"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"weak_arguments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Unsupported claims about fine-tuning benefits"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"overall_risk"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"medium"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This reflective loop transforms a basic RAG pipeline into a &lt;strong&gt;self-aware reasoning system.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Report Generation — From Thought to Thesis
&lt;/h2&gt;

&lt;p&gt;Finally, all insights are compiled into a Markdown report via &lt;code&gt;pipeline.py&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;report_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
System:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;SYSTEM_REPORT&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

User: Query: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
SYNTHESIS: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;synthesis&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
CRITIC REVIEW: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;review&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
Write final report in Markdown.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="n"&gt;report_md&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;report_llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;report_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result reads like an academic mini-paper:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Executive summary&lt;/li&gt;
&lt;li&gt;Comparative analysis&lt;/li&gt;
&lt;li&gt;Decision framework&lt;/li&gt;
&lt;li&gt;Risks and gaps&lt;/li&gt;
&lt;li&gt;References&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system doesn’t just compute, it &lt;em&gt;articulates&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Agentic RAG Outperforms Traditional RAG
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Traditional RAG&lt;/th&gt;
&lt;th&gt;Agentic RAG (Data-Inspector)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single linear chain&lt;/td&gt;
&lt;td&gt;Multi-agent collaboration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Learning Behavior&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Retrieval + generation only&lt;/td&gt;
&lt;td&gt;Retrieval + reasoning + reflection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Error Handling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None — one-shot generation&lt;/td&gt;
&lt;td&gt;Built-in self-critique loop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Explainability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Opaque output&lt;/td&gt;
&lt;td&gt;Transparent intermediate JSONs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Adaptability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Static embeddings&lt;/td&gt;
&lt;td&gt;Dynamic web retrieval + modular agents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Output Depth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fluent but shallow&lt;/td&gt;
&lt;td&gt;Analytical, reference-backed synthesis&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Agentic RAG = Traditional RAG + Cognition.&lt;/strong&gt;&lt;br&gt;
It elevates retrieval-augmented generation into &lt;em&gt;reason-augmented generation.&lt;/em&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prompts are contracts.&lt;/strong&gt;
Each agent must have a clear, bounded responsibility, otherwise, outputs collapse into noise.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Autonomy is discipline disguised as freedom.&lt;/strong&gt;
Structured interaction enables creativity without chaos.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Critique breeds truth.&lt;/strong&gt;
The CriticAgent was the breakthrough, the moment the system began questioning itself, quality skyrocketed.&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Looking Ahead
&lt;/h2&gt;

&lt;p&gt;Agentic RAG hints at a future where models won’t just &lt;em&gt;generate answers&lt;/em&gt; but will &lt;em&gt;collaborate intelligently.&lt;/em&gt;&lt;br&gt;
When Data-Inspector finished its first report, it didn’t feel like I’d run code, it felt like I’d led a discussion with a team of invisible colleagues.&lt;/p&gt;


&lt;h2&gt;
  
  
  Explore the Project
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/trickste/data-inspector" rel="noopener noreferrer"&gt;Data-Inspector — Agentic RAG Demo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Run it locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
streamlit run app/ui_streamlit.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Final Reflection
&lt;/h2&gt;

&lt;p&gt;What began as a question: &lt;em&gt;“Can RAG think critically?”&lt;/em&gt; -&amp;gt; evolved into an experiment in &lt;strong&gt;digital reasoning.&lt;/strong&gt;&lt;br&gt;
And maybe that’s the trajectory AI will take next:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;from systems that &lt;em&gt;answer questions&lt;/em&gt; to systems that &lt;em&gt;question their own answers.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>python</category>
      <category>programming</category>
    </item>
    <item>
      <title>No OpenAI API? No Problem. Build RAG Locally with Ollama and FastAPI.</title>
      <dc:creator>Nishant prakash</dc:creator>
      <pubDate>Thu, 06 Nov 2025 14:09:22 +0000</pubDate>
      <link>https://forem.com/nishant_prakash_780f5d541/no-openai-api-no-problem-build-rag-locally-with-ollama-and-fastapi-5g3c</link>
      <guid>https://forem.com/nishant_prakash_780f5d541/no-openai-api-no-problem-build-rag-locally-with-ollama-and-fastapi-5g3c</guid>
      <description>&lt;p&gt;I built a &lt;strong&gt;fully local Retrieval-Augmented Generation (RAG)&lt;/strong&gt; system that lets a &lt;strong&gt;Llama 3 model&lt;/strong&gt; answer questions about my own PDFs and Markdown files, no cloud APIs, no external servers, all running on my machine.&lt;/p&gt;

&lt;p&gt;It’s powered by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Streamlit&lt;/strong&gt; for the frontend&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FastAPI&lt;/strong&gt; for the backend&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ChromaDB&lt;/strong&gt; for vector storage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ollama&lt;/strong&gt; to run Llama 3 locally&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system ingests documents, chunks and embeds them, retrieves relevant parts for a query, and feeds them into Llama 3 to generate grounded answers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqz1umdeesz0cgu9jcvsr.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqz1umdeesz0cgu9jcvsr.jpg" alt="You might need some help" width="800" height="1000"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 Introduction – Why Go Local?
&lt;/h2&gt;

&lt;p&gt;It started with a simple frustration, I had a bunch of private PDFs and notes I wanted to query like ChatGPT, but without sending anything to the cloud.&lt;br&gt;
LLMs are powerful, but they &lt;strong&gt;don’t know your documents&lt;/strong&gt;. RAG changes that, it gives the model a “working memory” by feeding it relevant chunks of your data each time you ask a question.&lt;/p&gt;

&lt;p&gt;So, I decided to build my own mini-ChatGPT for personal docs, everything self-hosted, modular, and transparent.&lt;br&gt;
The goals were simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Upload docs → Ask questions → Get answers &lt;strong&gt;with citations&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Stay &lt;strong&gt;offline&lt;/strong&gt; and &lt;strong&gt;private&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Learn the &lt;strong&gt;moving parts&lt;/strong&gt; of a RAG pipeline by hand, not just through frameworks like LangChain.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Project Repository&lt;/strong&gt;: &lt;a href="https://github.com/trickste/raga" rel="noopener noreferrer"&gt;github.com/trickste/raga&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  🏗 Architecture Overview
&lt;/h2&gt;

&lt;p&gt;The setup runs in two clear phases: &lt;strong&gt;ingestion&lt;/strong&gt; and &lt;strong&gt;querying&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyf9sek1j7itbue2j7g2b.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyf9sek1j7itbue2j7g2b.jpg" alt="RAG Flow Diagram" width="721" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Streamlit UI&lt;/strong&gt; lets users upload documents and ask questions interactively.&lt;br&gt;
The &lt;strong&gt;FastAPI backend&lt;/strong&gt; handles everything else, text extraction, embedding, search, and invoking the LLM.&lt;/p&gt;

&lt;p&gt;This separation makes it easy to debug, extend, and even swap components later (e.g., replace Chroma with Pinecone, or Llama 3 with Mistral).&lt;/p&gt;


&lt;h2&gt;
  
  
  Code Walkthrough
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. Ingestion – Reading and Chunking Documents
&lt;/h3&gt;

&lt;p&gt;The ingestion pipeline starts by reading PDFs or Markdown files and turning them into clean text chunks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_bytes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_text_from_pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# via PyMuPDF
&lt;/span&gt;    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endswith&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.markdown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_text_from_markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;file_bytes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ignore&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;chunk_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;      &lt;span class="c1"&gt;# split into ~500-char chunks
&lt;/span&gt;    &lt;span class="n"&gt;num_added&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;add_texts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;num_added&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Chunking was trickier than expected.&lt;br&gt;
Too long, and the model forgets details. Too short, and it loses context.&lt;br&gt;
After experimenting, I settled around &lt;strong&gt;300–500 words per chunk&lt;/strong&gt; with slight overlaps to maintain continuity.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;CHUNK_SIZE&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;OVERLAP&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;CHUNK_SIZE&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That overlap (~10–15%) turned out to be a small tweak that made a big difference in retrieval quality.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Vector Storage – Semantic Search with ChromaDB
&lt;/h3&gt;

&lt;p&gt;Each chunk is embedded using &lt;strong&gt;SentenceTransformers (all-MiniLM-L6-v2)&lt;/strong&gt; and stored in a local ChromaDB collection.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;embedding_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SentenceTransformer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all-MiniLM-L6-v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chromadb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PersistentClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./chroma_db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;collection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_or_create_collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;langrag_docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_texts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embedding_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;normalize_embeddings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;metadatas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;ids&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;hex&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each document is now represented as a &lt;strong&gt;vector&lt;/strong&gt; in a 384-dimensional space.&lt;br&gt;
When a query comes in, we embed the question and retrieve the &lt;strong&gt;top-k most similar chunks&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_texts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;user_query&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;n_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;retrieved_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where the “retrieval” in Retrieval-Augmented Generation happens.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Querying – Feeding the Context to the LLM
&lt;/h3&gt;

&lt;p&gt;Once we have the most relevant chunks, we build a &lt;strong&gt;grounded prompt&lt;/strong&gt; and send it to &lt;strong&gt;Llama 3&lt;/strong&gt; via &lt;strong&gt;Ollama&lt;/strong&gt;’s local API.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a document QA assistant. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Answer strictly using only the CONTEXT below.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CONTEXT:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;QUESTION: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3:latest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434/api/chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prompt discipline was &lt;strong&gt;critical&lt;/strong&gt;.&lt;br&gt;
If you don’t tell the model to stick to the context, it will happily hallucinate.&lt;br&gt;
By enforcing “If it’s not in the context, say you don’t know,” we drastically improved trustworthiness.&lt;/p&gt;


&lt;h3&gt;
  
  
  4. The FastAPI Backend – Tying It All Together
&lt;/h3&gt;

&lt;p&gt;FastAPI glues it all: ingestion, querying, and LLM invocation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/add_docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_docs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;UploadFile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;File&lt;/span&gt;&lt;span class="p"&gt;(...)):&lt;/span&gt;
    &lt;span class="n"&gt;file_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;num_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;process_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Added &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;num_chunks&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; chunks from &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_docs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;QueryRequest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_texts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;n_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This clean separation of endpoints made debugging painless.&lt;br&gt;
Every step logs info: embeddings added, chunks retrieved, distances scored, and LLM latency.&lt;/p&gt;


&lt;h3&gt;
  
  
  5. Streamlit Frontend – A Minimal, Interactive UI
&lt;/h3&gt;

&lt;p&gt;The frontend makes it fun, drag and drop a file, type a question, and watch it respond.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;file_uploader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Upload PDF or Markdown&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text_input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ask a question about your documents:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;button&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Get Answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;API_URL&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;**Answer:**&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It’s only ~30 lines of Streamlit, but it transforms the project into a usable app.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Chunking Is an Art
&lt;/h3&gt;

&lt;p&gt;Small, overlapping chunks worked better than large ones.&lt;br&gt;
Breaking text at &lt;strong&gt;semantic boundaries&lt;/strong&gt; (paragraphs or sections) gave cleaner retrievals.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Quality of Retrieval Beats Quantity
&lt;/h3&gt;

&lt;p&gt;Feeding too many chunks diluted answers.&lt;br&gt;
3 relevant chunks &amp;gt; 10 vague ones.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Prompt Grounding Changes Everything
&lt;/h3&gt;

&lt;p&gt;Explicitly instructing the LLM not to make things up was the single most effective fix for hallucination.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Local Models Are Ready for Prime Time
&lt;/h3&gt;

&lt;p&gt;Running &lt;strong&gt;Llama 3 via Ollama&lt;/strong&gt; felt just as smooth as using an API, but faster, cheaper, and private.&lt;br&gt;
And yes, no API keys or rate limits, WOOHOO !!!&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Observe Everything
&lt;/h3&gt;

&lt;p&gt;Logging every stage, chunk sizes, retrieval scores, final prompts, made debugging feel scientific rather than guesswork.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion – My Own “ChatGPT for PDFs”
&lt;/h2&gt;

&lt;p&gt;At the end of this build, I had a working &lt;strong&gt;Local RAG Assistant&lt;/strong&gt;, a tiny offline system that could read, index, and reason about my documents.&lt;br&gt;
It runs entirely on my laptop, keeps my data private, and helped me deeply understand &lt;strong&gt;how modern LLM pipelines actually work&lt;/strong&gt; under the hood.&lt;/p&gt;

&lt;p&gt;There’s plenty of room to grow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Add &lt;strong&gt;source citations&lt;/strong&gt; in answers.&lt;/li&gt;
&lt;li&gt;Support more file types (Word, HTML, etc.).&lt;/li&gt;
&lt;li&gt;Experiment with &lt;strong&gt;different embedding models&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Add caching or user authentication for a real-world app.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But most importantly, it taught me how retrieval, embeddings, and prompt engineering combine to make language models truly useful.&lt;/p&gt;

&lt;p&gt;If you’ve been thinking about building something similar, &lt;strong&gt;just start&lt;/strong&gt;.&lt;br&gt;
It’s incredibly rewarding to see your own files talk back intelligently.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Happy building and may your vectors always find their nearest neighbors.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>openai</category>
      <category>ai</category>
      <category>mlops</category>
      <category>python</category>
    </item>
    <item>
      <title>Teaching AI to Take Initiative – Building a Self-Thinking App with LangGraph and Ollama</title>
      <dc:creator>Nishant prakash</dc:creator>
      <pubDate>Tue, 04 Nov 2025 16:38:38 +0000</pubDate>
      <link>https://forem.com/nishant_prakash_780f5d541/teaching-ai-to-take-initiative-building-a-self-thinking-app-with-langgraph-and-ollama-2h9j</link>
      <guid>https://forem.com/nishant_prakash_780f5d541/teaching-ai-to-take-initiative-building-a-self-thinking-app-with-langgraph-and-ollama-2h9j</guid>
      <description>&lt;h2&gt;
  
  
  Introduction – When AI Becomes a Recruiter
&lt;/h2&gt;

&lt;p&gt;Imagine uploading a job description and a candidate’s resume, and an AI instantly evaluates the match, gives ATS scores, writes tailored feedback, and even generates interview questions.&lt;br&gt;
No spreadsheets. No manual parsing. Just structured, explainable intelligence.&lt;/p&gt;

&lt;p&gt;That’s what I set out to build an &lt;strong&gt;Agentic AI Resume Evaluator&lt;/strong&gt; powered by &lt;strong&gt;LangGraph&lt;/strong&gt;, &lt;strong&gt;Ollama&lt;/strong&gt;, and &lt;strong&gt;FastAPI&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Project Repository:&lt;/strong&gt; &lt;a href="https://github.com/trickste/ntrvur" rel="noopener noreferrer"&gt;github.com/trickste/ntrvur&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this blog, I’ll walk through how I stitched together an end-to-end agentic system that autonomously reasons, uses tools, and collaborates with sub-agents to produce actionable hiring insights.&lt;/p&gt;

&lt;p&gt;By the end, you’ll see how agentic architectures, local LLMs, and good old Python orchestration come together to make AI do work, not just talk.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpeh9cjkx4ydjj3bm0ina.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpeh9cjkx4ydjj3bm0ina.jpg" alt="Agent?Really?" width="800" height="882"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  What Is Agentic AI, Really?
&lt;/h2&gt;

&lt;p&gt;Agentic AI means giving AI the ability to think, decide, and act toward a goal, much like a human agent.&lt;br&gt;
Instead of “just predicting text,” the AI plans its steps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;“Extract candidate details → Compute ATS score → Evaluate → Review → Merge → Return final report.”
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That requires:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tools&lt;/strong&gt; – for specific actions (e.g., ATS computation, text extraction).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Memory &amp;amp; State&lt;/strong&gt; – to carry information between steps.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reasoning Loop&lt;/strong&gt; – to know when to invoke which tool.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s where frameworks like LangChain and its orchestration layer LangGraph shine, they let you turn an LLM into a multi-step problem solver.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Tech Stack – Tools Behind the Magic
&lt;/h2&gt;

&lt;p&gt;Before diving into code, let's introduce the key players in this adventure:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LangGraph&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Builds and executes the agent workflow as a directed graph. Each node performs a task (ATS, experience extraction, etc.).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LangChain (Community)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Provides integrations and structured tool definitions (like &lt;code&gt;StructuredTool.from_function&lt;/code&gt;).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ollama&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Runs local LLMs (like &lt;code&gt;llama3&lt;/code&gt; or &lt;code&gt;phi3&lt;/code&gt;) offline, via &lt;code&gt;ollama.Client&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;FastAPI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Wraps everything in a clean REST API (&lt;code&gt;/api/evaluate&lt;/code&gt;) for users to upload JDs &amp;amp; resumes.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;scikit-learn&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Handles ATS scoring via TF-IDF cosine similarity.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PyMuPDF (fitz)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Extracts text from PDF resumes reliably.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;With this stack, I had the ingredients for an AI that could think (LLM via Ollama), remember/learn (via LangChain’s memory or retrieval components), act (use tools or retrieve info, potentially via MCP or LangChain tools), and be deployed (FastAPI serving a LangGraph-managed workflow). Now it was time to design the brain of the operation – the agent’s architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Designing an Agentic Workflow (Architecture)
&lt;/h2&gt;

&lt;p&gt;The system runs like a small team of AI specialists working in sync. Once a JD and resume are uploaded, FastAPI triggers a LangGraph workflow, a series of smart nodes that extract the candidate’s name, compute the ATS score, and infer years of experience. A local Ollama LLM then wraps these findings into structured JSON.&lt;/p&gt;

&lt;p&gt;From there, the Evaluator Agent analyzes the match and drafts interview questions, while the Reviewer Agent critiques and improves them. A final Synthesizer merges both perspectives into one coherent report, a clean orchestration of reasoning, review, and refinement.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2xyg61n1gcgcemxthcd.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2xyg61n1gcgcemxthcd.jpg" alt="Flow Architecture" width="701" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each part operates independently but shares structured state through LangGraph.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let’s unpack the pieces.
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Agentic Evaluation with LangGraph
&lt;/h3&gt;

&lt;p&gt;LangGraph was the backbone of my orchestration.&lt;br&gt;
I defined a &lt;code&gt;StateGraph&lt;/code&gt; with typed state fields using &lt;strong&gt;Pydantic&lt;/strong&gt;, and created sequential nodes that act like tools in a workflow.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# agentic_evaluator.py
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;EvaluationState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;jd_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;resume_text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;ATS_SCORE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;YEARS_OF_EXPERIENCE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;CANDIDATE_NAME&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Nodes as Actions
&lt;/h4&gt;

&lt;p&gt;Each node modifies and returns the updated state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ats_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;EvaluationState&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;ats_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jd_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;resume_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[ATS] → &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_copy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;experience_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;EvaluationState&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;experience_extractor_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jd_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Experience] → &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_copy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, a small &lt;strong&gt;LLM finalizer&lt;/strong&gt; via Ollama neatly packages all extracted values as JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;llm_finalize_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;EvaluationState&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatOllama&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ATS_SCORE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ATS_SCORE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YEARS_OF_EXPERIENCE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;YEARS_OF_EXPERIENCE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CANDIDATE_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CANDIDATE_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Given this data, return valid JSON:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;indent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;LangGraph made it effortless to visualize this reasoning loop as a flowchart.&lt;br&gt;
Each node is deterministic, while the last one invokes reasoning, the true “thinking” step.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 2: Evaluator Agent – LLM as the Hiring Analyst
&lt;/h3&gt;

&lt;p&gt;Once the structured MCP (Model Context Protocol)-like output was ready, I passed it into an &lt;strong&gt;Evaluator Agent&lt;/strong&gt;, an Ollama-powered LLM acting as a hiring analyst.&lt;/p&gt;

&lt;p&gt;It uses templated system and user prompts stored as Markdown:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;app/prompts/evaluator_system.md
app/prompts/evaluator_user.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The system prompt strictly enforces a JSON schema with ATS score, resume feedback, and 10–15 interview questions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_prompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OllamaLLM&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This agent reads the JD and resume, scores the match, and auto-generates a structured evaluation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Reviewer Agent – The Senior Interviewer
&lt;/h3&gt;

&lt;p&gt;Next, the &lt;strong&gt;Reviewer Agent&lt;/strong&gt; steps in, another Ollama instance prompted with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The JD and resume text&lt;/li&gt;
&lt;li&gt;Candidate experience level&lt;/li&gt;
&lt;li&gt;The evaluator’s generated questions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Its role: review the quality of those questions, detect missing areas, and produce a refined list.&lt;br&gt;
This agent embodies &lt;strong&gt;human-in-the-loop reasoning&lt;/strong&gt;, automated quality control.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;review_json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_reviewer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;jd_text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;jd_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;resume_text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;resume_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;questions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;questions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;years_of_experience&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;years_of_experience&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output includes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"QUESTION_REVIEW"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"missing_topics"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"CI/CD"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Cloud Costing"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"improvement_suggestions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Add a behavioral question"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"UPDATED_QUESTIONS"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Explain blue/green deployments?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"How do you manage secrets in CI/CD?"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Synthesizer – The Final Merge
&lt;/h3&gt;

&lt;p&gt;Finally, I wrote a &lt;code&gt;merge_evaluator_and_reviewer&lt;/code&gt; function to combine both agents’ results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;merge_evaluator_and_reviewer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;evaluator_json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reviewer_json&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;candidate_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;iter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;evaluator_json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;evaluator_json&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;candidate_name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;updated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;payload&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;candidate_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ATS_SCORE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ATS_SCORE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RESUME_FEEDBACK&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RESUME_FEEDBACK&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FINAL_QUESTIONS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromkeys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INTERVIEW_QUESTIONS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;reviewer_json&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UPDATED_QUESTIONS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="p"&gt;)),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;QUESTIONS_REVIEW_SUMMARY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;reviewer_json&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;QUESTION_REVIEW&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;updated&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is a &lt;strong&gt;rich, structured payload&lt;/strong&gt; ready for dashboards or downstream workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5: FastAPI – Making It Real
&lt;/h3&gt;

&lt;p&gt;To expose the system, I used FastAPI for a clean HTTP interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@router.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/evaluate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;EvaluateResponse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jd_file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;UploadFile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resume_pdf&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;UploadFile&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;jd_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;jd_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;resume_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;resume_pdf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;resume_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_text_from_pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resume_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;mcp_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_agentic_evaluation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jd_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resume_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;eval_json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_evaluator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jd_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resume_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mcp_output&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CANDIDATE_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;review_json&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_reviewer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jd_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resume_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;eval_json&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INTERVIEW_QUESTIONS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;mcp_output&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YEARS_OF_EXPERIENCE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;merge_evaluator_and_reviewer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;eval_json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;review_json&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The beauty? Upload a JD and a resume, and within seconds you get:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"payload"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"John Doe"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"ATS_SCORE"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;84.6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"RESUME_FEEDBACK"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"Python"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Strong"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"Docker"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Moderate"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"FINAL_QUESTIONS"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Explain container orchestration?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Describe CI/CD best practices?"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What I Learned
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. &lt;strong&gt;Agentic AI is orchestration, not magic.&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Breaking down tasks into nodes and state transitions makes the system explainable and debuggable. LangGraph turns “AI reasoning” into visual, modular code.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. &lt;strong&gt;Local LLMs are surprisingly capable.&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Running &lt;code&gt;llama3&lt;/code&gt; and &lt;code&gt;phi3&lt;/code&gt; locally through Ollama gave me full control no latency, no API keys. And switching models was one-line config.&lt;/p&gt;

&lt;h4&gt;
  
  
  3. &lt;strong&gt;Strict JSON prompts save hours.&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Enforcing schema at the prompt level eliminated parsing headaches. (Still, I built &lt;code&gt;coerce_json()&lt;/code&gt; to auto-clean malformed outputs.)&lt;/p&gt;

&lt;h4&gt;
  
  
  4. &lt;strong&gt;Agents need reviewers too.&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Having a second LLM review the first’s work brought real reliability a taste of collaborative multi-agent systems.&lt;/p&gt;

&lt;h4&gt;
  
  
  5. &lt;strong&gt;FastAPI makes deployment feel native.&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Integrating async routes and file uploads made serving AI models feel as easy as any microservice.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion – Agentic AI for Real-World Automation
&lt;/h3&gt;

&lt;p&gt;This project taught me that &lt;strong&gt;Agentic AI isn’t just theory&lt;/strong&gt;, it’s a practical blueprint for intelligent systems.&lt;br&gt;
By combining &lt;strong&gt;LangGraph&lt;/strong&gt; (structured reasoning), &lt;strong&gt;Ollama&lt;/strong&gt; (local intelligence), and &lt;strong&gt;FastAPI&lt;/strong&gt; (deployment glue), I built a hiring-assistant AI that’s fast, explainable, and private.&lt;/p&gt;

&lt;p&gt;But more than that, it changed how I think about AI apps, not as “chatbots,” but as &lt;strong&gt;collaborative systems of agents&lt;/strong&gt;, each with a role, a goal, and a reasoning process.&lt;/p&gt;

&lt;p&gt;If you’re looking to dive deeper, here are the docs that shaped my journey:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.langchain.com/langgraph" rel="noopener noreferrer"&gt;LangGraph Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ollama.ai" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; – Run local LLMs easily&lt;/li&gt;
&lt;li&gt;&lt;a href="https://python.langchain.com/docs" rel="noopener noreferrer"&gt;LangChain Tools and Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://fastapi.tiangolo.com" rel="noopener noreferrer"&gt;FastAPI&lt;/a&gt; – Lightning-fast web APIs&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://modelcontextprotocol.io" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt; – Emerging standard for LLM-tool integration&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Repo Snapshot
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;app/
├── core/config.py
├── routers/evaluate.py
├── services/
│   ├── agentic_evaluator.py
│   ├── evaluator_agent.py
│   ├── reviewer_agent.py
│   ├── synthesizer_agent.py
│   └── ollama_client.py
├── prompts/
│   ├── evaluator_system.md
│   ├── evaluator_user.md
│   ├── reviewer_system.md
│   └── reviewer_user.md
└── utils/json_safety.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Final Thought:&lt;/strong&gt;&lt;br&gt;
Agentic AI is not about creating a single genius model, it’s about creating a &lt;em&gt;team&lt;/em&gt; of smart, cooperative components.&lt;br&gt;
And once they start working together you realize, automation can actually &lt;em&gt;think&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Happy building, fellow tinkerers!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>python</category>
      <category>learning</category>
    </item>
  </channel>
</rss>
