<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: RajeevaChandra</title>
    <description>The latest articles on Forem by RajeevaChandra (@rajeev_3ce9f280cbae73b234).</description>
    <link>https://forem.com/rajeev_3ce9f280cbae73b234</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2827287%2F06012321-dd2b-4192-b498-c6209a71f876.jpg</url>
      <title>Forem: RajeevaChandra</title>
      <link>https://forem.com/rajeev_3ce9f280cbae73b234</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/rajeev_3ce9f280cbae73b234"/>
    <language>en</language>
    <item>
      <title>🚀 𝐈𝐧𝐭𝐞𝐠𝐫𝐚𝐭𝐢𝐧𝐠 𝐎𝐩𝐞𝐧𝐀𝐈’𝐬 𝐂𝐡𝐚𝐭𝐊𝐢𝐭 𝐰𝐢𝐭𝐡 𝐅𝐚𝐬𝐭𝐀𝐏𝐈: 𝐀 𝐏𝐫𝐚𝐜𝐭𝐢𝐜𝐚𝐥 𝐆𝐮𝐢𝐝𝐞 𝐭𝐨 𝐁𝐮𝐢𝐥𝐝𝐢𝐧𝐠 𝐌𝐨𝐝𝐞𝐫𝐧 𝐂𝐡𝐚𝐭 𝐀𝐠𝐞𝐧𝐭𝐬</title>
      <dc:creator>RajeevaChandra</dc:creator>
      <pubDate>Wed, 08 Oct 2025 02:21:40 +0000</pubDate>
      <link>https://forem.com/rajeev_3ce9f280cbae73b234/--3hhn</link>
      <guid>https://forem.com/rajeev_3ce9f280cbae73b234/--3hhn</guid>
      <description>&lt;p&gt;OpenAI unveiled a major update during Dev Day yesterday (Oct 6), introducing a new suite of tools to make building and deploying AI agents much easier.&lt;/p&gt;

&lt;p&gt;✨ What’s New:&lt;/p&gt;

&lt;p&gt;🧠 The launch includes AgentKit, which gives developers and business users the ability to build, deploy, and optimize agentic AI systems, and ChatKit, a framework for creating rich chat experiences without reinventing the UI layer.&lt;/p&gt;

&lt;p&gt;💬 ChatKit lets you embed a production-ready chat interface into your app or website with support for file uploads, tool invocation, and chain-of-thought visualization — all within minutes. &lt;/p&gt;

&lt;p&gt;Together, AgentKit and ChatKit bridge the gap between agent logic and user interaction, making it simpler to bring real AI agents into production products.&lt;/p&gt;

&lt;p&gt;💡 𝐓𝐰𝐨 𝐖𝐚𝐲𝐬 𝐭𝐨 𝐔𝐬𝐞 𝐂𝐡𝐚𝐭𝐊𝐢𝐭::&lt;/p&gt;

&lt;p&gt;According to OpenAI’s documentation, ChatKit can be integrated in two ways:&lt;br&gt;
1️⃣ 𝐑𝐞𝐜𝐨𝐦𝐦𝐞𝐧𝐝𝐞𝐝 𝐈𝐧𝐭𝐞𝐠𝐫𝐚𝐭𝐢𝐨𝐧:&lt;br&gt;
 Let OpenAI host and scale everything — you embed the ChatKit widget in your frontend and connect it to an OpenAI Agent Builder backend.&lt;br&gt;
2️⃣ 𝐀𝐝𝐯𝐚𝐧𝐜𝐞𝐝 𝐈𝐧𝐭𝐞𝐠𝐫𝐚𝐭𝐢𝐨𝐧:&lt;br&gt;
 Host ChatKit on your own infrastructure, use the ChatKit SDK, and connect it to any backend or model endpoint.&lt;/p&gt;

&lt;p&gt;🧱 What I Built&lt;br&gt;
 I implemented the Advanced Integration version — a self-hosted Chat Framework combining:&lt;br&gt;
● 𝐂𝐡𝐚𝐭𝐊𝐢𝐭 𝐔𝐈 (𝐅𝐫𝐨𝐧𝐭𝐞𝐧𝐝): A modern React + Next.js interface built with ChatKit components. Supports message history, placeholders, and full customization.&lt;br&gt;
● 𝐅𝐚𝐬𝐭𝐀𝐏𝐈 (𝐁𝐚𝐜𝐤𝐞𝐧𝐝): A lightweight layer exposing /api/chat and /health endpoints. Handles message serialization, temperature control, and integrates with any OpenAI-compatible API.&lt;br&gt;
● 𝐂𝐨𝐧𝐟𝐢𝐠𝐮𝐫𝐚𝐛𝐥𝐞 𝐌𝐨𝐝𝐞𝐥 𝐄𝐧𝐝𝐩𝐨𝐢𝐧𝐭: Flexible backend integration for connecting to any model API or agent orchestration layer.&lt;/p&gt;

&lt;p&gt;This setup delivers the same smooth ChatKit experience — but entirely under developer control. It’s modular, lightweight, and can easily connect to private APIs, enterprise systems, or custom agent tools.&lt;/p&gt;

&lt;p&gt;⚙️ How It Works&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpjh25xehw3tx1cj5eajk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpjh25xehw3tx1cj5eajk.png" alt="Navigation Flow" width="617" height="677"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;✅ The ChatKit UI sends messages to my FastAPI backend.&lt;br&gt;
✅ The backend processes, formats, and forwards them to the model API.&lt;br&gt;
✅ The response is returned to the UI and rendered instantly in the chat view.&lt;br&gt;
✅All data flow follows the OpenAI Chat Completions schema, so it’s plug-and-play with any model or agent backend.&lt;/p&gt;

&lt;p&gt;For those who’d like to explore the setup, I’ve published the full implementation here:&lt;br&gt;
&lt;a href="https://lnkd.in/eCvN8mjh" rel="noopener noreferrer"&gt;https://lnkd.in/eCvN8mjh&lt;/a&gt;&lt;/p&gt;

</description>
      <category>tutorial</category>
      <category>openai</category>
      <category>ai</category>
      <category>python</category>
    </item>
    <item>
      <title>𝐇𝐨𝐰 𝐭𝐨 𝐄𝐱𝐩𝐨𝐬𝐞 𝐀𝐖𝐒 𝐋𝐚𝐦𝐛𝐝𝐚 𝐚𝐬 𝐚𝐧 𝐌𝐂𝐏 𝐓𝐨𝐨𝐥 𝐰𝐢𝐭𝐡 Bedrock 𝐀𝐠𝐞𝐧𝐭𝐂𝐨𝐫𝐞 𝐆𝐚𝐭𝐞𝐰𝐚𝐲</title>
      <dc:creator>RajeevaChandra</dc:creator>
      <pubDate>Thu, 28 Aug 2025 02:43:15 +0000</pubDate>
      <link>https://forem.com/rajeev_3ce9f280cbae73b234/bedrock-3ami</link>
      <guid>https://forem.com/rajeev_3ce9f280cbae73b234/bedrock-3ami</guid>
      <description>&lt;p&gt;Most enterprises already have dozens (if not hundreds) of AWS Lambda functions powering business logic.&lt;br&gt;
But here’s the problem:&lt;br&gt;
 👉 How do you make those functions easily consumable by AI agents in a secure, standardized way?&lt;br&gt;
That’s where Amazon Bedrock AgentCore Gateway + Model Context Protocol (MCP) come in.&lt;br&gt;
 Think of the Gateway as a universal adapter that lets your Lambda “speak MCP,” so any agent can discover and call it as a tool.&lt;/p&gt;

&lt;p&gt;𝐀𝐦𝐚𝐳𝐨𝐧 𝐁𝐞𝐝𝐫𝐨𝐜𝐤 𝐀𝐠𝐞𝐧𝐭𝐂𝐨𝐫𝐞 𝐆𝐚𝐭𝐞𝐰𝐚𝐲&lt;br&gt;
1️⃣ One-stop bridge for agents → Turn APIs, Lambda functions, or existing services into MCP-compatible tools with just a few lines of config.&lt;br&gt;
 2️⃣ Scale with security → Built-in ingress auth (who can call) + egress auth (how Gateway connects to backends).&lt;br&gt;
 3️⃣ Developer speed → No weeks of glue code or infra provisioning — Gateway handles it.&lt;br&gt;
 4️⃣ Broad support → Works with OpenAPI, Smithy, and Lambda as input types.&lt;/p&gt;

&lt;p&gt;What I did in this example:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fog57jvieax1j9sdgdn86.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fog57jvieax1j9sdgdn86.png" alt="arch" width="800" height="357"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;● Created a simple Lambda function (get_order, update_order) to simulate order lookups and updates.&lt;br&gt;
● Used AgentCore Gateway to expose that Lambda as MCP tools.&lt;br&gt;
● Configured Cognito for inbound OAuth2 authentication and an IAM role for outbound authorization.&lt;br&gt;
● Connected with an MCP client to listTools and callTool, and got back real Lambda responses.&lt;/p&gt;

&lt;p&gt;After wiring everything up, my MCP client returned:&lt;br&gt;
🔧 get_order_tool&lt;br&gt;
 📝 Action: Fetch order status&lt;br&gt;
 📦 Result: { "orderId": "123", "status": "SHIPPED" }&lt;/p&gt;

&lt;p&gt;🔧 update_order_tool&lt;br&gt;
 📝 Action: Update order status&lt;br&gt;
 📦 Result: { "orderId": "123", "status": "UPDATED" }&lt;/p&gt;

&lt;p&gt;💻 Want to try this example yourself?&lt;br&gt;
 I’ve published the full working code here:&lt;br&gt;
 👉 &lt;a href="https://lnkd.in/gdrhHFDs" rel="noopener noreferrer"&gt;https://lnkd.in/gdrhHFDs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is just the beginning — next, I’ll explore exposing APIs (OpenAPI/Smithy) and chaining multiple tools together for richer agent workflows.&lt;/p&gt;

&lt;p&gt;Reference: Learn More About AgentCore Gateway&lt;br&gt;
To get a deeper understanding of how Amazon Bedrock AgentCore Gateway simplifies tool integration for AI agents, check out the official docs:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://lnkd.in/gGTB3EJD" rel="noopener noreferrer"&gt;https://lnkd.in/gGTB3EJD&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>mcp</category>
      <category>lambda</category>
      <category>bedrock</category>
    </item>
    <item>
      <title>𝐁𝐞𝐲𝐨𝐧𝐝 𝐅𝐫𝐞𝐬𝐡𝐧𝐞𝐬𝐬: 𝐇𝐨𝐰 𝐭𝐨 𝐮𝐬𝐞 𝐒𝐞𝐚𝐫𝐜𝐡 𝐌𝐨𝐝𝐞𝐬 𝐢𝐧 𝐕𝐞𝐜𝐭𝐨𝐫 𝐃𝐚𝐭𝐚𝐛𝐚𝐬𝐞𝐬 (𝐰𝐢𝐭𝐡 𝐋𝐚𝐧𝐠𝐜𝐡𝐚𝐢𝐧)</title>
      <dc:creator>RajeevaChandra</dc:creator>
      <pubDate>Sun, 24 Aug 2025 03:53:42 +0000</pubDate>
      <link>https://forem.com/rajeev_3ce9f280cbae73b234/--52i5</link>
      <guid>https://forem.com/rajeev_3ce9f280cbae73b234/--52i5</guid>
      <description>&lt;p&gt;In my last post, I talked about how dynamic embeddings keep your knowledge base fresh as documents evolve. But freshness is only half the story. &lt;br&gt;
When a user asks your assistant a question, 𝐡𝐨𝐰 𝐲𝐨𝐮 𝐬𝐞𝐚𝐫𝐜𝐡 𝐭𝐡𝐞 𝐯𝐞𝐜𝐭𝐨𝐫 𝐝𝐚𝐭𝐚𝐛𝐚𝐬𝐞 determines whether they get: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the single most relevant snippet, &lt;/li&gt;
&lt;li&gt;a broader set of context, or &lt;/li&gt;
&lt;li&gt;results filtered by metadata like timestamps or document type. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here are the five main search strategies—explained simply. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs31718emrimq2r0e67fc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs31718emrimq2r0e67fc.png" alt="search types" width="800" height="556"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;1️⃣ Similarity Search (k-NN)&lt;/p&gt;

&lt;p&gt;When you type a query, the system converts it into a vector Then, it looks around the vector space for the “neighbors” that sit closest. Those become your top results&lt;/p&gt;

&lt;p&gt;👉 Example:&lt;br&gt;
Query: “What is the required capital reserve?”&lt;br&gt;
Result: “Banks must maintain 12% capital reserves.”&lt;/p&gt;

&lt;p&gt;2️⃣ Max Marginal Relevance (MMR)&lt;/p&gt;

&lt;p&gt;MMR makes sure you don’t get the same answer five times in a row.&lt;br&gt;
Here’s how it works: after finding the most relevant snippet, it deliberately looks for other results that are still relevant but it balances relevance with diversity.&lt;/p&gt;

&lt;p&gt;👉 Example:&lt;br&gt;
Query: “Explain capital reserve requirements.”&lt;br&gt;
Results: “Banks must maintain 12% capital reserves.”&lt;br&gt;
“These reserves are adjusted annually based on regulations.”&lt;/p&gt;

&lt;p&gt;Notice how the second snippet doesn’t just repeat the first—it brings in a new angle. That’s MMR at work.&lt;/p&gt;

&lt;p&gt;3️⃣ Filtered / Metadata Search&lt;br&gt;
Sometimes “closest meaning” isn’t the whole story—you also care about context and constraints. That’s where metadata filtering comes in.&lt;br&gt;
Think of it as adding a funnel on top of similarity search. You still find the closest matches, but only those that meet extra rules like date, document type, source, or author.&lt;/p&gt;

&lt;p&gt;👉 Example:&lt;br&gt;
Query: “What’s the latest capital reserve requirement?”&lt;br&gt;
Filter: updated_at &amp;gt; 2025-01-01&lt;br&gt;
Result: The system ignores older documents and only shows the most recent rule—even if the older ones are technically “closer” in meaning.&lt;/p&gt;

&lt;p&gt;4️⃣ Hybrid Search (Keyword + Vector)&lt;/p&gt;

&lt;p&gt;Sometimes, meaning alone isn’t enough. What if your query includes an exact code, acronym, or ID? A pure semantic search might blur it, but a keyword search nails it.&lt;/p&gt;

&lt;p&gt;Hybrid search combines the two:&lt;/p&gt;

&lt;p&gt;Vector search captures the context and meaning.&lt;br&gt;
Keyword search makes sure specific terms (like “CRR-2025”) get the priority they deserve.&lt;/p&gt;

&lt;p&gt;👉 Example:&lt;/p&gt;

&lt;p&gt;Query: “Capital Reserve Rule CRR-2025”&lt;br&gt;
Vector search → understands it’s about capital reserves.&lt;br&gt;
Keyword search → ensures documents mentioning CRR-2025 are ranked higher.&lt;/p&gt;

&lt;p&gt;5️⃣ Cross-Encoder Reranking&lt;/p&gt;

&lt;p&gt;Starts with a fast similarity search, then uses a deeper model (like BERT) to re-score the top candidates for accuracy.&lt;/p&gt;

&lt;p&gt;👉 Query: “What are the capital reserve rules for 2025?”&lt;br&gt;
Step 1: Initial retrieval → 10 candidates&lt;br&gt;
Step 2: Reranker → re-scores and picks the single best snippet&lt;/p&gt;

&lt;p&gt;Want to explore the full code base?&lt;br&gt;
&lt;a href="https://lnkd.in/eec9AiHy" rel="noopener noreferrer"&gt;https://lnkd.in/eec9AiHy&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  📊 Search Strategies at a Glance
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;How it Works&lt;/th&gt;
&lt;th&gt;Pros&lt;/th&gt;
&lt;th&gt;Cons&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Similarity Search (k-NN)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Finds nearest neighbors in vector space&lt;/td&gt;
&lt;td&gt;Fast &amp;amp; simple&lt;/td&gt;
&lt;td&gt;Can return repetitive or narrow results&lt;/td&gt;
&lt;td&gt;Quick lookups, FAQs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Max Marginal Relevance (MMR)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Balances relevance + diversity&lt;/td&gt;
&lt;td&gt;Avoids duplicates, adds variety&lt;/td&gt;
&lt;td&gt;Slightly slower&lt;/td&gt;
&lt;td&gt;Explanations, multi-fact answers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Filtered / Metadata Search&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Adds constraints (date, type, source) on top of similarity&lt;/td&gt;
&lt;td&gt;Ensures results match business rules&lt;/td&gt;
&lt;td&gt;Needs clean, consistent metadata&lt;/td&gt;
&lt;td&gt;Compliance, regulations, versioned docs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hybrid Search&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Combines keyword search with vector similarity&lt;/td&gt;
&lt;td&gt;Best of both worlds (context + exact match)&lt;/td&gt;
&lt;td&gt;Requires extra infra (ElasticSearch, OpenSearch)&lt;/td&gt;
&lt;td&gt;IDs, codes, acronyms, technical docs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cross-Encoder Reranking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Re-scores initial candidates with a deeper model (e.g., BERT)&lt;/td&gt;
&lt;td&gt;Highest precision&lt;/td&gt;
&lt;td&gt;Computationally heavy&lt;/td&gt;
&lt;td&gt;Mission-critical answers, high-accuracy apps&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;🔑 Key Takeaway&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Static embeddings = stale snapshots&lt;/li&gt;
&lt;li&gt;Dynamic embeddings = living knowledge&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This pipeline keeps context fresh and supports multiple retrieval modes so you can choose the right strategy for your production needs.&lt;/p&gt;

</description>
      <category>langchain</category>
      <category>machinelearning</category>
      <category>vectordatabase</category>
      <category>python</category>
    </item>
    <item>
      <title>𝐁𝐮𝐢𝐥𝐝𝐢𝐧𝐠 𝐚 𝐃𝐲𝐧𝐚𝐦𝐢𝐜 𝐑𝐀𝐆 𝐏𝐢𝐩𝐞𝐥𝐢𝐧𝐞 𝐰𝐢𝐭𝐡 𝐋𝐚𝐧𝐠𝐂𝐡𝐚𝐢𝐧 (𝐓𝐡𝐚𝐭 𝐒𝐭𝐚𝐲𝐬 𝐅𝐫𝐞𝐬𝐡)</title>
      <dc:creator>RajeevaChandra</dc:creator>
      <pubDate>Sat, 23 Aug 2025 04:26:52 +0000</pubDate>
      <link>https://forem.com/rajeev_3ce9f280cbae73b234/--24od</link>
      <guid>https://forem.com/rajeev_3ce9f280cbae73b234/--24od</guid>
      <description>&lt;p&gt;Most RAG (Retrieval-Augmented Generation) systems work fine for static knowledge bases—but the moment your documents start changing (new policies, updated financials, revised product specs), they quickly go stale.&lt;/p&gt;

&lt;p&gt;We solved that with a dynamic RAG pipeline that keeps embeddings and context fresh without doing heavy full rebuilds. Here’s how it works:&lt;/p&gt;

&lt;p&gt;🧩 High-Level Flow&lt;/p&gt;

&lt;p&gt;1️⃣ 𝐖𝐚𝐭𝐜𝐡𝐞𝐫 (𝐅𝐢𝐥𝐞/𝐒3 𝐜𝐡𝐚𝐧𝐠𝐞𝐬)&lt;br&gt;
▪ Continuously listens for file changes (local folder or S3 bucket).&lt;br&gt;
▪ Detects when a document is new, updated, or deleted.&lt;br&gt;
2️⃣𝐄𝐦𝐛𝐞𝐝𝐝𝐢𝐧𝐠 (𝐨𝐧𝐥𝐲 𝐮𝐩𝐝𝐚𝐭𝐞𝐬)&lt;br&gt;
▪ Instead of re-embedding everything, it re-embeds only the changed chunks.&lt;br&gt;
▪ Saves time and compute costs while keeping the knowledge base fresh.&lt;br&gt;
3️⃣ 𝐕𝐞𝐜𝐭𝐨𝐫 𝐃𝐁 (𝐂𝐡𝐫𝐨𝐦𝐚)&lt;br&gt;
▪ Stores embeddings with metadata like updated_at.&lt;br&gt;
▪ When conflicts arise (e.g., same document with old + new facts), retrieval logic can guide the LLM to trust the freshest snippet.&lt;br&gt;
4️⃣ 𝐋𝐋𝐌 (𝐎𝐥𝐥𝐚𝐦𝐚/𝐎𝐩𝐞𝐧𝐀𝐈)&lt;br&gt;
▪ Takes the top-k retrieved chunks and augments the query.&lt;br&gt;
▪ Produces a contextualized answer with citations.&lt;br&gt;
5️⃣ 𝐒𝐭𝐫𝐞𝐚𝐦𝐥𝐢𝐭 𝐔𝐈&lt;br&gt;
▪ Users simply ask questions.&lt;br&gt;
▪ The UI calls the FastAPI backend, retrieves from Chroma, and passes to the LLM.&lt;br&gt;
▪Responses include answers + sources, so users know why the model said what it did.&lt;/p&gt;

&lt;p&gt;🚧 The Challenge (Simple Example)&lt;br&gt;
One file said:&lt;br&gt;
 ➡️ “All banks must maintain capital reserves of 10%.”&lt;br&gt;
Later, an update stated:&lt;br&gt;
 ➡️ “All banks must maintain capital reserves of 12%.”&lt;br&gt;
When I asked: “What is the required capital reserve?”&lt;/p&gt;

&lt;p&gt;Static RAG: “I don’t know.” (confused by conflicting facts)&lt;br&gt;
 Dynamic RAG: “12%” (trusts the most recent doc)&lt;/p&gt;

&lt;p&gt;𝐓𝐡𝐞 𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧 — 𝐃𝐲𝐧𝐚𝐦𝐢𝐜 𝐄𝐦𝐛𝐞𝐝𝐝𝐢𝐧𝐠𝐬&lt;br&gt;
🔄 Watches for new/updated docs in real time&lt;br&gt;
 ⚡ Re-embeds only what changes (no full rebuilds)&lt;br&gt;
 🏷️ Tracks updated_at so the LLM knows the freshest fact&lt;br&gt;
 🧠 Guides the model to resolve conflicts by trusting the most recent snippet&lt;br&gt;
Now, when a file is updated, the system re-embeds instantly and gives the right answer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8cphps0ryvio70lb6nd8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8cphps0ryvio70lb6nd8.png" alt="High Level Architecture" width="800" height="115"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the full working codebase, check my GitHub repo&lt;br&gt;
&lt;a href="https://github.com/rajeevchandra/dynamic_embeddings" rel="noopener noreferrer"&gt;https://github.com/rajeevchandra/dynamic_embeddings&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At the end of the day, AI systems are only as useful as the freshness of the knowledge they rely on. Building dynamic pipelines isn’t just about better tech — it’s about building assistants that can actually keep up with how fast the world changes.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>vectordatabase</category>
      <category>langchain</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>How 𝐋𝐚𝐧𝐠𝐒𝐦𝐢𝐭𝐡, 𝐋𝐚𝐧𝐠𝐆𝐫𝐚𝐩𝐡, 𝐎𝐥𝐥𝐚𝐦𝐚 &amp; 𝐅𝐀𝐈𝐒𝐒 Gave Me End-to-End Observability in a Local AI Chatbot</title>
      <dc:creator>RajeevaChandra</dc:creator>
      <pubDate>Fri, 16 May 2025 03:18:20 +0000</pubDate>
      <link>https://forem.com/rajeev_3ce9f280cbae73b234/how-gave-me-end-to-end-observability-in-a-local-ai-chatbot-4ch8</link>
      <guid>https://forem.com/rajeev_3ce9f280cbae73b234/how-gave-me-end-to-end-observability-in-a-local-ai-chatbot-4ch8</guid>
      <description>&lt;p&gt;I just built a self-contained AI chatbot—no cloud dependencies, no API keys, just pure local power with 𝐋𝐚𝐧𝐠𝐆𝐫𝐚𝐩𝐡, 𝐎𝐥𝐥𝐚𝐦𝐚, 𝐅𝐀𝐈𝐒𝐒, 𝐚𝐧𝐝 𝐋𝐚𝐧𝐠𝐒𝐦𝐢𝐭𝐡.&lt;/p&gt;

&lt;p&gt;Tech Stack:&lt;/p&gt;

&lt;p&gt;🤖 Build a fully local AI chatbot that answers questions from uploaded documents, with zero cloud dependencies.&lt;br&gt;
🔧 LangChain: Orchestrate the chatbot logic using modular, state-based workflows (e.g., retrieve → generate → feedback).&lt;br&gt;
📚 Ollama + FAISS + TF-IDF: Run a local llama3 model for response generation, and use TF-IDF + FAISS for fast, document-based context retrieval.&lt;br&gt;
🖥 Streamlit: Provide an interactive web interface where users can upload files and chat with the bot in real time.&lt;br&gt;
📊 LangSmith: Enable full observability — trace queries, inspect prompts, monitor latency, and analyze errors or retrieval issues end-to-end.&lt;/p&gt;

&lt;p&gt;At first, it answered questions well enough. &lt;br&gt;
But the real game-changer? &lt;br&gt;
I could trace every step of its reasoning.&lt;/p&gt;

&lt;p&gt;𝐋𝐚𝐧𝐠𝐒𝐦𝐢𝐭𝐡 gave me the transparency I never knew I needed, revealing the exact document chunks retrieved, the prompts fed to the model, execution times, and even where things went off track.&lt;/p&gt;

&lt;p&gt;🚧 The Problem: “It Works” Isn’t Enough&lt;/p&gt;

&lt;p&gt;At first, my chatbot seemed to be doing well — it returned reasonable answers to most questions. But then… weird things started to happen..&lt;br&gt;
● Was the wrong chunk retrieved?&lt;br&gt;
● Was the prompt malformed?&lt;br&gt;
● Did the model hallucinate?&lt;/p&gt;

&lt;p&gt;Without insight into what was happening step by step, debugging was pure guesswork.&lt;/p&gt;

&lt;p&gt;𝐇𝐨𝐰 𝐋𝐚𝐧𝐠𝐒𝐦𝐢𝐭𝐡 𝐇𝐞𝐥𝐩𝐞𝐝 𝐌𝐞 𝐃𝐞𝐛𝐮𝐠 𝐚𝐧𝐝 𝐈𝐦𝐩𝐫𝐨𝐯𝐞&lt;/p&gt;

&lt;p&gt;⇒ Tracing&lt;br&gt;
● View each query, chunk retrieval, and LLM response in real-time&lt;br&gt;
● Confirm the right part of the document was being used&lt;br&gt;
● Inspect the exact prompt given to llama3 via Ollama&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1fakaxqema1mwvhaeuvt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1fakaxqema1mwvhaeuvt.png" alt="Tracing" width="800" height="423"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;⇒ Error Analysis&lt;br&gt;
● Trace misfires back to irrelevant or empty document chunks&lt;br&gt;
● Compare expected vs. actual outputs&lt;br&gt;
● Catch malformed inputs or slow model responses&lt;/p&gt;

&lt;p&gt;⇒ Performance Metrics&lt;br&gt;
● Track latency for each step (retriever, LLM)&lt;br&gt;
● Identify slowdowns during Ollama inference&lt;br&gt;
● Start tagging “slow” or “retrieval_miss” runs for dashboards&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4qn1a27hcnouub8aqqry.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4qn1a27hcnouub8aqqry.png" alt="Metrics" width="800" height="327"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;📊 Scaling Visibility with LangSmith Dashboards&lt;/p&gt;

&lt;p&gt;LangSmith doesn’t just log traces — it helps you monitor trends over time.&lt;br&gt;
Using their dashboard tools, I now track:&lt;br&gt;
🧠 Number of LLM calls&lt;br&gt;
🕒 Average latency per query&lt;br&gt;
📉 Retrieval failures&lt;br&gt;
💸 Token usage (if using APIs like OpenAI or Anthropic)&lt;br&gt;
❌ Error Rates: Identify failed runs, exceptions, or empty prompts&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fci4x1s5cmfgde3vkij85.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fci4x1s5cmfgde3vkij85.png" alt="Dashboard" width="800" height="398"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I’ve published the full working project on GitHub — complete with TF-IDF + FAISS retrieval, Ollama model integration, LangSmith observability, and a Streamlit interface.&lt;br&gt;
&lt;a href="https://lnkd.in/etpCMPiS" rel="noopener noreferrer"&gt;https://lnkd.in/etpCMPiS&lt;/a&gt;&lt;/p&gt;

</description>
      <category>langchain</category>
      <category>faiss</category>
      <category>langsmith</category>
      <category>streamlit</category>
    </item>
    <item>
      <title>Talk to Your Kubernetes Cluster Using AI</title>
      <dc:creator>RajeevaChandra</dc:creator>
      <pubDate>Tue, 13 May 2025 21:47:14 +0000</pubDate>
      <link>https://forem.com/rajeev_3ce9f280cbae73b234/talk-to-your-kubernetes-cluster-using-ai-2al5</link>
      <guid>https://forem.com/rajeev_3ce9f280cbae73b234/talk-to-your-kubernetes-cluster-using-ai-2al5</guid>
      <description>&lt;p&gt;Today, I explored kubectl-ai, a powerful CLI from Google Cloud that lets you interact with your Kubernetes cluster using natural language, powered by local LLMs like Mistral (via Ollama) or cloud models like Gemini.&lt;/p&gt;

&lt;p&gt;Imagine saying things like:&lt;/p&gt;

&lt;p&gt;“List all pods in default namespace”&lt;br&gt;
“generate a deployment with 3 nginx replicas”&lt;br&gt;
“debug a pod stuck in CrashLoopBackOff”&lt;br&gt;
“Generate a YAML for a CronJob that runs every 5 minutes”&lt;/p&gt;

&lt;p&gt;And your terminal does the work — no YAML guessing, no docs tab-hopping.&lt;/p&gt;

&lt;p&gt;How does Kubectl-ai work? &lt;/p&gt;

&lt;p&gt;1) You type a natural language prompt like “List all pods in kube-system”.&lt;br&gt;
 2) The kubectl-ai CLI sends your prompt to a connected LLM (like Ollama or Gemini).&lt;br&gt;
3) The LLM interprets the request and returns either a plain explanation, a suggested kubectl command, or a tool-call instruction.&lt;br&gt;
4) kubectl-ai processes the response:&lt;br&gt;
 If --dry-run is enabled, it just prints the command.&lt;br&gt;
 If --enable-tool-use-shim is used, it extracts and runs the command.&lt;/p&gt;

&lt;p&gt;5) The actual kubectl command is executed on your active Kubernetes cluster.&lt;br&gt;
6) The cluster returns the result (like pod lists or deployment status).&lt;br&gt;
7) The output is shown in your terminal — just like you ran the command manually.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcvalq8nj57xpb25lvp4n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcvalq8nj57xpb25lvp4n.png" alt="how the tool works" width="800" height="558"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;kubectl-ai supports the Model Context Protocol (MCP), the emerging open protocol for AI tool interoperability.&lt;/p&gt;

&lt;p&gt;💡 This means you can:&lt;br&gt;
1) Build structured, agentic workflows&lt;br&gt;
2) Pipe Kubernetes operations into broader AI systems&lt;br&gt;
3) Connect kubectl-ai to MCP clients (e.g., Claude, Amazon Q)&lt;/p&gt;

&lt;p&gt;If you’re already building with MCP, this is a killer entry point into AI-assisted DevOps.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcrog4ukeyt2mfpgp6jq6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcrog4ukeyt2mfpgp6jq6.png" alt="creating a pod" width="800" height="428"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Try It Yourself&lt;br&gt;
🔗 GitHub: &lt;a href="https://lnkd.in/e57aFwC6" rel="noopener noreferrer"&gt;https://lnkd.in/e57aFwC6&lt;/a&gt;&lt;br&gt;
Want to control Kubernetes with natural language?&lt;br&gt;
 This is the cleanest, most extensible way to do it — and it works entirely in your terminal.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>kubernetes</category>
      <category>mcp</category>
      <category>kubectl</category>
    </item>
    <item>
      <title>𝐀 𝐅𝐮𝐥𝐥𝐲 𝐋𝐨𝐜𝐚𝐥 𝐀𝐈 𝐂𝐡𝐚𝐭𝐛𝐨𝐭 𝐔𝐬𝐢𝐧𝐠 𝐎𝐥𝐥𝐚𝐦𝐚, 𝐋𝐚𝐧𝐠𝐂𝐡𝐚𝐢𝐧 &amp; 𝐂𝐡𝐫𝐨𝐦𝐚𝐃𝐁</title>
      <dc:creator>RajeevaChandra</dc:creator>
      <pubDate>Tue, 13 May 2025 03:12:04 +0000</pubDate>
      <link>https://forem.com/rajeev_3ce9f280cbae73b234/--2koi</link>
      <guid>https://forem.com/rajeev_3ce9f280cbae73b234/--2koi</guid>
      <description>&lt;p&gt;🚀 Today, I got hands-on with a Retrieval-Augmented Generation (RAG) setup that runs entirely offline. I built a private AI assistant that can answer questions from Markdown and PDF documentation — no cloud, no API keys.&lt;/p&gt;

&lt;p&gt;🧱 Ollama for local LLM &amp;amp; embedding&lt;br&gt;
🔍 LangChain for RAG orchestration + memory&lt;br&gt;
📦 ChromaDB for vector storage&lt;br&gt;
💬 Streamlit for the chatbot UI&lt;/p&gt;

&lt;p&gt;Key features:&lt;br&gt;
 ● Upload .md or .pdf Files&lt;br&gt;
 ● Auto-re-index and embed with nomic-embed-text&lt;br&gt;
 ● Ask natural questions to mistral (or other local LLMs)&lt;br&gt;
 ● Multi-turn chat with memory&lt;br&gt;
 ● Source highlighting for every answer&lt;/p&gt;

&lt;p&gt;🧠 How This Local RAG Chatbot Works (Summary)&lt;/p&gt;

&lt;p&gt;1) Upload Your Docs&lt;br&gt;
 Drag and drop .md and .pdf files into the Streamlit app. The system supports both structured and unstructured formats — no manual formatting needed.&lt;/p&gt;

&lt;p&gt;2) Chunking + Embedding&lt;br&gt;
 Each document is split into small, context-aware text chunks and embedded locally using the nomic-embed-text model via Ollama.&lt;/p&gt;

&lt;p&gt;3) Store in Chroma Vector DB&lt;br&gt;
 The resulting embeddings are stored in ChromaDB, enabling fast and accurate similarity search when queries are made.&lt;/p&gt;

&lt;p&gt;4) Ask Natural Questions&lt;br&gt;
 You type a question like “What are DevOps best practices?”, and the app retrieves the most relevant chunks using semantic search.&lt;/p&gt;

&lt;p&gt;5) Answer with LLM + Memory&lt;br&gt;
 Retrieved context is passed to mistral (or any Ollama-compatible LLM). LangChain manages session memory for multi-turn Q&amp;amp;A.&lt;/p&gt;

&lt;p&gt;6) Sources Included&lt;br&gt;
 Each answer shows where it came from — including the filename and content snippet — so you can trust and trace every response.&lt;/p&gt;

&lt;p&gt;Display answer + source documents in Streamlit&lt;/p&gt;

&lt;p&gt;💬 Example Prompts&lt;/p&gt;

&lt;p&gt;"What is a microservice?"&lt;br&gt;
"How does Kubernetes manage pod lifecycle?"&lt;br&gt;
"Give me an example Docker Compose file."&lt;br&gt;
"What are DevOps best practices?"&lt;/p&gt;

&lt;p&gt;Honestly, this was one of those projects that reminded me how far local AI tools have come. No cloud APIs, no fancy GPU rig — just a regular laptop, and I was able to build a fully working RAG chatbot that reads my docs and gives solid, contextual answers.&lt;/p&gt;

&lt;p&gt;If you’ve ever wanted to interact with your own knowledge base — internal docs, PDFs, notes — in a more natural way, this setup is 100% worth trying. It's private, surprisingly fast, and honestly, kind of fun to put together.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F64obr2ydwzk85h0ewh4g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F64obr2ydwzk85h0ewh4g.png" alt="how this works" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fra1y3kwvbjk3bnp2et72.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fra1y3kwvbjk3bnp2et72.png" alt="how it runs in streamlit" width="800" height="497"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo28tf3y4xn8vuri7xrla.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo28tf3y4xn8vuri7xrla.png" alt="how it runs in streamlit" width="800" height="302"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rag</category>
      <category>streamlit</category>
      <category>chromadb</category>
    </item>
    <item>
      <title>🏦 Automating Loan Underwriting with Agentic AI: LangGraph, MCP &amp; Amazon SageMaker in Action</title>
      <dc:creator>RajeevaChandra</dc:creator>
      <pubDate>Fri, 09 May 2025 21:17:34 +0000</pubDate>
      <link>https://forem.com/rajeev_3ce9f280cbae73b234/automating-loan-underwriting-with-agentic-ai-langgraph-mcp-amazon-sagemaker-in-action-3ah5</link>
      <guid>https://forem.com/rajeev_3ce9f280cbae73b234/automating-loan-underwriting-with-agentic-ai-langgraph-mcp-amazon-sagemaker-in-action-3ah5</guid>
      <description>&lt;p&gt;To demonstrate the power of Model Context Protocol (MCP) in real-world enterprise AI, I recently ran a loan underwriting pipeline that combines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MCP for tool-style interaction between LLMs and services &lt;/li&gt;
&lt;li&gt;LangGraph to orchestrate multi-step workflows &lt;/li&gt;
&lt;li&gt;Amazon SageMaker to securely host the LLM &lt;/li&gt;
&lt;li&gt;FastAPI to serve agents with modular endpoints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What Is LangGraph?&lt;/p&gt;

&lt;p&gt;LangGraph is a framework for orchestrating multi-step, stateful workflows across LLM-powered agents.&lt;/p&gt;

&lt;p&gt;🔄 Graph-based execution engine: It lets you define agent workflows as nodes in a graph, enabling branching, retries, and memory — perfect for multi-agent AI systems.&lt;/p&gt;

&lt;p&gt;🔗 Seamless tool and state handling: It maintains structured state across steps, making it easy to pass outputs between agents like Loan Officer → Credit Analyst → Risk Manager.&lt;/p&gt;

&lt;p&gt;Each agent doesn’t run in isolation — they’re stitched together with LangGraph, a framework that lets you:&lt;/p&gt;

&lt;p&gt;● Define multi-agent workflows&lt;br&gt;
● Handle flow control, retries, state transitions&lt;br&gt;
● Pass structured data from one agent to the next&lt;/p&gt;

&lt;p&gt;Here’s how it works — and why it’s a powerful architectural pattern for decision automation&lt;/p&gt;

&lt;h2&gt;
  
  
  🧾 The Use Case: AI-Driven Loan Underwriting
&lt;/h2&gt;

&lt;p&gt;Loan underwriting typically involves:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reviewing applicant details&lt;/li&gt;
&lt;li&gt;Evaluating creditworthiness&lt;/li&gt;
&lt;li&gt;Making a final approval or denial decision&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In this architecture, each role is performed by a &lt;strong&gt;dedicated AI agent&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loan Officer– Summarizes application details &lt;/li&gt;
&lt;li&gt;Credit Analyst– Assesses financial risk &lt;/li&gt;
&lt;li&gt;Risk Manager – Makes the final decision&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🧱 Architecture Overview
&lt;/h2&gt;

&lt;p&gt;This workflow is powered by a centralized LLM, hosted on Amazon SageMaker, with each agent deployed as an **MCP server on EC2 and orchestrated via LangGraph:&lt;/p&gt;

&lt;p&gt;Workflow Steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User submits loan details (e.g., name, income, credit score)&lt;/li&gt;
&lt;li&gt;MCP client routes the request to the Loan Officer MCP server&lt;/li&gt;
&lt;li&gt;Output is forwarded to the Credit Analyst MCP server&lt;/li&gt;
&lt;li&gt;Result is passed to the Risk Manager MCP server&lt;/li&gt;
&lt;li&gt;A final prompt is generated, processed by the LLM on SageMaker, and sent back to the user&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Image Credit: AWS&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fer0z9g1omojinc9prpre.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fer0z9g1omojinc9prpre.png" alt="AWS" width="800" height="429"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I have used below model for the execution &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model: &lt;code&gt;Qwen/Qwen2.5-1.5B-Instruct&lt;/code&gt; &lt;/li&gt;
&lt;li&gt;Source: Hugging Face &lt;/li&gt;
&lt;li&gt;Hosted on: Amazon SageMaker (Hugging Face LLM Inference Container)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjjr39so2fdi1367nal1a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjjr39so2fdi1367nal1a.png" alt="execution flow" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Image credit: "AWS"&lt;/p&gt;

&lt;p&gt;🔗 Want to Try It?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;👉 &lt;a href="https://lnkd.in/ebueztnv" rel="noopener noreferrer"&gt;Official AWS Blog&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>sagemaker</category>
      <category>langgraph</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Kubernetes 1.32: Real-World Use Cases &amp; Examples</title>
      <dc:creator>RajeevaChandra</dc:creator>
      <pubDate>Fri, 02 May 2025 13:45:29 +0000</pubDate>
      <link>https://forem.com/rajeev_3ce9f280cbae73b234/kubernetes-132-real-world-use-cases-examples-4227</link>
      <guid>https://forem.com/rajeev_3ce9f280cbae73b234/kubernetes-132-real-world-use-cases-examples-4227</guid>
      <description>&lt;p&gt;Kubernetes 1.32: Real-World Use Cases &amp;amp; Examples&lt;/p&gt;

&lt;p&gt;The Kubernetes 1.32 release, codenamed &lt;strong&gt;"Penelope"&lt;/strong&gt;, introduces thoughtful features aimed at making workloads more efficient, observable, and manageable.&lt;/p&gt;

&lt;p&gt;In this post, I’ve compiled practical examples for each major feature, making it easier to see how they fit into your everyday Kubernetes workflow.&lt;/p&gt;




&lt;h2&gt;
  
  
  🎯 1. Dynamic Resource Allocation (DRA) Enhancements
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Use Case:
&lt;/h3&gt;

&lt;p&gt;A financial services company needs to train ML models that require GPUs with at least 16GB of memory. Instead of hardcoding node selection, &lt;strong&gt;DRA dynamically allocates GPU resources at runtime.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What it does:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Uses a &lt;code&gt;ResourceClaimTemplate&lt;/code&gt; to define GPU access.&lt;/li&gt;
&lt;li&gt;Pods request GPUs without being tied to specific nodes.&lt;/li&gt;
&lt;li&gt;Runs a container that uses an NVIDIA GPU to train a model.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why it matters:
&lt;/h3&gt;

&lt;p&gt;Template:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: resource.k8s.io/v1alpha2
kind: ResourceClaimTemplate
metadata:
  name: gpu-claim-template
spec:
  metadata:
    labels:
      resource: nvidia-gpu
  spec:
    resourceClassName: nvidia.com/gpu
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  resourceClaims:
    - name: gpu
      source:
        resourceClaimTemplateName: gpu-claim-template
  containers:
    - name: ml-trainer
      image: your-ml-image
      command: ["python", "train.py"]
      resources:
        limits:
          nvidia.com/gpu: 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;✅ Dynamically provisions GPU at runtime
&lt;/li&gt;
&lt;li&gt;✅ Avoids node pre-binding
&lt;/li&gt;
&lt;li&gt;✅ Ideal for ML training, AI workloads, and GPU-heavy applications&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧹 2. Auto-Removal of PVCs in StatefulSets
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Use Case:
&lt;/h3&gt;

&lt;p&gt;Your team deploys short-lived stateful workloads (like test environments). Without cleanup, leftover PVCs accumulate.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: data-processor
spec:
  serviceName: "data-service"
  replicas: 3
  selector:
    matchLabels:
      app: data-processor
  persistentVolumeClaimRetentionPolicy:
    whenDeleted: Delete
    whenScaled: Delete
  template:
    metadata:
      labels:
        app: data-processor
    spec:
      containers:
        - name: processor
          image: your-data-processor-image
          volumeMounts:
            - name: data-storage
              mountPath: /data
  volumeClaimTemplates:
    - metadata:
        name: data-storage
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 10Gi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why it’s useful:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;✅ Automatically deletes PVCs when a pod is removed or scaled down
&lt;/li&gt;
&lt;li&gt;✅ Prevents orphaned volumes
&lt;/li&gt;
&lt;li&gt;✅ Great for ephemeral data processing jobs and simulations&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🪟 3. Graceful Shutdown for Windows Nodes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Use Case:
&lt;/h3&gt;

&lt;p&gt;You run Windows-based apps in your cluster. During node shutdown, you need those apps to clean up gracefully instead of abruptly terminating.&lt;/p&gt;

&lt;h3&gt;
  
  
  What’s new:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Kubernetes 1.32 adds &lt;strong&gt;graceful shutdown&lt;/strong&gt; support for Windows pods.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: v1
kind: Pod
metadata:
  name: windows-app
spec:
  nodeSelector:
    kubernetes.io/os: windows
  terminationGracePeriodSeconds: 60
  containers:
    - name: app
      image: your-windows-app-image
      command: ["powershell", "-Command", "Start-Sleep -Seconds 300"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why it’s helpful:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;✅ Preserves app data integrity
&lt;/li&gt;
&lt;li&gt;✅ Simple to test with &lt;code&gt;kubectl delete pod&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;✅ Essential for apps with shutdown routines&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  💾 4. Change Block Tracking (CBT) – Alpha
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Use Case:
&lt;/h3&gt;

&lt;p&gt;You maintain large databases or file systems. Full-volume snapshots are too slow and consume unnecessary storage.&lt;/p&gt;

&lt;h3&gt;
  
  
  How it works:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cbt-pvc
  annotations:
    snapshot.storage.kubernetes.io/change-block-tracking: "true"
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  storageClassName: csi-cbt-enabled
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Add a special annotation to your PVC to enable CBT.&lt;/li&gt;
&lt;li&gt;Ensure your CSI driver supports CBT (e.g., &lt;code&gt;csi-cbt-enabled&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Snapshots capture only changed blocks, not the whole volume.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why it matters:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;✅ Faster incremental backups
&lt;/li&gt;
&lt;li&gt;✅ Reduced snapshot size
&lt;/li&gt;
&lt;li&gt;✅ Improves disaster recovery speed&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ⚙️ 5. Pod-Level Resource Limits
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Use Case:
&lt;/h3&gt;

&lt;p&gt;You’re running multiple containers inside a single pod (e.g., app + sidecar in a CI pipeline). Individual container limits are too rigid.&lt;/p&gt;

&lt;h3&gt;
  
  
  What’s new:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: v1
kind: Pod
metadata:
  name: resource-shared-pod
spec:
  containers:
    - name: container-a
      image: your-app-image
    - name: container-b
      image: your-app-image
  resources:
    limits:
      cpu: "2"
      memory: "4Gi"
    requests:
      cpu: "1"
      memory: "2Gi"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Set &lt;strong&gt;resource limits at the pod level&lt;/strong&gt;, not just per container.&lt;/li&gt;
&lt;li&gt;Containers can &lt;strong&gt;share total CPU/memory quotas&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why it’s great:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;✅ More efficient resource sharing
&lt;/li&gt;
&lt;li&gt;✅ Great for CI/CD, proxies, and log sidecars
&lt;/li&gt;
&lt;li&gt;✅ Reduces over-provisioning and increases node density&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🔍 6. Enhanced Observability with &lt;code&gt;/statusz&lt;/code&gt; and &lt;code&gt;/flagz&lt;/code&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Use Case:
&lt;/h3&gt;

&lt;p&gt;DevOps and SRE teams can now monitor component health and configuration more efficiently. These endpoints make it easier to audit settings, detect misconfigurations, and ensure runtime consistency during upgrades or debugging.&lt;/p&gt;

&lt;p&gt;🔍 /statusz&lt;br&gt;
Reports the health status of the component.&lt;br&gt;
Example output: ok if the component is functioning properly.&lt;/p&gt;

&lt;p&gt;⚙️ /flagz&lt;br&gt;
Lists runtime flags and configuration values for the component.&lt;br&gt;
Helps verify the active settings on running nodes or control-plane components.&lt;/p&gt;

&lt;h3&gt;
  
  
  How it works:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Enable &lt;code&gt;ComponentStatusz&lt;/code&gt; and &lt;code&gt;ComponentFlagz&lt;/code&gt; feature gates.&lt;/li&gt;
&lt;li&gt;Access these built-in endpoints:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Final Thoughts&lt;/p&gt;

&lt;p&gt;Kubernetes 1.32 isn’t just a list of features—it’s a set of solutions to common challenges faced by teams managing complex workloads.&lt;br&gt;
Whether you’re focused on AI/ML efficiency, storage hygiene, Windows reliability, or control-plane observability, this release has something valuable for you.&lt;/p&gt;

&lt;p&gt;👉 I’ve created a GitHub repo with all YAML examples for these use cases:&lt;br&gt;
 🔗 &lt;a href="https://lnkd.in/emkKCxuY" rel="noopener noreferrer"&gt;https://lnkd.in/emkKCxuY&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let me know which feature you're most excited to try—or if you’re already using it in production!&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>k8</category>
      <category>sre</category>
    </item>
    <item>
      <title>Building Smarter Local AI Agents with MCP: A Simple Client-Server Example</title>
      <dc:creator>RajeevaChandra</dc:creator>
      <pubDate>Tue, 29 Apr 2025 13:35:47 +0000</pubDate>
      <link>https://forem.com/rajeev_3ce9f280cbae73b234/building-smarter-local-ai-agents-with-mcp-a-simple-client-server-example-4lfm</link>
      <guid>https://forem.com/rajeev_3ce9f280cbae73b234/building-smarter-local-ai-agents-with-mcp-a-simple-client-server-example-4lfm</guid>
      <description>&lt;p&gt;In today's AI landscape, enabling a &lt;strong&gt;Local LLM&lt;/strong&gt; (like &lt;strong&gt;Llama3 via Ollama&lt;/strong&gt;) to understand user intent and &lt;strong&gt;dynamically call Python functions&lt;/strong&gt; is a critical capability.&lt;/p&gt;

&lt;p&gt;The foundation of this interaction is &lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In this blog, I'll show you a &lt;strong&gt;simple working example&lt;/strong&gt; of an &lt;strong&gt;MCP Client&lt;/strong&gt; and &lt;strong&gt;MCP Server&lt;/strong&gt; communicating locally using &lt;strong&gt;pure &lt;code&gt;stdio&lt;/code&gt;&lt;/strong&gt; — no networking needed!&lt;/p&gt;




&lt;h2&gt;
  
  
  🔹 How It Works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ✅ MCP Server
&lt;/h3&gt;

&lt;p&gt;The MCP Server acts as a &lt;strong&gt;toolbox&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;- Exposes Python functions (&lt;code&gt;add&lt;/code&gt;, &lt;code&gt;multiply&lt;/code&gt;, etc.)&lt;/li&gt;
&lt;li&gt;- Waits silently for requests to execute tools&lt;/li&gt;
&lt;li&gt; It executes the function and returns the result.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp.server.fastmcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastMCP&lt;/span&gt;

&lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastMCP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;calculator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;

&lt;span class="nd"&gt;@mcp.tool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;multiply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transport&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;stdio&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Server &lt;strong&gt;registers tools&lt;/strong&gt; and &lt;strong&gt;waits&lt;/strong&gt; for client requests!&lt;/p&gt;




&lt;h3&gt;
  
  
  ✅ MCP Client
&lt;/h3&gt;

&lt;p&gt;The MCP Client is the &lt;strong&gt;messenger&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lists available tools&lt;/li&gt;
&lt;li&gt;Sends tools to the LLM&lt;/li&gt;
&lt;li&gt;Forwards tool call requests to the Server&lt;/li&gt;
&lt;li&gt;Collects and returns results&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Client is like a translator and dispatcher — handling everything between the model and the tools.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ClientSession&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp.client.stdio&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;stdio_client&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mcp.client.server_parameters&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StdioServerParameters&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;server_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StdioServerParameters&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;math_server.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;stdio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;stdio_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;server_params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;ClientSession&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;stdio&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;initialize&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_tools&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Available tools:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;add&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Result of add(5,8):&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Client manages the conversation and tool execution.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔹 Communication: stdio
&lt;/h2&gt;

&lt;p&gt;Instead of HTTP or network APIs, the Client and Server communicate &lt;strong&gt;directly over &lt;code&gt;stdin&lt;/code&gt;/&lt;code&gt;stdout&lt;/code&gt;&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Client ➡️ Server: Requests like &lt;code&gt;list_tools&lt;/code&gt; and &lt;code&gt;call_tool&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Server ➡️ Client: Replies with tools and results&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;✅ Fast, lightweight, private communication&lt;br&gt;
✅ Perfect for local LLM setups&lt;/p&gt;




&lt;h2&gt;
  
  
  📈 Visual Flow
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Felrj904jg9ydlofywwdl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Felrj904jg9ydlofywwdl.png" alt="Diagram of stdio protocol flow in AI model interactions showing input/output streams" width="800" height="498"&gt;&lt;/a&gt;  &lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Pattern Matters
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;No API servers, no networking complexity&lt;/li&gt;
&lt;li&gt;Fast, local, secure communication&lt;/li&gt;
&lt;li&gt;Easily extendable: add new tools, no need to rebuild the architecture&lt;/li&gt;
&lt;li&gt;Foundation for building smart autonomous agents with local LLMs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can easily extend this pattern to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Add more Python tools&lt;/li&gt;
&lt;li&gt;Connect Streamlit or FastAPI frontends&lt;/li&gt;
&lt;li&gt;Dockerize the full stack&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  📚 Full GitHub Project
&lt;/h2&gt;

&lt;p&gt;👉 &lt;a href="https://github.com/rajeevchandra/mcp-client-server-example" rel="noopener noreferrer"&gt;https://github.com/rajeevchandra/mcp-client-server-example&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;✅ MCP Server &amp;amp; MCP Client Code&lt;br&gt;
✅ Local LLM setup with Ollama&lt;br&gt;
✅ Full README + Diagrams&lt;/p&gt;

&lt;h2&gt;
  
  
  🚀 Final Thought
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;"Smarter agents don’t know everything — they know how to use the right tool at the right time."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Would love to hear your thoughts if you check it out or build something on top of it! 🚀&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>llm</category>
      <category>ai</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Introducing Postgres MCP Server: Query Your Database in Plain English with AI</title>
      <dc:creator>RajeevaChandra</dc:creator>
      <pubDate>Wed, 23 Apr 2025 02:58:15 +0000</pubDate>
      <link>https://forem.com/rajeev_3ce9f280cbae73b234/introducing-postgres-mcp-server-query-your-database-in-plain-english-with-ai-l0o</link>
      <guid>https://forem.com/rajeev_3ce9f280cbae73b234/introducing-postgres-mcp-server-query-your-database-in-plain-english-with-ai-l0o</guid>
      <description>&lt;p&gt;Have you ever wished you could just ask your database a question, without writing SQL?&lt;/p&gt;

&lt;p&gt;"Show me the average salary by department."&lt;br&gt;
"List employees in New York earning over $80K."&lt;br&gt;
"Plot monthly sales trends."&lt;/p&gt;

&lt;p&gt;What if you could get these answers instantly, without writing a single SQL query?&lt;/p&gt;

&lt;p&gt;That’s exactly why I built Postgres MCP Server—an open-source AI SQL dashboard that translates natural language into safe, optimized PostgreSQL queries.&lt;/p&gt;

&lt;p&gt;✅ What Postgres MCP Server Can Do&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🧠 Natural Language to SQL: Converts human questions into valid SQL queries using LLaMA 3 via Ollama.&lt;/li&gt;
&lt;li&gt;📊 Statistical Data Analysis: Computes summary stats, correlation matrices, and aggregates on your data automatically.&lt;/li&gt;
&lt;li&gt;📅 Time Series &amp;amp; Charts: Detects date fields and visualizes trends using line/bar charts.&lt;/li&gt;
&lt;li&gt;💬 Prompt-Based Filtering: Understands queries like “employees in NY earning over 80K” and applies them as SQL filters.&lt;/li&gt;
&lt;li&gt;📎 MCP-Compliant API Server: Exposes sql://query and table://list tools via the Model Context Protocol for LLM and agent compatibility.&lt;/li&gt;
&lt;li&gt;📦 Streamlit Dashboard: Clean, reactive UI to browse data, input prompts, see SQL, and export CSV.&lt;/li&gt;
&lt;li&gt;🔐 Safe Read-Only Queries: Executes only non-destructive SQL with validation; protects your source database.&lt;/li&gt;
&lt;li&gt;🧱 Dockerized Setup: Entire app runs locally using Docker Compose — PostgreSQL, Streamlit, MCP server, Ollama.&lt;/li&gt;
&lt;li&gt;💬 LLM Agent-Ready: Compatible with Claude, GPT, LangChain, or AutoGen frameworks via MCP schema.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why MCP? (Model Context Protocol)&lt;br&gt;
Most AI agents rely on hardcoded APIs or brittle prompts—but MCP changes that. It’s an open protocol that lets LLMs discover and use tools dynamically.&lt;/p&gt;

&lt;p&gt;MCP enables:&lt;br&gt;
✅ Self-documenting APIs (LLMs understand what your server can do)&lt;br&gt;
✅ Agent-friendly tool discovery (no rigid integrations)&lt;br&gt;
✅ Flexible schema definitions (describe tables, queries, and operations in a model-readable way)&lt;/p&gt;

&lt;p&gt;Instead of writing custom prompts for every agent, MCP lets your LLM automatically understand how to query your database.&lt;/p&gt;

&lt;p&gt;🧪 Sample Prompts&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Show total number of employees"&lt;/li&gt;
&lt;li&gt;"List departments with avg salary &amp;gt; 80K"&lt;/li&gt;
&lt;li&gt;"Number of employees in each location"&lt;/li&gt;
&lt;li&gt;"Plot salary trends over time"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The server translates these into SQL, executes them securely, and returns results in the UI.&lt;/p&gt;

&lt;p&gt;🚀 Run It Locally in 3 Steps&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/rajeevchandra/mcp-ollama-postgres  
cd mcp-ollama-postgres  
docker-compose up --build  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Streamlit UI → &lt;a href="http://localhost:8501" rel="noopener noreferrer"&gt;http://localhost:8501&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MCP Server → &lt;a href="http://localhost:3333" rel="noopener noreferrer"&gt;http://localhost:3333&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note: Requires Ollama with llama3 pulled (ollama pull llama3).&lt;/p&gt;

&lt;p&gt;Try it out, star the repo, and let me know what you think!&lt;br&gt;
GitHub - &lt;a href="https://github.com/rajeevchandra/mcp-ollama-postgres" rel="noopener noreferrer"&gt;https://github.com/rajeevchandra/mcp-ollama-postgres&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Would love your feedback—what features would make this even more useful for you? 🚀&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>docker</category>
      <category>machinelearning</category>
      <category>llm</category>
    </item>
    <item>
      <title>Building a Local AI Agent with Ollama + MCP + LangChain + Docker"</title>
      <dc:creator>RajeevaChandra</dc:creator>
      <pubDate>Mon, 21 Apr 2025 03:55:02 +0000</pubDate>
      <link>https://forem.com/rajeev_3ce9f280cbae73b234/building-a-local-ai-agent-with-ollama-mcp-docker-37a</link>
      <guid>https://forem.com/rajeev_3ce9f280cbae73b234/building-a-local-ai-agent-with-ollama-mcp-docker-37a</guid>
      <description>&lt;h2&gt;
  
  
  🧠 Empowering Local AI with Tools: Ollama + MCP + Docker
&lt;/h2&gt;

&lt;p&gt;Have you ever wanted to run a local AI agent that does more than just chat? What if it could list and summarize files on your machine — using just natural language?&lt;/p&gt;

&lt;p&gt;In this post, you'll build a &lt;strong&gt;fully offline AI agent&lt;/strong&gt; using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://ollama.com/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; to run local LLMs like &lt;code&gt;qwen2:7b&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://smith.langchain.com/mcp" rel="noopener noreferrer"&gt;LangChain MCP&lt;/a&gt; for tool usage&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://fastapi.tiangolo.com/" rel="noopener noreferrer"&gt;FastAPI&lt;/a&gt; to build a tool server&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://streamlit.io/" rel="noopener noreferrer"&gt;Streamlit&lt;/a&gt; for an optional frontend&lt;/li&gt;
&lt;li&gt;Docker + Docker Compose to glue it all together&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why Use MCP (Model Context Protocol)?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;MCP allows models (like those running in LangChain) to discover tools at runtime via a RESTful API dynamically.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Unlike traditional hardcoded tool integrations, MCP makes it declarative and modular.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can plug and unplug tools without changing model logic.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;“Think of MCP as a universal remote for your LLM tools.”&lt;/p&gt;

&lt;p&gt;🧠 Why Ollama?&lt;br&gt;
Ollama makes it dead-simple to run LLMs locally with one line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ollama run qwen2:7b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You get full data privacy, no usage limits, and offline AI&lt;/p&gt;

&lt;p&gt;🤖 Why qwen2:7b?&lt;/p&gt;

&lt;p&gt;Qwen2 is a strong open-source model from Alibaba, excelling at reasoning and tool usage&lt;/p&gt;

&lt;p&gt;Works well for agents, summaries, and structured thinking tasks&lt;/p&gt;

&lt;p&gt;You could also swap in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;mistral:7b (more lightweight)&lt;/li&gt;
&lt;li&gt;llama3:8b (strong general-purpose)&lt;/li&gt;
&lt;li&gt;phi3 (fast and smart in low RAM setups)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🚀 What You'll Build
&lt;/h2&gt;

&lt;p&gt;A tool-using AI agent that:&lt;/p&gt;

&lt;p&gt;✅ Lists text files in a local folder&lt;br&gt;&lt;br&gt;
✅ Summarizes any selected file&lt;br&gt;&lt;br&gt;
✅ Runs 100% locally — no API keys, no cloud  &lt;/p&gt;

&lt;p&gt;This is perfect for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Private AI assistants&lt;/li&gt;
&lt;li&gt;Offline development&lt;/li&gt;
&lt;li&gt;Custom workflows with local data&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  🧩 Architecture
&lt;/h2&gt;

&lt;p&gt;[🧠 Ollama (LLM)] &lt;br&gt;
       ↓&lt;br&gt;
[🔗 LangChain Agent w/ MCP Tool Access]&lt;br&gt;
       ↓&lt;br&gt;
[🛠️ FastAPI MCP Server]&lt;br&gt;
       ↓&lt;br&gt;
[📁 Local Filesystem]&lt;/p&gt;

&lt;p&gt;🛠️1) Prerequisites&lt;/p&gt;

&lt;p&gt;Install Ollama and run a model like qwen2:7b:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ollama run qwen2:7b

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;📦2) Clone and Run&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/rajeevchandra/mcp-ollama-file-agent
cd ollama-mcp-tool-agent

docker-compose up --build
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✅ This starts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;mcp-server: FastAPI tool server &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd9bwhn88o8by5qxln1hz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd9bwhn88o8by5qxln1hz.png" alt="Screenshot" width="800" height="489"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;streamlit: UI &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa6ikes3v63xaurvmb63t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa6ikes3v63xaurvmb63t.png" alt="Screenshot" width="800" height="441"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;agent-runner: LangChain agent using Ollama + MCP tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;🔄 Example Interaction&lt;br&gt;
Prompt:&lt;/p&gt;

&lt;p&gt;“List files in ./docs and summarize the first one”&lt;/p&gt;

&lt;p&gt;Flow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent selects the list_files tool&lt;/li&gt;
&lt;li&gt;Gets the list of files&lt;/li&gt;
&lt;li&gt;Picks the first file and calls read_and_summarize&lt;/li&gt;
&lt;li&gt;Uses the model to generate a summary&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🧠 Conclusion
&lt;/h2&gt;

&lt;p&gt;With just a few tools — &lt;strong&gt;Ollama&lt;/strong&gt;, &lt;strong&gt;MCP&lt;/strong&gt;, and &lt;strong&gt;LangChain&lt;/strong&gt; — you’ve built a local AI agent that goes beyond chatting: it actually &lt;strong&gt;uses tools&lt;/strong&gt;, interacts with your &lt;strong&gt;filesystem&lt;/strong&gt;, and provides real utility — &lt;strong&gt;all offline&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This project demonstrates how easy it is to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Combine &lt;strong&gt;LLM reasoning&lt;/strong&gt; with &lt;strong&gt;real-world actions&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Keep your data &lt;strong&gt;private and local&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Extend AI capabilities with &lt;strong&gt;modular tools&lt;/strong&gt; via MCP&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As the AI landscape shifts toward more customizable, self-hosted, and privacy-first solutions, this architecture offers a &lt;strong&gt;powerful and flexible blueprint&lt;/strong&gt; for future agents — whether you're automating internal workflows, building developer assistants, or experimenting with multi-agent systems.&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;💬 &lt;strong&gt;If you found this useful or inspiring&lt;/strong&gt;, ⭐️ star the repo, fork it with your own tools, or share what you build in the comments. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;👉 &lt;a href="https://github.com/rajeevchandra/mcp-ollama-file-agent" rel="noopener noreferrer"&gt;GitHub Repo →&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
