Forem: RajeevaChandra

🚀 𝐈𝐧𝐭𝐞𝐠𝐫𝐚𝐭𝐢𝐧𝐠 𝐎𝐩𝐞𝐧𝐀𝐈’𝐬 𝐂𝐡𝐚𝐭𝐊𝐢𝐭 𝐰𝐢𝐭𝐡 𝐅𝐚𝐬𝐭𝐀𝐏𝐈: 𝐀 𝐏𝐫𝐚𝐜𝐭𝐢𝐜𝐚𝐥 𝐆𝐮𝐢𝐝𝐞 𝐭𝐨 𝐁𝐮𝐢𝐥𝐝𝐢𝐧𝐠 𝐌𝐨𝐝𝐞𝐫𝐧 𝐂𝐡𝐚𝐭 𝐀𝐠𝐞𝐧𝐭𝐬

RajeevaChandra — Wed, 08 Oct 2025 02:21:40 +0000

OpenAI unveiled a major update during Dev Day yesterday (Oct 6), introducing a new suite of tools to make building and deploying AI agents much easier.

✨ What’s New:

🧠 The launch includes AgentKit, which gives developers and business users the ability to build, deploy, and optimize agentic AI systems, and ChatKit, a framework for creating rich chat experiences without reinventing the UI layer.

💬 ChatKit lets you embed a production-ready chat interface into your app or website with support for file uploads, tool invocation, and chain-of-thought visualization — all within minutes.

Together, AgentKit and ChatKit bridge the gap between agent logic and user interaction, making it simpler to bring real AI agents into production products.

💡 𝐓𝐰𝐨 𝐖𝐚𝐲𝐬 𝐭𝐨 𝐔𝐬𝐞 𝐂𝐡𝐚𝐭𝐊𝐢𝐭::

According to OpenAI’s documentation, ChatKit can be integrated in two ways:
1️⃣ 𝐑𝐞𝐜𝐨𝐦𝐦𝐞𝐧𝐝𝐞𝐝 𝐈𝐧𝐭𝐞𝐠𝐫𝐚𝐭𝐢𝐨𝐧:
Let OpenAI host and scale everything — you embed the ChatKit widget in your frontend and connect it to an OpenAI Agent Builder backend.
2️⃣ 𝐀𝐝𝐯𝐚𝐧𝐜𝐞𝐝 𝐈𝐧𝐭𝐞𝐠𝐫𝐚𝐭𝐢𝐨𝐧:
Host ChatKit on your own infrastructure, use the ChatKit SDK, and connect it to any backend or model endpoint.

🧱 What I Built
I implemented the Advanced Integration version — a self-hosted Chat Framework combining:
● 𝐂𝐡𝐚𝐭𝐊𝐢𝐭 𝐔𝐈 (𝐅𝐫𝐨𝐧𝐭𝐞𝐧𝐝): A modern React + Next.js interface built with ChatKit components. Supports message history, placeholders, and full customization.
● 𝐅𝐚𝐬𝐭𝐀𝐏𝐈 (𝐁𝐚𝐜𝐤𝐞𝐧𝐝): A lightweight layer exposing /api/chat and /health endpoints. Handles message serialization, temperature control, and integrates with any OpenAI-compatible API.
● 𝐂𝐨𝐧𝐟𝐢𝐠𝐮𝐫𝐚𝐛𝐥𝐞 𝐌𝐨𝐝𝐞𝐥 𝐄𝐧𝐝𝐩𝐨𝐢𝐧𝐭: Flexible backend integration for connecting to any model API or agent orchestration layer.

This setup delivers the same smooth ChatKit experience — but entirely under developer control. It’s modular, lightweight, and can easily connect to private APIs, enterprise systems, or custom agent tools.

⚙️ How It Works

✅ The ChatKit UI sends messages to my FastAPI backend.
✅ The backend processes, formats, and forwards them to the model API.
✅ The response is returned to the UI and rendered instantly in the chat view.
✅All data flow follows the OpenAI Chat Completions schema, so it’s plug-and-play with any model or agent backend.

For those who’d like to explore the setup, I’ve published the full implementation here:
https://lnkd.in/eCvN8mjh

𝐇𝐨𝐰 𝐭𝐨 𝐄𝐱𝐩𝐨𝐬𝐞 𝐀𝐖𝐒 𝐋𝐚𝐦𝐛𝐝𝐚 𝐚𝐬 𝐚𝐧 𝐌𝐂𝐏 𝐓𝐨𝐨𝐥 𝐰𝐢𝐭𝐡 Bedrock 𝐀𝐠𝐞𝐧𝐭𝐂𝐨𝐫𝐞 𝐆𝐚𝐭𝐞𝐰𝐚𝐲

RajeevaChandra — Thu, 28 Aug 2025 02:43:15 +0000

Most enterprises already have dozens (if not hundreds) of AWS Lambda functions powering business logic.
But here’s the problem:
👉 How do you make those functions easily consumable by AI agents in a secure, standardized way?
That’s where Amazon Bedrock AgentCore Gateway + Model Context Protocol (MCP) come in.
Think of the Gateway as a universal adapter that lets your Lambda “speak MCP,” so any agent can discover and call it as a tool.

𝐀𝐦𝐚𝐳𝐨𝐧 𝐁𝐞𝐝𝐫𝐨𝐜𝐤 𝐀𝐠𝐞𝐧𝐭𝐂𝐨𝐫𝐞 𝐆𝐚𝐭𝐞𝐰𝐚𝐲
1️⃣ One-stop bridge for agents → Turn APIs, Lambda functions, or existing services into MCP-compatible tools with just a few lines of config.
2️⃣ Scale with security → Built-in ingress auth (who can call) + egress auth (how Gateway connects to backends).
3️⃣ Developer speed → No weeks of glue code or infra provisioning — Gateway handles it.
4️⃣ Broad support → Works with OpenAPI, Smithy, and Lambda as input types.

What I did in this example:

● Created a simple Lambda function (get_order, update_order) to simulate order lookups and updates.
● Used AgentCore Gateway to expose that Lambda as MCP tools.
● Configured Cognito for inbound OAuth2 authentication and an IAM role for outbound authorization.
● Connected with an MCP client to listTools and callTool, and got back real Lambda responses.

After wiring everything up, my MCP client returned:
🔧 get_order_tool
📝 Action: Fetch order status
📦 Result: { "orderId": "123", "status": "SHIPPED" }

🔧 update_order_tool
📝 Action: Update order status
📦 Result: { "orderId": "123", "status": "UPDATED" }

💻 Want to try this example yourself?
I’ve published the full working code here:
👉 https://lnkd.in/gdrhHFDs

This is just the beginning — next, I’ll explore exposing APIs (OpenAPI/Smithy) and chaining multiple tools together for richer agent workflows.

Reference: Learn More About AgentCore Gateway
To get a deeper understanding of how Amazon Bedrock AgentCore Gateway simplifies tool integration for AI agents, check out the official docs:

https://lnkd.in/gGTB3EJD

𝐁𝐞𝐲𝐨𝐧𝐝 𝐅𝐫𝐞𝐬𝐡𝐧𝐞𝐬𝐬: 𝐇𝐨𝐰 𝐭𝐨 𝐮𝐬𝐞 𝐒𝐞𝐚𝐫𝐜𝐡 𝐌𝐨𝐝𝐞𝐬 𝐢𝐧 𝐕𝐞𝐜𝐭𝐨𝐫 𝐃𝐚𝐭𝐚𝐛𝐚𝐬𝐞𝐬 (𝐰𝐢𝐭𝐡 𝐋𝐚𝐧𝐠𝐜𝐡𝐚𝐢𝐧)

RajeevaChandra — Sun, 24 Aug 2025 03:53:42 +0000

In my last post, I talked about how dynamic embeddings keep your knowledge base fresh as documents evolve. But freshness is only half the story.
When a user asks your assistant a question, 𝐡𝐨𝐰 𝐲𝐨𝐮 𝐬𝐞𝐚𝐫𝐜𝐡 𝐭𝐡𝐞 𝐯𝐞𝐜𝐭𝐨𝐫 𝐝𝐚𝐭𝐚𝐛𝐚𝐬𝐞 determines whether they get:

the single most relevant snippet,
a broader set of context, or
results filtered by metadata like timestamps or document type.

Here are the five main search strategies—explained simply.

1️⃣ Similarity Search (k-NN)

When you type a query, the system converts it into a vector Then, it looks around the vector space for the “neighbors” that sit closest. Those become your top results

👉 Example:
Query: “What is the required capital reserve?”
Result: “Banks must maintain 12% capital reserves.”

2️⃣ Max Marginal Relevance (MMR)

MMR makes sure you don’t get the same answer five times in a row.
Here’s how it works: after finding the most relevant snippet, it deliberately looks for other results that are still relevant but it balances relevance with diversity.

👉 Example:
Query: “Explain capital reserve requirements.”
Results: “Banks must maintain 12% capital reserves.”
“These reserves are adjusted annually based on regulations.”

Notice how the second snippet doesn’t just repeat the first—it brings in a new angle. That’s MMR at work.

3️⃣ Filtered / Metadata Search
Sometimes “closest meaning” isn’t the whole story—you also care about context and constraints. That’s where metadata filtering comes in.
Think of it as adding a funnel on top of similarity search. You still find the closest matches, but only those that meet extra rules like date, document type, source, or author.

👉 Example:
Query: “What’s the latest capital reserve requirement?”
Filter: updated_at > 2025-01-01
Result: The system ignores older documents and only shows the most recent rule—even if the older ones are technically “closer” in meaning.

4️⃣ Hybrid Search (Keyword + Vector)

Sometimes, meaning alone isn’t enough. What if your query includes an exact code, acronym, or ID? A pure semantic search might blur it, but a keyword search nails it.

Hybrid search combines the two:

Vector search captures the context and meaning.
Keyword search makes sure specific terms (like “CRR-2025”) get the priority they deserve.

👉 Example:

Query: “Capital Reserve Rule CRR-2025”
Vector search → understands it’s about capital reserves.
Keyword search → ensures documents mentioning CRR-2025 are ranked higher.

5️⃣ Cross-Encoder Reranking

Starts with a fast similarity search, then uses a deeper model (like BERT) to re-score the top candidates for accuracy.

👉 Query: “What are the capital reserve rules for 2025?”
Step 1: Initial retrieval → 10 candidates
Step 2: Reranker → re-scores and picks the single best snippet

Want to explore the full code base?
https://lnkd.in/eec9AiHy

📊 Search Strategies at a Glance

Strategy	How it Works	Pros	Cons	Best For
Similarity Search (k-NN)	Finds nearest neighbors in vector space	Fast & simple	Can return repetitive or narrow results	Quick lookups, FAQs
Max Marginal Relevance (MMR)	Balances relevance + diversity	Avoids duplicates, adds variety	Slightly slower	Explanations, multi-fact answers
Filtered / Metadata Search	Adds constraints (date, type, source) on top of similarity	Ensures results match business rules	Needs clean, consistent metadata	Compliance, regulations, versioned docs
Hybrid Search	Combines keyword search with vector similarity	Best of both worlds (context + exact match)	Requires extra infra (ElasticSearch, OpenSearch)	IDs, codes, acronyms, technical docs
Cross-Encoder Reranking	Re-scores initial candidates with a deeper model (e.g., BERT)	Highest precision	Computationally heavy	Mission-critical answers, high-accuracy apps

🔑 Key Takeaway

Static embeddings = stale snapshots
Dynamic embeddings = living knowledge

This pipeline keeps context fresh and supports multiple retrieval modes so you can choose the right strategy for your production needs.

𝐁𝐮𝐢𝐥𝐝𝐢𝐧𝐠 𝐚 𝐃𝐲𝐧𝐚𝐦𝐢𝐜 𝐑𝐀𝐆 𝐏𝐢𝐩𝐞𝐥𝐢𝐧𝐞 𝐰𝐢𝐭𝐡 𝐋𝐚𝐧𝐠𝐂𝐡𝐚𝐢𝐧 (𝐓𝐡𝐚𝐭 𝐒𝐭𝐚𝐲𝐬 𝐅𝐫𝐞𝐬𝐡)

RajeevaChandra — Sat, 23 Aug 2025 04:26:52 +0000

Most RAG (Retrieval-Augmented Generation) systems work fine for static knowledge bases—but the moment your documents start changing (new policies, updated financials, revised product specs), they quickly go stale.

We solved that with a dynamic RAG pipeline that keeps embeddings and context fresh without doing heavy full rebuilds. Here’s how it works:

🧩 High-Level Flow

1️⃣ 𝐖𝐚𝐭𝐜𝐡𝐞𝐫 (𝐅𝐢𝐥𝐞/𝐒3 𝐜𝐡𝐚𝐧𝐠𝐞𝐬)
▪ Continuously listens for file changes (local folder or S3 bucket).
▪ Detects when a document is new, updated, or deleted.
2️⃣𝐄𝐦𝐛𝐞𝐝𝐝𝐢𝐧𝐠 (𝐨𝐧𝐥𝐲 𝐮𝐩𝐝𝐚𝐭𝐞𝐬)
▪ Instead of re-embedding everything, it re-embeds only the changed chunks.
▪ Saves time and compute costs while keeping the knowledge base fresh.
3️⃣ 𝐕𝐞𝐜𝐭𝐨𝐫 𝐃𝐁 (𝐂𝐡𝐫𝐨𝐦𝐚)
▪ Stores embeddings with metadata like updated_at.
▪ When conflicts arise (e.g., same document with old + new facts), retrieval logic can guide the LLM to trust the freshest snippet.
4️⃣ 𝐋𝐋𝐌 (𝐎𝐥𝐥𝐚𝐦𝐚/𝐎𝐩𝐞𝐧𝐀𝐈)
▪ Takes the top-k retrieved chunks and augments the query.
▪ Produces a contextualized answer with citations.
5️⃣ 𝐒𝐭𝐫𝐞𝐚𝐦𝐥𝐢𝐭 𝐔𝐈
▪ Users simply ask questions.
▪ The UI calls the FastAPI backend, retrieves from Chroma, and passes to the LLM.
▪Responses include answers + sources, so users know why the model said what it did.

🚧 The Challenge (Simple Example)
One file said:
➡️ “All banks must maintain capital reserves of 10%.”
Later, an update stated:
➡️ “All banks must maintain capital reserves of 12%.”
When I asked: “What is the required capital reserve?”

Static RAG: “I don’t know.” (confused by conflicting facts)
Dynamic RAG: “12%” (trusts the most recent doc)

𝐓𝐡𝐞 𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧 — 𝐃𝐲𝐧𝐚𝐦𝐢𝐜 𝐄𝐦𝐛𝐞𝐝𝐝𝐢𝐧𝐠𝐬
🔄 Watches for new/updated docs in real time
⚡ Re-embeds only what changes (no full rebuilds)
🏷️ Tracks updated_at so the LLM knows the freshest fact
🧠 Guides the model to resolve conflicts by trusting the most recent snippet
Now, when a file is updated, the system re-embeds instantly and gives the right answer.

For the full working codebase, check my GitHub repo
https://github.com/rajeevchandra/dynamic_embeddings

At the end of the day, AI systems are only as useful as the freshness of the knowledge they rely on. Building dynamic pipelines isn’t just about better tech — it’s about building assistants that can actually keep up with how fast the world changes.

How 𝐋𝐚𝐧𝐠𝐒𝐦𝐢𝐭𝐡, 𝐋𝐚𝐧𝐠𝐆𝐫𝐚𝐩𝐡, 𝐎𝐥𝐥𝐚𝐦𝐚 & 𝐅𝐀𝐈𝐒𝐒 Gave Me End-to-End Observability in a Local AI Chatbot

RajeevaChandra — Fri, 16 May 2025 03:18:20 +0000

I just built a self-contained AI chatbot—no cloud dependencies, no API keys, just pure local power with 𝐋𝐚𝐧𝐠𝐆𝐫𝐚𝐩𝐡, 𝐎𝐥𝐥𝐚𝐦𝐚, 𝐅𝐀𝐈𝐒𝐒, 𝐚𝐧𝐝 𝐋𝐚𝐧𝐠𝐒𝐦𝐢𝐭𝐡.

Tech Stack:

🤖 Build a fully local AI chatbot that answers questions from uploaded documents, with zero cloud dependencies.
🔧 LangChain: Orchestrate the chatbot logic using modular, state-based workflows (e.g., retrieve → generate → feedback).
📚 Ollama + FAISS + TF-IDF: Run a local llama3 model for response generation, and use TF-IDF + FAISS for fast, document-based context retrieval.
🖥 Streamlit: Provide an interactive web interface where users can upload files and chat with the bot in real time.
📊 LangSmith: Enable full observability — trace queries, inspect prompts, monitor latency, and analyze errors or retrieval issues end-to-end.

At first, it answered questions well enough.
But the real game-changer?
I could trace every step of its reasoning.

𝐋𝐚𝐧𝐠𝐒𝐦𝐢𝐭𝐡 gave me the transparency I never knew I needed, revealing the exact document chunks retrieved, the prompts fed to the model, execution times, and even where things went off track.

🚧 The Problem: “It Works” Isn’t Enough

At first, my chatbot seemed to be doing well — it returned reasonable answers to most questions. But then… weird things started to happen..
● Was the wrong chunk retrieved?
● Was the prompt malformed?
● Did the model hallucinate?

Without insight into what was happening step by step, debugging was pure guesswork.

𝐇𝐨𝐰 𝐋𝐚𝐧𝐠𝐒𝐦𝐢𝐭𝐡 𝐇𝐞𝐥𝐩𝐞𝐝 𝐌𝐞 𝐃𝐞𝐛𝐮𝐠 𝐚𝐧𝐝 𝐈𝐦𝐩𝐫𝐨𝐯𝐞

⇒ Tracing
● View each query, chunk retrieval, and LLM response in real-time
● Confirm the right part of the document was being used
● Inspect the exact prompt given to llama3 via Ollama

⇒ Error Analysis
● Trace misfires back to irrelevant or empty document chunks
● Compare expected vs. actual outputs
● Catch malformed inputs or slow model responses

⇒ Performance Metrics
● Track latency for each step (retriever, LLM)
● Identify slowdowns during Ollama inference
● Start tagging “slow” or “retrieval_miss” runs for dashboards

📊 Scaling Visibility with LangSmith Dashboards

LangSmith doesn’t just log traces — it helps you monitor trends over time.
Using their dashboard tools, I now track:
🧠 Number of LLM calls
🕒 Average latency per query
📉 Retrieval failures
💸 Token usage (if using APIs like OpenAI or Anthropic)
❌ Error Rates: Identify failed runs, exceptions, or empty prompts

I’ve published the full working project on GitHub — complete with TF-IDF + FAISS retrieval, Ollama model integration, LangSmith observability, and a Streamlit interface.
https://lnkd.in/etpCMPiS

Talk to Your Kubernetes Cluster Using AI

RajeevaChandra — Tue, 13 May 2025 21:47:14 +0000

Today, I explored kubectl-ai, a powerful CLI from Google Cloud that lets you interact with your Kubernetes cluster using natural language, powered by local LLMs like Mistral (via Ollama) or cloud models like Gemini.

Imagine saying things like:

“List all pods in default namespace”
“generate a deployment with 3 nginx replicas”
“debug a pod stuck in CrashLoopBackOff”
“Generate a YAML for a CronJob that runs every 5 minutes”

And your terminal does the work — no YAML guessing, no docs tab-hopping.

How does Kubectl-ai work?

1) You type a natural language prompt like “List all pods in kube-system”.
2) The kubectl-ai CLI sends your prompt to a connected LLM (like Ollama or Gemini).
3) The LLM interprets the request and returns either a plain explanation, a suggested kubectl command, or a tool-call instruction.
4) kubectl-ai processes the response:
If --dry-run is enabled, it just prints the command.
If --enable-tool-use-shim is used, it extracts and runs the command.

5) The actual kubectl command is executed on your active Kubernetes cluster.
6) The cluster returns the result (like pod lists or deployment status).
7) The output is shown in your terminal — just like you ran the command manually.

kubectl-ai supports the Model Context Protocol (MCP), the emerging open protocol for AI tool interoperability.

💡 This means you can:
1) Build structured, agentic workflows
2) Pipe Kubernetes operations into broader AI systems
3) Connect kubectl-ai to MCP clients (e.g., Claude, Amazon Q)

If you’re already building with MCP, this is a killer entry point into AI-assisted DevOps.

Try It Yourself
🔗 GitHub: https://lnkd.in/e57aFwC6
Want to control Kubernetes with natural language?
This is the cleanest, most extensible way to do it — and it works entirely in your terminal.

𝐀 𝐅𝐮𝐥𝐥𝐲 𝐋𝐨𝐜𝐚𝐥 𝐀𝐈 𝐂𝐡𝐚𝐭𝐛𝐨𝐭 𝐔𝐬𝐢𝐧𝐠 𝐎𝐥𝐥𝐚𝐦𝐚, 𝐋𝐚𝐧𝐠𝐂𝐡𝐚𝐢𝐧 & 𝐂𝐡𝐫𝐨𝐦𝐚𝐃𝐁

RajeevaChandra — Tue, 13 May 2025 03:12:04 +0000

🚀 Today, I got hands-on with a Retrieval-Augmented Generation (RAG) setup that runs entirely offline. I built a private AI assistant that can answer questions from Markdown and PDF documentation — no cloud, no API keys.

🧱 Ollama for local LLM & embedding
🔍 LangChain for RAG orchestration + memory
📦 ChromaDB for vector storage
💬 Streamlit for the chatbot UI

Key features:
● Upload .md or .pdf Files
● Auto-re-index and embed with nomic-embed-text
● Ask natural questions to mistral (or other local LLMs)
● Multi-turn chat with memory
● Source highlighting for every answer

🧠 How This Local RAG Chatbot Works (Summary)

1) Upload Your Docs
Drag and drop .md and .pdf files into the Streamlit app. The system supports both structured and unstructured formats — no manual formatting needed.

2) Chunking + Embedding
Each document is split into small, context-aware text chunks and embedded locally using the nomic-embed-text model via Ollama.

3) Store in Chroma Vector DB
The resulting embeddings are stored in ChromaDB, enabling fast and accurate similarity search when queries are made.

4) Ask Natural Questions
You type a question like “What are DevOps best practices?”, and the app retrieves the most relevant chunks using semantic search.

5) Answer with LLM + Memory
Retrieved context is passed to mistral (or any Ollama-compatible LLM). LangChain manages session memory for multi-turn Q&A.

6) Sources Included
Each answer shows where it came from — including the filename and content snippet — so you can trust and trace every response.

Display answer + source documents in Streamlit

💬 Example Prompts

"What is a microservice?"
"How does Kubernetes manage pod lifecycle?"
"Give me an example Docker Compose file."
"What are DevOps best practices?"

Honestly, this was one of those projects that reminded me how far local AI tools have come. No cloud APIs, no fancy GPU rig — just a regular laptop, and I was able to build a fully working RAG chatbot that reads my docs and gives solid, contextual answers.

If you’ve ever wanted to interact with your own knowledge base — internal docs, PDFs, notes — in a more natural way, this setup is 100% worth trying. It's private, surprisingly fast, and honestly, kind of fun to put together.

🏦 Automating Loan Underwriting with Agentic AI: LangGraph, MCP & Amazon SageMaker in Action

RajeevaChandra — Fri, 09 May 2025 21:17:34 +0000

To demonstrate the power of Model Context Protocol (MCP) in real-world enterprise AI, I recently ran a loan underwriting pipeline that combines:

MCP for tool-style interaction between LLMs and services
LangGraph to orchestrate multi-step workflows
Amazon SageMaker to securely host the LLM
FastAPI to serve agents with modular endpoints

What Is LangGraph?

LangGraph is a framework for orchestrating multi-step, stateful workflows across LLM-powered agents.

🔄 Graph-based execution engine: It lets you define agent workflows as nodes in a graph, enabling branching, retries, and memory — perfect for multi-agent AI systems.

🔗 Seamless tool and state handling: It maintains structured state across steps, making it easy to pass outputs between agents like Loan Officer → Credit Analyst → Risk Manager.

Each agent doesn’t run in isolation — they’re stitched together with LangGraph, a framework that lets you:

● Define multi-agent workflows
● Handle flow control, retries, state transitions
● Pass structured data from one agent to the next

Here’s how it works — and why it’s a powerful architectural pattern for decision automation

🧾 The Use Case: AI-Driven Loan Underwriting

Loan underwriting typically involves:

Reviewing applicant details
Evaluating creditworthiness
Making a final approval or denial decision

In this architecture, each role is performed by a dedicated AI agent:

Loan Officer– Summarizes application details
Credit Analyst– Assesses financial risk
Risk Manager – Makes the final decision

🧱 Architecture Overview

This workflow is powered by a centralized LLM, hosted on Amazon SageMaker, with each agent deployed as an **MCP server on EC2 and orchestrated via LangGraph:

Workflow Steps:

User submits loan details (e.g., name, income, credit score)
MCP client routes the request to the Loan Officer MCP server
Output is forwarded to the Credit Analyst MCP server
Result is passed to the Risk Manager MCP server
A final prompt is generated, processed by the LLM on SageMaker, and sent back to the user

Image Credit: AWS

I have used below model for the execution

Model: Qwen/Qwen2.5-1.5B-Instruct
Source: Hugging Face
Hosted on: Amazon SageMaker (Hugging Face LLM Inference Container)

Image credit: "AWS"

🔗 Want to Try It?

👉 Official AWS Blog

Kubernetes 1.32: Real-World Use Cases & Examples

RajeevaChandra — Fri, 02 May 2025 13:45:29 +0000

Kubernetes 1.32: Real-World Use Cases & Examples

The Kubernetes 1.32 release, codenamed "Penelope", introduces thoughtful features aimed at making workloads more efficient, observable, and manageable.

In this post, I’ve compiled practical examples for each major feature, making it easier to see how they fit into your everyday Kubernetes workflow.

🎯 1. Dynamic Resource Allocation (DRA) Enhancements

Use Case:

A financial services company needs to train ML models that require GPUs with at least 16GB of memory. Instead of hardcoding node selection, DRA dynamically allocates GPU resources at runtime.

What it does:

Uses a ResourceClaimTemplate to define GPU access.
Pods request GPUs without being tied to specific nodes.
Runs a container that uses an NVIDIA GPU to train a model.

Why it matters:

Template:

apiVersion: resource.k8s.io/v1alpha2
kind: ResourceClaimTemplate
metadata:
  name: gpu-claim-template
spec:
  metadata:
    labels:
      resource: nvidia-gpu
  spec:
    resourceClassName: nvidia.com/gpu

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  resourceClaims:
    - name: gpu
      source:
        resourceClaimTemplateName: gpu-claim-template
  containers:
    - name: ml-trainer
      image: your-ml-image
      command: ["python", "train.py"]
      resources:
        limits:
          nvidia.com/gpu: 1

✅ Dynamically provisions GPU at runtime
✅ Avoids node pre-binding
✅ Ideal for ML training, AI workloads, and GPU-heavy applications

🧹 2. Auto-Removal of PVCs in StatefulSets

Use Case:

Your team deploys short-lived stateful workloads (like test environments). Without cleanup, leftover PVCs accumulate.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: data-processor
spec:
  serviceName: "data-service"
  replicas: 3
  selector:
    matchLabels:
      app: data-processor
  persistentVolumeClaimRetentionPolicy:
    whenDeleted: Delete
    whenScaled: Delete
  template:
    metadata:
      labels:
        app: data-processor
    spec:
      containers:
        - name: processor
          image: your-data-processor-image
          volumeMounts:
            - name: data-storage
              mountPath: /data
  volumeClaimTemplates:
    - metadata:
        name: data-storage
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 10Gi

Why it’s useful:

✅ Automatically deletes PVCs when a pod is removed or scaled down
✅ Prevents orphaned volumes
✅ Great for ephemeral data processing jobs and simulations

🪟 3. Graceful Shutdown for Windows Nodes

Use Case:

You run Windows-based apps in your cluster. During node shutdown, you need those apps to clean up gracefully instead of abruptly terminating.

What’s new:

Kubernetes 1.32 adds graceful shutdown support for Windows pods.

apiVersion: v1
kind: Pod
metadata:
  name: windows-app
spec:
  nodeSelector:
    kubernetes.io/os: windows
  terminationGracePeriodSeconds: 60
  containers:
    - name: app
      image: your-windows-app-image
      command: ["powershell", "-Command", "Start-Sleep -Seconds 300"]

Why it’s helpful:

✅ Preserves app data integrity
✅ Simple to test with kubectl delete pod
✅ Essential for apps with shutdown routines

💾 4. Change Block Tracking (CBT) – Alpha

Use Case:

You maintain large databases or file systems. Full-volume snapshots are too slow and consume unnecessary storage.

How it works:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cbt-pvc
  annotations:
    snapshot.storage.kubernetes.io/change-block-tracking: "true"
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  storageClassName: csi-cbt-enabled

Add a special annotation to your PVC to enable CBT.
Ensure your CSI driver supports CBT (e.g., csi-cbt-enabled).
Snapshots capture only changed blocks, not the whole volume.

Why it matters:

✅ Faster incremental backups
✅ Reduced snapshot size
✅ Improves disaster recovery speed

⚙️ 5. Pod-Level Resource Limits

Use Case:

You’re running multiple containers inside a single pod (e.g., app + sidecar in a CI pipeline). Individual container limits are too rigid.

What’s new:

apiVersion: v1
kind: Pod
metadata:
  name: resource-shared-pod
spec:
  containers:
    - name: container-a
      image: your-app-image
    - name: container-b
      image: your-app-image
  resources:
    limits:
      cpu: "2"
      memory: "4Gi"
    requests:
      cpu: "1"
      memory: "2Gi"

Set resource limits at the pod level, not just per container.
Containers can share total CPU/memory quotas.

Why it’s great:

✅ More efficient resource sharing
✅ Great for CI/CD, proxies, and log sidecars
✅ Reduces over-provisioning and increases node density

🔍 6. Enhanced Observability with `/statusz` and `/flagz`

Use Case:

DevOps and SRE teams can now monitor component health and configuration more efficiently. These endpoints make it easier to audit settings, detect misconfigurations, and ensure runtime consistency during upgrades or debugging.

🔍 /statusz
Reports the health status of the component.
Example output: ok if the component is functioning properly.

⚙️ /flagz
Lists runtime flags and configuration values for the component.
Helps verify the active settings on running nodes or control-plane components.

How it works:

Enable ComponentStatusz and ComponentFlagz feature gates.
Access these built-in endpoints:

Final Thoughts

Kubernetes 1.32 isn’t just a list of features—it’s a set of solutions to common challenges faced by teams managing complex workloads.
Whether you’re focused on AI/ML efficiency, storage hygiene, Windows reliability, or control-plane observability, this release has something valuable for you.

👉 I’ve created a GitHub repo with all YAML examples for these use cases:
🔗 https://lnkd.in/emkKCxuY

Let me know which feature you're most excited to try—or if you’re already using it in production!

Building Smarter Local AI Agents with MCP: A Simple Client-Server Example

RajeevaChandra — Tue, 29 Apr 2025 13:35:47 +0000

In today's AI landscape, enabling a Local LLM (like Llama3 via Ollama) to understand user intent and dynamically call Python functions is a critical capability.

The foundation of this interaction is Model Context Protocol (MCP).

In this blog, I'll show you a simple working example of an MCP Client and MCP Server communicating locally using pure stdio — no networking needed!

🔹 How It Works

✅ MCP Server

The MCP Server acts as a toolbox:

- Exposes Python functions (add, multiply, etc.)
- Waits silently for requests to execute tools
It executes the function and returns the result.

Example:

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("calculator")

@mcp.tool()
def add(a: int, b: int) -> int:
    return a + b

@mcp.tool()
def multiply(a: int, b: int) -> int:
    return a * b

if __name__ == "__main__":
    mcp.run(transport='stdio')

The Server registers tools and waits for client requests!

✅ MCP Client

The MCP Client is the messenger:

Lists available tools
Sends tools to the LLM
Forwards tool call requests to the Server
Collects and returns results

The Client is like a translator and dispatcher — handling everything between the model and the tools.

Example:

import asyncio
from mcp import ClientSession
from mcp.client.stdio import stdio_client
from mcp.client.server_parameters import StdioServerParameters

async def main():
    server_params = StdioServerParameters(command="python", args=["math_server.py"])
    stdio = await stdio_client(server_params)

    async with ClientSession(*stdio) as session:
        await session.initialize()
        tools = await session.list_tools()
        print("Available tools:", [tool.name for tool in tools.tools])
        result = await session.call_tool("add", {"a": 5, "b": 8})
        print("Result of add(5,8):", result.content[0].text)

asyncio.run(main())

The Client manages the conversation and tool execution.

🔹 Communication: stdio

Instead of HTTP or network APIs, the Client and Server communicate directly over stdin/stdout:

Client ➡️ Server: Requests like list_tools and call_tool
Server ➡️ Client: Replies with tools and results

✅ Fast, lightweight, private communication
✅ Perfect for local LLM setups

📈 Visual Flow

Why This Pattern Matters

No API servers, no networking complexity
Fast, local, secure communication
Easily extendable: add new tools, no need to rebuild the architecture
Foundation for building smart autonomous agents with local LLMs

You can easily extend this pattern to:

Add more Python tools
Connect Streamlit or FastAPI frontends
Dockerize the full stack

📚 Full GitHub Project

👉 https://github.com/rajeevchandra/mcp-client-server-example

✅ MCP Server & MCP Client Code
✅ Local LLM setup with Ollama
✅ Full README + Diagrams

🚀 Final Thought

"Smarter agents don’t know everything — they know how to use the right tool at the right time."

Would love to hear your thoughts if you check it out or build something on top of it! 🚀

Introducing Postgres MCP Server: Query Your Database in Plain English with AI

RajeevaChandra — Wed, 23 Apr 2025 02:58:15 +0000

Have you ever wished you could just ask your database a question, without writing SQL?

"Show me the average salary by department."
"List employees in New York earning over $80K."
"Plot monthly sales trends."

What if you could get these answers instantly, without writing a single SQL query?

That’s exactly why I built Postgres MCP Server—an open-source AI SQL dashboard that translates natural language into safe, optimized PostgreSQL queries.

✅ What Postgres MCP Server Can Do

🧠 Natural Language to SQL: Converts human questions into valid SQL queries using LLaMA 3 via Ollama.
📊 Statistical Data Analysis: Computes summary stats, correlation matrices, and aggregates on your data automatically.
📅 Time Series & Charts: Detects date fields and visualizes trends using line/bar charts.
💬 Prompt-Based Filtering: Understands queries like “employees in NY earning over 80K” and applies them as SQL filters.
📎 MCP-Compliant API Server: Exposes sql://query and table://list tools via the Model Context Protocol for LLM and agent compatibility.
📦 Streamlit Dashboard: Clean, reactive UI to browse data, input prompts, see SQL, and export CSV.
🔐 Safe Read-Only Queries: Executes only non-destructive SQL with validation; protects your source database.
🧱 Dockerized Setup: Entire app runs locally using Docker Compose — PostgreSQL, Streamlit, MCP server, Ollama.
💬 LLM Agent-Ready: Compatible with Claude, GPT, LangChain, or AutoGen frameworks via MCP schema.

Why MCP? (Model Context Protocol)
Most AI agents rely on hardcoded APIs or brittle prompts—but MCP changes that. It’s an open protocol that lets LLMs discover and use tools dynamically.

MCP enables:
✅ Self-documenting APIs (LLMs understand what your server can do)
✅ Agent-friendly tool discovery (no rigid integrations)
✅ Flexible schema definitions (describe tables, queries, and operations in a model-readable way)

Instead of writing custom prompts for every agent, MCP lets your LLM automatically understand how to query your database.

🧪 Sample Prompts

"Show total number of employees"
"List departments with avg salary > 80K"
"Number of employees in each location"
"Plot salary trends over time"

The server translates these into SQL, executes them securely, and returns results in the UI.

🚀 Run It Locally in 3 Steps

git clone https://github.com/rajeevchandra/mcp-ollama-postgres  
cd mcp-ollama-postgres  
docker-compose up --build

Streamlit UI → http://localhost:8501
MCP Server → http://localhost:3333

Note: Requires Ollama with llama3 pulled (ollama pull llama3).

Try it out, star the repo, and let me know what you think!
GitHub - https://github.com/rajeevchandra/mcp-ollama-postgres

Would love your feedback—what features would make this even more useful for you? 🚀

Building a Local AI Agent with Ollama + MCP + LangChain + Docker"

RajeevaChandra — Mon, 21 Apr 2025 03:55:02 +0000

🧠 Empowering Local AI with Tools: Ollama + MCP + Docker

Have you ever wanted to run a local AI agent that does more than just chat? What if it could list and summarize files on your machine — using just natural language?

In this post, you'll build a fully offline AI agent using:

Ollama to run local LLMs like qwen2:7b
LangChain MCP for tool usage
FastAPI to build a tool server
Streamlit for an optional frontend
Docker + Docker Compose to glue it all together

Why Use MCP (Model Context Protocol)?

MCP allows models (like those running in LangChain) to discover tools at runtime via a RESTful API dynamically.
Unlike traditional hardcoded tool integrations, MCP makes it declarative and modular.
You can plug and unplug tools without changing model logic.

“Think of MCP as a universal remote for your LLM tools.”

🧠 Why Ollama?
Ollama makes it dead-simple to run LLMs locally with one line:

ollama run qwen2:7b

You get full data privacy, no usage limits, and offline AI

🤖 Why qwen2:7b?

Qwen2 is a strong open-source model from Alibaba, excelling at reasoning and tool usage

Works well for agents, summaries, and structured thinking tasks

You could also swap in:

mistral:7b (more lightweight)
llama3:8b (strong general-purpose)
phi3 (fast and smart in low RAM setups)

🚀 What You'll Build

A tool-using AI agent that:

✅ Lists text files in a local folder

✅ Summarizes any selected file

✅ Runs 100% locally — no API keys, no cloud

This is perfect for:

Private AI assistants
Offline development
Custom workflows with local data

🧩 Architecture

[🧠 Ollama (LLM)]
↓
[🔗 LangChain Agent w/ MCP Tool Access]
↓
[🛠️ FastAPI MCP Server]
↓
[📁 Local Filesystem]

🛠️1) Prerequisites

Install Ollama and run a model like qwen2:7b:

ollama run qwen2:7b

📦2) Clone and Run

git clone https://github.com/rajeevchandra/mcp-ollama-file-agent
cd ollama-mcp-tool-agent

docker-compose up --build

✅ This starts:

mcp-server: FastAPI tool server

streamlit: UI

agent-runner: LangChain agent using Ollama + MCP tools

🔄 Example Interaction
Prompt:

“List files in ./docs and summarize the first one”

Flow:

Agent selects the list_files tool
Gets the list of files
Picks the first file and calls read_and_summarize
Uses the model to generate a summary

🧠 Conclusion

With just a few tools — Ollama, MCP, and LangChain — you’ve built a local AI agent that goes beyond chatting: it actually uses tools, interacts with your filesystem, and provides real utility — all offline.

This project demonstrates how easy it is to:

Combine LLM reasoning with real-world actions
Keep your data private and local
Extend AI capabilities with modular tools via MCP

As the AI landscape shifts toward more customizable, self-hosted, and privacy-first solutions, this architecture offers a powerful and flexible blueprint for future agents — whether you're automating internal workflows, building developer assistants, or experimenting with multi-agent systems.

💬 If you found this useful or inspiring, ⭐️ star the repo, fork it with your own tools, or share what you build in the comments.

👉 GitHub Repo →

Forem: RajeevaChandra

𝐇𝐨𝐰 𝐭𝐨 𝐄𝐱𝐩𝐨𝐬𝐞 𝐀𝐖𝐒 𝐋𝐚𝐦𝐛𝐝𝐚 𝐚𝐬 𝐚𝐧 𝐌𝐂𝐏 𝐓𝐨𝐨𝐥 𝐰𝐢𝐭𝐡 Bedrock 𝐀𝐠𝐞𝐧𝐭𝐂𝐨𝐫𝐞 𝐆𝐚𝐭𝐞𝐰𝐚𝐲

𝐁𝐞𝐲𝐨𝐧𝐝 𝐅𝐫𝐞𝐬𝐡𝐧𝐞𝐬𝐬: 𝐇𝐨𝐰 𝐭𝐨 𝐮𝐬𝐞 𝐒𝐞𝐚𝐫𝐜𝐡 𝐌𝐨𝐝𝐞𝐬 𝐢𝐧 𝐕𝐞𝐜𝐭𝐨𝐫 𝐃𝐚𝐭𝐚𝐛𝐚𝐬𝐞𝐬 (𝐰𝐢𝐭𝐡 𝐋𝐚𝐧𝐠𝐜𝐡𝐚𝐢𝐧)

📊 Search Strategies at a Glance

𝐁𝐮𝐢𝐥𝐝𝐢𝐧𝐠 𝐚 𝐃𝐲𝐧𝐚𝐦𝐢𝐜 𝐑𝐀𝐆 𝐏𝐢𝐩𝐞𝐥𝐢𝐧𝐞 𝐰𝐢𝐭𝐡 𝐋𝐚𝐧𝐠𝐂𝐡𝐚𝐢𝐧 (𝐓𝐡𝐚𝐭 𝐒𝐭𝐚𝐲𝐬 𝐅𝐫𝐞𝐬𝐡)

How 𝐋𝐚𝐧𝐠𝐒𝐦𝐢𝐭𝐡, 𝐋𝐚𝐧𝐠𝐆𝐫𝐚𝐩𝐡, 𝐎𝐥𝐥𝐚𝐦𝐚 & 𝐅𝐀𝐈𝐒𝐒 Gave Me End-to-End Observability in a Local AI Chatbot

Talk to Your Kubernetes Cluster Using AI

𝐀 𝐅𝐮𝐥𝐥𝐲 𝐋𝐨𝐜𝐚𝐥 𝐀𝐈 𝐂𝐡𝐚𝐭𝐛𝐨𝐭 𝐔𝐬𝐢𝐧𝐠 𝐎𝐥𝐥𝐚𝐦𝐚, 𝐋𝐚𝐧𝐠𝐂𝐡𝐚𝐢𝐧 & 𝐂𝐡𝐫𝐨𝐦𝐚𝐃𝐁

🏦 Automating Loan Underwriting with Agentic AI: LangGraph, MCP & Amazon SageMaker in Action

🧾 The Use Case: AI-Driven Loan Underwriting

🧱 Architecture Overview

Kubernetes 1.32: Real-World Use Cases & Examples

🎯 1. Dynamic Resource Allocation (DRA) Enhancements

Use Case:

What it does:

Why it matters:

🧹 2. Auto-Removal of PVCs in StatefulSets

Use Case:

Why it’s useful:

🪟 3. Graceful Shutdown for Windows Nodes

Use Case:

What’s new:

Why it’s helpful:

💾 4. Change Block Tracking (CBT) – Alpha

Use Case:

How it works:

Why it matters:

⚙️ 5. Pod-Level Resource Limits

Use Case:

What’s new:

Why it’s great:

🔍 6. Enhanced Observability with /statusz and /flagz

Use Case:

How it works:

Building Smarter Local AI Agents with MCP: A Simple Client-Server Example

🔹 How It Works

✅ MCP Server

✅ MCP Client

🔹 Communication: stdio

📈 Visual Flow

Why This Pattern Matters

📚 Full GitHub Project

🚀 Final Thought

Introducing Postgres MCP Server: Query Your Database in Plain English with AI

Building a Local AI Agent with Ollama + MCP + LangChain + Docker"

🧠 Empowering Local AI with Tools: Ollama + MCP + Docker

🚀 What You'll Build

🧩 Architecture

🧠 Conclusion

🔍 6. Enhanced Observability with `/statusz` and `/flagz`