Forem: Rushank Savant

Auto-Merging RAG: Hierarchical Retrieval ⛓️

Rushank Savant — Mon, 11 May 2026 17:28:12 +0000

🚨 The Problem: Context Fragmentation

Imagine a 50-page legal contract. If you chunk it into tiny 200-character pieces, one chunk might say:

"The liability is capped at $1M."

Another chunk might say:

"However, this cap does not apply to gross negligence."

If your retriever only finds the first chunk, your AI will give a dangerously wrong answer because it lacks the Parent Context.

🧩 The Solution: Hierarchical Parenting

Auto-merging retrieval organizes data into a tree structure.
You store small Child Chunks (for high-precision searching) that are linked to larger Parent Chunks (for broad context).

How it works:

You index the document at multiple levels (e.g., small, medium, and large chunks).
During retrieval, if the system finds that multiple child chunks belonging to the same parent have been retrieved, it "merges" them.
Instead of sending the fragmented children to the LLM, it sends the entire Parent Chunk.

⚕️ Real-Life Realistic Example: Medical Protocol Analysis

Imagine a hospital system with a 200-page "Cardiology Emergency Protocol."

The User Query: "What is the dosage for Epinephrine during a cardiac arrest for a patient with a history of hypertension?"
The Challenge: The dosage is listed in a small table, but the contraindications (the "history of hypertension" part) are in the preceding paragraphs.
The Hierarchical Approach:
- Child Chunks: Individual sentences or table rows.
- Parent Chunks: The entire "Cardiac Arrest Sub-section" (2-3 pages).

The Auto-Merge: If the retriever hits the "Epinephrine dosage" row AND the "Hypertension warning" sentence, the system realizes they are both in the same sub-section and sends the entire protocol section to the LLM. This ensures the LLM sees the full picture.

🛠️ Practical Implementation with LangChain

Following is a implementation of a 1-level tree i.e Parent and it's child(s) [no sub-child(s)]

import uuid
from typing import List, Dict
from collections import Counter
from langchain.chat_models import init_chat_model
from langchain_community.vectorstores import Chroma
from langchain_huggingface import HuggingFaceEndpointEmbeddings
from langchain_core.stores import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
from langchain_core.retrievers import BaseRetriever
from langchain_core.callbacks import CallbackManagerForRetrieverRun

from dotenv import load_dotenv
load_dotenv()


# --- STEP 1: ULTRA-REALISTIC MEDICAL DATA ---
medical_manual = """
CARDIOVASCULAR PROTOCOL - VERSION 2026
SECTION 1.1: ACUTE MYOCARDIAL INFARCTION (AMI)
Diagnosis: Patient presents with retrosternal chest pressure, radiation to left arm, and diaphoresis. 
ECG Requirements: 12-lead ECG must be performed within 10 minutes of arrival. Look for ST-segment elevation >1mm.
Immediate Treatment: Oxygen saturation maintenance >94%. Aspirin 325mg (chewed). Nitroglycerin 0.4mg sublingual every 5 mins.
Contraindications: Do not use Nitroglycerin if SBP < 90mmHg or if patient has taken PDE5 inhibitors in 24h.

SECTION 1.2: ADULT CARDIAC ARREST (VF/pVT)
Protocol: Initiate high-quality CPR. Attach defibrillator. Shock at 200J (Biphasic).
Drug Therapy: Epinephrine 1mg IV/IO every 3-5 minutes. Amiodarone 300mg IV/IO bolus after 3rd shock.
Post-Resuscitation: If ROSC is achieved, initiate Targeted Temperature Management (32°C-36°C).
Warning: Excessive ventilation (over 10 breaths/min) decreases cardiac output and survival rates.

SECTION 2.1: HYPERTENSIVE CRISIS
Definition: SBP >180 mmHg or DBP >120 mmHg. 
Hypertensive Urgency: No end-organ damage. Treat with oral Labetalol 200mg.
Hypertensive Emergency: Evidence of end-organ damage (Stroke, Encephalopathy). 
Emergency Treatment: Labetalol IV 20mg bolus or Nicardipine IV infusion 5mg/h. 
Goal: Reduce Mean Arterial Pressure (MAP) by no more than 25% in the first hour to prevent cerebral ischemia.

SECTION 3.1: ANAPHYLAXIS EMERGENCY
Symptoms: Urticaria, angioedema, stridor, wheezing, or hypotension following allergen exposure.
Primary Treatment: Epinephrine 0.3mg (1:1000) IM in the lateral thigh. Repeat every 5-15 mins if no improvement.
Secondary Treatment: Diphenhydramine 25-50mg IV. Methylprednisolone 125mg IV.
Observation: Monitor for biphasic reactions for at least 4-6 hours post-symptom resolution.
"""


# --- STEP 2: HIERARCHICAL SPLITTING ---
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
child_splitter = RecursiveCharacterTextSplitter(chunk_size=150, chunk_overlap=20)

# Create parent and child docs with metadata links
docs = [Document(page_content=medical_manual)]
parent_docs = parent_splitter.split_documents(docs)

all_child_docs = []
docstore_data = {}

for parent in parent_docs: # one-level parent-child tree
    parent_id = str(uuid.uuid4())
    docstore_data[parent_id] = parent

    # Split each parent into children
    children = child_splitter.split_documents([parent])
    for child in children:
        child.metadata["parent_id"] = parent_id
        all_child_docs.append(child)


# --- STEP 3: STORAGE ---
embed_model = HuggingFaceEndpointEmbeddings(
    model="sentence-transformers/all-MiniLM-L6-v2", ## this model returns 384 sized vector
    task="feature-extraction")
vectorstore = Chroma.from_documents(all_child_docs, embed_model)
docstore = InMemoryStore()
docstore.mset(list(docstore_data.items())) ## mset stands for Multiple Set, a high-performance way to save a batch of key-value pairs into storage at once.


# --- STEP 4: THE CUSTOM AUTO-MERGING RETRIEVER ---
class AutoMergingRetriever(BaseRetriever):
    vectorstore: Chroma
    docstore: InMemoryStore
    merge_threshold: int = 3 # If 3+ children found, return Parent

    def _get_relevant_documents(self, query: str, *, 
                                run_manager: CallbackManagerForRetrieverRun) -> List[Document]:
        # 1. Fetch top K small children
        initial_hits = self.vectorstore.similarity_search(query, k=12)

        # 2. Track which parents are represented and how many times
        parent_id_map = [doc.metadata.get("parent_id") for doc in initial_hits]
        counts = Counter(parent_id_map)

        final_results = []
        processed_parents = set()

        for doc in initial_hits:
            p_id = doc.metadata.get("parent_id")

            # 3. MERGE LOGIC: If parent is frequent, "Zoom Out"
            if p_id and counts[p_id] >= self.merge_threshold:
                if p_id not in processed_parents:
                    parent_doc = self.docstore.mget([p_id])[0]
                    if parent_doc:
                        final_results.append(parent_doc)
                        processed_parents.add(p_id)
            # 4. If not frequent enough, keep the precise child snippet
            elif p_id not in processed_parents:
                final_results.append(doc)

        return final_results


# --- STEP 5: TEST RUN ---
retriever = AutoMergingRetriever(vectorstore=vectorstore, docstore=docstore)

# Ask a query that touches multiple parts of one section
query = "What is the IV dosage for Labetalol and what is the target MAP reduction for high blood pressure?"
results = retriever.invoke(query)

print(f"Retrieved {len(results)} merged document(s).\n")
print("-" * 30)
for doc in results:
    print(f"SOURCE: {doc.metadata.get('parent_id', 'Child Node')}")
    print(f"CONTENT: {doc.page_content[:400]}...") # Print snippet
    print("-" * 30)

⚖️ When to Use vs. When to Avoid

✅ Use it when:

The document structure matters: Legal contracts, technical manuals, or textbooks where a paragraph only makes sense within its chapter.
Dense Information: When specific facts (like numbers/dates) are scattered but related to a single theme.
High-Accuracy Needs: When "half an answer" is worse than no answer (e.g., Medical or Compliance).

❌ Avoid it when:

Simple FAQ datasets: If your data is just short, independent Q&A pairs, hierarchy adds needless complexity.
Low Latency is King: Retrieving and merging larger blocks of text takes more time and uses more LLM tokens.
Unstructured "Messy" Data: If your documents are random bullet points with no logical flow, a parent chunk might just be "noise".

🎯 Summary: Pros and Cons

🌟 Pros:

Complete Context: Eliminates the "fragmentation" problem where LLMs miss surrounding warnings.
High Precision: Searching is still done on small chunks, so it's very "findable".
Cleaner Logic: Allows the LLM to "read" like a human (sections/chapters) rather than "reading" snippets.

🔴 Cons:

Token Cost: You end up sending more text to the LLM, which increases your API bill.
Complexity: Harder to debug and requires managing two storage layers (Vector + Docstore).
Latency: Merging and fetching parent documents adds a few milliseconds to the RAG loop.

Beyond Keywords: Mastering HyDE for Smarter Retrieval 🧠

Rushank Savant — Sun, 10 May 2026 17:18:01 +0000

If you’ve ever built a RAG system, you’ve likely felt the frustration of the "Mismatch Problem". You ask a perfectly reasonable question, but it returns completely irrelevant documents.

Why? Because your retrieval method is searching based upon your question's language. In the vector world, these two things often don't look alike.
Eg: Users asking questions to retrieve context from a technical documentation (like company's legal policies)

Today, we’re going to master HyDE (Hypothetical Document Embedding)—a technique that flips the script by "hallucinating" the answer before it even touches your database.

📝 What is HyDE?

Instead of taking a user's question and searching for it directly, HyDE follows a three-step dance:

The Hallucination: It asks an LLM to write a "fake" or hypothetical answer to the user's question in document friendly language (using few-shot prompting).
The Embedding: It converts that "fake" answer into a vector.
The Retrieval: It searches your database for real documents that look like that fake answer.

🧩 The Problem It Solves: Asymmetric Retrieval

In standard search, we assume vector(Question) ~ approx vector(Answer).

But in reality, questions are short, curious, and often informal.
Answers are long, factual, and professional.
This is Asymmetric Retrieval.

HyDE turns an Asymmetric problem into a Symmetric one by making the query "look" like the data it’s trying to find.

📍 A Real-World Example: The Legal "Needle"

Imagine you are building a RAG for a law firm. A junior associate asks:

"What happens if a rival company takes over the vendor?"

The Problem: The 5,000-page contract in your database doesn't use the word "rival" or "takes over". It uses professional jargon like "Change of Control Event".

A standard search might fail because these two vectors aren't close enough.

💡 The HyDE Solution:

The LLM generates a "fake" clause:

"In the event of a Change of Control to a Restricted Entity, the Successor shall..."

The system searches for that text.

It immediately finds the correct legal page because the "fake" answer and the "real" document speak the same language.

⚙️ Practical Implementation (The "Professional" Way)

import os
from langchain.chat_models import init_chat_model
from langchain_huggingface import HuggingFaceEndpointEmbeddings
from langchain_community.vectorstores import Chroma
# from langchain_classic.chains import HypotheticalDocumentEmbedder ## Not widely used, since custom functions give more flexibility
from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate
from langchain_core.documents import Document
from dotenv import load_dotenv

load_dotenv()

# 1. Prompt prep: STYLE EXAMPLES (The "Linguistic DNA")
examples = [
    {"question": "rival acquisition", "answer": "In the event of a Change of Control to a Restricted Entity..."},
    {"question": "sharing info with others", "answer": "Confidential Information shall not be disclosed to any third-party without prior written indemnification..."}
]

example_prompt = PromptTemplate(
    input_variables=["question", "answer"],
    template="User: {question}\nLegal Style: {answer}"
)

hyde_prompt = FewShotPromptTemplate( ## Class provided by langchain for few-shot prompting
    examples=examples,
    example_prompt=example_prompt,
    prefix="You are a Legal Architect. Translate the query into formal contractual prose.",
    suffix="User: {question}\nLegal Style:",
    input_variables=["question"],
)


# 2. MODELS & HYDE EMBEDDER
llm_groq = init_chat_model(model="openai/gpt-oss-120b", model_provider='groq', temperature=0)

base_embeddings = HuggingFaceEndpointEmbeddings(
    model="sentence-transformers/all-MiniLM-L6-v2", ## this model returns 384 sized vector
    task="feature-extraction")


# 3. The DUMMY Documents
# We add diverse sections to ensure the retriever can distinguish between them.
legal_docs = [
    # THE TARGET: Change of Control
    Document(page_content="Section 45.1: Control Events. A 'Change of Control' occurs if a Restricted Entity acquires 51% of voting shares.", 
             metadata={"source": "Corp_Bylaws.pdf", "section": "Governance"}),
    # THE NOISE: Liability & Indemnity
    Document(page_content="Section 10.2: Limitation of Liability. Neither party shall be liable for indirect, incidental, or consequential damages.", 
             metadata={"source": "MSA_Main.pdf", "section": "Liability"}),
    # THE NOISE: IP Rights
    Document(page_content="Section 5.4: Intellectual Property. All Work Product created during the Term shall be deemed 'Work Made for Hire' and owned by the Client.", 
             metadata={"source": "MSA_Main.pdf", "section": "IP"}),
    # THE NOISE: Termination
    Document(page_content="Section 12.0: Termination for Convenience. Either party may terminate this agreement upon 90 days prior written notice.", 
             metadata={"source": "Vendor_Agmt.pdf", "section": "Term"}),
    # THE NOISE: Confidentiality
    Document(page_content="Section 3.1: Non-Disclosure. The Receiver shall protect Confidential Information using the same degree of care as its own proprietary data.", 
             metadata={"source": "NDA_Standard.pdf", "section": "Privacy"}),
    # THE TARGET: Assignment/Successors
    Document(page_content="Section 8.9: Successors and Assigns. This Agreement shall be binding upon and inure to the benefit of the Parties and their respective permitted successors.", 
             metadata={"source": "Corp_Bylaws.pdf", "section": "General"})]


# 4. INITIALIZE VECTOR STORE
vectorstore = Chroma.from_documents(documents = legal_docs, 
                            embedding = base_embeddings,
                            collection_name = "production_legal_vault") 


# 5. TEST THE RETRIEVAL
# Note: The query is vague and uses 0 keywords from the docs.
def hyde_retrieval(query):

    # 1. Generate the Hypothetical Document (The "Fake" Answer)
    formatted_prompt = hyde_prompt.format(question = query)
    hypothetical_doc = llm_groq.invoke(formatted_prompt).content
    print(f"--- HYPOTHETICAL DOC GENERATED ---\n{hypothetical_doc}\n")

    # 2. Embed the "Fake" Doc and search the "Real" DB
    # We use base_embeddings here so the 'math' matches the stored data
    results = vectorstore.similarity_search(hypothetical_doc, k=1)
    return results

user_query = "What about sensitive or important data's protection?"

final_docs = hyde_retrieval(user_query)

print(f"--- FINAL REAL DOCUMENT FOUND ---")
print(f"Source: {final_docs[0].metadata['source']}")
print(f"Actual Text: {final_docs[0].page_content}")

⌚ When to Use HyDE (and When to Skip It)

✅ Use it when:

Queries are vague or short: If users type "refund" and your docs say "reimbursement protocols," HyDE will bridge that gap.
Terminology Mismatch: Your users are "laymen" and your docs are "experts" (Medical, Legal, Engineering).
High-Stakes Accuracy: When finding the right page is more important than saving a few pennies on API costs.

❌ Skip it when:

Factual/Number Lookups: If a user asks "What was the revenue in 2023?", the LLM might hallucinate a fake number in the hypothetical doc, leading the search to the wrong year.
Latency is Critical: HyDE requires an extra LLM call, which adds 1–2 seconds of "thinking time."
Tight Budgets: Every search now costs extra LLM tokens.

🎯 Summary: Pros & Cons

👍 Pros:

Superior Context: Maps informal intent to formal data.
Zero Keyword Dependence: You don't need exact word matches.
Scalable: Works across thousands of pages without manual tagging.

👎 Cons:

Latency: Adds an extra step to the search process.
Hallucination Risk: A "too-fake" answer can derail the search.
Cost: Increased token usage per query.

Happy coding! Have you tried HyDE in your projects? Let’s discuss in the comments below! 👇

Day 14: Deployment & LangSmith

Rushank Savant — Sat, 09 May 2026 18:53:01 +0000

When your LangGraph agent runs, a lot happens under the hood. If it gives a wrong answer, you need to know: Was it a bad retrieval? Did the tool fail? Or did the LLM just misinterpret the data?

LangSmith allows you to trace every single step.

Tracing: See the exact prompt sent and the raw JSON returned.
Latency: Find out which node is slowing down your app.
Cost: Track exactly how many tokens that "loop" consumed.

Pro Tip: Simply setting two environment variables in your .env file enables tracing automatically. No code changes required!

🌐 2. LangServe: Turning Code into an API

You can't give your users a Python script. You need a REST API.
LangServe helps you deploy your chains and graphs as a professional web service using FastAPI.
It even gives you a built-in "Playground" UI to test your API in the browser.

from fastapi import FastAPI
from langserve import add_routes
# Import your 'graph' from Day 13
from my_agent import graph 

app = FastAPI(title="My AI Agent API")

# This creates /invoke, /stream, and /batch endpoints automatically!
add_routes(app, graph, path="/agent")

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="localhost", port=8000)

✅ 3. The Production Checklist

Before you hit "Deploy," ask yourself these three questions:

Does it have a System Prompt? Ensure your agent has clear "guardrails" (e.g., "Do not answer political questions").
Is the Memory capped? Use WindowMemory or SummaryMemory so your database doesn't explode.
Are the Tools safe? If your tool can delete data, make sure there is a "human-in-the-loop" check.

🎓 Graduation: Where to go from here?

Two weeks ago, LangChain was a mystery. Now, it’s a tool in your belt. The field of AI moves fast, but the fundamentals you learned—Prompts, Models, Parsers, RAG, and Graphs—are the pillars of the industry.

Your Final Challenge: Take everything you’ve built over the last 14 days and combine it. Build a "Personal Study Assistant" that can read your local PDFs (RAG), search the web (Tools), and remember your name (Memory).

🎯 Day 14 Summary

LangSmith: For debugging and tracing.
LangServe: For deploying as an API.
Evaluations: Testing your agent against "Golden Sets" of data to ensure it stays smart.

It has been an incredible journey. Keep building, keep breaking things, and most importantly, keep sharing your progress.

The future is agentic. And you’re ready for it. ☕

Small-to-Big RAG: Your AI Needs a Better Context 🧠

Rushank Savant — Sat, 09 May 2026 09:55:18 +0000

If your text chunks are too small, the AI misses the context. If they are too big, the search becomes "blurry" and inaccurate. To solve this, advanced developers use Small-to-Big Retrieval. Two popular flavors are Sentence Window and Parent Document Retrieval.

Here is the breakdown of how they work and which one you should choose.

🤝 The Shared Secret: "Search Small, Read Big"

Both techniques follow one rule: Search using a tiny, precise snippet, but give the LLM a large, context-rich block of text to read. It’s like searching a library index for a "keyword" but then pulling the whole "book" off the shelf to get the full story.

🔍 1. Sentence Window Retrieval: The "Magnifying Glass"

Imagine you are reading a novel. To understand a specific line of dialogue, you usually just need to know what happened a few seconds before and after.

How it works: You break your data into individual sentences. When the AI finds a relevant sentence, it automatically grabs the 3–5 sentences immediately surrounding it.
The Vibe: Linear and local.
Best for: Narrative text, chat transcripts, or articles where ideas flow sentence-by-sentence.

🗺️ 2. Parent Document Retrieval: The "Map"

Imagine a Technical Manual or a Legal Contract. A single sentence like "Tighten the bolt" is useless if the safety warning is at the top of the page. You don't just need the "neighboring sentences"; you need the whole section.

How it works: You create a hierarchy. You have Parent chunks (like a full page) and Child chunks (small paragraphs inside that page). The AI searches the "Children" but returns the "Parent" to the LLM.
The Vibe: Structural and organized.
Best for: PDFs, manuals, financial reports, and legal docs where sections are logically grouped.

Comparison Table

Feature	Sentence Window	Parent Document
Logic	"Show me what’s around this."	"Show me the section this belongs to."
Structure	Flat/Linear	Hierarchical (Big & Small)
Storage	Context is often hidden in metadata.	Parents are stored in a separate database.
Best Use Case	Books, Emails, Conversations.	Technical Specs, Legal, Wiki pages.

🚀 Summary

Choose Sentence Window if your data is "unstructured" and the context is always right next to the answer. It’s easier to set up and works great for simple Q&A.

Choose Parent Document if you are building an Enterprise-grade tool. It is more "stable" because it respects document boundaries (like chapters or headers), ensuring the LLM never gets a half-finished thought from a different page.

Day 13: The Self-Correcting Agent — Building Loops with LangGraph 🔄

Rushank Savant — Fri, 08 May 2026 17:39:23 +0000

Yesterday, we learned that LangGraph is all about nodes and edges. Today, we’re putting that into practice by building a Stateful Agent that can actually "think," use a tool, and then decide if it needs to do more.

In the old way (Chains), if a tool returned an error, the app crashed. In the LangGraph way, we create a loop where the agent sees the error and tries a different approach.

🧠 The Agentic Loop: Thought → Action → Observation

To build this, we need a graph that doesn't just go in a straight line. We need it to circle back.

Agent Node: The LLM decides what to do.
Tools Node: If the LLM wants to use a tool, this node executes it.
The Loop: The result of the tool goes back to the Agent so it can "observe" the result and decide if it's done.

🛠️ Coding the Loop

We'll use the ToolNode—a pre-built helper from LangGraph—to make this easy.

from typing import Annotated, TypedDict
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode, tools_condition
from langchain_openai import ChatOpenAI
from langchain_community.tools.tavily_search import TavilySearchResults

# 1. Define the State (Our notebook)
class State(TypedDict):
    # add_messages tells LangGraph to append new messages to history
    messages: Annotated[list, add_messages]

# 2. Setup Tools and Model
tools = [TavilySearchResults(max_results=2)]
model = ChatOpenAI(model="gpt-4o").bind_tools(tools)

# 3. Define the Nodes
def chatbot(state: State):
    return {"messages": [model.invoke(state["messages"])]}

# 4. Build the Graph
workflow = StateGraph(State)

workflow.add_node("chatbot", chatbot)
# ToolNode automatically executes tool calls found in the messages
workflow.add_node("tools", ToolNode(tools))

# 5. Define the Logic (The "Brain")
workflow.add_edge(START, "chatbot")

# Conditional Edge: If the model called a tool, go to 'tools', 
# otherwise, go to 'END'.
workflow.add_conditional_edges("chatbot", tools_condition)

# Loop back! After using tools, go back to the chatbot to see if it's done
workflow.add_edge("tools", "chatbot")

graph = workflow.compile()

⚡ Why `tools_condition` is Magic

This is a built-in "traffic controller." It looks at the last message from the AI:

If it contains a tool_call, it sends the flow to the tools node.
If it is a regular string, it realizes the AI is finished and sends it to END.

This prevents the AI from getting stuck in an infinite loop!

🎯 Day 13 Summary

Today, you built a truly autonomous agent. You learned:

Cyclic Graphs: How to point an edge back to a previous node.
ToolNode: Automating the execution of AI-requested actions.
Conditional Routing: Letting the graph decide its own path.

Your Homework: Imagine the search tool returns "No results found." Because of the loop we built today, the AI can see that and try a different search query automatically. Try to prompt your agent to find something obscure and watch it work!

See you tomorrow! ☕

Day 12: Enter LangGraph — Moving from Chains to Cyclic Graphs 🕸️

Rushank Savant — Thu, 07 May 2026 17:55:32 +0000

Today, we leave the world of linear "Chains" and enter the most powerful evolution of the LangChain ecosystem: LangGraph.

If you've been following along, you've noticed that Chains always go forward: A -> B -> C. But real intelligence requires loops. Think about how you work: you write code, you run it, it fails, so you go back and fix it. That's a cycle. LangGraph is designed to let AI agents do exactly that.

🔄 Why LangGraph?

Standard LangChain "Chains" are Directed Acyclic Graphs (DAGs). They can't loop.
LangGraph allows for cycles, which are essential for:

Self-Correction: "I tried to search Google, but got no results. I'll try a different keyword."
Multi-Agent Collaboration: "Agent A writes the code, Agent B reviews it and sends it back for edits."
Persistence: Saving the state of a conversation so you can pause and resume it days later.

🏗️ The Core Concepts

To build with LangGraph, you need to understand three things:

State: A shared "notebook" (usually a Python dictionary) that all parts of your agent can read and write to.

Nodes: Simple Python functions that take the current State, do some work, and return an update.

Edges: The "roads" between nodes. They decide which node to go to next.

🛠️ Building a Basic "Smart Assistant" Graph

Let's build a graph where the AI decides whether to use a tool or just reply.

import operator
from typing import Annotated, TypedDict, Union
from langgraph.graph import StateGraph, START, END
from langchain_openai import ChatOpenAI

# 1. Define the State
class AgentState(TypedDict):
    # This stores our message history
    messages: Annotated[list, operator.add]

# 2. Define a Node (The Brain)
model = ChatOpenAI(model="gpt-4o")

def call_model(state: AgentState):
    response = model.invoke(state["messages"])
    return {"messages": [response]}

# 3. Build the Graph
workflow = StateGraph(AgentState)

# Add our node
workflow.add_node("agent", call_model)

# Define the flow
workflow.add_edge(START, "agent")
workflow.add_edge("agent", END)

# 4. Compile it
app = workflow.compile()

# Run it!
input_state = {"messages": [("user", "Explain LangGraph in 10 words.")]}
for event in app.stream(input_state):
    print(event)

⚡ The "Conditional" Edge: The Secret Sauce

The real power comes when you add a Conditional Edge. This is a function that looks at the AI's response and says:

"If the AI wants to use a tool → go to the Tools Node."
"If the AI is finished → go to END."

This creates the "Loop" that makes agents truly autonomous.

🎯 Day 12 Summary

Today, you caught a glimpse of the "Brain" of modern AI agents. You learned:

Chains vs. Graphs: Why cycles are necessary for complex tasks.
State Management: How agents keep track of their "thoughts."
Nodes & Edges: The building blocks of an agentic workflow.

Your Homework: Look at the code above. How would you add a "Review" node that checks the AI's answer for typos before sending it to the user?

See you tomorrow! ☕

Why MCP is the "USB-C" of AI Tools

Rushank Savant — Wed, 06 May 2026 17:57:37 +0000

If you’ve been building with LangChain or OpenAI Functions, you’re used to defining tools as simple lists: tools = [get_weather, send_email]. It works great for a weekend project, but what happens when your application grows?

What if you want your tools to work in Claude Desktop, a custom Python script, and a TypeScript dashboard all at once? What if you need to update a tool's logic without redeploying your entire AI agent?

That is where the Model Context Protocol (MCP) comes in. Think of it as the USB-C of the AI world—a universal standard that lets any AI "host" talk to any "tool" server.

Here is a simple breakdown of when you should move past simple tools and embrace MCP.

🔌 1. The "USB-C" Effect (Standardization)

In a standard setup, your tool is often locked into a specific library (like LangChain). With MCP, your tool lives on a server. Because it follows a universal protocol, the same server can provide tools to a LangChain agent, a Claude Desktop instance, and a custom-built robot simultaneously.

🔄 2. Dynamic Tool Discovery

Normally, you must pass a fixed list of tools to your agent. If you want to add a new feature, you have to change the code and reboot the app.
With MCP, the agent "polls" the server. If you add a new tool to the server at 2:00 PM, the agent can see and use it at 2:01 PM without you touching a single line of code in the host application.

✂️ 3. Decoupled Logic and Updates

When your tool logic lives inside the MCP server:

Host Independence: If you fix a bug in the tool's math or update an API key, the host application doesn't need to be touched.

Language Agnostic: You can write a heavy data-processing tool in Rust or Go for performance, while keeping your AI logic in Python.

🔒 4. Security and Stability

MCP acts as a protective layer:

The Firewall: You can deploy tools to a remote server and wrap them in a firewall.

Blast Radius: If a tool has a vulnerability or crashes, it won't take down your main AI application. They are running in separate environments.

📚 5. Sharing More Than Just Functions

Standard tools are usually just "functions" (do X, get Y). MCP allows you to share:

Resources: Live log files, database schemas, or documents.

Prompts: Pre-written instruction templates that the server provides to the host.

⚖️ The One-Line Rule

Use MCP when building an "Ecosystem" that needs to scale, stay secure, and remain flexible.
Use Standard Tools when building a "Prototype" where speed of development is your only priority.

💼 Real-World Scenarios

✅ Use MCP for: "The Enterprise Data Hub"
Imagine building an AI for a bank. The AI needs to check balances, pull credit scores, and generate PDFs.

Why: You want the "Credit Score" team to manage their own tool server. If they change their logic, the AI keeps working. You also need a strict security barrier between the AI and the sensitive financial databases.

❌ Skip MCP for: "The Local PDF Summarizer"
Imagine a simple script that reads 5 PDFs on your laptop and extracts names using a regex function.

Why: Setting up a Client-Server architecture for a single function is massive overkill. A standard @tool takes two seconds to write and requires zero infrastructure.

📝 Summary

MCP moves us away from "hard-coding" AI capabilities and toward a world where tools are plug-and-play. If you are planning for the future of your app, start thinking in servers, not just lists.

Day 11: Conversational RAG — How to Chat with Your Documents 💬

Rushank Savant — Wed, 06 May 2026 15:11:58 +0000

Yesterday, we built a RAG chain that could answer a single question. But if you followed up with "Can you explain that further?", the AI would get confused. Why? Because it didn't have contextual history.

Today, we solve the hardest part of RAG: Conversational Memory. We'll teach the AI to understand that "it" or "that" refers to things mentioned earlier in the chat.

🏗️ The Problem: The "Query Re-writing" Challenge

If you ask:

"How does LangChain work?"
"Can you give me an example of it?"

The retriever doesn't know what "it" is. It will literally search your database for the word "it," which is useless.

To fix this, we add a step called History-Aware Retrieval. The AI takes your follow-up question and the chat history, then "re-writes" it into a standalone question that the retriever can understand.

🛠️ Step 1: Contextualizing the Question

We create a sub-chain that looks at the history and the new question to produce a "search-friendly" query.

from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# The prompt that tells the AI to re-write the question if history exists
contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question."
)

contextualize_q_prompt = ChatPromptTemplate.from_messages([
    ("system", contextualize_q_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

# Wrap your existing retriever (from Day 9)
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

🛠️ Step 2: The Full Conversational Chain

Now, we plug this into our document chain to create the final "Conversational RAG" flow.

from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

# Standard Q&A prompt
qa_system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer the question."
    "\n\n"
    "{context}"
)

qa_prompt = ChatPromptTemplate.from_messages([
    ("system", qa_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

# The final chain!
rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

🚀 Testing it Out

from langchain_core.messages import HumanMessage, AIMessage

chat_history = []

# First Interaction
question = "What is LangSmith?"
result = rag_chain.invoke({"input": question, "chat_history": chat_history})
print(result["answer"])

# Update History
chat_history.extend([
    HumanMessage(content=question),
    AIMessage(content=result["answer"]),
])

# Follow-up (The AI now knows 'it' refers to LangSmith!)
second_question = "How do I get started with it?"
result = rag_chain.invoke({"input": second_question, "chat_history": chat_history})
print(result["answer"])

🎯 Day 11 Summary

Today, you bridged the final gap in RAG. You learned:

- Contextualization: Why "it" and "this" break standard retrievers.

- Query Re-writing: Using an LLM to make search queries smarter.

create_history_aware_retriever: The specific LangChain tool for this job.

Your Homework: Try running the chain without updating the chat_history list. Notice how the second answer becomes generic or fails—this proves how vital history is!

See you tomorrow! ☕

Day 10: The Full RAG Chain — From Library to Answers 🔗

Rushank Savant — Tue, 05 May 2026 17:59:24 +0000

Yesterday, we built the "Library" (Vector Store). Today, we’re going to build the "Librarian."

A Librarian doesn't just point you to a shelf; they go get the right book, read the relevant page, and explain it to you. In LangChain, we do this by connecting our Retriever to our LLM using a Retrieval Chain.

🏗️ The 2-Step Architecture

To make our AI answer questions based on our data, we need to link two distinct parts:

1. The Retrieval Step: Finding the most relevant chunks from our Vector Database.

2. The Generation Step: Feeding those chunks into the LLM as "Context" so it can craft an answer.

🛠️ Step 1: The "Stuffing" Chain

First, we need a way to tell the AI: "Here is a bunch of text (the context). Use it to answer this specific question." In LangChain, this is often called the create_stuff_documents_chain because it "stuffs" all retrieved documents into the prompt.

from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

# Define the "Instructional" prompt
prompt = ChatPromptTemplate.from_template("""
Answer the user's question based ONLY on the provided context. 
If you don't know the answer, say you don't know.

Context: {context}
Question: {input}
""")

llm = ChatOpenAI(model="gpt-4o")

# This chain knows HOW to format the documents into the prompt
document_chain = create_stuff_documents_chain(llm, prompt)

🛠️ Step 2: The Final Retrieval Chain

Now, we link the "Librarian" (the document chain) to the "Library" (the retriever we built yesterday).

from langchain.chains import create_retrieval_chain

# 'retriever' is the object we created in Day 9
retrieval_chain = create_retrieval_chain(retriever, document_chain)

# Let's ask it something!
response = retrieval_chain.invoke({"input": "What are the main features of LangSmith?"})

print(response["answer"])

🧐 Why This is Better Than a Search Engine

- No Hallucinations: Because we told the AI "Answer ONLY based on context," it won't make things up.

- Privacy: The data never leaves your "Vector Store" to train the model; it's only sent as a temporary reference during the query.

- Source Transparency: The response object actually contains a context key that shows you exactly which snippets of text the AI used to find the answer.

🎯 Day 10 Summary

Today, you built a production-ready AI feature! You learned:

- Document Chains: How to format multiple text chunks for an LLM.

- Retrieval Chains: How to automate the flow from "Question" to "Answer."

- Groundedness: How to prevent AI hallucinations using context.

Your Homework: Look at the response["context"] metadata. Can you see which specific part of your document the AI liked the most?

See you tomorrow! ☕

The Rise of the Machine Identity

Rushank Savant — Mon, 04 May 2026 15:54:00 +0000

The Autonomous Paradox

In 2026, we’ve moved past simple chatbots. We are building Production-Grade RAG pipelines and autonomous agents that can plan, execute, and iterate. But as an architect, I’ve noticed a glaring hole in our "Agentic" future: Identity Sprawl.

We are giving agents non-human identities (NHI) with "Full Admin" permissions just to ensure the RAG works smoothly. We are effectively building a workforce of privileged users that never sleep, never get tired, and—most importantly—never verify their own intent.

The Problem: The .env Security Theater
Most "Agentic" workflows today rely on a precarious stack of environment variables. Your agent has your OpenAI key, your Pinecone credentials, and often, write-access to your GitHub or cloud infrastructure.

If your development environment is compromised—even for a second—via a simple browser injection or a typosquatted library, those keys are gone. In the era of AI-driven social engineering, an attacker doesn’t need to hack your code; they just need to "support" your agent into leaking its own context.

Why "Human-in-the-Loop" is Failing

We talk about keeping a "Human-in-the-Loop" (HITL) for safety. But if the "Loop" is a web-based dashboard or a browser extension, it’s a battlefield you don't control.

- Contextual Spoofing: An attacker can alter the transaction description in a web UI so a malicious execution looks like a routine "Database Sync."

- The "All Green" Trap: AI agents can now simulate perfectly "legitimate" behavior, passing every automated check while exfiltrating data in the background.

Building a "Zero-Trust" Agent Architecture

To move from "Experimental" to "Production-Grade," we need to treat Agent Identities with the same rigor we treat Root access.

1. Hardware-Gated Signing: No autonomous agent should have the power to move assets or change critical infrastructure without a physical, isolated signature.

2. Short-Lived Tokens: Stop using long-lived API keys. Use OAuth flows that require periodic re-authorization via a trusted display.

3. Independent Interpretation: We need "Transaction Interpreters" that decode raw hex and JSON payloads independently of the browser's OS. If you can't read what the agent is actually doing, don't sign it.

The 2026 Reality
The recent infrastructure compromises we’ve seen—from bridge exploits to "ClickFix" social engineering—prove that the "Front Door" (the user interface) is the weakest link.

I’m currently rebuilding my local agent stack to move away from software-only keys. The goal is a "Zero-Software" Trust Boundary. I’ll be sharing the technical teardown of this setup, including the Python implementation for hardware-gated RAG, in my next post.

Are you trusting your agents with your master keys, or are you building a firewall?

Day 9: RAG — Giving Your AI a Private Library 📚

Rushank Savant — Mon, 04 May 2026 15:34:32 +0000

Have you ever asked an AI about something that happened this morning, or about a private document on your laptop, and it hallucinated an answer? That's because LLMs have a "cutoff date."

RAG fixes this. Instead of trying to memorize the whole world, the AI "looks up" the relevant information in your documents before it answers. Today, we’ll build the foundation of a RAG pipeline.

🏗️ The 5 Steps of RAG

To give an AI a library, we follow a simple assembly line:

1. Load: Pulling data from a PDF, Website, or Text file.

2. Split: Breaking long documents into small, bite-sized "chunks."

3. Embed: Converting those text chunks into numbers (vectors) that represent their meaning.

4. Store: Saving those numbers in a "Vector Database."

5. Retrieve: Finding the right chunk when a user asks a question.

🛠️ Step 1: Loading & Splitting

AI can't read a 50-page PDF all at once. We have to "chunk" it so the AI only reads the relevant parts.

from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# 1. Load a webpage
loader = WebBaseLoader("https://docs.smith.langchain.com/user_guide")
docs = loader.load()

# 2. Split it into 1000-character chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

print(f"Created {len(splits)} chunks of data.")

🧠 Step 2: The "Brainy" Storage (Vector DB)

We don't search for words; we search for meaning. Using Embeddings, the word "King" will be numerically close to the word "Queen," even if they are spelled differently.

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

# 3. Create the "Library" (Vector Store) using your chunks
vectorstore = Chroma.from_documents(
    documents=splits, 
    embedding=OpenAIEmbeddings()
)

# 4. Turn the library into a "Retriever"
retriever = vectorstore.as_retriever()

🎯 Day 9 Summary

Today, you learned how to bridge the gap between static AI knowledge and your dynamic data. We covered:

- The RAG Concept: Why "Search + Generate" is better than just "Generate."

- Document Loaders: Bringing external data into Python.

- Text Splitters: Why chunking matters for accuracy.

- Vector Stores: Searching by meaning, not just keywords.

Your Homework: Find a long article online and try to load it using the WebBaseLoader. See how many "chunks" it creates when you set the chunk_size to 500!

See you tomorrow! ☕

Day 8: Building Custom Tools — Teaching Your AI New Skills 🛠️

Rushank Savant — Sun, 03 May 2026 19:30:42 +0000

You’ve seen how agents can search the web, but what if you need your AI to interact with your specific company database, calculate a proprietary risk score, or even control a smart lightbulb in your house?

For that, you need Custom Tools. Today, we’ll see how a simple Python decorator can bridge the gap between your local code and a LLM.

🎨 The Magic of the `@tool` Decorator

The easiest way to create a tool in 2026 is using the @tool decorator. When you wrap a function with this, LangChain automatically analyzes your code to tell the AI:

What the tool is called.
What it does (based on your docstring).
What arguments it needs (based on your type hints).

What are Decorators in python?

In Python, decorators are a powerful design pattern that allows you to modify or enhance the behavior of a function, method, or class without permanently changing its original source code.

Think of a decorator as a "wrapper" that can execute code before and after the original function runs. Following code will help you understand better:

def my_decorator(func):
    def wrapper():
        print("Before the function.")
        func()
        print("After the function.")
    return wrapper

@my_decorator
def say_hello():
    print("Hello!")

say_hello()

🛠️ Step-by-Step: Creating a "Secret Multiplier" Tool

Let’s build a tool that performs a calculation the AI couldn't possibly know—using a "secret constant" from our local environment.

from langchain_core.tools import tool

@tool
def calculate_secret_score(base_value: int) -> str:
    """Calculates a business score by multiplying input by a secret company constant."""
    secret_constant = 42  # This could be from a database or API
    result = base_value * secret_constant
    return f"The secret business score is {result}."

# Let's see what the AI sees!
print(calculate_secret_score.name)
print(calculate_secret_score.description)
print(calculate_secret_score.args)

🧠 Why Docstrings and Type Hints Matter

In regular coding, docstrings are for other humans. In LangChain, docstrings are for the AI.

If your docstring is vague (e.g., "Does math"), the AI won't know when to use the tool. If it’s specific (e.g., "Use this tool when the user asks for a 'business score' or 'proprietary calculation'"), your agent becomes incredibly reliable.

🏗️ Plugging it into the Agent

Once your tool is defined, you just add it to your tool list, exactly like we did yesterday.

from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent

# 1. Define your custom skills
tools = [calculate_secret_score]

# 2. Setup the brain
model = ChatOpenAI(model="gpt-4o")

# 3. Create the Agent
agent = create_react_agent(model, tools)

# 4. Test it!
response = agent.invoke({"messages": [("human", "What is the secret score for a value of 10?")]})
print(response["messages"][-1].content)

🎯 Day 8 Summary

Today, you moved from "User" to "Creator." You learned:

The @tool Decorator: Converting functions to AI skills.
Prompt Engineering via Docstrings: Writing descriptions so the LLM knows when to call your tool.
Integration: Adding your custom logic into a ReAct agent loop.

Your Homework: Write a custom tool called get_user_status that takes a username and returns a fake status (like "Active" or "Away"). Try to get your agent to tell you the status of "Alex."

See you tomorrow! ☕

Forem: Rushank Savant

Auto-Merging RAG: Hierarchical Retrieval ⛓️

🚨 The Problem: Context Fragmentation

🧩 The Solution: Hierarchical Parenting

⚕️ Real-Life Realistic Example: Medical Protocol Analysis

🛠️ Practical Implementation with LangChain

⚖️ When to Use vs. When to Avoid

🎯 Summary: Pros and Cons

Beyond Keywords: Mastering HyDE for Smarter Retrieval 🧠

📝 What is HyDE?

🧩 The Problem It Solves: Asymmetric Retrieval

📍 A Real-World Example: The Legal "Needle"

💡 The HyDE Solution:

⚙️ Practical Implementation (The "Professional" Way)

⌚ When to Use HyDE (and When to Skip It)

✅ Use it when:

❌ Skip it when:

🎯 Summary: Pros & Cons

👍 Pros:

👎 Cons:

Day 14: Deployment & LangSmith

🌐 2. LangServe: Turning Code into an API

✅ 3. The Production Checklist

🎓 Graduation: Where to go from here?

🎯 Day 14 Summary

Small-to-Big RAG: Your AI Needs a Better Context 🧠

🤝 The Shared Secret: "Search Small, Read Big"

🔍 1. Sentence Window Retrieval: The "Magnifying Glass"

🗺️ 2. Parent Document Retrieval: The "Map"

Comparison Table

🚀 Summary

Day 13: The Self-Correcting Agent — Building Loops with LangGraph 🔄

🧠 The Agentic Loop: Thought → Action → Observation

🛠️ Coding the Loop

⚡ Why tools_condition is Magic

🎯 Day 13 Summary

Day 12: Enter LangGraph — Moving from Chains to Cyclic Graphs 🕸️

🔄 Why LangGraph?

🏗️ The Core Concepts

🛠️ Building a Basic "Smart Assistant" Graph

⚡ The "Conditional" Edge: The Secret Sauce

🎯 Day 12 Summary

Why MCP is the "USB-C" of AI Tools

🔌 1. The "USB-C" Effect (Standardization)

🔄 2. Dynamic Tool Discovery

✂️ 3. Decoupled Logic and Updates

🔒 4. Security and Stability

📚 5. Sharing More Than Just Functions

⚖️ The One-Line Rule

💼 Real-World Scenarios

📝 Summary

Day 11: Conversational RAG — How to Chat with Your Documents 💬

🏗️ The Problem: The "Query Re-writing" Challenge

🛠️ Step 1: Contextualizing the Question

🛠️ Step 2: The Full Conversational Chain

🚀 Testing it Out

🎯 Day 11 Summary

Day 10: The Full RAG Chain — From Library to Answers 🔗

🏗️ The 2-Step Architecture

🛠️ Step 1: The "Stuffing" Chain

🛠️ Step 2: The Final Retrieval Chain

🧐 Why This is Better Than a Search Engine

🎯 Day 10 Summary

The Rise of the Machine Identity

The Autonomous Paradox

Why "Human-in-the-Loop" is Failing

Building a "Zero-Trust" Agent Architecture

Day 9: RAG — Giving Your AI a Private Library 📚

🏗️ The 5 Steps of RAG

🛠️ Step 1: Loading & Splitting

🧠 Step 2: The "Brainy" Storage (Vector DB)

🎯 Day 9 Summary

Day 8: Building Custom Tools — Teaching Your AI New Skills 🛠️

🎨 The Magic of the @tool Decorator

What are Decorators in python?

🛠️ Step-by-Step: Creating a "Secret Multiplier" Tool

🧠 Why Docstrings and Type Hints Matter

🏗️ Plugging it into the Agent

🎯 Day 8 Summary

⚡ Why `tools_condition` is Magic

🎨 The Magic of the `@tool` Decorator