Forem: Dharmendra Singh

Building RAG Applications with LangChain(Part-5)

Dharmendra Singh — Tue, 24 Jun 2025 04:04:03 +0000

Prompting, Chaining & Parsing: Structuring Smart, Reliable LLM Workflows

Welcome back to the LangChain RAG Series. So far, we’ve walked through:

Part 1: Understanding RAG Architecture
Part 2: Document Loaders
Part 3: Text Splitters
Part 4: Embeddings & Vector Stores

Now it’s time to bring it all together using LangChain’s Prompt Templates, Chains, and Output Parsers — the heart of a well-structured LLM pipeline.

What This Part Covers

Writing reusable prompts using ChatPromptTemplate
Building logic flows with Runnable-based Chains
Parsing raw LLM output into structured results
Examples using Gemini, OpenAI, and LangChain Expressions
Best practices for scalable and debuggable apps

Prompt Engineering with LangChain

LangChain supports a modular approach to prompts.

Components:

SystemMessage — Set behavior and role of the model
HumanMessage — What the user asks
AIMessage — Model-generated replies (can be optional)
ChatPromptTemplate — Compose a full message list with placeholders

Example:

from langchain.prompts.chat import ChatPromptTemplate, SystemMessage, HumanMessage

prompt = ChatPromptTemplate.from_messages([
    SystemMessage(content="You are a helpful {domain} expert."),
    HumanMessage(content="Tell me about {topic}.")
])

formatted_prompt = prompt.invoke({"domain": "Cricket", "topic": "reverse swing"})

This makes your prompts reusable and structured across different topics or domains.

Another example:

def  ask_query(query):

#get retriever
retriever = vectorstore.as_retriever(search_type='similarity', search_kwargs={'k': 2})
results = retriever.invoke(query)
context = "\n".join([document.page_content for document in results])

prompt = f"""
You are an FAQ assistant. Use the following content to answer the user's question accurately and concisely:
{context}
Q: {query}
A:
"""
response = model.invoke(prompt)
return response.content

Chains: Connecting Components Using LangChain Expression Language (LCEL)

LangChain Chains combine multiple steps using runnables to manage logic and flow.

Runnable Types:

RunnableSequence: Run in sequence for example use case: Prompt → LLM → Parser
RunnableParallel: Run multiple branches in parallel for example use case: Fetch metadata + summary
RunnableLambda: Custom function as part of chain for example use case: Preprocessing inputs
RunnableBranch: If/else-style branching for example use case: Conditional routing
RunnablePassthrough: Pass input as-is for example use case: Default input/return identity

Example: RunnableSequence Chain

from langchain.schema.runnable import RunnableSequence
from langchain.output_parsers import StrOutputParser
from langchain.llms import OpenAI

chain = RunnableSequence([
    prompt,             # PromptTemplate
    OpenAI(),           # LLM
    StrOutputParser()   # Clean the output
])

result = chain.invoke({"domain": "Cricket", "topic": "swing bowling"})
print(result)

This runs: Prompt → LLM → OutputParser, all in one clean pipeline.

You can use LCEL in place of RunnableSequence:

chain = prompt | OpenAI() | StrOutputParser()

Output Parsers: Making Results Structured

By default, LLMs return plain text. LangChain supports parsers to transform this into structured data.

Common Parsers:

StrOutputParser: Clean text string
CommaSeparatedListOutputParser: List from CSV text
PydanticOutputParser: Typed structured objects
JsonOutputParser: JSON dictionaries

Example with Pydantic:

from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel

class Answer(BaseModel):
    topic: str
    summary: str

parser = PydanticOutputParser(pydantic_object=Answer)

You can use this in your chain to enforce structure in the output (e.g., for APIs, UIs, or analytics).

Real-World Workflow Example

full_chain = prompt | OpenAI() | StrOutputParser()

This structure works for:

Document-based QA
Summarizers
Entity extractors
Internal tools with LLM logic

Best Practices

Use ChatPromptTemplate for modularity
Parse everything — never trust raw LLM strings
Chain small functions using RunnableSequence
Test components independently
Branch with RunnableBranch to manage complex logic

Coming Up Next: Part 6

In Part 6, we’ll combine everything we’ve built so far into real-world, production-ready examples.

Think of this as your RAG Starter Toolkit in action.

Missed the Previous Parts?

👉 Part 1 – What is RAG

👉 Part 2 – Document Loaders

👉 Part 3 – Text Splitters

👉 Part 4 – Embeddings & Vectors

Building RAG Applications with LangChain(Part-4)

Dharmendra Singh — Mon, 23 Jun 2025 17:04:07 +0000

Embeddings & Vector Stores: Turning Text into Searchable Intelligence

Welcome to Part 4 of our hands-on RAG series with LangChain.

So far, we’ve covered:

Part 1: RAG Theory and Architecture
Part 2: Document Loaders
Part 3: Text Splitters

In this part, we explore how your split documents are converted into vectors (Embeddings) that can be searched, ranked, and retrieved using LLMs.

What Are Embeddings?

Embeddings are numerical vector representations of text that capture semantic meaning.

When a model generates an embedding:

It transforms “meaning” into a dense numerical vector
Similar meanings = closer vectors in multi-dimensional space

Example:

king - man + woman ≈ queen

This analogy works because embeddings preserve relational meaning.

Lets understand with some examples

Let’s say you have a model that creates embeddings for words.

king <-------Embeddings---------> [0.9, 0.8, 0.7]
queen <-------Embeddings---------> [0.88, 0.82, 0.68]
man <-------Embeddings---------> [0.5, 0.4, 0.3]
woman <-------Embeddings---------> [0.48, 0.42, 0.28]
apple <-------Embeddings---------> [0.1, 0.3, 0.4]
banana <-------Embeddings---------> [0.09, 0.29, 0.41]

Here we can see ,

King is very close to queen
man is close to woman
apple is close to banana but in other semantic cluster

Embedding Table: Sentences

"How to cook pasta?" <--Embeddings-->. [0.65, 0.88, 0.34, ..., 0.72]
"Steps for making spaghetti" <--Embeddings--> [0.63, 0.90, 0.33, ..., 0.71]
"What is quantum physics?" <--Embeddings--> [0.11, 0.23, 0.56, ..., 0.19]

Now calculate cosine similarity between embeddings:

cook pasta and making spaghetti is very similar
cook pasta and quantum physics is not similar

Semantic Search Works on Meaning, Not Just Words

In both the word and sentence embedding examples, you’ll notice a key takeaway:

Semantic search operates on vector representations — numerical values that capture meaning — not just literal word matching.

This means even if the exact words don’t appear in the query or document, the model can still understand the context and retrieve relevant results based on meaning. This is what makes LLM-powered search far more powerful than traditional keyword-based methods.

How Embeddings Power RAG

In RAG, embeddings allow us to:

Convert chunks of documents into vectors
Store these vectors in a vector database
Embed the user query at runtime
Use similarity search to fetch relevant chunks

Result: LLMs generate answers with user query + context (relevant documents).

Common Embedding Models

OpenAIEmbeddings
HuggingFaceEmbeddings
GoogleGenerativeAIEmbeddings
OllamaEmbeddings

from langchain_google_genai import GoogleGenerativeAIEmbeddings
 from langchain.schema import Document
 from langchain.text_splitter import CharacterTextSplitter

embedding = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",
    google_api_key="YOUR_API_KEY")

#Document
faq_text = """
Q: What is your return policy?
A: You can return items within 30 days for a full refund.
Q: How long does shipping take?
A: Shipping typically takes 3-5 business days.
Q: Do you offer international shipping?
A: Yes, we ship to over 50 countries.
Q: How can I track my order?
A: You will receive a tracking link via email once your order ships.
"""
#Split the doucment
text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=0)
documents = text_splitter.create_documents([faq_text])

doc_embeddings = embedding.embed_documents(documents)
query_embedding = embedding.embed_query("What is the return policy?")

embed_documents() creates a vector for each chunk
embed_query() lets you compare a query to your document embeddings

What Are Vector Databases?

A vector database is a special kind of database designed to store and search through embeddings (vectors), which represent the semantic meaning of things like:

Text (words, sentences, documents)
Images
Code
Audio

These databases are optimized for fast similarity search — like answering:

“Find me the most similar documents to this question.”

Key Idea is :

In traditional databases search by exact values such as:

SELECT * FROM users WHERE email = 'you@example.com';

But in Vector Databases perform semantic search which is based on words or sentences context or meaning. Already discussed above. To perform such operations they use cosine similarity, Euclidean distance, etc.

Use Case Flow Example*

    Your PDF → Split into chunks → Embed each chunk → Store in Vector DB

    User query → Embed query → Search DB → Get top chunks → Answer

Popular Vector Databases

FAISS: Open-source by Facebook, fast, local
Pinecone: Cloud-native, scalable, real-time updates
Weaviate: Semantic graph + vector search
Milvus: High-performance, GPU acceleration
Qdrant: Rust-based, fast, open-source
Chroma: Developer-friendly, works well with LangChain

Vector Database use cases:

Similarity Search: Finds meaning, not just keywords
Memory for LLMs: Used in Retrieval-Augmented Generation (RAG)
Fast Search on Big Data: Search millions of vectors quickly
Scalable + Flexible: Easily update, delete, filter, tag data

Code Example with Chroma

from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.schema import Document
from langchain.text_splitter import CharacterTextSplitter
from langchain_core.output_parsers import StrOutputParser

#Document
faq_text = """
Q: What is your return policy?
A: You can return items within 30 days for a full refund.
Q: How long does shipping take?
A: Shipping typically takes 3-5 business days.
Q: Do you offer international shipping?
A: Yes, we ship to over 50 countries.
Q: How can I track my order?
A: You will receive a tracking link via email once your order ships.
"""

#Split the doucment
text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=0)
documents = text_splitter.create_documents([faq_text])

#Embeded model
embedings = GoogleGenerativeAIEmbeddings(google_api_key=GOOGLE_API_KEY, model=EMBEDDING_MODEL_NAME)

#create vector database
vectorstore = Chroma.from_documents(documents, embedings, persist_directory='./faq.db')

query = "What is the return policy?"
results = vectorstore.similarity_search(query)
print(results[0].page_content)

You just built a semantic search engine.

Summary

A vector database stores and retrieves embeddings, enabling machines to search by meaning rather than exact matches.

They’re essential for:

Chatbots with memory
Semantic search
AI-powered search engines
RAG pipelines

What is Cosine Similarity?

Similarity between embeddings is usually calculated using cosine similarity:

Similarity(A, B) = (A · B) / (||A|| ||B||)

Ranges from -1 to 1
1 = Identical direction (most similar)
0 = Orthogonal (unrelated)

LangChain handles this internally when using similarity_search().

Best Practices

Use same model for doc/query to prevent mismatched meaning
Normalize content before embedding
Store metadata in chunks
Choose right vector store

What’s Next?

In Part 5, we’ll bring it all together using:

LangChain Chains + Output Parsers

So that the LLM can not just retrieve context — but generate structured, actionable answers!

Missed the earlier parts?

Building RAG Applications with LangChain(Part-3)

Dharmendra Singh — Mon, 23 Jun 2025 13:13:28 +0000

Part 3: Text Splitters — The Art of Chunking for LLMs

Series Progress

Part 1: RAG Architecture
Part 2: Document Loaders You’re here: Part 3 — Text Splitting

Why Do We Need Text Splitting?

Large documents can overwhelm LLMs if passed in raw. Text splitting is essential in Retrieval-Augmented Generation (RAG) systems for these reasons:

Breaks long documents into manageable, context-rich chunks
Improves vector search accuracy (better embeddings)
Enables retrieving only relevant content
Prevents exceeding token limits of LLM prompts

Without smart chunking, your RAG pipeline may hallucinate or return irrelevant results.

Key Concepts

Chunk Size

The maximum size of each split, typically in characters or tokens.

Bigger chunks = more context, but risk overflow
Smaller chunks = less context, but safer for prompt limits

Chunk Overlap

Extra content from the previous chunk to maintain continuity.

Helps the model retain context across chunks
Common values: 30–50 tokens or characters

Common Text Splitters in LangChain

LangChain offers various built-in splitters, each optimized for different use cases:

1. `CharacterTextSplitter`

Simple, general-purpose splitter by character length
Works well for raw or unstructured text

2. `RecursiveCharacterTextSplitter`

Smart splitter that tries to preserve structure (e.g., paragraphs, sections)
Ideal for Markdown, source code, or articles

3. `TokenTextSplitter`

Token-aware (e.g., works with OpenAI/Gemini tokens)
Prevents prompt overflow

4. `MarkdownHeaderTextSplitter`

Splits based on heading levels in Markdown documents
Great for blogs, technical docs, wikis

5. Language-Specific Splitters

e.g., PythonCodeSplitter
Maintains function/class blocks in source code files

How Do Text Splitters Work?

Step-by-Step Breakdown

Receive Raw Content Usually as Document objects loaded from PDFs, web pages, etc.
Choose a Splitting Strategy

By characters: \n, .,
By tokens: using tokenizer
By structure: headers, code blocks

3. Split into Segments

Uses a hierarchy: try largest delimiter first (\n\n → \n → . → space)
If still too long, falls back to character-level splits

4. Build Overlapping Chunks

Ensures each chunk fits within chunk_size
Adds chunk_overlap tokens for context preservation

5. Return New Document Chunks

Each chunk retains metadata (source, page number, etc.)

Code Example 1: Recursive Character Splitter

from langchain.text_splitter import RecursiveCharacterTextSplitter  

splitter = RecursiveCharacterTextSplitter(  
    chunk_size=500,  
    chunk_overlap=50  
)  

chunks = splitter.split_documents(documents)  

print(f"Chunks created: {len(chunks)}")  
print(chunks[0].page_content)

Recommended for most use cases.

Intelligently handles structure and fallback splitting._

Code Example 2: Token Splitter

from langchain.text_splitter import TokenTextSplitter  

splitter = TokenTextSplitter(  
    chunk_size=200,  
    chunk_overlap=20  
)  

chunks = splitter.split_documents(documents)

Useful when working with LLMs that have strict token limits (e.g., OpenAI, Gemini, Claude)._

Code Example 3: Markdown Header Splitter

from langchain.text_splitter import MarkdownHeaderTextSplitter  

md_text = """# RAG Tutorial  
LangChain is awesome.  

## Embeddings  
This is how it works."""  

splitter = MarkdownHeaderTextSplitter(  
    headers_to_split_on=[('#', 'H1'), ('##', 'H2')]  
)  

docs = splitter.split_text(md_text)

Best for docs, blogs, or tutorials with clear header structure_

Mini Workflow Example

from langchain.document_loaders import PyPDFLoader  
from langchain.text_splitter import RecursiveCharacterTextSplitter  

#Load PDF  
loader = PyPDFLoader("data/report.pdf")  
documents = loader.load()  

#Split into chunks  
splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=50)  
chunks = splitter.split_documents(documents)  

#Preview first chunks  
for chunk in chunks[:2]:  
    print(chunk.metadata)  
    print(chunk.page_content[:100])

Best Practices

Use RecursiveCharacterTextSplitter for general use
Always set chunk_overlap (30–50) to retain context
Keep chunk_size within your model’s max context window
Clean up input data before splitting (especially scanned PDFs)
Preserve original metadata (title, page number, etc.) in each chunk

TL;DR — What to Do / What to Avoid

Do This:

Use structure-aware splitters like RecursiveCharacterTextSplitter
Tune chunk_size and chunk_overlap to match your use case
Retain and attach document metadata to each chunk
Use token-aware splitting for LLM compatibility

Avoid This:

Splitting by fixed length without overlap
Using chunks that are too small or too large
Dropping metadata (leads to loss of context)
Ignoring token limitations of your LLM

Coming Next

In Part 4, we’ll explore Embeddings and Vector Stores — turning chunks into vectors and enabling semantic search through similarity.

Missed a Part?

Building RAG Applications with LangChain(Part-2)

Dharmendra Singh — Mon, 23 Jun 2025 13:01:54 +0000

Part 2: Document Loaders — Theory, Usage, and Examples

Welcome back to our LangChain RAG series!

In Part 1, we covered the architecture and theory behind Retrieval-Augmented Generation (RAG). We broke down the full pipeline and its major components.

Today, in Part 2, we’ll take our first deep dive — into the Document Loader component.

A RAG pipeline is only as good as the data you feed into it. That journey starts with document loaders_._

What Are Document Loaders?

Document loaders are responsible for reading raw content (from files, databases, URLs, APIs, etc.) and converting it into a format that LangChain can work with — typically a list of Document objects.

Each Document contains:

page_content — the actual text
metadata — file name, source, page number, etc.

Document(  
page_content="Technical support is available 24/7 through chat or phone.",  
metadata="faq.txt"  
)

Why Do We Need Document Loaders?

LLMs like GPT or Gemini can’t natively read PDFs, CSVs, or websites. You need to:

Extract text
Clean it up
Split it into chunks
Embed & retrieve it

Without good document ingestion, your RAG model is flying blind.

Supported Content Sources in LangChain

LangChain makes it easy to load and process content from a wide variety of source types. Whether you’re working with PDFs, web pages, or structured data, there’s likely a loader (or two) that fits your use case.

Common Source Types & Loaders

PDFs Use cases: Reports, eBooks, scanned documents Loaders: PyPDFLoader, PDFMinerLoader, UnstructuredPDFLoader
Text / Markdown Use cases: Notes, technical documentation, blog posts Loaders: TextLoader, MarkdownLoader
Word Documents Use cases: Contracts, resumes, letters Loader: Docx2txtLoader
Web Pages Use cases: Articles, blog content, public websites Loaders: WebBaseLoader (static), SeleniumURLLoader (JavaScript-heavy)
Images / OCR Use cases: Scanned forms, handwritten notes, image-based PDFs Loader: UnstructuredImageLoader
APIs & Structured Data Use cases: JSON files, databases, Google Sheets Approach: Use custom loaders or make direct

API/database calls to fetch content

How Document Loaders Work

Few examples how can load documents:

Example 1: Load a PDF

from langchain.document_loaders import PyPDFLoader  

loader = PyPDFLoader("files/ai_report.pdf")  
docs = loader.load()  

print(docs[0].page_content)  
print(docs[0].metadata)

Example 2: Load a website

from langchain.document_loaders import WebBaseLoader  
loader = WebBaseLoader("https://openai.com/research")  
docs = loader.load()

Example 3: Load a folder of `.txt` files

from langchain.document_loaders import TextLoader  
from pathlib import Path  

loaders = [TextLoader(str(file)) for file in Path("notes").glob("*.txt")]  
docs = []  
for loader in loaders:  
    docs.extend(loader.load())

Example 4: Load `.CSV` files

from langchain_community.document_loaders import CSVLoader  

loader = CSVLoader(file_path='Social_Network_Ads.csv')  

docs = loader.load()  

print(len(docs))  
print(docs[1])

Pro Tip: Use Metadata

Good metadata (e.g. page number, source file, date) can be used to:

Improve retrieval accuracy
Add filters (e.g. date, topic)
Show context in results

print(docs[0].metadata)

Choosing the Right Document Loader in LangChain

LangChain provides a wide range of document loaders tailored to different content types and use cases. Here’s a quick guide to help you choose the best one:

PyPDFLoader — Ideal for general PDF files with mostly text content.
PDFMinerLoader — Best for PDFs where layout and positioning of content matter.
UnstructuredPDFLoader — Great for scanned PDFs or those with mixed content (images + text).
WebBaseLoader — Use this for simple, static HTML web pages.
SeleniumURLLoader — Designed for JavaScript-heavy websites like Medium, LinkedIn, or dynamic dashboards.
TextLoader — Perfect for plain .txt or .md (Markdown) files.
Docx2txtLoader — Loads content from Microsoft Word .docx files.
Unstructured — A versatile loader for scanned images, documents, forms, and content in mixed or unknown formats.

Best Practices for Document Loading

Clean up raw content (remove headers/footers)
Store source info for traceability
Use RecursiveCharacterTextSplitter after loading
Combine multiple loaders in pipelines
Avoid unnecessary chunking during loading stage

Putting It All Together (Mini App)

from langchain.document_loaders import PyPDFLoader  
from langchain.text_splitter import RecursiveCharacterTextSplitter  

loader = PyPDFLoader("sample.pdf")  
documents = loader.load()  

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)  
docs = splitter.split_documents(documents)  

for doc in docs[:3]:  
    print(doc.metadata)  
    print(doc.page_content[:100])

Coming Up Next

In Part 3, we’ll explore Text Splitters — how to break large documents into chunks that actually work well with vector search and LLM prompts.

📖 Catch up on:

Part 1 — What is RAG & Why It Matters

Have a use case in mind?

Drop it in the comments! We’ll include community examples in upcoming parts.

Building RAG Applications with LangChain: Part 1

Dharmendra Singh — Mon, 23 Jun 2025 12:51:18 +0000

Welcome to a brand new series where we deep-dive into building RAG (Retrieval-Augmented Generation) applications using LangChain, LLMs (like ChatGPT/Gemini), and modern vector databases.

In the previous part of this series, we explored how to build foundational LLM applications using tools like chains, structured output parsers, prompt engineering, and more.

👉 If you’re not yet familiar with concepts like LangChain basics, prompt templates, output parsers, LCEL (LangChain Expression Language), and chains, I recommend checking out the earlier articles in this series for a solid foundation.

LangChain basics & LCEL

Part-2: Document Loader

Now, we’re taking it a step further: infusing LLMs with factual, external knowledge using RAG — one of the most important design patterns in LLM-powered systems today.

What is RAG(Retrieval-Augmented Generation)?

Retrieval: Retrieve relevant documents or passages based on the user query.

Augmentation: Use the retrieved documents as additional context.

Generation: Generate a response based on the retrieved content plus the user query.

Retrieval-Augmented Generation (RAG) is a technique that combines information retrieval and text generation. Instead of asking an LLM to generate answers from its internal knowledge alone, we first retrieve relevant documents from a data source and feed them into the prompt.

This allows LLMs to:

Generate responses grounded in external data
Work with up-to-date and domain-specific knowledge
Reduce hallucination
Enable enterprise and private data use

Think of RAG as “search + summarize” powered by an LLM.

Why Use RAG?

Retrieval-Augmented Generation (RAG) offers key advantages over using traditional LLMs alone. Here's how they compare:

Both Traditional LLMs and RAG-enabled LLMs are trained on large datasets.
Traditional LLMs cannot access real-time or private data, but
RAG-enabled LLMs can, via external sources or databases.
Traditional LLMs are prone to hallucinations, while RAG-enabled LLMs are more reliable due to grounding with real data.
Traditional LLMs often give generic or unverified answers, whereas RAG-enabled LLMs provide grounded, source-backed responses.
Traditional LLMs may not be ideal for production use alone, but RAG-enabled LLMs are well-suited for real-world production apps.

If you’re building apps like:

AI search assistants
Chat with PDFs or websites
Domain-specific Q&A
Legal/medical document readers …you’ll want RAG.

Core Components of a RAG Pipeline

Here’s a breakdown of each core building block in a LangChain-based RAG app:

1. Document Loader

LangChain offers a wide variety of document loaders to help you ingest and process data from various sources and formats. These loaders are essential for preparing unstructured data for use in LLM-powered applications.

Supported Sources

Local files (PDFs, text, markdown, etc.)
URLs and web pages
APIs and JSON endpoints
Databases (e.g., SQL, MongoDB)

Common Formats

PDF, CSV, Markdown, HTML, DOCX
Web pages and plain text
Notion, Airtable, and more

Under-the-Hood Tools

unstructured
BeautifulSoup
PyMuPDF
pdfminer.six
pypdf
html2text

Popular LangChain Loaders

PyPDFLoader – For reading PDF files
WebBaseLoader – For scraping and parsing content from web pages
UnstructuredFileLoader – For general-purpose file parsing using the unstructured library
BSHTMLLoader – Parses raw HTML using BeautifulSoup
CSVLoader – Ingests CSV files into document chunks
NotionDBLoader – Loads structured content directly from Notion databases
DirectoryLoader – Loads multiple documents from a folder in bulk

These loaders make it easy to turn raw content into structured Document objects ready for chunking, embedding, or retrieval.

from langchain.document_loaders import PyPDFLoader  
loader = PyPDFLoader("sample.pdf")  
documents = loader.load()

2. Text Splitter

Splits large texts into manageable chunks.
Improves vector relevance and performance.
Tools: RecursiveCharacterTextSplitter, TokenTextSplitter.

from langchain.text_splitter import RecursiveCharacterTextSplitter  
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)  
chunks = splitter.split_documents(documents)

3. Embeddings & Vector Store

Converts text chunks into numerical vectors.
Stores them in a vector database for similarity search.
Tools: OpenAIEmbeddings, GooglePalmEmbeddings, FAISS, Chroma, Pinecone.

from langchain.vectorstores import FAISS  
from langchain.embeddings import OpenAIEmbeddings  
db = FAISS.from_documents(chunks, OpenAIEmbeddings())

4. Retriever

Interfaces with the vector store to fetch similar documents based on a query.
Returns top k relevant chunks.

retriever = db.as_retriever(search_type="similarity", k=3)

5. Prompt Template

Formats the retrieved chunks and the user’s question into a single prompt.
May include instructions for the LLM.


template = """Use the context below to answer the question:  
{context}  
Question: {question}  
Answer:  
"""

6. LLM / ChatModel

The large language model (ChatGPT, Gemini, Claude) that processes the prompt.
Can be tuned for summarization, Q&A, or reasoning.

7. RAG Chain

LangChain lets you connect all these with a RetrievalQA or a custom LCEL chain.

from langchain.chains import RetrievalQA  

qa_chain = RetrievalQA.from_chain_type(  
    llm=llm,  
    retriever=retriever,  
    chain_type="stuff"  # or refine, map_reduce  
)  
qa_chain.run("What did the author say about LangChain?")

LCEL chain:

from langchain_core.prompts import PromptTemplate  
from langchain.chains.combine_documents.stuff import StuffDocumentsChain  
from langchain_core.runnables import RunnablePassthrough  

## Define a simple prompt  
prompt = PromptTemplate.from_template(  
    "Answer the following question based on the context:\n\n{context}\n\nQuestion: {question}"  
)  

## Combine documents using the 'stuff' method  
document_chain = prompt | llm  

## Build the full LCEL chain  
qa_chain = {  
    "context": retriever | RunnablePassthrough(),  
    "question": RunnablePassthrough()  
} | document_chain  

## Invoke the chain  
response = qa_chain.invoke("What did the author say about LangChain?")  
print(response)

RAG Flow Diagram

[Document Loader] → [Text Splitter] → [Embeddings] ↓ [Vector Store] ↑ [Retriever (k documents)] ↑ [User Query] → [Prompt Template] + [Docs] → [LLM] → [Answer]

Why RAG?

RAG bridges the gap between static LLMs and dynamic, real-world applications. Instead of retraining models, we teach them via retrieval — making them faster, safer, and more context-aware.

Whether you’re building internal tools, smart search engines, or AI copilots — RAG is a must-have skill.

This article outlines the complete technology stack we use to build Retrieval-Augmented Generation (RAG) applications, along with the reasoning behind their growing importance in the GenAI landscape. Beginning with this introduction, we’ll explore each component of the RAG architecture in detail. Once we’ve covered all the essential building blocks, we’ll move on to developing several real-world, end-to-end RAG applications.

Let’s get started — the RAG journey begins here.

Forem: Dharmendra Singh

Building RAG Applications with LangChain(Part-5)

Prompting, Chaining & Parsing: Structuring Smart, Reliable LLM Workflows

What This Part Covers

Prompt Engineering with LangChain

Components:

Example:

Another example:

Chains: Connecting Components Using LangChain Expression Language (LCEL)

Runnable Types:

Example: RunnableSequence Chain

Output Parsers: Making Results Structured

Example with Pydantic:

Real-World Workflow Example

Best Practices

Coming Up Next: Part 6

Missed the Previous Parts?

Building RAG Applications with LangChain(Part-4)

Embeddings & Vector Stores: Turning Text into Searchable Intelligence

What Are Embeddings?

How Embeddings Power RAG

What Are Vector Databases?

Key Idea is :

Popular Vector Databases

Vector Database use cases:

Code Example with Chroma

Summary

What is Cosine Similarity?

Best Practices

What’s Next?

Building RAG Applications with LangChain(Part-3)

Why Do We Need Text Splitting?

Key Concepts

Chunk Size

Chunk Overlap

Common Text Splitters in LangChain

1. CharacterTextSplitter

2. RecursiveCharacterTextSplitter

3. TokenTextSplitter

4. MarkdownHeaderTextSplitter

5. Language-Specific Splitters

How Do Text Splitters Work?

Step-by-Step Breakdown

Code Example 1: Recursive Character Splitter

Code Example 2: Token Splitter

Code Example 3: Markdown Header Splitter

Mini Workflow Example

Best Practices

TL;DR — What to Do / What to Avoid

Coming Next

Missed a Part?

Building RAG Applications with LangChain(Part-2)

Part 2: Document Loaders — Theory, Usage, and Examples

What Are Document Loaders?

Why Do We Need Document Loaders?

Supported Content Sources in LangChain

Common Source Types & Loaders

How Document Loaders Work

Example 2: Load a website

Example 3: Load a folder of .txt files

Example 4: Load .CSV files

Pro Tip: Use Metadata

Choosing the Right Document Loader in LangChain

Best Practices for Document Loading

Putting It All Together (Mini App)

Coming Up Next

Have a use case in mind?

Building RAG Applications with LangChain: Part 1

What is RAG(Retrieval-Augmented Generation)?

Why Use RAG?

Core Components of a RAG Pipeline

1. Document Loader

Supported Sources

Common Formats

Under-the-Hood Tools

Popular LangChain Loaders

2. Text Splitter

3. Embeddings & Vector Store

4. Retriever

5. Prompt Template

1. `CharacterTextSplitter`

2. `RecursiveCharacterTextSplitter`

3. `TokenTextSplitter`

4. `MarkdownHeaderTextSplitter`

Example 3: Load a folder of `.txt` files

Example 4: Load `.CSV` files