Forem: Zarrar Shaikh

PostgreSQL MCP Server with Built-in SSH Tunneling

Zarrar Shaikh — Sat, 27 Dec 2025 14:57:43 +0000

Zlash65 / postgresql-ssh-mcp

PostgreSQL MCP server with SSH tunneling for Claude Desktop and ChatGPT

PostgreSQL SSH MCP Server

A secure PostgreSQL MCP server with built-in SSH tunneling. Connect to databases through bastion hosts automatically — no manual ssh -L required.

Features

Dual Transport — STDIO for Claude Desktop, Streamable HTTP for ChatGPT
SSH Tunneling — Built-in tunnel with auto-reconnect and TOFU (trust on first use)
Read-Only by Default — Safe for production; enable writes explicitly
OAuth Support — Auth0 integration for secure ChatGPT connections
Connection Pooling — Efficient resource management with configurable limits

Architecture

Quick Start

Claude Desktop (STDIO)

Add to your Claude Desktop config:

Platform	Config Location
macOS	`~/Library/Application Support/Claude/claude_desktop_config.json`
Windows	`%APPDATA%/Claude/claude_desktop_config.json`

{
  "mcpServers": {
    "postgres": {
      "command": "npx",
      "args": ["-y", "@zlash65/postgresql-ssh-mcp"],
      "env": {
        "DATABASE_URI": "postgresql://user:password@localhost:5432/mydb"
      }
    }
  }
}

ChatGPT (Streamable HTTP)

DATABASE_URI="postgresql://user:pass@localhost:5432/mydb" npx @zlash65/postgresql-ssh-mcp-http

Then configure ChatGPT to connect to…

View on GitHub

Why SSH Tunneling Matters
Architecture
Deployment Options
- Method 1: Claude Desktop (Local STDIO)
- Remote HTTP Server Deployment
Available Tools
Security
Troubleshooting

Why SSH Tunneling Matters

Production databases live in private networks (VPCs, private subnets) isolated from the public internet. You can't simply provide a DATABASE_URL to an AI and expect it to connect.

Companies use bastion hosts as secure gateways to private infrastructure—servers that authenticate via SSH and act as the only entry point to internal resources.

Without SSH Tunneling:
   AI → Database URL → Connection Refused (private network)

With SSH Tunneling (this server):
   AI → MCP Server → SSH Tunnel → Bastion → Database

Traditional approach (manual):

# Terminal 1: SSH tunnel
ssh -L 5432:db.internal:5432 user@bastion.company.com

# Terminal 2: MCP server
DATABASE_URI=postgresql://localhost:5432/db npx @zlash65/postgresql-ssh-mcp

This server (automatic):

{
  "env": {
    "DATABASE_URI": "postgresql://db.internal:5432/db",
    "SSH_ENABLED": "true",
    "SSH_HOST": "bastion.company.com",
    "SSH_PRIVATE_KEY_PATH": "~/.ssh/id_rsa"
  }
}

The server handles tunnel establishment, database traffic forwarding, auto-reconnection on failure, and clean shutdown automatically.

Without built-in SSH tunneling, you're stuck with bad options: VPN for everyone (overhead), exposed databases (security risk), or manual SSH tunneling (requires terminal skills). Product managers, operations teams, and analysts who need database insights shouldn't have to learn SSH commands or manage private keys. Configure it once, and it just works.

Key Features:

🔐 Automatic SSH tunneling with trust-on-first-use and auto-reconnection
🛡️ Read-only by default with smart SQL validation (60+ blocked patterns)
🔄 Dual transport — STDIO for local, HTTP + OAuth for remote
📊 12 database tools — query, schema discovery, monitoring
⚡ Connection pooling with cursor-based result limiting

Try it now: npx @zlash65/postgresql-ssh-mcp

Architecture

Local:  Claude Desktop → STDIO → MCP Server → [SSH Tunnel] → PostgreSQL
Remote: Claude/ChatGPT → HTTPS → MCP HTTP Server → [SSH Tunnel] → PostgreSQL

Deployment Options

Method	Client	Transport	Use Case
1	Claude Desktop	STDIO	Local development, direct DB access
2	Claude Desktop	HTTP + OAuth	Remote DB via Connectors
3	ChatGPT	HTTP + OAuth	Remote DB via Developer Mode

Method 1: Claude Desktop (Local STDIO)

Best for: Local development, databases accessible from your machine

Step 1: Find Config File

Platform	Location
macOS	`~/Library/Application Support/Claude/claude_desktop_config.json`
Windows	`%APPDATA%/Claude/claude_desktop_config.json`

In Claude Desktop: Settings > Developer > Edit Config

Step 2: Add MCP Server

Basic (local database):

{
  "mcpServers": {
    "postgres": {
      "command": "npx",
      "args": ["-y", "@zlash65/postgresql-ssh-mcp"],
      "env": {
        "DATABASE_URI": "postgresql://user:password@localhost:5432/mydb"
      }
    }
  }
}

With SSH Tunnel (production database)

{
  "mcpServers": {
    "postgres-prod": {
      "command": "npx",
      "args": ["-y", "@zlash65/postgresql-ssh-mcp"],
      "env": {
        "DATABASE_URI": "postgresql://dbuser:dbpass@db.internal:5432/mydb",
        "SSH_ENABLED": "true",
        "SSH_HOST": "bastion.example.com",
        "SSH_USER": "ec2-user",
        "SSH_PRIVATE_KEY_PATH": "/Users/you/.ssh/id_rsa"
      }
    }
  }
}

Optional: SSH_PRIVATE_KEY_PASSPHRASE, SSH_MAX_RECONNECT_ATTEMPTS

Enable Write Mode

{
  "env": {
    "DATABASE_URI": "postgresql://user:password@localhost:5432/mydb",
    "READ_ONLY": "false"
  }
}

⚠️ Only enable for non-production databases.

Step 3: Restart Claude Desktop

Done! Ask Claude: "What tables are in my database?"

Remote HTTP Server Deployment

Best for: Shared team access, production databases, ChatGPT integration

This section covers deploying the MCP HTTP server with OAuth on a Linux server. Both Claude Desktop (via Connectors) and ChatGPT connect to the same server.

Prerequisites

Linux server (Ubuntu 22.04/24.04 recommended)
Domain name with DNS access
Firewall allowing ports 22, 80, 443
PostgreSQL database

Step 1: Configure DNS

Create an A record pointing your domain to your server's public IP:

Type	Name	Value
A	your-subdomain	Server IP

Verify: nslookup your-subdomain.example.com

Step 2: Install Dependencies

# Connect to server
ssh -i your-key.pem ubuntu@your-server-ip

# Update system
sudo apt update && sudo apt upgrade -y

# Install Node.js via nvm
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash
source ~/.bashrc
nvm install 22

# Install nginx and Certbot
sudo apt install -y nginx certbot python3-certbot-nginx

Step 3: Configure nginx

sudo vim /etc/nginx/sites-available/mcp

server {
    listen 80;
    server_name your-subdomain.example.com;

    location / {
        proxy_pass http://127.0.0.1:3000;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_read_timeout 86400;
    }
}

Enable and reload:

sudo ln -s /etc/nginx/sites-available/mcp /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx

Step 4: Obtain SSL Certificate

sudo certbot --nginx -d your-subdomain.example.com

# Verify auto-renewal
sudo certbot renew --dry-run

Step 5: Install MCP Server

sudo mkdir -p /opt/mcp-server
sudo chown ubuntu:ubuntu /opt/mcp-server
cd /opt/mcp-server

git clone https://github.com/zlash65/postgresql-ssh-mcp.git
cd postgresql-ssh-mcp
npm install && npm run build

Step 6: Configure Auth0

6.1: Create Auth0 Tenant

Sign up at Auth0
Set tenant domain (e.g., postgresql-ssh-mcp)
Select region

Your AUTH0_DOMAIN: {tenant-domain}.{region}.auth0.com

6.2: Create API

Applications > APIs > + Create API
Name: PostgreSQL SSH MCP
Identifier: https://your-subdomain.example.com/mcp (becomes AUTH0_AUDIENCE)
Signing Algorithm: RS256

6.3: Set Default Audience

Settings > General > API Authorization Settings
Set Default Audience to your API Identifier
Save

6.4: Enable DCR

Settings > Advanced
Enable Dynamic Client Registration (DCR)

6.5: Configure Database Connection

Authentication > Database > Username-Password-Authentication
Enable: Disable Sign Ups and Promote Connection to Domain Level

6.6: Create User

User Management > Users > + Create User
Connection: Username-Password-Authentication
Email and password

Remember these credentials for authentication.

Step 7: Configure Environment

vim /opt/mcp-server/postgresql-ssh-mcp/.env

# Database
DATABASE_URI=postgresql://user:password@your-db-host:5432/your-database
DATABASE_SSL=false
READ_ONLY=true

# SSH Tunnel (optional)
# SSH_ENABLED=true
# SSH_HOST=bastion.example.com
# SSH_USER=ubuntu
# SSH_PRIVATE_KEY_PATH=/home/ubuntu/.ssh/id_rsa

# HTTP Server
MCP_HOST=0.0.0.0

# Auth0
MCP_AUTH_MODE=oauth
AUTH0_DOMAIN=your-tenant.us.auth0.com
AUTH0_AUDIENCE=https://your-subdomain.example.com/mcp

Step 8: Create systemd Service

sudo vim /etc/systemd/system/postgresql-ssh-mcp.service

[Unit]
Description=PostgreSQL SSH MCP Server
After=network.target

[Service]
Type=simple
User=ubuntu
WorkingDirectory=/opt/mcp-server/postgresql-ssh-mcp
EnvironmentFile=/opt/mcp-server/postgresql-ssh-mcp/.env
Environment=PATH=/home/ubuntu/.nvm/versions/node/v22.21.1/bin:/usr/bin:/bin
ExecStart=/home/ubuntu/.nvm/versions/node/v22.21.1/bin/node /opt/mcp-server/postgresql-ssh-mcp/dist/http.js
Restart=on-failure
RestartSec=10

[Install]
WantedBy=multi-user.target

Check your Node.js path with which node and update accordingly.

sudo systemctl daemon-reload
sudo systemctl enable postgresql-ssh-mcp
sudo systemctl start postgresql-ssh-mcp

Step 9: Verify Deployment

# Check status
sudo systemctl status postgresql-ssh-mcp

# View logs
sudo journalctl -u postgresql-ssh-mcp -f

# Test health endpoint
curl https://your-subdomain.example.com/health

Expected response:

{"status": "ok", "timestamp": "...", "version": "1.x.x"}

Useful Commands

Command	Description
`sudo systemctl restart postgresql-ssh-mcp`	Restart service
`sudo journalctl -u postgresql-ssh-mcp -f`	Live logs
`sudo nginx -t && sudo systemctl reload nginx`	Reload nginx
`sudo certbot renew`	Renew SSL

Method 2: Connect Claude Desktop (via Connectors)

Open Claude Desktop Settings > Connectors
Click on the Add custom connector button

Click Add
Authenticate with Auth0 (credentials from Step 6.6)

Method 3: Connect ChatGPT

Profile > Settings > Apps > Enable Developer Mode
Create App:
- Name: PostgreSQL SSH MCP
- MCP Server URL: https://your-subdomain.example.com
- Authentication: OAuth (leave Client ID/Secret empty)

Authenticate with Auth0 (credentials from Step 6.6)

In chat: + > More > Select your MCP server

Available Tools

Query

Tool	Description
`execute_query`	SQL with parameterized queries (results capped by `MAX_ROWS`)
`explain_query`	EXPLAIN plans in text/JSON/YAML/XML

Schema

Tool	Description
`list_schemas`	Database schemas (excludes system)
`list_tables`	Tables with row counts and sizes
`describe_table`	Columns, constraints, indexes
`list_databases`	All databases with sizes

Monitoring

Tool	Description
`get_connection_status`	Pool stats, tunnel state
`list_active_connections`	Active connections
`list_long_running_queries`	Slow queries
`get_database_version`	PostgreSQL version
`get_database_size`	Size breakdown
`get_table_stats`	Vacuum/analyze stats

Security

Read-only by default — The server blocks INSERT, UPDATE, DELETE, DROP, TRUNCATE, ALTER, CREATE, GRANT, REVOKE, even SELECT INTO and PREPARE/EXECUTE.

SSH Host Key Verification — Trust-on-First-Use (TOFU) saves unknown keys on first connection and verifies subsequent connections. Disable with SSH_TRUST_ON_FIRST_USE=false.

Connection string obfuscation — Passwords are redacted in logs: postgresql://user:***@host:5432/mydb

Query limits — MAX_ROWS=1000, QUERY_TIMEOUT=30000, MAX_CONCURRENT_QUERIES=10

CORS/Origin validation (HTTP mode):

MCP_ALLOWED_ORIGINS="https://chatgpt.com,https://chat.openai.com"
MCP_ALLOWED_HOSTS="your-subdomain.example.com"

Environment Variables Reference

# Database
DATABASE_URI="postgresql://user:pass@host:5432/db"
DATABASE_SSL="true"
DATABASE_SSL_CA="/path/to/ca.pem"

# SSH Tunnel
SSH_ENABLED="true"
SSH_HOST="bastion.example.com"
SSH_USER="ec2-user"
SSH_PRIVATE_KEY_PATH="/path/to/key"
SSH_TRUST_ON_FIRST_USE="true"
SSH_MAX_RECONNECT_ATTEMPTS="5"

# Query
READ_ONLY="true"
MAX_ROWS="1000"
QUERY_TIMEOUT="30000"

# HTTP Server
PORT="3000"
MCP_AUTH_MODE="oauth"
AUTH0_DOMAIN="tenant.us.auth0.com"
AUTH0_AUDIENCE="https://your-domain.com"

Troubleshooting

Claude Desktop: Server Not Starting

Check logs:

macOS: ~/Library/Logs/Claude/mcp*.log
Windows: %APPDATA%/Claude/logs/mcp*.log

Common issues: Invalid DATABASE_URI, SSH key permissions (should be 600), PostgreSQL not accessible.

ChatGPT: "Unable to connect"

# Verify server
curl https://your-subdomain.example.com/health

# Check OAuth metadata
curl https://your-subdomain.example.com/.well-known/oauth-protected-resource

Verify Auth0: Default Audience set, DCR enabled, database connection promoted.

401 Unauthorized

AUTH0_AUDIENCE must match API Identifier exactly
Check Default Audience in Auth0 tenant settings
Tokens may have expired

Resources

Star on GitHub

Fullstack RAG PDFBot - From Prototype to Production-Ready-ish

Zarrar Shaikh — Mon, 07 Jul 2025 12:06:15 +0000

🔗 If you're new to this project, start with the original guide here:

Building a RAG-powered PDF Chatbot - V1

🔗 Follow-up guide after the first iteration of the bot:

Refactoring RAG PDFBot - V2

In Version 2, we built on our Version 1 foundation by splitting everything into separate files. That was great - we cleaned up our monolithic code and gave our chatbot more structure. But let’s be honest: everything still lived inside one Streamlit app. The logic for uploading files, generating answers, and even managing the vector store - all of it was handled inside Streamlit. That’s fine for a prototype, but not quite production-ready.

With Version 3, we’ve taken a major step forward.

📦 Source Code V3: Zlash65/rag-bot-fastapi

🚀 What’s New in Iteration 3?

We’ve split the application into a real Frontend and Backend:

Frontend: Built using Streamlit, it handles all the UI.
Backend: Powered by FastAPI, it takes care of PDF processing, vector storage, querying, and AI interactions.

✅ Why this split?

Separation of Concerns: UI doesn’t need to know how AI logic or embeddings work.
Flexibility: Want to use Gradio, or React for your UI? Now you can, without touching the backend.
Scalability: This separation allows better logging, monitoring, and potential deployment on different servers.

👆 Here's a quick look

🧱 Our Project Structure

We now have two separate folders:

📂 `client/` - Streamlit Frontend

client/
├── app.py                      # Main entrypoint for Streamlit
├── components/                 # Chat UI, inspector, sidebar
│   ├── chat.py
│   ├── inspector.py
│   └── sidebar.py
├── state/
│   └── session.py              # Session setup and helper functions
├── utils/
│   ├── api.py                  # API calls to FastAPI server
│   ├── config.py               # API URL config
│   └── helpers.py              # High-level API abstractions
├── requirements.txt
└── README.md

Stateless API interactions via requests
UI elements handled via sidebar, chat input, and toggleable views
Modular components for Chat, Inspector, and Uploads

📂 `server/` - FastAPI Backend

server/
├── api/
│   ├── routes.py               # API endpoints for upload, chat, models etc.
│   └── schemas.py              # Input/output data validation with Pydantic
├── core/
│   ├── document_processor.py   # PDF handling: save, chunk, split
│   ├── llm_chain_factory.py    # LLM, embeddings, chain creation
│   └── vector_database.py      # ChromaDB handling: load, upsert, search
├── config/
│   └── settings.py             # API keys, model setup, directories
├── utils/
│   └── logger.py               # Logging for debugging and monitoring
├── main.py                     # FastAPI app setup
├── requirements.txt
└── README.md

Our backend now has full control over:

What LLM is used
Which model the user selects
How PDFs are stored and processed
What embeddings we generate
What responses we send back

This means we can extend things easily - add another model, another embedding technique, or change vector store - without touching our frontend.

🔄 What Changed from Iteration 2

Here’s a quick breakdown of how we evolved:

Feature	Iteration 2	Iteration 3
Codebase	One Streamlit app	Separate client (UI) + server (logic)
PDF Handling	Inside frontend	Via FastAPI API
LLM Response	Direct from Streamlit	API-based response
Embeddings + Vectorstore	Managed by UI	Fully controlled by backend
Inspector	Inside sidebar (cramped)	Main UI toggle - cleaner
Extending Models	Needed code change in UI	Plug-and-play via config
File Validation	None	PDF size/type check in backend
Future Extensions	Hard	Clean hooks for scaling
UX	Basic	Toggle-based views, downloads, resets
Text Splitting	RecursiveCharacterTextSplitter	TokenTextSplitter (LLM-aware, cleaner splits)

🔍 Why We Switched to TokenTextSplitter

In earlier versions, we used RecursiveCharacterTextSplitter to chunk our documents. It works by splitting the text at "natural" breakpoints - like paragraphs, then sentences, then words, then characters - to get close to the target chunk size in characters.

But here's the problem: LLMs like GPT, Claude, or Gemini don’t read text in characters - they read tokens. A token is roughly 3–4 characters or 0.75 words. That means your 1000-character chunk might be 300 tokens… or 1200 tokens. It’s unpredictable.

To fix this, we now use TokenTextSplitter, which splits based on actual token counts, giving precise control over chunk size and overlap. This leads to more reliable inputs and avoids going over model limits.

🔬 Simple Example

Let’s take this sentence:

"LangChain helps developers build applications with LLMs more efficiently."

That’s 76 characters but around 12 tokens.

RecursiveCharacterTextSplitter

With RecursiveCharacterTextSplitter(chunk_size=30), we might get:

Chunk 1: "LangChain helps developers "
Chunk 2: "build applications with LLMs "
Chunk 3: "more efficiently."

Visually clean, but token count varies and could overflow model limits.

⚠️ Notice how the bot was not able to give correct response to our question because of improper chunking

TokenTextSplitter

With TokenTextSplitter(chunk_size=10, chunk_overlap=2), we get chunks like:

Chunk 1: ["Lang", "Chain", "helps", ..., "efficiently"]
Chunk 2: ["applications", ..., "efficiently"]

Each chunk is exactly 10 tokens, making it predictable and LLM-friendly.

✅ We get more accurate responses when splitting chunks by token

By using TokenTextSplitter, we gain better control, consistency, and contextual accuracy - making our RAG pipeline more reliable.

✨ Better User Experience

Previously, the inspector tool was a bit hidden. We crammed it into the sidebar and showed the results there too - not the best experience.

In this iteration, we made it visible in the main chat area. There's a toggle in the sidebar where we can switch between:

💬 Chat View
🔬 Inspector View

Now, we get full-width, readable responses whether we’re chatting with PDFs or inspecting our vectorstore. It’s simple and intuitive.

⚡ Why a Production-Ready Backend Matters

Splitting the codebase into frontend and backend isn’t just good structure - it unlocks real power:

Async APIs by Default: Our FastAPI backend supports async endpoints. That means heavy operations like PDF uploads can later be offloaded to background task queues like Celery or RQ, keeping the app responsive.
Plug-and-Play Model Integration: Want to add OpenAI, Cohere, or any other LLM provider? Just update the model config in settings.py. The frontend automatically reflects the new options - no need to touch UI code.
Independent Scalability: The backend can now be scaled separately. You could deploy it on a more powerful server or container, while keeping the Streamlit frontend lightweight.
Extendability: You can now plug in:
- Authentication & authorization
- Persistent chat history
- User sessions
- Rate limiting
- Admin dashboards
- and more...
Cleaner Logs & Traceability: Errors, API calls, and internal processing can now be logged systematically using utils/logger.py.
Ready for Containerization: Frontend and backend can be deployed on different services and containers (e.g. Streamlit Cloud + Render, EC2, etc).
Frontend Agnostic: Want a more custom UI? You can now build one in React, Gradio, or even mobile - and keep using the same backend APIs.

🌐 How the Frontend Talks to the Backend

The Streamlit frontend acts purely as a UI renderer. Every interaction routes through the FastAPI backend via well-defined HTTP endpoints:

Model & Provider Fetching:
- GET /llm → Fetches available providers.
- GET /llm/{model_provider} → Fetches models for the selected provider.
PDF Upload & Processing:
- POST /upload_and_process_pdfs → Uploads selected PDFs, splits them, creates embeddings, and stores them.
Inspector Tools:
- GET /vector_store/count/{model_provider} → Gets the number of indexed documents.
- POST /vector_store/search → Returns top document matches for a query.
Chat Endpoint:
- POST /chat → Sends user message + model info, and returns LLM-generated response.

This setup ensures separation of responsibilities:

Frontend: handles layout, inputs, displaying results.
Backend: handles logic, computation, storage, and integration with LLMs.

🏗️ Why We Still Use Streamlit for the Frontend

While we’ve upgraded our backend, we’re sticking with Streamlit for the frontend - for now. Here’s why:

🧱 Rapid Prototyping: We can build interactive UIs in minutes, not days.
💬 Built-in Components: Features like chat_input, expander, sidebar, and st.tabs simplify layout.
🧪 Focus on Learning AI: We avoid the overhead of building a custom UI in React or HTML/CSS - saving our energy for improving LLM workflows.

Eventually, we’ll likely switch to a custom-built frontend. But until then, Streamlit lets us move fast and learn faster.

📦 Architecture Benefits at a Glance

Here’s what we gain by this separation of concerns:

✅ Maintainability: Code is modular and easier to debug or extend.
✅ Scalability: Frontend and backend can grow independently.
✅ Developer Experience: No fear adding new models, chains, or workflows.
✅ Deployment Flexibility: Deploy frontend and backend to different services with ease.
✅ Tooling Support: Easier to add monitoring, tracing, logging, background jobs, or security layers.

This structure mirrors how real-world AI products are built.

🔁 Recap

Let’s summarize our journey so far:

Iteration 1: A single-file prototype with Streamlit and FAISS. Quick and dirty.
Iteration 2: Modularized the logic but still kept everything inside one Streamlit app.
Iteration 3: Split into a decoupled frontend (Streamlit) and backend (FastAPI), creating a scalable, production-leaning RAG bot.

From hacking things together to building an extensible, maintainable system - we're not just playing with AI anymore. We're engineering real tools.

📦 Source Code

Version 1: Zlash65/rag-bot-basic

Version 2: Zlash65/rag-bot-chroma

Version 3: Zlash65/rag-bot-fastapi

💭 Final thoughts

We started with a single file prototype. Then, we broke things into modules. Now, we’ve split the app into an actual frontend and backend. And that’s a huge deal.

If you’ve made it this far, you’ve not only built something functional - you’ve learned how real-world AI tools are structured.

Don't stop here. Keep exploring. Keep tweaking. Build weird stuff. Break things and fix them.

This is how engineers grow.

Let’s keep shipping and improving - one iteration at a time.

Happy building! 🛠️

Refactoring RAG PDFBot: Modular Design with LangChain, Streamlit and ChromaDB

Zarrar Shaikh — Sat, 05 Jul 2025 20:04:50 +0000

🔗 If you're new to this project, start with the original guide here: Building a RAG-powered PDF Chatbot

In real-world production systems, it’s common practice to split responsibilities into multiple well-defined modules. Instead of cramming everything into a single file, code is grouped based on functionality - making it easier to debug, scale, and maintain. In this version of the RAG PDFBot, we’re simulating that same structure.

In this post we will walk through how we can evolve our original chatbot into a modular, production-style app using LangChain, ChromaDB, and Streamlit.

👆 Here's a quick look at what you'll be building in this guide.

📦 Source Code: Zlash65/rag-bot-chroma

🧱 What's New in the Modular Version

Area	Original Version	Modular Edition
File Structure	One big `app.py` file	Multiple logical modules (chat, sidebar, LLM, PDF, vectorstore, config)
PDF Parser	PyPDF2	Switched to `pypdf`
Embedding Store	FAISS	Switched to `ChromaDB` (for learning & experimentation)
LLM Chains	Simple `load_qa_chain`	LangChain `RetrievalChain` with structured prompts
Prompting	Static prompt template	Modular prompt with system/human roles
Dev Tools	None	Built-in vectorstore inspector

🔁 From FAISS to ChromaDB

Both FAISS and ChromaDB are popular options for storing and searching vector embeddings.

⚠️ In this version, we're switching to ChromaDB - not because FAISS isn't good, but to experiment with a different vector database and learn its tradeoffs.

Feature	FAISS	ChromaDB
Persistence	In-memory by default (requires manual saving/loading with `.save_local()` / `.load_local()`)	Persistent by default (creates `chroma` directory and auto-saves)
Setup Complexity	Simple for in-memory; more manual steps for persistence	Plug-and-play with auto-persistence
Metadata Support	Stores metadata, but querying/filtering support is limited	Rich metadata filtering and querying support
Built-in Filtering	Minimal (not intuitive for metadata-based filtering)	Native filtering with conditions on metadata
Performance	Highly optimized for similarity search at scale (especially with GPU)	Good performance, but not optimized for billion-scale datasets
Indexing Options	Multiple indexing algorithms (Flat, IVF, HNSW, etc.)	Abstracted away - we don't control indexing

Use FAISS if:

You want high performance similarity search.
You’re comfortable managing manual persistence.
You’re deploying on-device or at scale, especially with GPU acceleration.

Use ChromaDB if:

You want auto-persistence with minimal setup.
You need metadata filtering (e.g., retrieve only documents from a specific source).
You're in rapid prototyping mode and want a simple dev experience.

Code Snippet: ChromaDB Setup

from langchain.vectorstores import Chroma

def create_chroma_vectorstore(chunks, embedding):
    vectorstore = Chroma.from_texts(
        texts=chunks,
        embedding=embedding,
        persist_directory="./data/chroma_store"
    )
    return vectorstore

🔍 VectorStore Inspector

One of the highlights of this version is we added a vectorstore inspector.

In the previous version, the vectorstore was a black box. Now, we can:

Run ad-hoc test queries
See matching chunks returned from Chroma
Visually debug which documents were used for answering

Code Snippet: Vector Inspector

def inspect_vectorstore(vs):
    st.subheader("🔬 Vectorstore Inspector")
    query = st.text_input("Enter a test query")
    if query:
        results = vs.similarity_search(query)
        for i, doc in enumerate(results):
            st.markdown(f"**Result {i+1}**")
            st.code(doc.page_content.strip())

Example:

🧠 Improved Prompt & Chain Logic

In the original version, we used:

load_qa_chain(llm, chain_type="stuff", prompt=...)

That worked - but now we’re using LangChain's RetrievalQA with a cleaner, modular prompt built using ChatPromptTemplate.

Code Snippet: New Chain Logic

from langchain.chains import RetrievalQA
from langchain.prompts import ChatPromptTemplate

def get_qa_chain(llm, retriever):
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a helpful assistant. Use the provided context to answer."),
        ("human", "{question}")
    ])
    return RetrievalQA.from_chain_type(
        llm=llm,
        retriever=retriever,
        chain_type="stuff",
        chain_type_kwargs={"prompt": prompt}
    )

Why it’s better:

We separate system and user roles clearly
It's easier to extend for follow-up questions or history
It aligns better with modern LLM chat paradigms

🧩 UI & Handler Logic: Cleaner, Separated, Smarter

The user interface behavior is mostly the same - but under the hood, it's been broken into logical handlers:

File	Role
`sidebar_handler.py`	Handles model selection, API key input, PDF upload, and utility buttons
`chat_handler.py`	Handles rendering chat bubbles, input box, and chat history download
`llm_handler.py`	Manages chain and prompt setup for different model providers
`vectorstore_handler.py`	Embeds and stores PDF chunks into ChromaDB
`pdf_handler.py`	Extracts and chunks text from uploaded PDFs
`developer_mode.py`	Adds optional vectorstore inspector
`config.py`	Holds model metadata and keys from `.env`

🎛️ Smarter UI Behavior with `disabled` Components

Previous Version

We used conditions like:

if not model_provider:
    return

Which meant entire sections of the UI wouldn’t render at all until something was selected.

Example

Current Version

In this version, all components are always rendered, but disabled until their prerequisites are met.

Why this matters:

The UI feels more responsive and intuitive
Users can "see" what steps are required
No jumping around or missing UI elements

Example:

The model select dropdown is active only after choosing a provider
The pdf uploader is active only after choosing a model
The chat input is shown but disabled until PDFs are submitted

This approach improves clarity, especially for new users.

🚀 Want to Try It?

You can find the full source code here 👉 Zlash65/rag-bot-chroma

git clone https://github.com/Zlash65/rag-bot-chroma.git
cd rag-bot-chroma

python3 -m venv venv
source venv/bin/activate

pip3 install -r requirements.txt

Create a .env file for your API keys:

GROQ_API_KEY=your-groq-key
GOOGLE_API_KEY=your-google-key

Then launch the app:

streamlit run app.py

💭 Final Thoughts

This version of RAG PDFBot isn’t just a refactor - it’s a learning step toward building production-grade RAG apps. With ChromaDB, internal tools, modular code, and more intuitive UI, it's easier to maintain and extend.

Still learning?

👉 Start here: Building a RAG-powered PDF Chatbot

Then come back and modularize like a pro.

Happy building! 🛠️

Building a RAG-powered PDF Chatbot with LangChain, Streamlit and FAISS

Zarrar Shaikh — Fri, 04 Jul 2025 11:15:10 +0000

In this guide, we’re going to build a working AI chatbot that can read our PDFs and answer questions from them. We’ll use a method called Retrieval-Augmented Generation (RAG) to help our chatbot connect the dots between static AI models and our own documents.

The thing with large language models (LLMs) is - they’re great with language, but they don’t know anything about our specific data. Their knowledge is frozen at the point they were trained. They can’t read our lease agreements, product manuals, or meeting notes. That’s where RAG comes in.

RAG gives us a way to feed our data into the model. So instead of asking, “When does this lease expire?” and hoping the AI knows what we’re talking about, we give it the lease PDF - and it finds the answer based on what’s actually written in there.

Some useful ways we can apply this:

Legal documents - Ask the AI to summarize a case or find key terms.
Research papers - Get summaries or compare studies without reading it all ourselves.
Customer support - Use our product docs and chat history to create a smart support assistant.
HR/Policy docs - Help our team quickly find rules, policies, and procedures.

We’ll walk through this step by step, using:

Streamlit for a simple, clean frontend
LangChain to handle the LLM logic
FAISS for fast vector search over our document content

Our goal is to build something useful, easy to understand, and flexible enough to plug in any LLM we want later.

Let’s get started.

🏗️ What We're Building

Working Demo

📐 System Design

We’re building RAG PDFBot - a chatbot that lets us upload PDFs, choose a model provider (like Groq or Gemini), and ask natural-language questions based on the content of those PDFs.

All the core pieces - PDF parsing, embedding, vector search, and LLM interaction - will be stitched together using LangChain, FAISS, and Streamlit.

And we’re keeping the architecture modular, so we can plug in new models or features anytime.

📚 Core Concepts (Explained with Real Examples)

Before we dive into the code, let’s quickly go over the key ideas behind what we’re building - explained in a way that makes sense without needing a PhD.

Large Language Models (LLMs)

LLMs are programs trained to guess the next word in a sentence - but they’ve seen billions of examples, so their guesses are usually spot on.

For example:

What is the capital of France?
Paris.

We’ll use an LLM later in our chatbot to generate answers based on the content we retrieve from our PDFs.

Embeddings

Embeddings turn our text into numbers - or more accurately, into vectors. These numbers capture meaning. For example:

“Cats drink milk.” → [0.12, -0.34, 0.89, …]
“Kittens consume dairy.” → [0.10, -0.31, 0.87, …]

The two sentences mean nearly the same thing, and their embeddings are close too. That’s how we compare meanings using math, not just words.

Vectors and Vector Databases

A vector is just a list of numbers. A vector database stores lots of these vectors and helps us search by meaning.

Let’s say we ask:

“Who signed this document?”

Instead of keyword matching, our app finds chunks like:

“Authorized representative: John Doe”

We’re using FAISS - a super fast, lightweight vector store that runs locally and keeps things snappy.

Retrieval-Augmented Generation (RAG)

This is the secret sauce. Instead of dumping the entire PDF into the LLM, we do something smarter:

Retrieval - We find the chunks that best match our question
Augmented - We add those chunks to the prompt
Generation - The model uses them to craft a relevant answer

So when we ask a question, it doesn’t guess blindly - it answers based on the actual content from our documents.

🧬 Stack Overview

Here’s what we’re using to build our chatbot - and why each piece matters:

LangChain
The glue holding everything together. LangChain helps us connect to LLMs, manage prompts, load chains, and run contextual queries. It saves us from writing a ton of boilerplate.
Streamlit
A Python-based web framework that turns our scripts into a working UI - no need to mess with HTML or JavaScript. Just write Python functions, and Streamlit builds the interface.
Groq
A service that runs open-source models like LLaMA 3 at lightning speed using custom chips. It’s perfect for low-latency responses. And with their generous free tier, it’s great for getting started.
Google Gemini
Google’s LLM platform and a strong alternative to ChatGPT. We can access advanced models like Gemini Flash for reasoning and dialogue. The free tier gives us more than enough for prototyping.
FAISS
A fast, local vector store from Facebook AI. It lets us search our document chunks by meaning, not just keywords. Lightweight, efficient, and easy to use with embeddings.

💡 We’re skipping OpenAI for now because of cost - both Groq and Gemini have fast and generous free tiers.

🧰 Setting Up the Project

Let’s start by creating a fresh project from scratch - and version-controlling it right from the beginning.

1. Create a New GitHub Repository

Head over to your GitHub profile and create a new repository.
You can name it something like rag-pdf-chatbot.

Keep it public or private - up to you.
Add a .gitignore and choose the Python template.

Once created, copy the repo’s URL (use HTTPS).

2. Clone the Repo to Your Local Machine

Open your terminal and run:

git clone https://github.com/your-username/rag-pdf-chatbot.git
cd rag-pdf-chatbot

Replace your-username with your actual GitHub username.

3. Set Up a Virtual Environment

Let’s keep our project dependencies clean and isolated from the rest of our system:

python3 -m venv .venv
source .venv/bin/activate  # For MacOS / Linux

🧪 Why use a virtual environment?
It keeps our project’s packages separate from everything else on our machine. This avoids version conflicts and makes our setup easier to manage and share.

🧩 Code Overview

Directory Structure

rag-pdf-chatbot/
│
├── app.py                 # Streamlit frontend
├── requirements.txt       # Dependencies
├── data/                  # FAISS vector store
├── README.md              # Description and instructions
├── .env                   # API keys (not committed to Git)

`requirements.txt`

pandas
PyPDF2
streamlit
faiss-cpu
langchain
langchain-community
langchain-groq
langchain-google-genai
sentence-transformers

Install with:

pip install -r requirements.txt

`app.py`

from datetime import datetime

import pandas as pd
import streamlit as st
from PyPDF2 import PdfReader

from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.question_answering import load_qa_chain

from langchain.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

from langchain_google_genai import (
  GoogleGenerativeAIEmbeddings,
  ChatGoogleGenerativeAI
)
from langchain_groq import ChatGroq

These cover everything from PDF parsing and embedding to using Groq and Gemini with LangChain.

💡 If you plan to use a different LLM provider later (like OpenAI or Mistral), you can add their package when needed.

📂 Full Code Reference

You can find the complete code for this project here:

👉 GitHub Repo - Zlash65/rag-bot-basic

Feel free to give it a ⭐ if you like it!

🔍 Full Walkthrough of `main()`

This walkthrough follows the top-down order of the main() function in your app and explains utility functions in-depth as they're introduced.

1. 🚀 App Initialization and State Setup

def main():
  st.set_page_config(page_title="RAG PDFBot", layout="centered")
  st.title("👽 RAG PDFBot")
  st.caption("Chat with multiple PDFs :books:")

The app starts by setting a page title and layout, followed by a heading and caption. This makes the UI friendly and helps users immediately understand the purpose of the app.

  for key, default in {
    "chat_history": [],
    "pdfs_submitted": False,
    "vector_store": None,
    "pdf_files": [],
    "last_provider": None,
    "unsubmitted_files": False,
  }.items():
    if key not in st.session_state:
      st.session_state[key] = default

This loop initializes st.session_state with default values. Streamlit reruns the script on every interaction, so we use the session state to persist key pieces of information across reruns, such as uploaded files, model selections, chat history, and whether reprocessing is needed.

2. 🎛️ Sidebar: Model Configuration UI

  with st.sidebar:
    with st.expander("⚙️ Configuration", expanded=True):

The sidebar groups all model-related configurations inside an expandable section to keep the interface organized and uncluttered.

🧾 Below the imports, define model options.

MODEL_OPTIONS = {
  "Groq": {
    "playground": "https://console.groq.com/",
    "models": ["llama-3.1-8b-instant", "llama3-70b-8192"]
  },
  "Gemini": {
    "playground": "https://ai.google.dev",
    "models": ["gemini-2.0-flash", "gemini-2.5-flash"]
  }
}

🧾 Inside the sidebar expander, add this.

      model_provider = st.selectbox(
        "🔌 Model Provider",
        ["Select a model provider", "Groq", "Gemini"],
        index=0,
        key="model_provider"
      )

      if model_provider == "Select a model provider":
        return

The user is required to pick either Groq or Gemini as their LLM provider. If nothing is selected, the function returns early. This prevents the rest of the interface from loading and avoids initializing model-specific logic prematurely.

🧾 Still inside the same sidebar expander, add this.

      api_key = st.text_input(
        "🔑 Enter your API Key",
        help=f"Get API key from [here]({MODEL_OPTIONS[model_provider]['playground']})"
      )
      if not api_key:
        return

This input prompts the user for their API key. It dynamically links to the relevant model provider's API dashboard. Again, if no key is entered, we return early, which ensures no model or embedding operations run without credentials.

🧾 Still inside the same sidebar expander, add this.

      models = MODEL_OPTIONS[model_provider]["models"]
      model = st.selectbox("🧠 Select a model", models, key="model")

The available model options are loaded based on the selected provider. For Groq, it includes LLaMA variants, and for Gemini, it includes Gemini 2.0 and 2.5.

3. 📥 PDF Upload and Submission

🧾 Still inside the same sidebar expander, add this.

      uploaded_files = st.file_uploader(
        "📚 Upload PDFs",
        type=["pdf"],
        accept_multiple_files=True,
        key="pdf_uploader"
      )

      if uploaded_files and uploaded_files != st.session_state.pdf_files:
        st.session_state.unsubmitted_files = True

Users can upload one or more PDFs. If the uploaded files differ from the current state, we flag them as “unsubmitted.” This allows us to prompt users later to submit their files explicitly, avoiding silent reprocessing.

🧾 Still inside the same sidebar expander, add this.

      if st.button("➡️ Submit"):
        if uploaded_files:
          with st.spinner("Processing PDFs..."):
            process_and_store_pdfs(uploaded_files, model_provider, api_key)
            st.session_state.pdf_files = uploaded_files
            st.session_state.unsubmitted_files = False
            st.toast("PDFs processed successfully!", icon="✅")
        else:
          st.warning("No files uploaded.")

When the user clicks Submit, we process the uploaded files and generate their vector representation. This is done only after confirmation to prevent accidental reprocessing or loading large documents unintentionally.

3.1 ⚙️ Utility: `process_and_store_pdfs()`

🧾 Add this function above your main() function.

def process_and_store_pdfs(pdfs, provider, api_key):
  raw_text = get_pdf_text(pdfs)
  chunks = get_text_chunks(raw_text)
  store = get_vectorstore(chunks, provider, api_key)
  st.session_state.vector_store = store
  st.session_state.pdfs_submitted = True

This function is the core of the ingestion pipeline. It extracts all text from the uploaded PDFs, chunks that text into manageable overlapping segments, embeds them, stores them in a FAISS vectorstore, and then keeps the store in session memory.

3.2 📄 `get_pdf_text()`

🧾 Add this function above your process_and_store_pdfs() function.

def get_pdf_text(pdf_files):
  text = ""
  for file in pdf_files:
    reader = PdfReader(file)
    for page in reader.pages:
      text += page.extract_text() or ""
  return text

This function loops through each uploaded PDF and extracts raw text from every page using PyPDF2. If a page doesn't contain extractable text (like a scanned image), extract_text() may return None, so we use or "" to ensure the process doesn’t fail. The function returns a long string containing all text concatenated together.

Example:
If you upload a PDF with 2 pages:

Page 1: "Terms and Conditions"
Page 2: "Refunds will not be issued after 30 days."

Then this function will return:
"Terms and ConditionsRefunds will not be issued after 30 days."

3.3 ✂️ `get_text_chunks()`

🧾 Add this function below your get_pdf_text() function.

def get_text_chunks(text):
  splitter = RecursiveCharacterTextSplitter(chunk_size=5000, chunk_overlap=500)
  return splitter.split_text(text)

This function breaks the extracted text into overlapping chunks using LangChain’s RecursiveCharacterTextSplitter.

The chunk_size is set to 5000 characters, and each chunk overlaps the previous one by 500 characters. This ensures that if important context spans across two chunks, the LLM doesn’t lose meaning due to hard boundaries.

Example:

Let’s say you have the following 150-character text:

"In case of early termination, the lessee shall forfeit the security deposit. A 60-day written notice is mandatory for all cancellations."

With a chunk size of 80 and an overlap of 20, the chunks would be:

Chunk 1: "In case of early termination, the lessee shall forfeit the security deposit."
Chunk 2: "shall forfeit the security deposit. A 60-day written notice is mandatory"

This overlap ensures LLMs don’t miss context when answering questions later.

3.4 🧠 `get_vectorstore()` and `get_embeddings()`

🧾 Add these function below your get_text_chunks() function.

def get_vectorstore(chunks, provider, api_key):
  embedding = get_embeddings(provider, api_key)
  store = FAISS.from_texts(chunks, embedding)
  store.save_local(f"./data/{provider.lower()}_vector_store.faiss")
  return store

This function creates the actual FAISS vectorstore. It first retrieves the appropriate embedding function using get_embeddings(), applies it to each chunk, and stores the resulting vectors in a FAISS index saved locally.

def get_embeddings(provider, api_key=None):
  if provider.lower() == "groq":
    return HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
  elif provider.lower() == "gemini":
    return GoogleGenerativeAIEmbeddings(
      model="models/embedding-001",
      google_api_key=api_key
    )
  else:
    raise ValueError("Unsupported provider")

Since LangChain does not yet offer an official embedding model for Groq, we use a general-purpose HuggingFace embedding model called all-MiniLM-L6-v2, which works well for a wide range of semantic search tasks. For Gemini, LangChain provides official support for Google’s embedding API (models/embedding-001), which integrates seamlessly and is optimized for use with Gemini models.

4. 🔁 Auto-Reprocess on Provider Change

🧾 Place this right below the Submit button inside the same sidebar block.

      if model_provider != st.session_state.last_provider:
        st.session_state.last_provider = model_provider
        if st.session_state.pdf_files:
          with st.spinner("Reprocessing PDFs..."):
            process_and_store_pdfs(st.session_state.pdf_files, model_provider, api_key)
            st.toast("PDFs reprocessed successfully!", icon="🔁")

If the user changes the provider (e.g., from Groq to Gemini), the system automatically reprocesses existing PDFs using the new embedding model. This prevents mismatched embeddings and ensures consistency.

5. 🛠 Sidebar Tools

🧾 Still inside the sidebar but outside the config expander, add this.

    with st.expander("🛠️ Tools", expanded=False):
      col1, col2, col3 = st.columns(3)

      if col1.button("🔄 Reset"):
        st.session_state.clear()
        st.session_state.model_provider = "Select a model provider"
        st.rerun()

      if col2.button("🧹 Clear Chat"):
        st.session_state.chat_history = []
        st.session_state.pdf_files = None
        st.session_state.vector_store = None
        st.session_state.pdfs_submitted = False
        st.toast("Chat and PDF cleared.", icon="🧼")

      if col3.button("↩️ Undo") and st.session_state.chat_history:
        st.session_state.chat_history.pop()
        st.rerun()

The following tools allow users to reset, clear, or undo chat state, making the UI much more usable and fault-tolerant:

Reset: Clears everything and resets the dropdown
Clear Chat: Removes chat and uploaded data
Undo: Removes the last chat interaction only

6. 📎 Show Uploaded File List

🧾 Add this function below your process_and_store_pdfs() function.

def render_uploaded_files():
  pdf_files = st.session_state.get("pdf_files", [])
  if pdf_files:
    with st.expander("**📎 Uploaded Files:**"):
      for f in pdf_files:
        st.markdown(f"- {f.name}")

The render_uploaded_files() function shows the names of all submitted PDF files in a collapsible section. It gives users quick visual confirmation of which files are currently active in the chatbot.

We only call this function after the PDFs have been submitted and processed, using:

🧾 Place this outside the sidebar, right below the entire sidebar block in main().

if st.session_state.pdfs_submitted and st.session_state.pdf_files:
  render_uploaded_files()

This avoids showing the file list prematurely or when the files are not yet embedded, keeping the UI clean and relevant.

7. 📖 Show Chat History

🧾 Add this inside main() just after uploaded files list.

  for q, a, *_ in st.session_state.chat_history:
    with st.chat_message("user"):
      st.markdown(q)
    with st.chat_message("ai"):
      st.markdown(a)

This loop recreates all past questions and responses in chat-style bubbles.

8. ⚠️ Warn About Unsubmitted Files

🧾 Add this inside main() after chat history.

  if st.session_state.unsubmitted_files:
    st.warning("📄 New PDFs uploaded. Please submit before chatting.")
    return

This prevents the user from asking questions before submitting the newly uploaded PDFs, ensuring only processed documents are used for answering.

9. 💬 Chat Input and Answer Generation

🧾 Add this next inside main() after the unsubmitted check.

  if st.session_state.pdfs_submitted:
    question = st.chat_input("💬 Ask a Question from the PDF Files")
    if question:
      with st.chat_message("user"):
        st.markdown(question)
      with st.chat_message("ai"):
        with st.spinner("Thinking..."):
          try:
            docs = st.session_state.vector_store.similarity_search(question)
            chain = get_qa_chain(model_provider, model, api_key)
            output = chain(
              {"input_documents": docs, "question": question},
              return_only_outputs=True
            )["output_text"]
            st.markdown(output)
            pdf_names = [f.name for f in st.session_state.pdf_files]
            st.session_state.chat_history.append(
              (question, output, model_provider, model, pdf_names, datetime.now())
            )
          except Exception as e:
            st.error(f"Error: {str(e)}")
  else:
    st.info("📄 Please upload and submit PDFs to start chatting.")

This section renders the chat input only after PDFs have been successfully submitted. When a user asks a question, it performs a similarity search over the FAISS vector store to find relevant document chunks. It then sends those chunks and the user’s question to the selected LLM (Groq or Gemini) using a prompt chain, and returns a detailed answer. The result, along with metadata, is saved into session state for chat history and optional download.

🧠 9.1 `get_qa_chain()`

🧾 Add this function below your get_vectorstore() function.

def get_qa_chain(provider, model, api_key):
  prompt = PromptTemplate(
    template="""
    Answer the question as detailed as possible.
    If the question cannot be answered using the provided context, please say "I don't know."

    Context:
    {context}

    Question:
    {question}?

    Answer:
    """,
    input_variables=["context", "question"]
  )
  llm = ChatGroq(model=model, api_key=api_key) if provider.lower() == "groq" else ChatGoogleGenerativeAI(model=model, api_key=api_key)
  return load_qa_chain(llm, chain_type="stuff", prompt=prompt)

This utility creates a custom QA chain using LangChain’s load_qa_chain method. The chain is configured to respond strictly based on context, and not to hallucinate when the answer isn’t found. It supports both Groq and Gemini as LLM backends, selecting the appropriate one based on the provider.

10. 💾 Download Chat History

🧾 Add this function below your render_uploaded_files() function.

def render_download_chat_history():
  df = pd.DataFrame(
    st.session_state.chat_history,
    columns=["Question", "Answer", "Model", "Model Name", "PDF File", "Timestamp"]
  )
  with st.expander("**📎 Download Chat History:**"):
    st.sidebar.download_button(
      "📥 Download Chat History",
      data=df.to_csv(index=False),
      file_name="chat_history.csv",
      mime="text/csv"
    )

This function creates a downloadable CSV file containing the full chat history. It uses Pandas to format the data and adds a download button in the sidebar for users to save their conversations along with model and file metadata.

🧾 At the bottom of main(), add this.

  if st.session_state.chat_history:
    render_download_chat_history()

This checks if there’s any chat history and, if so, calls the utility function to render the download option.

💡 What This Does

If any chat has occurred, a new expander shows up in the sidebar.
Users can download a .csv file with all conversation metadata:
- Questions, Answers
- Model Provider and Model Name
- PDF files used
- Timestamp of each entry

✅ Final Behavior Example

A downloaded CSV might look like:

Question	Answer	Model	Model Name	PDF File	Timestamp
What’s the lease end date?	June 30, 2025	Groq	llama-3.1-8b-instant	lease.pdf	2025-07-04 14:02:00
Who signed the agreement?	John Doe	Groq	llama-3.1-8b-instant	lease.pdf	2025-07-04 14:03:15

11. 🎬 Launch the Script

🧾 Finally, call main() at the bottom of your file.

if __name__ == "__main__":
  main()

This standard Python entry point ensures the app runs when app.py is executed directly.

12. 🏁 How to Run the Code

Once everything is set up, running the app is simple. Just use:

streamlit run app.py

This command will start a local web server and open the chatbot in your browser - usually at http://localhost:8501.

From there, you’ll be able to:

Upload one or more PDFs
Choose a model provider (Groq or Gemini)
Enter your API key
Start asking questions based on your documents

🧠 If the browser doesn’t open automatically, just copy the localhost URL from the terminal and paste it into your browser.

💭 Final Thoughts

That’s it - our chatbot is ready to rock.

It reads PDFs, finds context using RAG, and gives relevant answers using your choice of LLM - all from a clean, modular setup that’s easy to extend.

Here are some ideas you can explore next:

🧠 Add memory so it remembers past messages
📁 Support multiple PDFs at once
🌐 Deploy it on the web (Streamlit Cloud or Hugging Face Spaces)
🔌 Try swapping in different LLMs like Claude or Mistral
🧪 Add advanced features like source highlighting or confidence scores

This isn’t just a chatbot - it’s a real-world template we can build upon to create context-aware AI tools. No retraining, no black-box magic. Just good engineering and the right tools.

So let’s fork it, build on it, break it, fix it - and make it ours.

Happy building. 🛠️

Building an AI Chatbot with LangChain, FastAPI & Streamlit

Zarrar Shaikh — Wed, 02 Jul 2025 18:33:07 +0000

In this comprehensive guide, we’ll build a robust and modular AI chatbot from scratch using powerful, free-to-use large language models (LLMs) like Groq and Gemini. We’ll integrate optional web search functionality to enhance the chatbot’s responses.

Throughout this project, we’ll learn:

Dynamically select AI models
Create responsive AI agents
Develop structured backend services
Design an intuitive frontend user interface

By the end of this guide, we’ll have a solid understanding of how to integrate AI models into practical applications and how to maintain code modularity, enabling easy extension and future-proofing of our projects.

Key Technologies Explained

LLM (Large Language Model): Advanced AI models capable of understanding context and generating human-like text. Read more
Groq: Provides extremely fast inference using open‑source models like LLaMA. It’s free to use and ideal for rapid prototyping. Learn more
Gemini: Google’s advanced LLM with strong performance in reasoning and dialogue tasks. Explore Gemini
LangChain: Simplifies the use of LLMs by providing tools for prompts, memory management, chaining models with various tools and APIs. LangChain documentation
LangGraph: A LangChain extension to structure and manage stateful AI agents, allowing complex decision‑making processes. LangGraph Overview
FastAPI: A modern, high‑performance Python framework for quickly building robust APIs. It’s intuitive, fast, and easy to learn. FastAPI docs
Streamlit: Enables us to build interactive and attractive web applications quickly and effortlessly using Python. Streamlit official site

Why Groq and Gemini and not OpenAI?

We chose Groq and Gemini instead of OpenAI primarily because OpenAI models are not available for free. Groq and Gemini, on the other hand, provide free access to advanced, capable language models like LLaMA and Google’s own Gemini models, making them perfect for experimenting, learning, and prototyping without worrying about burning through our pocket money and living like a hermit for the rest of the month.

Project Overview

System Design

Code Structure

You can find the source code here - Zlash65/agentic-ai-chatbot-example

agentic-ai-chatbot-example/
├── .env
├── requirements.txt
├── main.py                     # FastAPI entry point
├── agents/                     # Phase 1 - AI logic
│   ├── llm_provider.py         # Select LLM provider
│   ├── tools.py                # Tavily search tool
│   └── ai_agent.py             # Build LangGraph agent
├── backend/                    # Phase 2 - Backend API
│   ├── config.py               # Load env vars
│   ├── schema.py               # Pydantic schema
│   ├── router.py               # /chat endpoint
├── frontend/                   # Phase 3 - Streamlit UI
│   └── streamlit_app.py        # Streamlit chat UI

Phase 1 - AI Agent Configuration

`llm_provider.py`

This code snippet dynamically chooses the appropriate AI model provider (Groq or Gemini). We structured this code to be modular, allowing easy future addition of more LLM providers:

from langchain_groq import ChatGroq
from langchain_google_genai import ChatGoogleGenerativeAI
from backend.config import GROQ_API_KEY, GOOGLE_API_KEY

def get_llm(model_provider: str, model_name: str):
    if model_provider == "groq":
        if not GROQ_API_KEY:
            raise ValueError("GROQ_API_KEY is not set")
        return ChatGroq(model=model_name, api_key=GROQ_API_KEY)

    elif model_provider == "gemini":
        if not GOOGLE_API_KEY:
            raise ValueError("GOOGLE_API_KEY is not set")
        return ChatGoogleGenerativeAI(model=model_name, api_key=GOOGLE_API_KEY)
    else:
        raise ValueError(f"Invalid model provider: {model_provider}")

`tools.py`

We conditionally include a web search tool. This gives users the flexibility to enable or disable web searching as needed:

from langchain_community.tools.tavily_search import TavilySearchResults

def get_tools(allow_search: bool):
    tools = []
    if allow_search:
        tools.append(TavilySearchResults())
    return tools

`ai_agent.py`

This function is the core engine of our AI chatbot. It actually calls the LLM (Groq or Gemini), optionally enables tools like web search, and returns the AI-generated message:

from langgraph.prebuilt import create_react_agent
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
from agents.llm_provider import get_llm
from agents.tools import get_tools

def get_response_from_agent(
    model_provider: str,
    model_name: str,
    allow_search: bool,
    user_messages: list[str],
    system_prompt: str
) -> str:
    llm = get_llm(model_provider, model_name)
    tools = get_tools(allow_search)

    agent = create_react_agent(model=llm, tools=tools)

    state = {
        "messages": [
            SystemMessage(content=system_prompt),
            HumanMessage(content=user_messages[-1])
        ]
    }

    response = agent.invoke(state)
    messages = response.get("messages", [])

    ai_messages = [msg.content for msg in messages if isinstance(msg, AIMessage)]
    return ai_messages[-1] if ai_messages else "No response from AI"

Why only the last user message?

We’re simulating a stateless, simple interaction to keep responses fast and reduce LLM token usage.
Passing all previous messages would allow the LLM to remember context and have a longer conversation memory.
We can extend this later by feeding in more HumanMessage and AIMessage history for memory support.

Phase 2 - Backend Setup (FastAPI)

`config.py`

Securely load environment variables from the .env file:

from dotenv import load_dotenv
import os

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
GROQ_API_KEY = os.getenv("GROQ_API_KEY")
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
TAVILY_API_KEY = os.getenv("TAVILY_API_KEY")

`schema.py`

Define structured input from the frontend:

from pydantic import BaseModel
from typing import List

class ChatRequest(BaseModel):
    model_name: str
    model_provider: str
    system_prompt: str
    messages: List[str]
    allow_search: bool

`router.py`

from fastapi import APIRouter
from backend.schema import ChatRequest
from agents.ai_agent import get_response_from_agent

router = APIRouter()

ALLOWED_PROVIDERS = ["groq", "gemini"]
ALLOWED_MODELS = [
    "llama-3.1-8b-instant",
    "llama3-70b-8192",
    "gemini-2.0-flash",
    "gemini-2.5-flash"
]

@router.post("/chat")
def chat(request: ChatRequest):
    model_provider = request.model_provider.lower()
    model_name = request.model_name
    allow_search = request.allow_search

    if model_provider not in ALLOWED_PROVIDERS:
        raise ValueError(
            f"Invalid model provider: {request.model_provider}. "
            f"Must be one of {ALLOWED_PROVIDERS}."
        )
    if model_name not in ALLOWED_MODELS:
        raise ValueError(
            f"Invalid model name: {model_name}. "
            f"Must be one of {ALLOWED_MODELS}."
        )

    response = get_response_from_agent(
        model_provider=model_provider,
        model_name=model_name,
        allow_search=allow_search,
        user_messages=request.messages,
        system_prompt=request.system_prompt
    )

    return {"response": response}

ALLOWED_PROVIDERS & ALLOWED_MODELS - safety checks to block unsupported data.
We define a POST endpoint at /chat; FastAPI auto‑validates the body against ChatRequest.
The validated data flows into get_response_from_agent(), which handles model init, tools, and response building.

Phase 3 - Frontend Setup (Streamlit)

`streamlit_app.py`

Find the full code here.

Dynamic Model Selection

To prevent invalid provider‑model combinations, we dynamically update model selections based on the chosen provider:

MODEL_OPTIONS = {
    "Groq": ["llama-3.1-8b-instant", "llama3-70b-8192"],
    "Gemini": ["gemini-2.0-flash", "gemini-2.5-flash"]
}

model_provider = st.selectbox("🔌 Model Provider", list(MODEL_OPTIONS.keys()))
model_choices = MODEL_OPTIONS[model_provider]

if "model_name" in st.session_state and st.session_state.model_name not in model_choices:
    st.session_state.model_name = model_choices[0]

model_name = st.selectbox("🧠 Model Name", model_choices)

Running Our Application

# Terminal tab 1 - backend
source .venv/bin/activate
uvicorn main:app --reload

# Terminal tab 2 - frontend
source .venv/bin/activate
streamlit run frontend/streamlit_app.py

Final Chatbot in Action

🧠 Wrapping Up

By following this detailed guide, we’ve learned how to:

Dynamically integrate multiple powerful AI providers
Construct structured backend services using FastAPI
Create user‑friendly interfaces with Streamlit
Keep our code modular and easy to extend

🔧 What We Can Explore Next

Now that we have a solid foundation, here are a few ideas to level up:

Add context memory - retain previous questions and answers for richer conversations.
Enable streaming responses - stream long responses token by token for a smoother user experience.
Plug in more tools - let the AI do math, call APIs, or browse docs via LangChain tools.
Deploy our app - share your chatbot with the world on Render, Railway, or Hugging Face Spaces.

🚀 Final Thoughts

This project isn’t just a chatbot - it’s a starter kit for any LLM‑powered product. Whether you’re prototyping an assistant, automating research, or experimenting with multi‑agent workflows, this architecture gives you room to grow.

So tweak it, break it, improve it - and most importantly, have fun while learning.

Happy building! 💻✨

Build a Gen-AI Dockerfile Generator with AWS Bedrock, Lambda and Terraform

Zarrar Shaikh — Wed, 02 Jul 2025 13:40:19 +0000

Let’s be honest - AWS is powerful but not always the friendliest thing to set up. What started as a simple curiosity - “Can I generate a Dockerfile using an Amazon Bedrock?” - quickly turned into a full-blown, end-to-end project.

In the process, I learned how to:

Set up and use Amazon Bedrock
Create an AWS Lambda function that talks to an LLM
Use Terraform to build and tear down infrastructure automatically
Deal with Bedrock’s quirks and debug confusing prompt behavior

If you’re new to Bedrock, this project is a great way to learn how to actually use it - not just browse through the documentation. We’ll build a real service that accepts a programming language name via an HTTP request and returns a Dockerfile generated by an LLM. All of this is done using AWS Lambda, API Gateway, S3, and Bedrock - with Terraform managing the setup.

Here’s what we’re building:

We’ll go step by step. You don’t need to know everything up front. By the end, you’ll understand how to:

Configure Amazon Bedrock and request model access
Write a Python-based Lambda function that calls Bedrock
Store and retrieve files in S3 securely
Wire everything together with API Gateway
Use Terraform to deploy and manage all of it

Let’s get started.

Create an AWS Account (Skip if already have one)

Go to aws.amazon.com and create an account.
Once you’re in, head to the AWS Management Console.

Enable Bedrock and Request Model Access

In the search bar, type Amazon Bedrock and open it. Choose a region that supports Bedrock models - I used ap-south-1.

In the left sidebar, click Model access and request access to Meta Llama 3 8B Instruct. That’s the model we’ll use to generate the Dockerfile.

Access might take a few minutes to be granted.

You can also use any other model of your choice. I chose this model because of its low price and because I have been using Llama model for local LLM.

Install Terraform CLI

We’ll use Terraform to set up all the infrastructure - Lambda, API Gateway, IAM roles, and so on. Install Terraform CLI locally:

On macOS:

brew tap hashicorp/tap
brew install hashicorp/tap/terraform

Or follow the instructions for your OS: Install Terraform

Create a Terraform Cloud Account

Go to Terraform Cloud and sign up.

Create a new organization. I named mine zlash65-ai-ml.
Inside the org, create a workspace named aws-bedrock-example.

We’ll connect this workspace to GitHub later to deploy infrastructure from code.

Set Up AWS Credentials for Terraform

In the AWS Console:

Go to IAM > Users
Create a user called Terraform User
Enable programmatic access
Attach the AdministratorAccess policy

This gives Terraform permission to create any AWS resource
If you’re a paranoid person (and rightly so), you can limit the scope to only services you need - like Lambda, S3, IAM, Bedrock, etc.

Now back in Terraform Cloud:

Go to Organization Settings > Variable Sets
Click on Create organization variable set button
Add:
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY

Attach the variable set to the aws-bedrock-example workspace.

Why Terraform?

If you’ve never used Terraform before, think of it like this: instead of setting everything up manually in the AWS Console - clicking through pages, configuring settings, creating roles - you write everything you need in a few .tf files. Terraform reads those files and builds the infrastructure for you.

That might sound like extra work upfront, but here’s why it’s actually a huge win:

Trackability: Everything is written in code. You can track it in version control, collaborate with others, and roll back changes if something breaks.
Reusability: Want to recreate the same setup for another region or project? Just tweak a variable and redeploy.
Automation: With a single command, Terraform provisions AWS services like Lambda, API Gateway, S3, IAM roles, and more.
Clean Teardown: Done with your experiment? A simple terraform destroy removes every resource you created. No more forgetting something and getting billed for it.

For this project specifically, Terraform helps us avoid a manual setup that would’ve involved:

Creating a Lambda function and uploading a custom layer
Writing IAM roles and policies manually
Setting up an S3 bucket with the correct permissions
Building an API Gateway that connects to the Lambda function
Wiring it all together

By using Terraform, we’re making this setup:

Faster to build
Easier to understand
Simpler to reproduce and delete

If you’re building anything that involves multiple AWS services, Terraform isn’t just a nice-to-have - it’s a necessity.

Instead of creating Lambda, S3, and API Gateway one by one, we’ll just run a command and let Terraform do the work.

Set Up Project Structure

Create a GitHub repo called aws-bedrock-example. Then clone it:

git clone https://github.com/YOUR_USERNAME/aws-bedrock-example.git
cd aws-bedrock-example

Create this structure:

aws-bedrock-example/
├── .gitignore
├── README.md
├── lambda/
└── terraform/

Add .gitignore and .terraformignore with Python and Terraform-specific ignores.

Write Lambda Code

We’ll use Python and boto3 to talk to Amazon Bedrock.

Add the following to requirements.txt:

boto3

Lambda Code: Unpacked

The app.py file is the heart of our project.

Here's a breakdown of what each function does:

✅ generate_dockerfile(language: str) -> str

This function creates a structured prompt and sends it to Amazon Bedrock to generate a Dockerfile based on the language input.

We build a plain prompt that tells the model exactly what to return. Avoiding extra formatting like chat headers helps get a usable output. We’ll use a simplified prompt format (more on that later) -

def generate_dockerfile(language: str) -> str:
  formatted_prompt = f"""
  ONLY generate an ideal Dockerfile for {language} with best practices. Do not provide any explanation.
  Include:
  - Base image
  - Installing dependencies
  - Setting working directory
  - Adding source code
  - Running the application
  """

These parameters control the model’s output length and creativity -

body = {
    "prompt": formatted_prompt,
    "max_gen_len": 1024,
    "temperature": 0.5,
    "top_p": 0.9
  }

This block calls Bedrock, reads the JSON response, and extracts the generated Dockerfile -

try:
    bedrock = boto3.client("bedrock-runtime", region_name="ap-south-1",
                            config=botocore.config.Config(read_timeout=300, retries={"max_attempts": 3}))
    response = bedrock.invoke_model(body=json.dumps(body), modelId="meta.llama3-8b-instruct-v1:0")
    response_content = response.get("body").read().decode("utf-8")
    response_data = json.loads(response_content)
    print(response_data)
    dockerfile = response_data["generation"]
    return dockerfile
  except Exception as e:
    print("Error generating Dockerfile:", e)
    return ""

✅ save_dockerfile(s3_key, s3_bucket, dockerfile)

This function uploads the Dockerfile to S3 and returns a presigned URL.

Uploads the Dockerfile into a folder named dockerfiles/ inside the specified bucket -

def save_dockerfile(s3_key: str, s3_bucket: str, dockerfile: str, expiry: int = 3600) -> str:
  s3 = boto3.client("s3", region_name="ap-south-1", endpoint_url="https://s3.ap-south-1.amazonaws.com")
  try:
    s3.put_object(
      Body=dockerfile,
      Bucket=s3_bucket,
      Key=s3_key,
      ContentType="text/plain"
    )
    print("Dockerfile saved to S3")

Returns a temporary, secure URL so anyone can download the file without needing AWS credentials -

return s3.generate_presigned_url("get_object", Params={"Bucket": s3_bucket, "Key": s3_key}, ExpiresIn=expiry)

✅ handler(event, context)

This is the entrypoint Lambda uses when triggered by API Gateway.

Extracts the language from the incoming POST body -

  def handler(event, context):
    event = json.loads(event["body"])
    language = event["language"]

If the Dockerfile was generated successfully, we:

Store it in S3 with a timestamped key
Generate a presigned URL
Return a 200 status with the URL
If something failed, we return a 500 error

dockerfile = generate_dockerfile(language)
  current_time = datetime.now().strftime("%H-%M-%S")
  s3_bucket = "zlash65-aws-bedrock-example"
  if dockerfile:
    s3_key = f"dockerfiles/{language}-{current_time}.Dockerfile"
    dockerfile_url = save_dockerfile(s3_key, s3_bucket, dockerfile)
    return {
      "statusCode": 200,
      "body": json.dumps({
        "message": "Dockerfile generated",
        "url": dockerfile_url
      })
    }
  else:
    return {
      "statusCode": 500,
      "body": json.dumps({"message": "Failed to generate Dockerfile"})
    }

Create Lambda Build Script

In the root directory, add build.sh:

#!/bin/bash
rm -rf lambda_build lambda.zip
mkdir -p lambda_build
pip3 install -r lambda/requirements.txt -t lambda_build
cp lambda/app.py lambda_build
cd lambda_build && zip -r ../lambda.zip .
cd ..

Run:

chmod +x build.sh
./build.sh

This script installs dependencies and packages the Lambda function into a zip file for Terraform to deploy.

Write Terraform Code

Inside the terraform/ folder, create the following files and fill them with the linked code:

backends.tf - Connects to your Terraform Cloud workspace.

providers.tf - Sets the AWS provider and region.

variables.tf - Stores common variables like region.

outputs.tf - Outputs the API URL and Lambda function name.

bedrock.tf - This code sets up logging for Amazon Bedrock model invocations using CloudWatch Logs. Basically, if your Bedrock model fails or returns an unexpected response and you don’t see anything in the Lambda logs - this setup helps capture what’s happening behind the scenes in Bedrock itself.

`main.tf`

This file is the central piece of infrastructure setup for the entire project.

🔧 What This File Does Overall

This Terraform file provisions everything needed to expose our Lambda function as an API, give it permissions, and wire it up to Amazon Bedrock and S3. It’s the core infrastructure-as-code file for the project.

Creates a Lambda function that:
- Invokes Amazon Bedrock to generate Dockerfiles
- Stores the output in an S3 bucket
Sets up an S3 bucket to:
- Store the generated Dockerfiles
- Allow presigned access for download
Configures API Gateway to:
- Expose the Lambda as a public-facing HTTP POST endpoint (/generate-dockerfile)
Grants IAM permissions to the Lambda function so it can:
- Write logs to CloudWatch
- Read and write objects in S3
- Call Bedrock models via bedrock:InvokeModel permission

In short: this file is the heart of our project infrastructure. Once applied, we get a fully working, callable API that uses AI (via Bedrock) to return a Dockerfile and give a downloadable link via S3.

Format everything:

cd terraform
terraform fmt

Push Code to GitHub and Deploy

Commit and push your code to GitHub.

git add .
git commit -m "feat: lambda and infra code for aws-bedrock-example"

Then in Terraform Cloud:

Connect the repo to your aws-bedrock-example workspace
Trigger a new run
Review and Confirm & Apply

Terraform will create everything: Lambda, Gateway, S3, permissions - all in one go.

Test Using Postman

Once deployed, Terraform will show the API Gateway URL in outputs.

In Postman:

Set method: POST
URL: API Gateway URL from Terraform output
Body: {"language": "node"}
Click Send

If everything went well, you’ll get an Presigned URL pointing to your generated Dockerfile in S3.

Debugging: The Prompt Format Got Me

This part tripped me up for an hour or some.

I followed this official example and wrote my prompt like this:

formatted_prompt = f"""
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
ONLY Generate an ideal Dockerfile for {language}...
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

Looks right, but it didn’t work.

My Lambda ran fine. No errors. But the response from Bedrock was -

{‘generation’: ‘’, ‘prompt_token_count’: 64, ‘generation_token_count’: 1, ‘stop_reason’: ‘stop’}

Turns out this prompt format is for chat-based streaming. What I needed was a simple single-shot prompt:

formatted_prompt = f"""
ONLY generate an ideal Dockerfile for {language} with best practices. Do not provide any explanation.
Include:
- Base image
- Installing dependencies
- Setting working directory
- Adding source code
- Running the application
"""

Once I removed all the fancy tokens, it worked instantly.

This is also why we enabled Bedrock logging in bedrock.tf. If Lambda logs don’t help, Bedrock logs will.

Final Folder Structure

aws-bedrock-example/
├── .gitignore
├── .terraformignore
├── build.sh
├── lambda/
│   ├── app.py
│   └── requirements.txt
├── lambda.zip
├── README.md
└── terraform/
    ├── backends.tf
    ├── bedrock.tf
    ├── main.tf
    ├── outputs.tf
    ├── providers.tf
    └── variables.tf

🧹 Cleanup AWS Resources

Now that our project is complete, it’s time to clean up. And because we used Terraform Cloud, deleting everything is just as easy as provisioning it.

No manual AWS cleanup. No guessing what resources you created. No surprise charges a month later. Just a few clicks.

To destroy all AWS resources via Terraform Cloud:

Go to Terraform Cloud
Open your organization and the workspace you used (e.g., zlash-ai-ml > aws-bedrock-example)
Go to Workspace Settings > Destruction and Deletion
Click Queue destroy plan
Wait for the plan to complete, then click on Confirm and apply button to remove the resources from AWS

Terraform Cloud will now safely tear down every AWS resource it created - Lambda function, S3 bucket, API Gateway, IAM roles - everything.

⚠️ Important: You won’t be able to recover anything once it’s destroyed, so double-check before proceeding.

All the code is here: Zlash65/aws-bedrock-example

It’s working, simplified, and ready to be cloned.

I hope this gives you a clear, real-world path to start building with Amazon Bedrock. If you run into weird issues—start with the prompt. Always the prompt.

Forem: Zarrar Shaikh

PostgreSQL MCP Server with Built-in SSH Tunneling

Zlash65 / postgresql-ssh-mcp

PostgreSQL MCP server with SSH tunneling for Claude Desktop and ChatGPT

PostgreSQL SSH MCP Server

Features

Architecture

Quick Start

Claude Desktop (STDIO)

ChatGPT (Streamable HTTP)

Table of Contents

Why SSH Tunneling Matters

Architecture

Deployment Options

Method 1: Claude Desktop (Local STDIO)

Step 1: Find Config File

Step 2: Add MCP Server

Step 3: Restart Claude Desktop

Remote HTTP Server Deployment

Prerequisites

Step 1: Configure DNS

Step 2: Install Dependencies

Step 3: Configure nginx

Step 4: Obtain SSL Certificate

Step 5: Install MCP Server

Step 6: Configure Auth0

Step 7: Configure Environment

Step 8: Create systemd Service

Step 9: Verify Deployment

Method 2: Connect Claude Desktop (via Connectors)

Method 3: Connect ChatGPT

Available Tools

Query

Schema

Monitoring

Security

Troubleshooting

Resources

Fullstack RAG PDFBot - From Prototype to Production-Ready-ish

🚀 What’s New in Iteration 3?

✅ Why this split?

🧱 Our Project Structure

📂 client/ - Streamlit Frontend

📂 server/ - FastAPI Backend

🔄 What Changed from Iteration 2

🔍 Why We Switched to TokenTextSplitter

🔬 Simple Example

RecursiveCharacterTextSplitter

TokenTextSplitter

✨ Better User Experience

⚡ Why a Production-Ready Backend Matters

🌐 How the Frontend Talks to the Backend

🏗️ Why We Still Use Streamlit for the Frontend

📦 Architecture Benefits at a Glance

🔁 Recap

📦 Source Code

💭 Final thoughts

Refactoring RAG PDFBot: Modular Design with LangChain, Streamlit and ChromaDB

🧱 What's New in the Modular Version

🔁 From FAISS to ChromaDB

Code Snippet: ChromaDB Setup

🔍 VectorStore Inspector

Code Snippet: Vector Inspector

Example:

🧠 Improved Prompt & Chain Logic

Code Snippet: New Chain Logic

Why it’s better:

🧩 UI & Handler Logic: Cleaner, Separated, Smarter

🎛️ Smarter UI Behavior with disabled Components

Previous Version

Example

Current Version

Why this matters:

Example:

🚀 Want to Try It?

💭 Final Thoughts

Building a RAG-powered PDF Chatbot with LangChain, Streamlit and FAISS

Some useful ways we can apply this:

🏗️ What We're Building

Working Demo

📂 `client/` - Streamlit Frontend

📂 `server/` - FastAPI Backend

🎛️ Smarter UI Behavior with `disabled` Components

`requirements.txt`

`app.py`

🔍 Full Walkthrough of `main()`

3.1 ⚙️ Utility: `process_and_store_pdfs()`

3.2 📄 `get_pdf_text()`

3.3 ✂️ `get_text_chunks()`

3.4 🧠 `get_vectorstore()` and `get_embeddings()`

🧠 9.1 `get_qa_chain()`

`llm_provider.py`

`tools.py`

`ai_agent.py`

`config.py`

`schema.py`

`router.py`

`streamlit_app.py`