Forem: Oswin Heman-Ackah

Building a chatbot with Python (Backend)

Oswin Heman-Ackah — Tue, 16 Sep 2025 08:37:38 +0000

Documentation: Backend (backend.py)
This script processes PDF documents into vector embeddings and builds a FAISS index for semantic search.
It is the offline preprocessing pipeline for the Indaba RAG chatbot.

Key Responsibilities

Load PDFs from a folder.
Extract raw text using PyPDF2.
Chunk large documents into smaller overlapping text segments.
Convert chunks into embeddings using SentenceTransformers.
Build and persist a FAISS index for similarity search.
Save the raw chunks for later retrieval.

Step-by-Step Breakdown

Imports and Setup

import os
import pickle
import numpy as np
from PyPDF2 import PdfReader
from sentence_transformers import SentenceTransformer
import faiss

os → file system operations.
pickle→ save preprocessed chunks.
numpy → numerical array handling.
PyPDF2→ extract text from PDF files.
SentenceTransformer → embedding model (all-MiniLM-L6-v2).
faiss → efficient similarity search.

Constants:

embedder = SentenceTransformer("all-MiniLM-L6-v2")
INDEX_FILE = "faiss_index.bin"
CHUNKS_FILE = "chunks.pkl"

embedder is the model instance; loading this, downloads model weights (first run may take time).
INDEX_FILE and CHUNKS_FILE defines where to save FAISS index and chunks.

Function to Load PDF

def load_pdf(file_path):
    pdf = PdfReader(file_path)
    text = ""
    for page in pdf.pages:
        text += page.extract_text() + "\n"
    return text

Reads a PDF file with PyPDF2.
Extracts text page by page.
Returns the full document text as a string.

Function for Text Chunking

def chunk_text(text, chunk_size=500, overlap=100):
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunks.append(text[start:end])
        start += chunk_size - overlap
    return chunks

Splits the text into chunks of chunk_size characters, shifting by chunk_size - overlap each time (so consecutive chunks overlap by overlap characters).
Representation:
Chunk 1 = 0–500
Chunk 2 = 400–900 (100 overlap)

Full Pipeline Info
Walkthrough of the function:

1.Collect chunks for all PDFs:

pdf_folder = "vault"   
 #This is the folder/path pdfs are stored in.

all_chunks = []
for filename in os.listdir(pdf_folder):
    if filename.endswith(".pdf"):
        text = load_pdf(os.path.join(pdf_folder, filename))
        chunks = chunk_text(text)
        all_chunks.extend(chunks)

Extracts text and chunks for each PDF and keeps all chunks in all_chunks list as strings.

Note: Order matters (index ids align with order).

2.Embed chunks:

vectors = embedder.encode(all_chunks)
   vectors = np.array(vectors)

embedder.encode(list_of_texts) returns a list/array of vectors. By default, it returns float32 or float64 depending on version — FAISS expects float32. In practice it's safer to force dtype float32.
Important: embedding all chunks at once can OOM if you have many chunks. Use batching:

 vectors = embedder.encode(all_chunks, batch_size=32, show_progress_bar=True)
 vectors = np.array(vectors).astype('float32')

3.Create FAISS index:

dim = vectors.shape[1]
   index = faiss.IndexFlatL2(dim)
   index.add(vectors)

(Basic)

Creates a FAISS index and adds all chunk vectors into the index.

(Technical)
IndexFlatL2 = exact (brute-force) nearest neighbor search using L2 distance. Works for small-to-medium collections.

Pros: simple and exact.
Cons: slow on large collections.

The index.add(vectors) adds vectors in the same order as all_chunks. FAISS internal ids = 0..N-1 in that order — that’s how you map back to chunks.

4.Save index and chunks:

  faiss.write_index(index, INDEX_FILE)
   with open(CHUNKS_FILE, "wb") as f:
       pickle.dump(all_chunks, f)

Saves FAISS index to faiss_index.bin.
Saves chunks (raw text) to chunks.pkl.

These files are later loaded by the Streamlit frontend on runtime.

How to run this script

make sure your PDFs are in docs/

python -m backend

Building a Chatbot with Python (Frontend)

Oswin Heman-Ackah — Tue, 16 Sep 2025 03:34:38 +0000

Documentation: Frontend (frontend.py)

This file defines the Streamlit-based frontend for the Indaba Retrieval-Augmented Generation (RAG) chatbot.
It provides the user interface, manages queries, retrieves relevant chunks from the FAISS index, and generates answers using the Groq LLM API.
It also applies a custom CSS theme for a modern UI.

Key Responsibilities

Load the FAISS index and pre-processed text chunks.
Take user input (questions) through a chat-like form.
Retrieve the most relevant chunks using semantic search.
Pass retrieved chunks into the Groq-powered LLM to generate answers.
Display responses in a styled Streamlit interface.
Provide a button to clear chat history.

Step-by-Step Breakdown
1. Imports and Setup


import streamlit as st
import pickle
import faiss
from sentence_transformers import SentenceTransformer
from groq import Groq
import os
# from dotenv import load_dotenv

# load_dotenv()

What is happening:

streamlit→ UI framework.
pickle→ loads pre-saved chunks.
faiss → similarity search engine for embeddings.
SentenceTransformer → embedding model (all-MiniLM-L6-v2).
Groq → client to access LLM API.
os → (optional) environment variable access.

Unlike local.env use, API keys are pulled from st.secrets for deployment safety. And that is why the dotenv variable is commented out.

2. Load FAISS Index and Chunks


INDEX_FILE = "faiss_index.bin"
CHUNKS_FILE = "chunks.pkl"
embedder = SentenceTransformer("all-MiniLM-L6-v2", device="cpu")

index = faiss.read_index(INDEX_FILE)
with open(CHUNKS_FILE, "rb") as f:
    chunks = pickle.load(f)

Loads FAISS index (vector database).
Loads preprocessed document chunks (chunks.pkl).
Ensures embeddings match the index by using the same model.
Forces CPU for compatibility on Streamlit Cloud.

3. Initialize Groq Client
client = Groq(api_key=st.secrets["grok"]["api_key"])

Retrieves API key from Streamlit secrets.
Sets up Groq client for LLM queries.

4. Semantic Search Function

def search_index(query, k=10):
    q_vec = embedder.encode([query])
    D, I = index.search(q_vec, k)
    return [chunks[i] for i in I[0]]

Encodes the query into a vector.
Searches FAISS for the top k most relevant chunks.
Returns those chunks for answer generation.

5. LLM Answer Generation

def generate_answer(question, context_chunks):
    context = "\n\n".join(context_chunks)
    prompt = (
        f"Answer the question based on the context provided. "
        "If the question is not related to the context in any way, do NOT attempt to answer. "
        "Instead, strictly reply: 'My knowledge base does not have information about this. Please contact the technical team.'\n\n"
        f"Context: {context}\n\nQuestion: {question}\nAnswer:"
    )
    response = client.chat.completions.create(
        messages=[{"role": "user", "content": prompt}],
        model="llama-3.3-70b-versatile",
    )
    return response.choices[0].message.content.strip()

Builds a RAG prompt with:

Retrieved context.
Question.

Sends to Groq’s LLaMA-3.3-70B model.
Returns a clean answer.
Enforces that if no context exists → chatbot says:
“My knowledge base does not have information about this. Please contact the technical team.”

Custom Chat UI (CSS)

st.markdown(
    """
<style>
...
</style>
""",
    unsafe_allow_html=True,
)

Dark theme with neon-blue highlights.
Styles input box, buttons, and retrieved chunks.
Enhances readability and adds a polished UI feel.

7. Streamlit UI: Title & Instructions

st.title("🤖 Indaba")
st.write("Ask questions based on Discrete Mathematics.")

Displays app title.
Provides short instructions to user.

8. Question Input Form

with st.form(key="chat_form", clear_on_submit=True):
    question = st.text_input("Your question:", key="question_input")
    submit_button = st.form_submit_button("Send")

Input form for user questions.
Clears text box on submit.
Controlled by a Send button.

9. Main Chat Logic

if submit_button and question:
    retrieved = search_index(question)
    answer = generate_answer(question, retrieved)

    st.markdown("### 🤖 Answer")
    st.write(answer)

On submit:

Retrieves relevant chunks.
Generates answer from Groq LLM.
Displays result under “🤖 Answer”.

# Clear Chat Button

if st.button("Clear Chat"):
    st.session_state.messages = []
    st.rerun()

Resets chat state.
Reruns app for a fresh session.

Workflow (Frontend)

User submits a question.
Query is embedded and searched in FAISS.
Top chunks are retrieved.
Chunks + question are passed into LLM.
LLM generates an answer based only on retrieved knowledge.
Answer is displayed in a styled interface.
User can clear chat anytime.

Other Notes

Frontend relies on the backend (index_docs.py) having run beforehand. It doesn’t rebuild the index.
Chat memory is session-based only (clears on refresh).
Environment variable GROQ_API_KEY must be set in when running locally in the:
Local .env file. Otherwise, it is set in streamlit secrets as:
Streamlit Secrets ["grok"]["api_key"] during deployment.