<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Soham Sharma</title>
    <description>The latest articles on Forem by Soham Sharma (@sohamactive).</description>
    <link>https://forem.com/sohamactive</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3410188%2F7172c825-50d0-469a-8c70-6e8975334e98.jpeg</url>
      <title>Forem: Soham Sharma</title>
      <link>https://forem.com/sohamactive</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/sohamactive"/>
    <language>en</language>
    <item>
      <title>From Hallucinations to Grounded AI: Building a Gemini RAG System with Qdrant</title>
      <dc:creator>Soham Sharma</dc:creator>
      <pubDate>Wed, 04 Mar 2026 16:48:16 +0000</pubDate>
      <link>https://forem.com/sohamactive/from-hallucinations-to-grounded-ai-building-a-gemini-rag-system-with-qdrant-ae5</link>
      <guid>https://forem.com/sohamactive/from-hallucinations-to-grounded-ai-building-a-gemini-rag-system-with-qdrant-ae5</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/mlh-built-with-google-gemini-02-25-26"&gt;Built with Google Gemini: Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built with Google Gemini
&lt;/h2&gt;

&lt;p&gt;Large Language Models are powerful, but they still struggle with one major issue — hallucinations.&lt;/p&gt;

&lt;p&gt;While building AI assistants, I often found that models could generate answers that sounded convincing but were not actually grounded in real data. That led me to explore &lt;strong&gt;Retrieval-Augmented Generation (RAG)&lt;/strong&gt; and build a system that allows Gemini to answer questions using real documents instead of guesses.&lt;/p&gt;




&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;Large Language Models are incredibly powerful, but they have a well-known limitation: they can generate answers that sound convincing but are actually wrong. This behavior is called &lt;strong&gt;AI hallucination&lt;/strong&gt;, where a model produces fluent text that is not grounded in real facts or evidence.&lt;/p&gt;

&lt;p&gt;These hallucinations don’t happen randomly — they usually occur because of structural limitations in how LLMs work.&lt;/p&gt;

&lt;p&gt;Some common causes include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Limited context window&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
LLMs can only “remember” a fixed number of tokens in a conversation. When the context becomes too long, earlier information may drop out of the window, causing the model to lose important instructions or details.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Long or complex documents&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
When documents are very large, the model may struggle to reason over the entire content and can miss dependencies between different parts of the text.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Outdated training knowledge&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
LLMs rely on training data collected at a specific point in time. If new information appears after that, the model may provide answers based on stale or incomplete knowledge.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Probabilistic text generation&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Language models generate responses by predicting the most likely next token rather than verifying facts, which can lead to confident but incorrect outputs.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm1l2kcwq5uz3gqftck43.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm1l2kcwq5uz3gqftck43.png" alt="Illustration showing AI hallucination and developer frustration"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Why Retrieval-Augmented Generation (RAG)
&lt;/h3&gt;

&lt;p&gt;For applications like &lt;strong&gt;document search, knowledge assistants, or research tools&lt;/strong&gt;, these limitations become a serious problem. Users need answers that are grounded in real documents, not guesses.&lt;/p&gt;

&lt;p&gt;This challenge led me to explore &lt;strong&gt;Retrieval-Augmented Generation (RAG)&lt;/strong&gt; — a technique that helps language models answer questions using real data instead of relying only on what they remember.&lt;/p&gt;

&lt;p&gt;Instead of relying only on the model’s training data, a RAG system first retrieves relevant information from external documents and then uses that information as context when generating an answer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F66jn26ro5dwbhls8ct0g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F66jn26ro5dwbhls8ct0g.png" alt="Before vs After illustration showing LLM vs RAG system"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The idea is simple: rather than asking the model to rely purely on memory, we give it access to the correct information at the moment it generates the response.&lt;/p&gt;

&lt;p&gt;By grounding responses in retrieved documents, RAG systems help:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reduce hallucinations&lt;/li&gt;
&lt;li&gt;provide answers based on real data&lt;/li&gt;
&lt;li&gt;work with private or domain-specific knowledge&lt;/li&gt;
&lt;li&gt;keep information up-to-date without retraining the model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhsawafeht1qgaba73sgo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhsawafeht1qgaba73sgo.png" alt="Visual explanation of the RAG pipeline"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  System Architecture
&lt;/h3&gt;

&lt;p&gt;The system is built as a &lt;strong&gt;Retrieval-Augmented Generation (RAG) pipeline&lt;/strong&gt; using &lt;strong&gt;FastAPI&lt;/strong&gt;, &lt;strong&gt;Google Gemini (for both LLMs and embeddings)&lt;/strong&gt;, and &lt;strong&gt;Qdrant (as the vector database)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Users upload documents, which are processed into embeddings and stored in a vector database. When a query is made, relevant document chunks are retrieved and used as context for Gemini to generate a grounded response.&lt;/p&gt;




&lt;h3&gt;
  
  
  Core Components
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Frontend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;HTML, CSS, JavaScript&lt;/td&gt;
&lt;td&gt;User interface for uploading PDFs and interacting with the chatbot&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;API Server&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;FastAPI&lt;/td&gt;
&lt;td&gt;Handles routes, request processing, and serves the frontend&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Embedding Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gemini &lt;code&gt;gemini-embedding-001&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Converts queries and documents into vector embeddings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vector Database&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Qdrant&lt;/td&gt;
&lt;td&gt;Stores embeddings and retrieves similar document chunks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gemini Flash models&lt;/td&gt;
&lt;td&gt;Generates answers based on retrieved context&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  Backend Modules
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Module&lt;/th&gt;
&lt;th&gt;File&lt;/th&gt;
&lt;th&gt;Responsibility&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;API Layer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;main.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Defines endpoints (&lt;code&gt;/upload&lt;/code&gt;, &lt;code&gt;/search&lt;/code&gt;, &lt;code&gt;/chat&lt;/code&gt;) and initializes services&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Document Ingestion&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ingest.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Extracts PDF text, cleans it, and splits it into overlapping chunks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Embeddings&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;embeddings.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Generates embeddings for queries and document chunks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vector DB Utility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;qdrant_utils.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Handles Qdrant connection and collection initialization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Search Pipeline&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;search.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Performs retrieval and generates answers using Gemini&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Chat Pipeline&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;chat.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Implements conversational RAG with memory and streaming responses&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  Data Flow
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Process&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;User uploads a PDF document&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Text is extracted and split into chunks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Chunks are converted into embeddings using Gemini&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Embeddings and text are stored in Qdrant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;User submits a query&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Query is embedded and matched against stored vectors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;7&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Relevant document chunks are retrieved&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;8&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gemini generates an answer using the retrieved context&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  TECH ARSENAL
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Backend&lt;/td&gt;
&lt;td&gt;FastAPI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM&lt;/td&gt;
&lt;td&gt;Google Gemini Flash&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embeddings&lt;/td&gt;
&lt;td&gt;Gemini &lt;code&gt;gemini-embedding-001&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vector Database&lt;/td&gt;
&lt;td&gt;Qdrant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Framework&lt;/td&gt;
&lt;td&gt;LangChain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;HTML, CSS, JavaScript&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  System Workflow
&lt;/h3&gt;

&lt;h5&gt;
  
  
  1) Overall RAG System Workflow
&lt;/h5&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F11etxz7tureqe0lyuv94.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F11etxz7tureqe0lyuv94.png" alt="Overall RAG system workflow diagram"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h5&gt;
  
  
  2) Document Ingestion Pipeline
&lt;/h5&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6043ebd6jw77da7knb2f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6043ebd6jw77da7knb2f.png" alt="Document ingestion pipeline diagram"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h5&gt;
  
  
  3) Query Retrieval Pipeline
&lt;/h5&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0x6kv338p2ha8bqn233t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0x6kv338p2ha8bqn233t.png" alt="Query retrieval pipeline diagram"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h5&gt;
  
  
  4) Conversational Chat Flow
&lt;/h5&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhn88dczzgfu8t5nturcb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhn88dczzgfu8t5nturcb.png" alt="Conversational chat flow diagram"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;To make the system easier to explore, I built a simple web interface where users can upload documents and interact with the RAG system in real time.&lt;/p&gt;

&lt;h4&gt;
  
  
  Interaction
&lt;/h4&gt;

&lt;p&gt;&lt;em&gt;the interaction may be slightly off or may not work as desired due to deployment on render&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://rag-miee.onrender.com/" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;rag-miee.onrender.com&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;




&lt;h4&gt;
  
  
  Video
&lt;/h4&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/rNZZKdj1-MI"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;h4&gt;
  
  
  GitHub Repo
&lt;/h4&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/Sohamactive" rel="noopener noreferrer"&gt;
        Sohamactive
      &lt;/a&gt; / &lt;a href="https://github.com/Sohamactive/Rag-implementation-using-qdrant-gemini" rel="noopener noreferrer"&gt;
        Rag-implementation-using-qdrant-gemini
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Gemini-Qdrant RAG Backend&lt;/h2&gt;

&lt;/div&gt;
&lt;p&gt;This is a production-ready RAG (Retrieval-Augmented Generation) backend built with FastAPI. It uses Google Gemini Embeddings for high-quality semantic search and Qdrant as the vector database.&lt;/p&gt;
&lt;p&gt;The pipeline is non-blocking, designed for concurrent requests, and optimized for indexing documents.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;⚙️ Setup and Installation&lt;/h2&gt;

&lt;/div&gt;
&lt;div class="snippet-clipboard-content notranslate position-relative overflow-auto"&gt;&lt;pre class="notranslate"&gt;&lt;code&gt;python -m venv .venv
.venv/scripts/activate
pip install -r requirements.txt
fastapi dev backend/main.py
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Create a .env file in the root of your project directory and set your access keys and database URL.&lt;/p&gt;
&lt;div class="snippet-clipboard-content notranslate position-relative overflow-auto"&gt;&lt;pre class="notranslate"&gt;&lt;code&gt;# Gemini API Key (Required for embedding and generation)
GEMINI_API_KEY="YOUR_GEMINI_API_KEY_HERE"

# Qdrant Vector Database Connection
QDRANT_URL="http://localhost:6333"
QDRANT_API_KEY="" # Use only if your Qdrant instance requires it
QDRANT_COLLECTION="rag_documents_768"
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;future endeavors :&lt;br&gt;
[] implementing graphrag , pageindex, and other types of rag&lt;/p&gt;
&lt;p&gt;[] deploying a full fledged app&lt;/p&gt;
&lt;/div&gt;



&lt;/div&gt;
&lt;br&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/Sohamactive/Rag-implementation-using-qdrant-gemini" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;br&gt;
&lt;/div&gt;








&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;This project helped me understand how &lt;strong&gt;RAG actually works in practice&lt;/strong&gt;, not just in theory. Building the pipeline made me see how document ingestion, chunking, embeddings, vector search, and LLM generation all connect together. 🧩🤖&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I also learned how &lt;strong&gt;embeddings convert text into vectors&lt;/strong&gt; and why they are essential for semantic search. 🔢🔍&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Working with &lt;strong&gt;vector databases like Qdrant&lt;/strong&gt; helped me understand how similarity search and top-k retrieval power document-based AI systems. 🗂️⚡&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;And honestly, &lt;strong&gt;working with documentation can sometimes be a headache&lt;/strong&gt; 🥲. A lot of the learning came from experimenting, debugging, and figuring things out along the way. 🛠️😇&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Future Endeavours ⛰️
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Adding an authentication and user management system to manage chats and create &lt;strong&gt;separate databases for each user&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Adding support for &lt;strong&gt;different database services like MongoDB&lt;/strong&gt;, both local and cloud-based.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Using &lt;strong&gt;local LLMs&lt;/strong&gt; to minimize inference costs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Experimenting with &lt;strong&gt;advanced RAG architectures&lt;/strong&gt; such as &lt;a href="https://github.com/VectifyAI/PageIndex" rel="noopener noreferrer"&gt;PageIndex&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Deploying the system properly on &lt;strong&gt;cloud platforms like AWS, Azure, or GCP&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Google Gemini Feedback
&lt;/h2&gt;

&lt;h4&gt;
  
  
  LLM API — Smooth Experience 👍👍
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Calling the Gemini generative models was straightforward.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;code&gt;google-genai&lt;/code&gt; SDK has a clean interface. Methods like:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;client.models.generate_content()
client.models.embed_content()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;feel intuitive, and getting a working prototype running was quick.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The models themselves performed well for RAG use cases. Once retrieved document chunks were provided as context, Gemini produced grounded responses and streaming worked reliably in conversational flows.&lt;/li&gt;
&lt;/ul&gt;




&lt;h4&gt;
  
  
  Embeddings API — Documentation Could Be Clearer
&lt;/h4&gt;

&lt;p&gt;The embeddings API worked well overall, but the documentation could be easier to navigate.&lt;/p&gt;

&lt;p&gt;While integrating &lt;code&gt;gemini-embedding-001&lt;/code&gt;, some configuration details were not immediately obvious and well explained, especially parameters like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;task_type = "RETRIEVAL_QUERY"
task_type = "RETRIEVAL_DOCUMENT"
output_dimensionality
batch_size_limits
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because of that, I ended up doing some &lt;strong&gt;trial-and-error and experimentation&lt;/strong&gt; (and occasionally using AI coding tools) to determine the correct request structure.&lt;/p&gt;

&lt;p&gt;The API itself works well, but a &lt;strong&gt;more consolidated explanation of embedding configuration&lt;/strong&gt; would make the developer experience even smoother.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp30qxa811hgxmdt6q01w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp30qxa811hgxmdt6q01w.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>geminireflections</category>
      <category>gemini</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
