<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Prajwal</title>
    <description>The latest articles on Forem by Prajwal (@prajwal_ee759ffa925a7429e).</description>
    <link>https://forem.com/prajwal_ee759ffa925a7429e</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3698384%2Fcf592bbc-2f2d-44f7-ac2b-a9e585a63899.png</url>
      <title>Forem: Prajwal</title>
      <link>https://forem.com/prajwal_ee759ffa925a7429e</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/prajwal_ee759ffa925a7429e"/>
    <language>en</language>
    <item>
      <title>Designing a Production-Oriented RAG System for Technical Documentation</title>
      <dc:creator>Prajwal</dc:creator>
      <pubDate>Sun, 17 May 2026 05:57:07 +0000</pubDate>
      <link>https://forem.com/prajwal_ee759ffa925a7429e/designing-a-production-oriented-rag-system-for-technical-documentation-11p3</link>
      <guid>https://forem.com/prajwal_ee759ffa925a7429e/designing-a-production-oriented-rag-system-for-technical-documentation-11p3</guid>
      <description>&lt;p&gt;Large Language Models are incredibly powerful, but they have a major limitation:&lt;/p&gt;

&lt;p&gt;They do not inherently know your infrastructure, your internal documentation, your deployment standards, or your engineering workflows.&lt;/p&gt;

&lt;p&gt;Generic LLMs can explain concepts like Docker, Terraform, or NGINX at a broad level, but when building real engineering systems, broad knowledge is not enough. Engineering teams need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;accurate retrieval,&lt;/li&gt;
&lt;li&gt;contextual understanding,&lt;/li&gt;
&lt;li&gt;domain-specific responses,&lt;/li&gt;
&lt;li&gt;conversational continuity,&lt;/li&gt;
&lt;li&gt;and reliable citations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is where Retrieval-Augmented Generation (RAG) systems become important.&lt;/p&gt;

&lt;p&gt;This article explores the architecture and implementation of the RAG pipeline built for &lt;a href="https://vizlab.xyz?utm_source=chatgpt.com" rel="noopener noreferrer"&gt;VizLab.xyz&lt;/a&gt; — an internal AI-powered documentation assistant and developer copilot designed around real engineering workflows.&lt;/p&gt;

&lt;p&gt;Instead of functioning as a generic chatbot, the system was designed to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retrieve highly relevant technical documentation,&lt;/li&gt;
&lt;li&gt;maintain conversational context,&lt;/li&gt;
&lt;li&gt;reduce hallucinations,&lt;/li&gt;
&lt;li&gt;provide citation-backed answers,&lt;/li&gt;
&lt;li&gt;and operate entirely within a controlled documentation ecosystem.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The architecture combines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FastAPI&lt;/li&gt;
&lt;li&gt;FAISS&lt;/li&gt;
&lt;li&gt;BM25&lt;/li&gt;
&lt;li&gt;AWS Bedrock&lt;/li&gt;
&lt;li&gt;AWS Titan Embeddings&lt;/li&gt;
&lt;li&gt;Docker&lt;/li&gt;
&lt;li&gt;Tailscale&lt;/li&gt;
&lt;li&gt;Caddy Reverse Proxy&lt;/li&gt;
&lt;li&gt;S3-backed vector persistence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;while remaining lightweight enough to self-host on a private infrastructure stack.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why Traditional LLMs Fail for Engineering Workflows
&lt;/h1&gt;

&lt;p&gt;One of the biggest problems with using general-purpose LLMs in technical environments is hallucination.&lt;/p&gt;

&lt;p&gt;A model may:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;generate outdated commands,&lt;/li&gt;
&lt;li&gt;invent configuration syntax,&lt;/li&gt;
&lt;li&gt;confuse versions,&lt;/li&gt;
&lt;li&gt;or answer from unrelated training data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For engineering environments, this is dangerous.&lt;/p&gt;

&lt;p&gt;If a system generates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;incorrect Terraform configurations,&lt;/li&gt;
&lt;li&gt;invalid NGINX directives,&lt;/li&gt;
&lt;li&gt;broken IAM policies,&lt;/li&gt;
&lt;li&gt;or misleading Docker instructions,&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;the consequences can directly affect infrastructure stability.&lt;/p&gt;

&lt;p&gt;The goal of the VizLab RAG system was therefore not to create a “smart chatbot.”&lt;/p&gt;

&lt;p&gt;The goal was to create a retrieval-first architecture where:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;trusted documentation is indexed,&lt;/li&gt;
&lt;li&gt;relevant context is retrieved,&lt;/li&gt;
&lt;li&gt;and the LLM is forced to answer primarily from that retrieved context.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This architectural approach significantly improves response reliability.&lt;/p&gt;




&lt;h1&gt;
  
  
  System Goals
&lt;/h1&gt;

&lt;p&gt;The system was specifically designed for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;internal technical documentation retrieval,&lt;/li&gt;
&lt;li&gt;educational AI workflows,&lt;/li&gt;
&lt;li&gt;developer assistance,&lt;/li&gt;
&lt;li&gt;DevOps troubleshooting,&lt;/li&gt;
&lt;li&gt;infrastructure guidance,&lt;/li&gt;
&lt;li&gt;and conversational technical support.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The indexed knowledge base includes curated documentation from domains such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Docker&lt;/li&gt;
&lt;li&gt;NGINX&lt;/li&gt;
&lt;li&gt;Terraform&lt;/li&gt;
&lt;li&gt;GitHub Actions&lt;/li&gt;
&lt;li&gt;Solidity&lt;/li&gt;
&lt;li&gt;AWS Policies&lt;/li&gt;
&lt;li&gt;Infrastructure tooling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Rather than scraping the entire internet, the system intentionally targets a highly curated set of engineering documentation sources.&lt;/p&gt;

&lt;p&gt;This dramatically improves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retrieval precision,&lt;/li&gt;
&lt;li&gt;chunk quality,&lt;/li&gt;
&lt;li&gt;and contextual relevance.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  High-Level Architecture
&lt;/h1&gt;

&lt;p&gt;The architecture is divided into three major pipelines:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Offline Ingestion Pipeline&lt;/li&gt;
&lt;li&gt;Retrieval Pipeline&lt;/li&gt;
&lt;li&gt;Generation Pipeline&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The complete architecture diagram is shown below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5if8r5m7p41zl1wrkmt0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5if8r5m7p41zl1wrkmt0.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Step 1: Documentation Scraping
&lt;/h1&gt;

&lt;p&gt;The system begins by scraping a curated list of documentation URLs using &lt;code&gt;BeautifulSoup&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Instead of indexing random internet pages, the scraper focuses on trusted engineering sources.&lt;/p&gt;

&lt;p&gt;Examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Docker documentation&lt;/li&gt;
&lt;li&gt;Terraform documentation&lt;/li&gt;
&lt;li&gt;NGINX references&lt;/li&gt;
&lt;li&gt;AWS documentation&lt;/li&gt;
&lt;li&gt;Solidity references&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This dramatically improves knowledge quality.&lt;/p&gt;

&lt;p&gt;One important architectural decision was storing raw scraped content immediately into AWS S3 before processing.&lt;/p&gt;

&lt;p&gt;This provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;durability,&lt;/li&gt;
&lt;li&gt;backup recovery,&lt;/li&gt;
&lt;li&gt;ingestion reproducibility,&lt;/li&gt;
&lt;li&gt;and debugging visibility.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If chunking or embedding pipelines fail, raw documentation is still preserved.&lt;/p&gt;




&lt;h1&gt;
  
  
  Step 2: Text Cleaning &amp;amp; Normalization
&lt;/h1&gt;

&lt;p&gt;Raw documentation contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;navigation elements,&lt;/li&gt;
&lt;li&gt;repeated menus,&lt;/li&gt;
&lt;li&gt;headers,&lt;/li&gt;
&lt;li&gt;footers,&lt;/li&gt;
&lt;li&gt;formatting artifacts,&lt;/li&gt;
&lt;li&gt;inconsistent whitespace.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before chunking, the pipeline normalizes and cleans the content.&lt;/p&gt;

&lt;p&gt;This stage improves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;embedding quality,&lt;/li&gt;
&lt;li&gt;retrieval relevance,&lt;/li&gt;
&lt;li&gt;and token efficiency.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Garbage input produces garbage embeddings.&lt;/p&gt;

&lt;p&gt;Cleaning matters more than many people realize.&lt;/p&gt;




&lt;h1&gt;
  
  
  Step 3: Chunking Strategy
&lt;/h1&gt;

&lt;p&gt;One of the most important parts of any RAG system is chunking.&lt;/p&gt;

&lt;p&gt;The VizLab pipeline uses a recursive text splitting strategy designed to preserve semantic meaning.&lt;/p&gt;

&lt;p&gt;Poor chunking destroys retrieval quality.&lt;/p&gt;

&lt;p&gt;If chunks are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;too small → context becomes fragmented&lt;/li&gt;
&lt;li&gt;too large → embeddings become noisy&lt;/li&gt;
&lt;li&gt;overlapping incorrectly → retrieval becomes redundant&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system therefore uses overlapping semantic chunks to preserve continuity across boundaries.&lt;/p&gt;

&lt;p&gt;This allows the retrieval system to maintain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;contextual coherence,&lt;/li&gt;
&lt;li&gt;command relationships,&lt;/li&gt;
&lt;li&gt;and infrastructure explanations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The indexed system currently maintains roughly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;300–500 chunks,&lt;/li&gt;
&lt;li&gt;300–500 embeddings,&lt;/li&gt;
&lt;li&gt;with a FAISS index smaller than 5MB.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because the dataset is intentionally curated and focused, retrieval quality remains high without massive storage overhead.&lt;/p&gt;




&lt;h1&gt;
  
  
  Step 4: Embedding Generation
&lt;/h1&gt;

&lt;p&gt;After chunking, the documents are converted into vector embeddings using AWS Titan Embeddings through Amazon Bedrock.&lt;/p&gt;

&lt;p&gt;Embeddings convert semantic meaning into numerical vector representations.&lt;/p&gt;

&lt;p&gt;This enables similarity search based on meaning rather than exact keyword matching.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;A user asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“How do I configure reverse proxying?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;can still retrieve chunks mentioning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;upstream routing,&lt;/li&gt;
&lt;li&gt;proxy_pass,&lt;/li&gt;
&lt;li&gt;load balancing,&lt;/li&gt;
&lt;li&gt;or NGINX forwarding,&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;even if the exact wording differs.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why Hybrid Search Was Used
&lt;/h1&gt;

&lt;p&gt;One of the most important architectural decisions was combining:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FAISS dense retrieval&lt;/li&gt;
&lt;li&gt;with BM25 sparse retrieval.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Dense vector search is excellent for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;semantic understanding,&lt;/li&gt;
&lt;li&gt;paraphrased questions,&lt;/li&gt;
&lt;li&gt;conceptual similarity.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But dense retrieval sometimes struggles with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;exact command syntax,&lt;/li&gt;
&lt;li&gt;specific keywords,&lt;/li&gt;
&lt;li&gt;infrastructure flags,&lt;/li&gt;
&lt;li&gt;version identifiers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;BM25 solves this by ranking exact keyword relevance.&lt;/p&gt;

&lt;p&gt;Combining both systems creates a hybrid retrieval architecture that balances:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;semantic similarity,&lt;/li&gt;
&lt;li&gt;and exact-match retrieval.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is the Reciprocal Rank Fusion (RRF) logic that combines both lists:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@staticmethod&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_rrf_merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;dense&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;RetrievedChunk&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;sparse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;RetrievedChunk&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;RetrievedChunk&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Reciprocal Rank Fusion.
    RRF score = Σ 1 / (k + rank_i)
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;rrf&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RetrievedChunk&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dense&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunk_id&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rrf&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;rrf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunk_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;
        &lt;span class="n"&gt;rrf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunk_id&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;rrf_score&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;rrf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunk_id&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;faiss_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;rrf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunk_id&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;faiss_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;faiss_score&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sparse&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunk_id&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rrf&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;rrf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunk_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;
        &lt;span class="n"&gt;rrf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunk_id&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;rrf_score&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;rrf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunk_id&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;bm25_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;rrf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chunk_id&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;bm25_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bm25_score&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rrf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rrf_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This significantly improves engineering query performance.&lt;/p&gt;

&lt;p&gt;Especially for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CLI commands,&lt;/li&gt;
&lt;li&gt;configuration syntax,&lt;/li&gt;
&lt;li&gt;infrastructure tooling,&lt;/li&gt;
&lt;li&gt;and troubleshooting workflows.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Retrieval Pipeline
&lt;/h1&gt;

&lt;p&gt;Once ingestion is complete, the system becomes queryable.&lt;/p&gt;

&lt;p&gt;The retrieval pipeline is responsible for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;understanding user intent,&lt;/li&gt;
&lt;li&gt;retrieving relevant context,&lt;/li&gt;
&lt;li&gt;and assembling prompt-ready information.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Query Cache Layer
&lt;/h1&gt;

&lt;p&gt;Before entering retrieval, queries first pass through an LRU TTL cache.&lt;/p&gt;

&lt;p&gt;If an identical query already exists, the system bypasses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;vector retrieval,&lt;/li&gt;
&lt;li&gt;embedding generation,&lt;/li&gt;
&lt;li&gt;and LLM invocation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cached responses return in under 5ms.&lt;/p&gt;

&lt;p&gt;This dramatically reduces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bedrock API usage,&lt;/li&gt;
&lt;li&gt;latency,&lt;/li&gt;
&lt;li&gt;and infrastructure cost.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Caching becomes especially important in developer environments where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;repeated troubleshooting questions,&lt;/li&gt;
&lt;li&gt;repeated configuration lookups,&lt;/li&gt;
&lt;li&gt;and repeated deployment issues&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;occur frequently.&lt;/p&gt;




&lt;h1&gt;
  
  
  Conversational Memory System
&lt;/h1&gt;

&lt;p&gt;The conversational memory implementation uses an in-memory sliding-window architecture.&lt;/p&gt;

&lt;p&gt;Each session maintains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;recent user messages,&lt;/li&gt;
&lt;li&gt;assistant replies,&lt;/li&gt;
&lt;li&gt;timestamps,&lt;/li&gt;
&lt;li&gt;and conversational ordering.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The memory system stores:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;6 user turns&lt;/li&gt;
&lt;li&gt;and 6 assistant turns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;before automatically truncating older context.&lt;/p&gt;

&lt;p&gt;This sliding-window approach prevents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prompt explosion,&lt;/li&gt;
&lt;li&gt;token overflows,&lt;/li&gt;
&lt;li&gt;and degraded retrieval quality.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The memory subsystem influences two separate stages:&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Query Rewriting
&lt;/h2&gt;

&lt;p&gt;If a user asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“How do I configure that for NGINX?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;the retriever analyzes previous messages to understand what “that” refers to.&lt;/p&gt;

&lt;p&gt;This significantly improves retrieval accuracy for follow-up conversations.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Prompt Compilation
&lt;/h2&gt;

&lt;p&gt;The conversation history is also injected into the final LLM prompt.&lt;/p&gt;

&lt;p&gt;This enables conversational continuity while keeping context size controlled.&lt;/p&gt;

&lt;p&gt;The implementation currently uses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;an in-memory Python dictionary,&lt;/li&gt;
&lt;li&gt;mapped by &lt;code&gt;session_id&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This architecture is lightweight and extremely fast for single-instance deployments.&lt;/p&gt;

&lt;p&gt;However, for multi-replica scaling, this would eventually need migration to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Redis,&lt;/li&gt;
&lt;li&gt;or another distributed memory layer.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Query Rewriting
&lt;/h1&gt;

&lt;p&gt;One major issue with naive RAG systems is weak query formulation.&lt;/p&gt;

&lt;p&gt;Users rarely ask perfectly structured questions.&lt;/p&gt;

&lt;p&gt;The VizLab system expands queries into multiple search variants before retrieval.&lt;/p&gt;

&lt;p&gt;This dramatically improves recall.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;A query like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“How do I secure my containers?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;may internally generate variants related to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Docker security&lt;/li&gt;
&lt;li&gt;container isolation&lt;/li&gt;
&lt;li&gt;runtime permissions&lt;/li&gt;
&lt;li&gt;capabilities&lt;/li&gt;
&lt;li&gt;reverse proxy security&lt;/li&gt;
&lt;li&gt;TLS hardening&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The implementation for expanding queries looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_rewrite_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;domains&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Expand the query into search-optimised variants.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;rewrites&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;q_lower&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;intent&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;debug&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;rewrites&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;domains&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;q_lower&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; causes fix solution&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;rewrites&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;troubleshoot &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;q_lower&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;intent&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;how_to&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;rewrites&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;q_lower&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; step by step guide configuration&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;rewrites&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;domains&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;q_lower&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; example&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Cross-domain context injection
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;domains&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;rewrites&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;domains&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; integration &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromkeys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rewrites&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This improves retrieval coverage substantially.&lt;/p&gt;




&lt;h1&gt;
  
  
  Re-Ranking Pipeline
&lt;/h1&gt;

&lt;p&gt;After retrieval, results are re-ranked based on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;keyword density,&lt;/li&gt;
&lt;li&gt;contextual relevance,&lt;/li&gt;
&lt;li&gt;freshness,&lt;/li&gt;
&lt;li&gt;and semantic confidence.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without re-ranking, vector systems often retrieve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;partially related chunks,&lt;/li&gt;
&lt;li&gt;noisy semantic neighbors,&lt;/li&gt;
&lt;li&gt;or overly broad context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To solve this, a custom re-ranking step applies domain-specific heuristic boosts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_rerank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;RetrievedChunk&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;qu&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;QueryUnderstanding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;RetrievedChunk&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Boost chunks by:
      +0.1 per exact keyword match in chunk text
      +0.05 if chunk domain matches query domain
      +0.05 if chunk is Section 0 (intro / overview)
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;q_keywords_lower&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;qu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keywords&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;text_lower&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Keyword density bonus
&lt;/span&gt;        &lt;span class="n"&gt;keyword_hits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;kw&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;q_keywords_lower&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;kw&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;text_lower&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rrf_score&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;keyword_hits&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;

        &lt;span class="c1"&gt;# Domain match bonus
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;domain&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;qu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;domains&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rrf_score&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;

        &lt;span class="c1"&gt;# Introductory chunk bonus
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chunk_index&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rrf_score&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rrf_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Re-ranking significantly improves response precision.&lt;/p&gt;




&lt;h1&gt;
  
  
  Prompt Engineering &amp;amp; Hallucination Control
&lt;/h1&gt;

&lt;p&gt;One of the hardest engineering problems during development was strict citation enforcement.&lt;/p&gt;

&lt;p&gt;Amazon Bedrock Titan occasionally attempted to answer from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;its base model training,&lt;/li&gt;
&lt;li&gt;rather than the retrieved documentation context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a common RAG failure mode.&lt;/p&gt;

&lt;p&gt;The solution required:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;strict system prompts,&lt;/li&gt;
&lt;li&gt;structured prompt assembly,&lt;/li&gt;
&lt;li&gt;guardrails,&lt;/li&gt;
&lt;li&gt;context enforcement,&lt;/li&gt;
&lt;li&gt;and fallback validation logic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The final prompt compiler injects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retrieved context,&lt;/li&gt;
&lt;li&gt;system instructions,&lt;/li&gt;
&lt;li&gt;conversation history,&lt;/li&gt;
&lt;li&gt;user query,&lt;/li&gt;
&lt;li&gt;and formatting constraints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;into a single structured payload.&lt;/p&gt;

&lt;p&gt;The model is heavily instructed to answer primarily from retrieved context.&lt;/p&gt;

&lt;p&gt;This significantly reduces hallucinations and improves citation reliability.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why Streaming Responses Were Avoided
&lt;/h1&gt;

&lt;p&gt;Many AI systems use token streaming.&lt;/p&gt;

&lt;p&gt;This architecture intentionally avoids it.&lt;/p&gt;

&lt;p&gt;Instead, the system waits for the complete generation before returning a strict JSON payload.&lt;/p&gt;

&lt;p&gt;Why?&lt;/p&gt;

&lt;p&gt;Because the system prioritizes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;citation integrity,&lt;/li&gt;
&lt;li&gt;structured outputs,&lt;/li&gt;
&lt;li&gt;deterministic formatting,&lt;/li&gt;
&lt;li&gt;and response validation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Streaming complicated reliable citation attachment.&lt;/p&gt;

&lt;p&gt;For technical documentation systems, correctness was prioritized over token-by-token rendering speed.&lt;/p&gt;




&lt;h1&gt;
  
  
  Deployment Architecture
&lt;/h1&gt;

&lt;p&gt;The entire RAG service is containerized using Docker.&lt;/p&gt;

&lt;p&gt;The backend stack includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FastAPI&lt;/li&gt;
&lt;li&gt;Gunicorn&lt;/li&gt;
&lt;li&gt;AWS Bedrock&lt;/li&gt;
&lt;li&gt;FAISS&lt;/li&gt;
&lt;li&gt;Caddy&lt;/li&gt;
&lt;li&gt;Tailscale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The infrastructure is fully self-hosted.&lt;/p&gt;

&lt;p&gt;Importantly:&lt;br&gt;
the backend is never directly exposed to the public internet.&lt;/p&gt;

&lt;p&gt;Instead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the service operates inside a private Tailscale network,&lt;/li&gt;
&lt;li&gt;secured behind a Caddy reverse proxy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Caddy was specifically chosen because it integrates cleanly with Tailscale and automatically manages TLS certificates inside the private mesh network.&lt;/p&gt;

&lt;p&gt;This removed significant operational overhead.&lt;/p&gt;




&lt;h1&gt;
  
  
  Request Lifecycle Walkthrough
&lt;/h1&gt;

&lt;p&gt;A typical request flows through the following stages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User submits query from frontend UI&lt;/li&gt;
&lt;li&gt;Request enters Tailscale-secured network&lt;/li&gt;
&lt;li&gt;Caddy proxies request to FastAPI backend&lt;/li&gt;
&lt;li&gt;Query cache checks for existing response&lt;/li&gt;
&lt;li&gt;Conversational memory enriches context&lt;/li&gt;
&lt;li&gt;Query rewriting expands retrieval scope&lt;/li&gt;
&lt;li&gt;FAISS + BM25 perform hybrid search&lt;/li&gt;
&lt;li&gt;Retrieved chunks are re-ranked&lt;/li&gt;
&lt;li&gt;Prompt compiler assembles final payload&lt;/li&gt;
&lt;li&gt;AWS Bedrock Titan generates response&lt;/li&gt;
&lt;li&gt;Structured JSON response returned to frontend&lt;/li&gt;
&lt;li&gt;Response stored in memory and cache&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This entire process currently averages around:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;~8.9 seconds end-to-end latency.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h1&gt;
  
  
  What Broke During Development
&lt;/h1&gt;

&lt;p&gt;The hardest parts of the project were not the LLM APIs.&lt;/p&gt;

&lt;p&gt;The hardest parts were infrastructure and operational consistency.&lt;/p&gt;




&lt;h1&gt;
  
  
  1. Dockerizing FAISS &amp;amp; Bedrock Dependencies
&lt;/h1&gt;

&lt;p&gt;Packaging the ingestion pipeline inside Docker caused repeated failures during CI/CD.&lt;/p&gt;

&lt;p&gt;The challenge was getting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python dependencies,&lt;/li&gt;
&lt;li&gt;FAISS native bindings,&lt;/li&gt;
&lt;li&gt;Bedrock credentials,&lt;/li&gt;
&lt;li&gt;and ingestion runtime behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;to work consistently inside the container environment.&lt;/p&gt;

&lt;p&gt;This took extensive debugging across:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;container startup,&lt;/li&gt;
&lt;li&gt;dependency compatibility,&lt;/li&gt;
&lt;li&gt;and runtime initialization.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  2. Tailscale + Caddy Networking
&lt;/h1&gt;

&lt;p&gt;One of the most painful debugging sessions involved:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reverse proxying,&lt;/li&gt;
&lt;li&gt;CORS,&lt;/li&gt;
&lt;li&gt;Tailscale networking,&lt;/li&gt;
&lt;li&gt;and frontend-backend communication.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ensuring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;correct TLS handling,&lt;/li&gt;
&lt;li&gt;proper headers,&lt;/li&gt;
&lt;li&gt;secure private networking,&lt;/li&gt;
&lt;li&gt;and browser compatibility&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;required multiple iterations.&lt;/p&gt;

&lt;p&gt;Networking problems are often far harder than application logic.&lt;/p&gt;




&lt;h1&gt;
  
  
  3. Prompt Instability
&lt;/h1&gt;

&lt;p&gt;Early versions of the system occasionally ignored retrieved context and answered generically.&lt;/p&gt;

&lt;p&gt;This produced:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;weak citations,&lt;/li&gt;
&lt;li&gt;hallucinated explanations,&lt;/li&gt;
&lt;li&gt;inconsistent formatting.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The solution required extensive prompt engineering and retrieval refinement.&lt;/p&gt;

&lt;p&gt;This reinforced an important lesson:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In RAG systems, retrieval quality matters more than model size.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h1&gt;
  
  
  Future Improvements
&lt;/h1&gt;

&lt;p&gt;Several improvements are planned for future iterations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;distributed vector databases&lt;/li&gt;
&lt;li&gt;semantic caching&lt;/li&gt;
&lt;li&gt;Redis-backed distributed memory&lt;/li&gt;
&lt;li&gt;streaming retrieval pipelines&lt;/li&gt;
&lt;li&gt;reranking models&lt;/li&gt;
&lt;li&gt;agentic retrieval workflows&lt;/li&gt;
&lt;li&gt;observability dashboards&lt;/li&gt;
&lt;li&gt;tracing and telemetry&lt;/li&gt;
&lt;li&gt;Kubernetes-native deployment&lt;/li&gt;
&lt;li&gt;multi-user access control&lt;/li&gt;
&lt;li&gt;prompt injection detection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The current architecture intentionally prioritizes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;correctness,&lt;/li&gt;
&lt;li&gt;reliability,&lt;/li&gt;
&lt;li&gt;retrieval quality,&lt;/li&gt;
&lt;li&gt;and operational simplicity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;over premature scaling complexity.&lt;/p&gt;




&lt;h1&gt;
  
  
  Final Thoughts
&lt;/h1&gt;

&lt;p&gt;Building RAG systems is not simply:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“connecting an LLM to a vector database.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Production-oriented retrieval systems require:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retrieval engineering,&lt;/li&gt;
&lt;li&gt;chunking strategy,&lt;/li&gt;
&lt;li&gt;ranking pipelines,&lt;/li&gt;
&lt;li&gt;conversational memory,&lt;/li&gt;
&lt;li&gt;prompt control,&lt;/li&gt;
&lt;li&gt;infrastructure reliability,&lt;/li&gt;
&lt;li&gt;and operational debugging.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most valuable lesson from building this architecture was understanding that AI systems are fundamentally systems engineering problems.&lt;/p&gt;

&lt;p&gt;Not just machine learning problems.&lt;/p&gt;

&lt;p&gt;The quality of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retrieval,&lt;/li&gt;
&lt;li&gt;infrastructure,&lt;/li&gt;
&lt;li&gt;networking,&lt;/li&gt;
&lt;li&gt;caching,&lt;/li&gt;
&lt;li&gt;and prompt orchestration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ultimately determines whether the system feels reliable in real-world engineering workflows.&lt;/p&gt;

&lt;p&gt;And as RAG architectures continue evolving, the engineering surrounding retrieval pipelines may become even more important than the models themselves.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>devops</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Build Your Own Private Cloud in 5 Minutes with Docker, Syncthing &amp; Tailscale ☁️🔒</title>
      <dc:creator>Prajwal</dc:creator>
      <pubDate>Wed, 07 Jan 2026 12:09:08 +0000</pubDate>
      <link>https://forem.com/prajwal_ee759ffa925a7429e/build-your-own-private-cloud-in-5-minutes-with-docker-syncthing-tailscale-d4b</link>
      <guid>https://forem.com/prajwal_ee759ffa925a7429e/build-your-own-private-cloud-in-5-minutes-with-docker-syncthing-tailscale-d4b</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;We all rely on cloud storage—Google Drive, Dropbox, iCloud. But sometimes, you just want total control over your data. You want privacy, speed, and zero subscription fees.&lt;/p&gt;

&lt;p&gt;I recently built a Private Backup Cloud project that solves this problem using open-source tools. It’s self-hosted, encrypted, and accessible from anywhere without opening any public ports on your router.&lt;/p&gt;

&lt;p&gt;In this post, I’ll show you how I built it using Docker, Syncthing, File Browser, and Tailscale.&lt;/p&gt;

&lt;h2&gt;
  
  
  🤔 Why Build This?
&lt;/h2&gt;

&lt;p&gt;Privacy: Your data stays on your devices. No "scanning" by big tech.&lt;/p&gt;

&lt;p&gt;Security: Uses a private VPN (Tailscale) so you don't need to expose your IP to the public internet.&lt;/p&gt;

&lt;p&gt;Cost: Free (if using existing hardware like an old laptop, Raspberry Pi, or the free tier of a VPS).&lt;/p&gt;

&lt;p&gt;Simplicity: Deploys in minutes with Docker Compose.&lt;/p&gt;

&lt;h2&gt;
  
  
  🛠️ The Tech Stack
&lt;/h2&gt;

&lt;p&gt;Here are the heroes of this project:&lt;/p&gt;

&lt;p&gt;Docker: For containerizing the applications so they run anywhere.&lt;/p&gt;

&lt;p&gt;Syncthing: The engine that syncs files continuously between your devices (phone, laptop, server).&lt;/p&gt;

&lt;p&gt;Tailscale: A zero-config VPN that connects your devices as if they were on the same local network (Meshnet).&lt;/p&gt;

&lt;p&gt;File Browser: (Optional) A beautiful web interface to manage your files, just like Google Drive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;The setup is simple:&lt;/p&gt;

&lt;p&gt;Tailscale connects all your devices (Laptop, Phone, Server) into a secure private network.&lt;/p&gt;

&lt;p&gt;Syncthing runs in a Docker container, syncing folders in the background.&lt;/p&gt;

&lt;p&gt;File Browser runs in another container, giving you a Web UI to view, download, and upload files remotely.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0x9rqq17u9tbiyb6cwdc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0x9rqq17u9tbiyb6cwdc.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  How to Deploy (The "Advanced" Setup)
&lt;/h2&gt;

&lt;p&gt;Let's go for the full experience with both the Sync engine and the Web Dashboard.&lt;/p&gt;

&lt;p&gt;Step 1: Prerequisites&lt;br&gt;
Make sure you have Docker and Docker Compose installed. You also need to install Tailscale on your host machine and log in.&lt;/p&gt;

&lt;p&gt;Step 2: Prepare the Environment&lt;br&gt;
We need to create a few folders and a database file for the File Browser to work correctly.&lt;/p&gt;

&lt;p&gt;Bash&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mkdir -p fb_config
touch fb_config/settings.json
touch fb_config/filebrowser.db
echo "{}" &amp;gt; fb_config/settings.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Step 3: The Docker Compose File&lt;br&gt;
Create a docker-compose.yml file and paste this in. This sets up both Syncthing and the File Browser dashboard.&lt;/p&gt;

&lt;p&gt;YAML&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;version: '3'
services:
  syncthing:
    image: lscr.io/linuxserver/syncthing:latest
    container_name: syncthing
    hostname: syncthing
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Asia/Kolkata # Change this to your timezone
    volumes:
      - ./config:/config
      - ./data:/data
    ports:
      - 8384:8384
      - 22000:22000/tcp
      - 22000:22000/udp
      - 21027:21027/udp
    restart: always

  filebrowser:
    image: filebrowser/filebrowser:latest
    container_name: filebrowser
    user: 1000:1000
    volumes:
      - ./data:/srv
      - ./fb_config/filebrowser.db:/database/filebrowser.db
      - ./fb_config/settings.json:/config/settings.json
    ports:
      - 8080:80
    restart: always
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Step 4: Launch It!&lt;br&gt;
Bash&lt;/p&gt;

&lt;p&gt;docker-compose up -d&lt;/p&gt;

&lt;h2&gt;
  
  
  Accessing Your Cloud
&lt;/h2&gt;

&lt;p&gt;Because you are using Tailscale, you can access this safely from anywhere (even a coffee shop) using your Tailscale IP.&lt;/p&gt;

&lt;p&gt;Syncthing UI: &lt;a href="http://YOUR-TAILSCALE-IP:8384" rel="noopener noreferrer"&gt;http://YOUR-TAILSCALE-IP:8384&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Use this to link your phone or laptop and start syncing folders.&lt;/p&gt;

&lt;p&gt;File Dashboard: &lt;a href="http://YOUR-TAILSCALE-IP:8080" rel="noopener noreferrer"&gt;http://YOUR-TAILSCALE-IP:8080&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Default Login: admin / admin (Change this immediately!)&lt;/p&gt;

&lt;p&gt;Now you have a fully functional private cloud that looks like this:&lt;/p&gt;

&lt;p&gt;(You can add a screenshot of the File Browser UI here)&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This project is a perfect weekend build for anyone interested in Self-Hosting or DevOps. It separates your data from big providers and gives you total ownership.&lt;/p&gt;

&lt;p&gt;If you want to try this out, check out the full repository below. It includes a basic setup (lite version) and more detailed instructions.&lt;/p&gt;

&lt;p&gt;🔗 GitHub Repository: &lt;a href="https://github.com/prajwal-1703/private_backup_cloud_syncthing.git" rel="noopener noreferrer"&gt;private_backup_cloud_syncthing&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let me know in the comments if you have any questions or ideas for improvements! Happy hosting! &lt;/p&gt;

</description>
      <category>devops</category>
      <category>docker</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
