<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Abhishek</title>
    <description>The latest articles on Forem by Abhishek (@iamabhishek-n).</description>
    <link>https://forem.com/iamabhishek-n</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3682682%2F43917f4d-a961-4322-aa62-1dbb5929e605.jpg</url>
      <title>Forem: Abhishek</title>
      <link>https://forem.com/iamabhishek-n</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/iamabhishek-n"/>
    <language>en</language>
    <item>
      <title>Vectra: A Provider-Agnostic RAG SDK That Lets You Swap Anything Without Rewriting Code</title>
      <dc:creator>Abhishek</dc:creator>
      <pubDate>Sun, 15 Mar 2026 16:33:40 +0000</pubDate>
      <link>https://forem.com/iamabhishek-n/vectra-a-provider-agnostic-rag-sdk-that-lets-you-swap-anything-without-rewriting-code-41hj</link>
      <guid>https://forem.com/iamabhishek-n/vectra-a-provider-agnostic-rag-sdk-that-lets-you-swap-anything-without-rewriting-code-41hj</guid>
      <description>&lt;p&gt;I got tired of rebuilding RAG pipelines every time I wanted to change a provider. So I built an SDK where the entire pipeline is config-driven. 4.5K+ downloads and counting.&lt;/p&gt;

&lt;p&gt;Every RAG tutorial follows the same pattern: pick an embedding model, pick a vector DB, pick an LLM, write a bunch of glue code, and hope it works. Three months later, you need to swap from OpenAI to Gemini, or move from Chroma to Postgres — and suddenly you're rewriting half your backend.&lt;/p&gt;

&lt;p&gt;I built &lt;strong&gt;Vectra&lt;/strong&gt; to fix that. It's an open-source, provider-agnostic SDK for building full RAG pipelines where &lt;em&gt;every component is swappable through config&lt;/em&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;vectra-js
&lt;span class="c"&gt;# or&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;vectra-rag-py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What It Actually Does
&lt;/h2&gt;

&lt;p&gt;Vectra covers the full pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Load → Chunk → Embed → Store → Retrieve → Rerank → Plan → Ground → Generate → Stream
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All of it is configured through a single object. Here's a complete working example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;VectraClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ProviderType&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;vectra-js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Pool&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pg&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;VectraClient&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ProviderType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OPENAI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;modelName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text-embedding-3-small&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ProviderType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GEMINI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GOOGLE_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;modelName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;postgres&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;clientInstance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Pool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;connectionString&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DATABASE_URL&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="na"&gt;tableName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;document&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;columnMap&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;content&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;metadata&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;vector&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;chunking&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;recursive&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;chunkSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;chunkOverlap&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;retrieval&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;hybrid&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;reranking&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;windowSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;topN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Ingest&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ingestDocuments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./docs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Query&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;queryRAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;What is the vacation policy?&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Stream&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;queryRAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Summarize the policy&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice how the embedding provider is OpenAI and the LLM is Gemini. You can mix and match freely. Want to switch to Anthropic for generation? Change one line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ProviderType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ANTHROPIC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;modelName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;claude-sonnet-4-20250514&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your application code stays identical.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Full Config — Everything That's Swappable
&lt;/h2&gt;

&lt;p&gt;The quick start above shows a partial config. Here's the &lt;strong&gt;full&lt;/strong&gt; picture — every stage of the RAG pipeline you can control:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pipeline Stage&lt;/th&gt;
&lt;th&gt;What You Configure&lt;/th&gt;
&lt;th&gt;Options&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Embedding&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Provider, model, dimensions, API key&lt;/td&gt;
&lt;td&gt;OpenAI, Gemini, Ollama, HuggingFace&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Provider, model, temperature, max tokens&lt;/td&gt;
&lt;td&gt;OpenAI, Gemini, Anthropic, Ollama, OpenRouter, HuggingFace&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vector Store&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Backend, connection, table/collection&lt;/td&gt;
&lt;td&gt;PostgreSQL (pgvector), Prisma, ChromaDB, Qdrant, Milvus&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Chunking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Strategy, chunk size, overlap&lt;/td&gt;
&lt;td&gt;Recursive (character-aware) or Agentic (LLM-driven semantic)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Retrieval&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Search strategy&lt;/td&gt;
&lt;td&gt;Naive, HyDE, Multi-Query, Hybrid RRF, MMR&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reranking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Enable/disable, window size, top N&lt;/td&gt;
&lt;td&gt;LLM-based reordering of retrieved chunks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Memory&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Backend, max messages, session config&lt;/td&gt;
&lt;td&gt;In-memory, Redis, PostgreSQL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Observability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Enable/disable, storage path&lt;/td&gt;
&lt;td&gt;SQLite-backed traces + web dashboard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Metadata Enrichment&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per-chunk summaries, keywords, hypothetical Qs&lt;/td&gt;
&lt;td&gt;Generated at ingestion time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Query Planning&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Grounding strictness, context assembly&lt;/td&gt;
&lt;td&gt;How strictly answers must cite retrieved text&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Streaming&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Toggle per query&lt;/td&gt;
&lt;td&gt;Unified async generator across all providers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ingestion&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;File/directory, format handling&lt;/td&gt;
&lt;td&gt;PDF, DOCX, XLSX, TXT, Markdown&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Prototype with Chroma + Ollama on your laptop. Ship with Postgres + OpenAI in prod. Your app code doesn't change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Features That Matter in Production
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Agentic Chunking
&lt;/h3&gt;

&lt;p&gt;Instead of blindly splitting by character count, Vectra can use an LLM to split documents into semantic propositions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;chunking&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;strategy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;agentic&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;agenticLlm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ProviderType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OPENAI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;modelName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes a huge difference for policy documents, legal text, and anything with complex structure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Built-in Observability
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;observability&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;enabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;sqlitePath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;vectra-observability.db&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run &lt;code&gt;vectra dashboard&lt;/code&gt; to get a local web UI showing ingestion latency, query traces, retrieval performance, and chat sessions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conversation Memory
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;enabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;redis&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;maxMessages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;clientInstance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;redisClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;keyPrefix&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;vectra:chat:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pass a &lt;code&gt;sessionId&lt;/code&gt; and Vectra maintains multi-turn context automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Evaluation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Capital of France?&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;expectedGroundTruth&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Paris&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Built-in faithfulness and relevance metrics. Know if your pipeline is actually working before shipping.&lt;/p&gt;

&lt;h3&gt;
  
  
  CLI
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Ingest from terminal&lt;/span&gt;
vectra ingest ./docs &lt;span class="nt"&gt;--config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;./config.json

&lt;span class="c"&gt;# Query from terminal&lt;/span&gt;
vectra query &lt;span class="s2"&gt;"What is our leave policy?"&lt;/span&gt; &lt;span class="nt"&gt;--config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;./config.json &lt;span class="nt"&gt;--stream&lt;/span&gt;

&lt;span class="c"&gt;# Interactive config builder&lt;/span&gt;
vectra webconfig

&lt;span class="c"&gt;# Observability dashboard&lt;/span&gt;
vectra dashboard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why I Built This
&lt;/h2&gt;

&lt;p&gt;I'm a solo dev who kept running into the same problem: every RAG project started with hours of plumbing before I could write any actual application logic. And when requirements changed (they always do), switching providers meant touching code everywhere.&lt;/p&gt;

&lt;p&gt;Vectra's design principle is simple: &lt;strong&gt;RAG is a pipeline, not a pile of libraries.&lt;/strong&gt; Configure the pipeline once. Change any piece without touching the rest.&lt;/p&gt;

&lt;h2&gt;
  
  
  Numbers
&lt;/h2&gt;

&lt;p&gt;~4,500 downloads across npm and PyPI. 8 stars on GitHub. It's early, but the developers using it are using it because it actually solves the problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/iamabhishek-n/vectra-js" rel="noopener noreferrer"&gt;github.com/iamabhishek-n/vectra-js&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;npm:&lt;/strong&gt; &lt;a href="https://www.npmjs.com/package/vectra-js" rel="noopener noreferrer"&gt;npmjs.com/package/vectra-js&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PyPI:&lt;/strong&gt; &lt;a href="https://pypi.org/project/vectra-rag-py/" rel="noopener noreferrer"&gt;pypi.org/project/vectra-rag-py&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docs:&lt;/strong&gt; &lt;a href="https://vectra.thenxtgenagents.com" rel="noopener noreferrer"&gt;vectra.thenxtgenagents.com&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Star the repo if this resonates. Open an issue if something's broken. PRs welcome.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built by &lt;a href="https://github.com/iamabhishek-n" rel="noopener noreferrer"&gt;@iamabhishek-n&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>discuss</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
