Vectra: A Provider-Agnostic RAG SDK That Lets You Swap Anything Without Rewriting Code

Abhishek — Sun, 15 Mar 2026 16:33:40 +0000

I got tired of rebuilding RAG pipelines every time I wanted to change a provider. So I built an SDK where the entire pipeline is config-driven. 4.5K+ downloads and counting.

Every RAG tutorial follows the same pattern: pick an embedding model, pick a vector DB, pick an LLM, write a bunch of glue code, and hope it works. Three months later, you need to swap from OpenAI to Gemini, or move from Chroma to Postgres — and suddenly you're rewriting half your backend.

I built Vectra to fix that. It's an open-source, provider-agnostic SDK for building full RAG pipelines where every component is swappable through config.

npm install vectra-js
# or
pip install vectra-rag-py

What It Actually Does

Vectra covers the full pipeline:

Load → Chunk → Embed → Store → Retrieve → Rerank → Plan → Ground → Generate → Stream

All of it is configured through a single object. Here's a complete working example:

const { VectraClient, ProviderType } = require('vectra-js');
const { Pool } = require('pg');

const client = new VectraClient({
  embedding: {
    provider: ProviderType.OPENAI,
    apiKey: process.env.OPENAI_API_KEY,
    modelName: 'text-embedding-3-small'
  },
  llm: {
    provider: ProviderType.GEMINI,
    apiKey: process.env.GOOGLE_API_KEY,
    modelName: 'gemini-2.5-flash'
  },
  database: {
    type: 'postgres',
    clientInstance: new Pool({ connectionString: process.env.DATABASE_URL }),
    tableName: 'document',
    columnMap: { content: 'content', metadata: 'metadata', vector: 'vector' }
  },
  chunking: {
    strategy: 'recursive',
    chunkSize: 1000,
    chunkOverlap: 200
  },
  retrieval: { strategy: 'hybrid' },
  reranking: { enabled: true, windowSize: 20, topN: 5 }
});

// Ingest
await client.ingestDocuments('./docs');

// Query
const res = await client.queryRAG('What is the vacation policy?');
console.log(res.answer);

// Stream
const stream = await client.queryRAG('Summarize the policy', null, true);
for await (const chunk of stream) process.stdout.write(chunk.delta || '');

Notice how the embedding provider is OpenAI and the LLM is Gemini. You can mix and match freely. Want to switch to Anthropic for generation? Change one line:

llm: {
  provider: ProviderType.ANTHROPIC,
  apiKey: process.env.ANTHROPIC_API_KEY,
  modelName: 'claude-sonnet-4-20250514'
}

Your application code stays identical.

The Full Config — Everything That's Swappable

The quick start above shows a partial config. Here's the full picture — every stage of the RAG pipeline you can control:

Pipeline Stage	What You Configure	Options
Embedding	Provider, model, dimensions, API key	OpenAI, Gemini, Ollama, HuggingFace
LLM	Provider, model, temperature, max tokens	OpenAI, Gemini, Anthropic, Ollama, OpenRouter, HuggingFace
Vector Store	Backend, connection, table/collection	PostgreSQL (pgvector), Prisma, ChromaDB, Qdrant, Milvus
Chunking	Strategy, chunk size, overlap	Recursive (character-aware) or Agentic (LLM-driven semantic)
Retrieval	Search strategy	Naive, HyDE, Multi-Query, Hybrid RRF, MMR
Reranking	Enable/disable, window size, top N	LLM-based reordering of retrieved chunks
Memory	Backend, max messages, session config	In-memory, Redis, PostgreSQL
Observability	Enable/disable, storage path	SQLite-backed traces + web dashboard
Metadata Enrichment	Per-chunk summaries, keywords, hypothetical Qs	Generated at ingestion time
Query Planning	Grounding strictness, context assembly	How strictly answers must cite retrieved text
Streaming	Toggle per query	Unified async generator across all providers
Ingestion	File/directory, format handling	PDF, DOCX, XLSX, TXT, Markdown

Prototype with Chroma + Ollama on your laptop. Ship with Postgres + OpenAI in prod. Your app code doesn't change.

Features That Matter in Production

Agentic Chunking

Instead of blindly splitting by character count, Vectra can use an LLM to split documents into semantic propositions:

chunking: {
  strategy: 'agentic',
  agenticLlm: {
    provider: ProviderType.OPENAI,
    apiKey: process.env.OPENAI_API_KEY,
    modelName: 'gpt-4o-mini'
  }
}

This makes a huge difference for policy documents, legal text, and anything with complex structure.

Built-in Observability

observability: {
  enabled: true,
  sqlitePath: 'vectra-observability.db'
}

Then run vectra dashboard to get a local web UI showing ingestion latency, query traces, retrieval performance, and chat sessions.

Conversation Memory

memory: {
  enabled: true,
  type: 'redis',
  maxMessages: 20,
  redis: {
    clientInstance: redisClient,
    keyPrefix: 'vectra:chat:'
  }
}

Pass a sessionId and Vectra maintains multi-turn context automatically.

Evaluation

await client.evaluate([
  { question: 'Capital of France?', expectedGroundTruth: 'Paris' }
]);

Built-in faithfulness and relevance metrics. Know if your pipeline is actually working before shipping.

CLI

# Ingest from terminal
vectra ingest ./docs --config=./config.json

# Query from terminal
vectra query "What is our leave policy?" --config=./config.json --stream

# Interactive config builder
vectra webconfig

# Observability dashboard
vectra dashboard

Why I Built This

I'm a solo dev who kept running into the same problem: every RAG project started with hours of plumbing before I could write any actual application logic. And when requirements changed (they always do), switching providers meant touching code everywhere.

Vectra's design principle is simple: RAG is a pipeline, not a pile of libraries. Configure the pipeline once. Change any piece without touching the rest.

Numbers

~4,500 downloads across npm and PyPI. 8 stars on GitHub. It's early, but the developers using it are using it because it actually solves the problem.

Forem: Abhishek