Forem: Yasser B.

Building a RAG Application with pgvector and the OpenAI API

Yasser B. — Wed, 13 May 2026 15:25:43 +0000

Retrieval-Augmented Generation is now the default architecture for adding knowledge to LLM-powered applications. When a user asks a question, you retrieve relevant context from your own data, pass that context to the model alongside the question, and the model answers based on what you gave it.

Most RAG tutorials reach for LangChain immediately. This guide skips the framework and builds the pipeline from scratch: pgvector for vector storage, the OpenAI Python SDK for embeddings and generation, and psycopg for the database connection.

Originally published at rivestack.io

Database Setup

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
  id BIGSERIAL PRIMARY KEY,
  title TEXT NOT NULL,
  source TEXT
);

CREATE TABLE chunks (
  id BIGSERIAL PRIMARY KEY,
  document_id BIGINT REFERENCES documents(id) ON DELETE CASCADE,
  content TEXT NOT NULL,
  token_count INTEGER,
  embedding VECTOR(1536)
);

CREATE INDEX ON chunks USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

Chunking and Embedding

from openai import OpenAI
import psycopg, os

client = OpenAI()
EMBEDDING_MODEL = "text-embedding-3-small"

def chunk_text(text, chunk_size=400, overlap=50):
    words = text.split()
    start = 0
    while start < len(words):
        end = min(start + chunk_size, len(words))
        yield " ".join(words[start:end])
        if end == len(words):
            break
        start += chunk_size - overlap

def embed_texts(texts):
    return [i.embedding for i in client.embeddings.create(
        input=texts, model=EMBEDDING_MODEL).data]

def ingest_document(title, source, text):
    chunks = list(chunk_text(text))
    embeddings = []
    for i in range(0, len(chunks), 256):
        embeddings.extend(embed_texts(chunks[i:i+256]))
    with psycopg.connect(os.environ["DATABASE_URL"]) as conn:
        with conn.cursor() as cur:
            cur.execute("INSERT INTO documents (title, source) VALUES (%s, %s) RETURNING id", (title, source))
            doc_id = cur.fetchone()[0]
            for chunk, emb in zip(chunks, embeddings):
                cur.execute("INSERT INTO chunks (document_id, content, token_count, embedding) VALUES (%s, %s, %s, %s)",
                            (doc_id, chunk, len(chunk.split()), emb))
        conn.commit()

Retrieval and Generation

def retrieve_chunks(question, limit=5):
    q_emb = embed_texts([question])[0]
    with psycopg.connect(os.environ["DATABASE_URL"]) as conn:
        with conn.cursor() as cur:
            cur.execute("""
                SELECT c.content, d.title,
                  1 - (c.embedding <=> %s::vector) AS similarity
                FROM chunks c
                JOIN documents d ON d.id = c.document_id
                ORDER BY c.embedding <=> %s::vector LIMIT %s
            """, (q_emb, q_emb, limit))
            return [{"content": r[0], "title": r[1], "sim": r[2]} for r in cur.fetchall()]

def answer_question(question):
    chunks = retrieve_chunks(question)
    context = "\n\n---\n\n".join(f"Source: {c['title']}\n{c['content']}" for c in chunks)
    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": f"Answer using only this context:\n\n{context}"},
            {"role": "user", "content": question},
        ],
        temperature=0.2,
    )
    return resp.choices[0].message.content

Why pgvector Over a Dedicated Vector Store

Since embeddings live in PostgreSQL, metadata filtering is just a WHERE clause:

WHERE d.source = 'user-123-docs'

No separate metadata index. No extra service to keep in sync. Transactions, JOINs, and your existing backup setup all work automatically.

Production Notes

Batch embedding calls. The API accepts up to 2048 inputs per call. Batching 256 at a time keeps latency low without hitting rate limits.

Track embedding model per row. Add an embedding_model column so you can identify and re-embed stale rows when switching models.

Limit by tokens, not chunk count. Research shows 3 to 8 highly relevant chunks outperform 20 loosely related ones. Cap context by token count, not by number of chunks.

For the full guide including HNSW index tuning, async ingestion patterns, and similarity thresholds, see the complete post at rivestack.io.

pgvector with Node.js: Build Semantic Search on PostgreSQL

Yasser B. — Wed, 06 May 2026 15:17:41 +0000

This post was originally published on Rivestack.

Most pgvector tutorials are written for Python. That makes sense: Python dominates the AI tooling ecosystem and most RAG frameworks are Python-first. But a large share of production web applications are built with Node.js, and those applications need vector search too. If you are building an API with Express, a full-stack app with Next.js, or a background worker with plain Node.js, this guide has everything you need to get pgvector working.

We will cover connecting with node-postgres, inserting and querying embeddings, building an HNSW index, and wiring it all into a functional RAG pipeline. At the end, there is also a section on using Drizzle ORM for TypeScript projects that prefer an ORM layer.

What You Need Before You Start

You need:

A PostgreSQL database (14 or newer) with the vector extension enabled
Node.js 18 or newer
An OpenAI API key for generating embeddings (or any other embedding provider)

If you are running PostgreSQL locally, install pgvector from the pgvector GitHub repo. On Rivestack, the extension is pre-installed and enabled by default on every database. Connect to your database and run:

CREATE EXTENSION IF NOT EXISTS vector;

Installing the Node.js Packages

npm install pg pgvector openai
npm install --save-dev @types/pg typescript

Connecting to PostgreSQL

import pg from 'pg';
import { fromSql, toSql } from 'pgvector';

const { Pool } = pg;

const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
});

Using a Pool rather than a single Client is important for any application that handles concurrent requests.

Creating the Schema

async function createSchema(): Promise<void> {
  await pool.query(`
    CREATE TABLE IF NOT EXISTS documents (
      id        BIGSERIAL PRIMARY KEY,
      content   TEXT NOT NULL,
      source    TEXT,
      embedding VECTOR(1536)
    )
  `);
}

The dimension (1536) matches OpenAI's text-embedding-3-small model. Adjust based on the model you use.

Generating and Inserting Embeddings

import OpenAI from 'openai';
import { toSql } from 'pgvector';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function embed(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text,
  });
  return response.data[0].embedding;
}

async function insertDocument(content: string, source?: string): Promise<void> {
  const embedding = await embed(content);
  await pool.query(
    'INSERT INTO documents (content, source, embedding) VALUES ($1, $2, $3)',
    [content, source ?? null, toSql(embedding)]
  );
}

The toSql call converts the JavaScript number[] into the string format PostgreSQL expects. Without it, the query will fail with a type error.

Querying by Vector Similarity

interface SearchResult {
  id: number;
  content: string;
  source: string | null;
  similarity: number;
}

async function search(query: string, limit = 5): Promise<SearchResult[]> {
  const queryEmbedding = await embed(query);

  const result = await pool.query<SearchResult>(
    `SELECT id, content, source, 1 - (embedding <=> $1) AS similarity
     FROM documents
     ORDER BY embedding <=> $1
     LIMIT $2`,
    [toSql(queryEmbedding), limit]
  );

  return result.rows;
}

pgvector provides three distance operators: <=> (cosine), <-> (L2), <#> (inner product). For text embeddings, cosine distance is almost always the right choice.

Adding an HNSW Index

Without an index, PostgreSQL does a full sequential scan for each similarity query. This is fine for small datasets but becomes a bottleneck at scale.

CREATE INDEX CONCURRENTLY documents_embedding_idx
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

At query time, tune the recall versus speed tradeoff with hnsw.ef_search:

async function searchWithHighRecall(query: string, limit = 5) {
  const queryEmbedding = await embed(query);
  const client = await pool.connect();
  try {
    await client.query('SET LOCAL hnsw.ef_search = 100');
    const result = await client.query(
      'SELECT id, content, 1 - (embedding <=> $1) AS similarity FROM documents ORDER BY embedding <=> $1 LIMIT $2',
      [toSql(queryEmbedding), limit]
    );
    return result.rows;
  } finally {
    client.release();
  }
}

Use SET LOCAL (not SET) to scope the setting to the current transaction.

Building a Complete RAG Pipeline

async function retrieve(query: string, k = 5): Promise<string[]> {
  const qEmbed = await embed(query);
  const result = await pool.query(
    'SELECT content FROM documents ORDER BY embedding <=> $1 LIMIT $2',
    [toSql(qEmbed), k]
  );
  return result.rows.map((r) => r.content as string);
}

async function answer(question: string): Promise<string> {
  const docs = await retrieve(question);
  const context = docs.join('\n\n');

  const completion = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      { role: 'system', content: 'Answer questions using only the provided context.' },
      { role: 'user', content: `Context:\n${context}\n\nQuestion: ${question}` },
    ],
  });

  return completion.choices[0].message.content ?? '';
}

Metadata Filtering

One advantage of pgvector over standalone vector databases is standard SQL WHERE clauses:

async function searchBySource(query: string, source: string, limit = 5) {
  const queryEmbedding = await embed(query);
  const result = await pool.query(
    `SELECT id, content, source, 1 - (embedding <=> $1) AS similarity
     FROM documents
     WHERE source = $2
     ORDER BY embedding <=> $1
     LIMIT $3`,
    [toSql(queryEmbedding), source, limit]
  );
  return result.rows;
}

Next.js Connection Pool Pattern

For Next.js, avoid creating a new pool on every request:

// lib/db.ts
import pg from 'pg';

declare global {
  var pgPool: pg.Pool | undefined;
}

export const pool =
  globalThis.pgPool ??
  new pg.Pool({ connectionString: process.env.DATABASE_URL });

if (process.env.NODE_ENV !== 'production') {
  globalThis.pgPool = pool;
}

The globalThis pattern prevents Next.js hot reload from creating a new pool on every file change.

Common Mistakes

Forgetting toSql on insert. You will get invalid input syntax for type vector. Always call toSql(embedding) before passing it to pool.query.

Using SET hnsw.ef_search without LOCAL. Connection pools reuse sessions, so earlier settings can bleed into later queries. Always use SET LOCAL inside a transaction.

Creating the HNSW index before loading data. Load your data first, then build the index. Building on an empty table reduces index quality.

Where to Go From Here

The next steps: add hybrid search combining vector and full-text, add row-level security for multi-tenant apps, or check the HNSW tuning guide for production performance.

The operational side — keeping pgvector updated, managing NVMe storage, connection pooling — is where self-hosted PostgreSQL gets complicated. Rivestack provides managed PostgreSQL with pgvector pre-installed and connection pooling built in.

Hybrid Search with pgvector and PostgreSQL Full-Text Search

Yasser B. — Wed, 29 Apr 2026 15:27:46 +0000

Pure vector search is not always enough. If you have built a semantic search system with pgvector and noticed that exact keyword matches sometimes get buried under loosely related results, you have run into the fundamental limitation of embedding-only retrieval. Hybrid search fixes this by combining vector similarity with traditional keyword matching. In PostgreSQL, you can do both in a single query, without any additional infrastructure.

This guide covers:

Why pure vector search falls short for exact-term queries
How Reciprocal Rank Fusion (RRF) works
Full SQL implementation combining pgvector and tsvector
Python wrapper and LangChain integration
Metadata filtering, tuning, and performance considerations

Read the full post on Rivestack: https://rivestack.io/blog/hybrid-search-pgvector-postgres

PostgreSQL Row Level Security: A Complete Guide

Yasser B. — Wed, 22 Apr 2026 18:20:31 +0000

Your application code knows which tenant owns which row. Your ORM always filters by WHERE tenant_id = $1. Your team has reviewed the queries and they look fine.

Then someone forgets the WHERE clause. Or a bulk operation skips the filter. Or a new developer writes a raw query without knowing the convention. Suddenly one tenant can read another tenant's data, and you find out from a support ticket two weeks later.

Row Level Security (RLS) moves the tenant isolation logic inside PostgreSQL itself. The database enforces the policy automatically on every access, regardless of how the query was written.

What Row Level Security Does

Enable RLS on a table:

ALTER TABLE documents ENABLE ROW LEVEL SECURITY;

Without any policies, no rows are visible to non-superusers. The safe default is deny, not permit. Then create a policy:

CREATE POLICY documents_tenant_isolation
  ON documents FOR ALL
  USING (tenant_id = current_setting('app.tenant_id')::uuid);

Setting the Tenant Context

Always use SET LOCAL (not SET) with connection poolers. SET LOCAL resets when the transaction ends, so pooled connections do not carry the wrong tenant context into the next request:

BEGIN;
SET LOCAL app.tenant_id = '550e8400-e29b-41d4-a716-446655440000';
-- your queries here
COMMIT;

FORCE ROW LEVEL SECURITY

Table owners bypass RLS by default. Close this gap:

ALTER TABLE documents FORCE ROW LEVEL SECURITY;

Without it, an application connecting as the table owner silently ignores all policies. This is the most common RLS gotcha.

Permissive vs Restrictive Policies

Multiple policies on the same operation combine with OR by default (permissive). For rules that must always apply, use AS RESTRICTIVE. Restrictive policies combine with AND against all other policies.

Performance

Add an index on the tenant_id column:

CREATE INDEX idx_documents_tenant_id ON documents (tenant_id);

Without it, every query with an RLS filter becomes a full table scan.

Common Mistakes

Not using FORCE ROW LEVEL SECURITY when the app connects as the table owner
Using SET instead of SET LOCAL with PgBouncer in transaction mode (tenant context leaks between clients)
Missing the index on the tenant_id column
Not testing cross-tenant access explicitly in your test suite

For the full guide with multi-tenant schema setup, testing patterns, EXPLAIN output, and inspecting existing policies, read the full post at rivestack.io.

Originally published at rivestack.io

PostgreSQL Monitoring: A Practical Guide to Metrics, Tools, and Alerts

Yasser B. — Wed, 22 Apr 2026 15:17:27 +0000

Something is wrong with your database. You just don't know it yet.

Maybe it's a query that takes 200ms on average but spikes to 8 seconds every few minutes. Maybe autovacuum fell behind three weeks ago and your tables are now bloated to twice their logical size. Maybe replication lag crept from 100ms to 4 seconds and nobody noticed because the app still works. Until it doesn't.

PostgreSQL gives you everything you need to catch these problems early. The issue is knowing which of the hundreds of available metrics actually matter, how to collect them without adding overhead, and what to alert on versus what to watch passively. This guide walks through all of it.

Why PostgreSQL Monitoring Is Different

PostgreSQL is not a black box. It exposes a rich set of internal statistics through system views that you can query with plain SQL. No special agent, no proprietary protocol. The challenge is not access but interpretation: there are dozens of views, each tracking a different subsystem, and the relationships between them are not obvious until you've debugged a few production incidents.

The other challenge is overhead. Collecting metrics too aggressively can itself impact performance, particularly on systems with high connection counts or heavy write workloads.

A good monitoring setup has three layers:

Metrics collection ‚Äî snapshot key numbers every 15 to 60 seconds
Query analysis ‚Äî track which queries are slow and how often they run
Alerts ‚Äî fire on conditions that require human action, not on every blip

The Metrics That Actually Matter

Connection Usage

PostgreSQL creates a new OS process for every client connection. This isn't free. Each connection consumes around 5 to 10MB of RAM, and the OS has to schedule and context-switch between all of them.

The most important connection metric is what fraction of your max_connections limit you're actually using:

SELECT
  count(*) AS total_connections,
  count(*) FILTER (WHERE state = 'active') AS active,
  count(*) FILTER (WHERE state = 'idle') AS idle,
  count(*) FILTER (WHERE state = 'idle in transaction') AS idle_in_transaction,
  (SELECT setting::int FROM pg_settings WHERE name = 'max_connections') AS max_connections
FROM pg_stat_activity
WHERE pid <> pg_backend_pid();

idle in transaction connections are particularly dangerous. A connection sitting in an open transaction holds locks and prevents autovacuum from cleaning up dead rows. If you see this number climbing, you likely have application code that opens transactions and then waits on external resources before committing.

For most applications, if you're regularly exceeding 80% of max_connections, you need connection pooling. PgBouncer in transaction mode can multiplex hundreds of application connections through a small handful of actual Postgres connections.

Query Performance

pg_stat_statements is the single most useful view for understanding production performance. It tracks execution statistics for every distinct query seen since the extension was last reset, aggregated across all calls:

SELECT
  query,
  calls,
  round(total_exec_time::numeric, 2) AS total_ms,
  round(mean_exec_time::numeric, 2) AS mean_ms,
  round(stddev_exec_time::numeric, 2) AS stddev_ms,
  rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

The total_exec_time column gives you the queries consuming the most cumulative time. This is usually the right place to start: a fast query that runs a million times a day costs more than a slow query that runs once.

The stddev_exec_time column is underused. A query with a mean of 10ms and a stddev of 200ms has pathological behavior: most calls are fast but occasional calls are catastrophically slow. This pattern often means missing indexes on filtered columns or a query plan that changes depending on parameter values.

To enable pg_stat_statements:

-- Add to postgresql.conf:
-- shared_preload_libraries = 'pg_stat_statements'

-- After restart:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Cache Hit Ratio

PostgreSQL uses shared buffers as its in-memory cache. If your working set fits in memory, most reads are served from cache rather than disk. The buffer hit ratio tells you how well this is working:

SELECT
  sum(heap_blks_read) AS heap_read,
  sum(heap_blks_hit) AS heap_hit,
  round(
    sum(heap_blks_hit)::numeric /
    nullif(sum(heap_blks_hit) + sum(heap_blks_read), 0) * 100,
    2
  ) AS hit_ratio_pct
FROM pg_statio_user_tables;

A ratio below 90% on an OLTP database usually means your shared_buffers setting is too small, or your working set is genuinely larger than available RAM. The fix is either more RAM or a read replica to distribute load.

Table Bloat and Autovacuum

PostgreSQL uses MVCC (multi-version concurrency control). When you update or delete a row, the old version stays on disk until autovacuum cleans it up. If autovacuum falls behind, tables accumulate dead rows: "bloat." A highly bloated table is slower to scan and wastes disk space.

This query shows which tables have the most dead tuples relative to live ones:

SELECT
  schemaname,
  relname AS table_name,
  n_live_tup,
  n_dead_tup,
  round(n_dead_tup::numeric / nullif(n_live_tup + n_dead_tup, 0) * 100, 1) AS dead_pct,
  last_autovacuum,
  last_autoanalyze
FROM pg_stat_user_tables
WHERE n_live_tup > 1000
ORDER BY n_dead_tup DESC
LIMIT 20;

Tables with a dead_pct above 20% need attention. If autovacuum is running but not keeping up, you need to tune it: either lower autovacuum_vacuum_scale_factor for large tables, or increase autovacuum_max_workers if you have many tables that need vacuuming simultaneously.

For tables that see extremely high write throughput, you can override autovacuum settings at the table level:

ALTER TABLE orders
SET (autovacuum_vacuum_scale_factor = 0.01,
     autovacuum_vacuum_threshold = 100);

This tells autovacuum to trigger a vacuum on the orders table after 1% of rows change, rather than the default 20%. For a 10 million row table, the default means waiting until 2 million rows are dead before cleaning up.

Replication Lag

If you're running streaming replication (either for HA or read replicas), monitoring lag is critical. On the primary:

SELECT
  client_addr,
  state,
  sent_lsn,
  write_lsn,
  flush_lsn,
  replay_lsn,
  pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replication_lag_bytes
FROM pg_stat_replication;

replication_lag_bytes tells you how far behind each replica is in bytes of WAL. You can convert this to approximate time, but the byte measure is more reliable: time is dependent on how busy the primary is, whereas bytes directly measures how much data hasn't been applied yet.

For a complete picture of how high availability and replication interact, see our PostgreSQL High Availability guide.

Lock Waits

Lock contention is one of the sneakiest performance problems. A query that normally takes 10ms will wait indefinitely if it needs a lock held by another transaction. This query shows active locks and what they're waiting on:

SELECT
  blocked.pid AS blocked_pid,
  blocked.query AS blocked_query,
  blocking.pid AS blocking_pid,
  blocking.query AS blocking_query,
  blocked.wait_event_type,
  blocked.wait_event
FROM pg_stat_activity AS blocked
JOIN pg_stat_activity AS blocking
  ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
WHERE cardinality(pg_blocking_pids(blocked.pid)) > 0;

Long-running idle in transaction connections are the most common culprit. A connection that opens a transaction, acquires a lock, and then waits on application code blocks every other transaction that needs the same lock. Setting idle_in_transaction_session_timeout = '30s' in PostgreSQL 9.6+ will automatically terminate these connections.

Tools for Collecting PostgreSQL Metrics

Prometheus and postgres_exporter

The most common open source stack is postgres_exporter feeding into Prometheus, with Grafana for visualization. postgres_exporter runs as a sidecar, connects to your database with a read-only user, and scrapes all the stats views above on a configurable interval.

The advantage is flexibility: you control exactly which metrics you collect, how long you retain them, and how you visualize them. The disadvantage is that you own all of it. Setting up a reliable Prometheus stack with proper retention, alerting, and visualization takes real engineering time.

A minimal postgres_exporter setup:

# docker-compose.yml
services:
  postgres-exporter:
    image: prometheuscommunity/postgres-exporter
    environment:
      DATA_SOURCE_NAME: "postgresql://monitoring_user:password@postgres:5432/mydb?sslmode=require"
    ports:
      - "9187:9187"

The monitoring_user needs these grants:

CREATE USER monitoring_user WITH PASSWORD 'password';
GRANT pg_monitor TO monitoring_user;

pg_monitor is a predefined role in PostgreSQL 10+ that grants read access to all monitoring-related views without superuser privileges.

pganalyze

pganalyze is a hosted service purpose-built for PostgreSQL monitoring. It does query performance analysis, index recommendations, and automated bloat detection. The main differentiator is that it normalizes query statistics: it groups queries by their structure (ignoring parameter values) and tracks performance over time with a timeline view that shows when a query got slower and whether it correlates with a schema change or data growth.

The free tier covers one database and is enough to get started. The main tradeoff versus Prometheus is that you're sending query statistics to an external service, which requires a data processing agreement for regulated industries.

Datadog and CloudWatch

If you're already paying for Datadog or AWS CloudWatch, the PostgreSQL integrations are solid and require minimal setup. Datadog's postgres integration scrapes all the key stats views and provides out-of-the-box dashboards and alert policies.

The tradeoff is cost: Datadog pricing is per host, which gets expensive at scale. CloudWatch RDS Enhanced Monitoring is cheaper but only applies to RDS instances.

pg_activity

For interactive debugging, pg_activity is a top-like interface for PostgreSQL. It shows running queries, their duration, locks, I/O, and CPU usage in real time:

pip install pg_activity
pg_activity -U postgres --dbname=mydb

This isn't for automated monitoring; it's for when you're actively debugging an incident and need to see what's happening right now.

What to Alert On

Not every metric worth tracking needs an alert. Alert fatigue is real: if your alerts fire frequently and rarely require action, engineers start ignoring them.

These are the conditions worth waking someone up for:

High connection usage. Alert when active connections exceed 85% of max_connections. At 90%, new connections start failing, which causes application errors.

Replication lag above threshold. What the threshold is depends on your recovery time objective. For most applications, lag above 60 seconds on a replica that serves traffic is worth alerting on. For HA failover replicas, even 10 seconds of lag means potential data loss in a failover scenario.

Long-running transactions. Any transaction open for more than 5 minutes is almost certainly stuck or abandoned. This is a lock risk and a bloat risk.

Autovacuum not running. If last_autovacuum on a high-write table is more than a few hours old, autovacuum is either failing or misconfigured. The table will bloat.

Disk usage above 80%. PostgreSQL will crash if it runs out of disk space, and it will do so in a way that can require recovery. Alert early and plan for disk growth.

Checkpoint frequency too high. If checkpoints_req in pg_stat_bgwriter is growing faster than checkpoints_timed, your checkpoint_completion_target and max_wal_size settings need tuning. Frequent required checkpoints hurt write throughput.

SELECT
  checkpoints_timed,
  checkpoints_req,
  round(checkpoints_req::numeric / nullif(checkpoints_timed + checkpoints_req, 0) * 100, 1) AS pct_required
FROM pg_stat_bgwriter;

If pct_required is above 10%, increase max_wal_size.

Index Health

Monitoring query performance tells you when queries are slow, but it doesn't always tell you why. Two index-specific queries help diagnose the most common cause: missing or unused indexes.

Unused indexes waste write performance and vacuum time without helping any query:

SELECT
  schemaname,
  relname AS table_name,
  indexrelname AS index_name,
  idx_scan,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND NOT indisprimary
  AND NOT indisunique
ORDER BY pg_relation_size(indexrelid) DESC;

Indexes with idx_scan = 0 since the last stats reset (or since the database was restarted) are candidates for removal. Check the PostgreSQL Indexing Guide for the full picture on when to drop versus keep.

Monitoring in Production: Practical Setup

A minimal but effective monitoring setup for a production PostgreSQL database:

Enable pg_stat_statements as a shared preload library. This has almost no overhead and is the single highest-value observability addition you can make.
Create a dedicated monitoring user with pg_monitor role. Never use superuser credentials for monitoring.
Collect metrics every 30 seconds to a time-series store (Prometheus, CloudWatch, or Datadog). This gives you enough resolution to catch spikes without overwhelming the database.
Set up a weekly query review. Look at the top 10 queries by total time in pg_stat_statements every week. This is where most optimization opportunities hide.
Alert on the conditions above, not on every metric. Keep the signal-to-noise ratio high so alerts remain meaningful.
Back up your monitoring data separately from your database. If your database is having a problem, you need to be able to look at historical metrics from before the incident started. See our guide on PostgreSQL backup and recovery for a full production backup strategy.

What Rivestack Handles for You

If you're running PostgreSQL on Rivestack, the monitoring fundamentals are already in place. Rivestack exposes connection counts, active query duration, replication lag, and disk usage through a built-in dashboard, and alerts fire automatically on the conditions above without requiring any setup on your part.

pg_stat_statements is enabled by default on every Rivestack database. You can query it directly through your preferred SQL client or connect to it via the Rivestack metrics API.

For teams that want deeper observability, Rivestack supports read-only monitoring credentials so you can connect postgres_exporter or pganalyze without any additional configuration.

Starting Points

The most important thing is to start simple. You don't need the full Prometheus stack before you've shipped. What you do need, immediately, is pg_stat_statements enabled and a dashboard that shows connection usage and disk growth. Those two things will catch 80% of production database problems before they become incidents.

Add alerts for connection spikes and replication lag. Set idle_in_transaction_session_timeout. Review slow queries weekly. Everything else you can add as you learn what actually causes problems in your specific workload.

When you're ready to take the operational burden off your team entirely, try Rivestack for managed PostgreSQL with monitoring built in from day one.

Originally published at rivestack.io

PostgreSQL Backup and Recovery: A Complete Guide

Yasser B. — Sat, 18 Apr 2026 08:24:11 +0000

Your database has never failed. That's not because you're careful. It's because you haven't been running long enough.

Hardware dies. Disk controllers corrupt data silently for weeks before anyone notices. An engineer runs a DELETE without a WHERE clause on production. A cloud provider has a multi-zone outage. The only thing standing between you and catastrophic data loss is a backup you can actually restore from.

This guide covers how PostgreSQL backup actually works: the difference between logical and physical backups, how point-in-time recovery lets you undo bad queries, and what a production-grade backup strategy looks like. We will go through the tools, the tradeoffs, and the one mistake most teams make that turns a "we have backups" story into a "we lost two days of data" incident.

Logical vs Physical Backups

Before touching any tools, you need to understand the two categories of PostgreSQL backup:

Logical backups capture the data as SQL. The output is a file of CREATE TABLE, INSERT, and COPY statements that recreate your schema and data from scratch.

Physical backups copy the raw files that PostgreSQL uses to store data on disk, including the base directory and Write-Ahead Log (WAL) files.

Each has a different purpose:

	Logical (pg_dump)	Physical (pg_basebackup)
Portability	High: restore to different Postgres versions	Low: must match major version
Granularity	Table, schema, database	Entire cluster
Restore time	Slow for large databases	Fast: just copy files back
Point-in-time recovery	No	Yes (with WAL archiving)
Consistency	Always consistent	Consistent (with pg_start_backup)

For small databases or specific table exports, logical backups with pg_dump are perfectly reasonable. For production databases over a few gigabytes, or whenever you need point-in-time recovery, physical backups are the right foundation.

pg_dump: Logical Backups for Specific Needs

pg_dump is a utility that comes bundled with PostgreSQL. It connects to a running database and produces a self-consistent snapshot you can restore elsewhere.

Basic usage

# Dump a database to a plain SQL file
pg_dump -h localhost -U postgres mydb > mydb.sql

# Dump in custom format (recommended: faster restore, selective restore)
pg_dump -h localhost -U postgres -Fc mydb > mydb.dump

# Dump to a directory with parallel workers
pg_dump -h localhost -U postgres -Fd -j 4 -f mydb_dir/ mydb

The custom format (-Fc) is almost always better than plain SQL. It compresses the output, supports parallel restore, and lets you restore individual tables or schemas selectively.

Restoring a pg_dump backup

# Restore a plain SQL dump
psql -h localhost -U postgres newdb < mydb.sql

# Restore a custom format dump
pg_restore -h localhost -U postgres -d newdb mydb.dump

# Restore with parallel workers (much faster for large databases)
pg_restore -h localhost -U postgres -j 4 -d newdb mydb.dump

Limitations of pg_dump

pg_dump works by running a transaction against your running database. For large databases, this can take hours, and the entire dump runs in a single transaction to ensure consistency. If you need to restore 500 GB with pg_restore, you are looking at several hours of downtime.

More importantly, pg_dump gives you a snapshot at a point in time. If you run it at midnight and your application corrupts data at 10 PM the following day, you lose 22 hours of data. That might be acceptable for some workloads. For most production databases, it is not.

pg_basebackup: Physical Backups of the Entire Cluster

pg_basebackup creates a physical copy of the PostgreSQL data directory. Combined with WAL archiving, it forms the foundation of production backup strategies.

What pg_basebackup does

PostgreSQL writes every change to the Write-Ahead Log before applying it to the actual data files. This means a physical backup taken at any moment is internally consistent if you also keep the WAL generated during the backup. pg_basebackup handles the coordination automatically.

pg_basebackup \
  -h localhost \
  -U replicator \
  -D /backups/base \
  -Ft \
  -z \
  -P \
  --wal-method=stream

The replicator user needs REPLICATION privilege:

CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'yourpassword';

Restoring from pg_basebackup

sudo systemctl stop postgresql
sudo rm -rf /var/lib/postgresql/16/main/*
sudo tar -xzf /backups/base/base.tar.gz -C /var/lib/postgresql/16/main/
sudo systemctl start postgresql

This gives you the database as it was at the time of the backup. To recover to a point after the backup, you need WAL archiving.

WAL Archiving and Point-In-Time Recovery

Point-in-time recovery (PITR) lets you restore a database to any moment between your last base backup and now. It works by replaying WAL files on top of a base backup.

Enabling WAL archiving

In postgresql.conf:

wal_level = replica
archive_mode = on
archive_command = 'cp %p /mnt/wal-archive/%f'

In production, ship to object storage instead:

archive_command = 'aws s3 cp %p s3://my-wal-bucket/wal/%f'

Performing a point-in-time recovery

# PostgreSQL 12+: create a recovery signal file
touch /var/lib/postgresql/16/main/recovery.signal

# Add to postgresql.conf
restore_command = 'cp /mnt/wal-archive/%f %p'
recovery_target_time = '2026-04-18 14:30:00'
recovery_target_action = 'promote'

You can also recover to a named restore point you create before migrations:

SELECT pg_create_restore_point('before_migration');

pgBackRest: Production-Grade Backup Management

pgBackRest handles everything from incremental backups to parallel compression and verification.

Minimal pgBackRest configuration

/etc/pgbackrest/pgbackrest.conf:

[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=2
repo1-retention-diff=7
process-max=4

[main]
pg1-path=/var/lib/postgresql/16/main

In postgresql.conf:

archive_command = 'pgbackrest --stanza=main archive-push %p'
archive_mode = on
wal_level = replica

Running and restoring backups

# Full backup
sudo -u postgres pgbackrest --stanza=main --type=full backup

# Restore to a specific point in time
sudo -u postgres pgbackrest --stanza=main restore \
  --type=time \
  --target="2026-04-18 14:30:00"

Backup Verification: The Step Most Teams Skip

An untested backup is not a backup. It is an assumption.

WAL segments can be silently corrupted. Archive commands can silently succeed (exit 0) while producing empty files. pg_dump output can be truncated. You will not know until you try to restore.

Monitor archive health continuously:

SELECT
  archived_count,
  last_archived_wal,
  last_archived_time,
  failed_count,
  last_failed_wal,
  last_failed_time
FROM pg_stat_archiver;

If failed_count is climbing and last_archived_time is stale, your WAL archive is silently broken.

For mission-critical databases, run automated restore drills: take a backup, restore to an isolated instance, compare row counts against production. This sounds like overhead, and it is. It is also the thing that saves teams from finding out their backups were corrupted during an actual disaster.

Backup Retention and Recovery Time

A common production policy:

Daily full backups kept for 7 days
Continuous WAL archiving for PITR within the retention window
Weekly backups kept for 4 weeks
Monthly snapshots kept for 12 months

For recovery time: pg_restore with a single worker does roughly 1 GB per minute. A 100 GB database is 100 minutes. With parallel restore (-j 4), that drops to 30 to 40 minutes. A physical restore from pgBackRest over a 1 Gbps connection takes under 15 minutes for the base backup, plus WAL replay time.

Know your restore time before the incident, not during it.

A Practical Backup Checklist

[ ] Base backups running on a schedule (at least daily)
[ ] WAL archiving enabled and destination confirmed
[ ] Backup verification job running at least weekly
[ ] Restore procedure documented and tested
[ ] Retention policy defined and enforced
[ ] Monitoring for archive failures via pg_stat_archiver
[ ] Offsite storage (not on the same machine or availability zone as the primary)
[ ] Recovery time measured on actual restore drill

Summary

PostgreSQL gives you the primitives for a solid backup strategy. The gap between "we have backups" and "we can actually recover" is almost always process: untested restores, silent archive failures, and retention policies set once and never revisited.

If you want backups that work without the operational overhead, try Rivestack. Continuous backup, point-in-time recovery, and restore in a few clicks.

For more production PostgreSQL reading, see our guides on connection pooling with PgBouncer and PostgreSQL indexing strategies.

Originally published at rivestack.io

How to Search Gmail Attachments by File Type, Date, or Sender

Yasser B. — Sun, 12 Apr 2026 13:41:23 +0000

You know the file is somewhere in your inbox. A PDF invoice, a spreadsheet from your accountant, a signed contract someone sent three months ago. You scroll, you search, you give up and ask the sender to resend it. Sound familiar?

Gmail holds thousands of messages for most professionals, and attachments get buried fast. The built-in search bar is more powerful than most people realize, though. With the right search operators, you can filter attachments by file type, narrow results to a specific date range, isolate emails from one sender, and combine all of these filters to pinpoint exactly what you need in seconds.

This guide covers every useful technique for searching Gmail attachments, from basic operators to advanced combinations.

Why Gmail Search Feels Broken (And How to Fix It)

The default Gmail search works reasonably well for finding emails by keyword or sender name. But when you type "invoice PDF" into the search bar, Gmail searches the full body of every message, not just attachments. You get hundreds of results, most of them irrelevant.

The fix is search operators: special keywords and syntax that tell Gmail exactly what to look for and where. Once you know a handful of these, you'll never spend more than a few seconds hunting for an attachment.

The Foundation: Finding Emails with Attachments

The simplest operator for attachment search is has:attachment. Add it to any Gmail search and your results will only show emails that include attached files.

has:attachment

That single operator cuts out all the noise. From here, you layer on additional filters to zero in on exactly what you need.

Searching Gmail Attachments by File Type

Gmail lets you filter attachments by MIME type using the filename: operator. You can search by extension or by the actual filename if you know it.

Search by file extension

To find all PDFs in your inbox:

has:attachment filename:pdf

For Excel files:

has:attachment filename:xlsx

For Word documents:

has:attachment filename:docx

For ZIP archives:

has:attachment filename:zip

For CSV files:

has:attachment filename:csv

Search by partial filename

If you remember part of the filename, Gmail does partial matching:

has:attachment filename:invoice

Search by exact filename

filename:"Q1 2026 Report.pdf"

Searching Gmail Attachments by Date

Using after: and before:

To find attachments within a specific range:

has:attachment after:2026/01/01 before:2026/02/01

Using older_than: and newer_than:

Find attachments received in the last 30 days:

has:attachment newer_than:30d

Find PDFs received in the past three months:

has:attachment filename:pdf newer_than:3m

Searching Gmail Attachments by Sender

Search by email address:

has:attachment from:john@company.com

Search by domain (useful for vendors or clients):

has:attachment from:@company.com

Combining Filters for Precision Search

PDFs from a specific sender this year:

has:attachment filename:pdf from:accountant@firm.com after:2026/01/01

Large attachments from anyone:

has:attachment larger:5M

Using OR Logic for Multiple File Types

has:attachment {filename:pdf filename:docx}

Searching for Attachments You Sent

in:sent has:attachment

The Limits of Gmail's Built-in Search

Gmail's operators are powerful but have friction points: no bulk download, no attachment-level folder organization, and search doesn't look inside attachment content. Storage management still requires manual action per email.

How Dioveo Extends Gmail Attachment Management

Dioveo is built specifically for Gmail attachment management. Where Gmail's search helps you find attachments, Dioveo helps you act on them at scale — automatically saving attachments to Google Drive based on rules you define: sender, file type, subject keywords, or label.

It also handles the storage problem. Dioveo identifies large and old attachments, gives you a clear view of what's using space, and lets you clean up in bulk. Start with the free plan at dioveo.com.

Quick Reference: Gmail Attachment Search Operators

What you want	Operator
Emails with any attachment	`has:attachment`
Specific file type	`filename:pdf`
From a specific sender	`from:email@domain.com`
From a domain	`from:@company.com`
After a date	`after:2026/01/01`
Last N days	`newer_than:30d`
Larger than a size	`larger:5M`
Sent by you	`in:sent has:attachment`

Originally published at dioveo.com/blog/how-to-search-gmail-attachments

PostgreSQL JSONB: A Complete Guide to Storing and Querying JSON Data

Yasser B. — Sun, 12 Apr 2026 13:14:50 +0000

Every application reaches a point where the relational model gets awkward. User preferences. API responses. Feature flags. Event payloads. You could create a new column for each field, run a migration every time the schema changes, and normalize everything into lookup tables. Or you could store the data as JSON.

PostgreSQL gives you both options, and unlike databases that bolt on JSON support as an afterthought, PostgreSQL treats JSONB as a first class data type with real indexing, operators, and query planning. You get the flexibility of a document store with the guarantees of a relational database: transactions, constraints, joins, and a query planner that actually knows what it's doing.

This guide covers everything you need to work with JSONB in production: when to use it, how to query it, how to index it, and where teams get tripped up.

Originally published at rivestack.io

pgvector with LangChain: Build a RAG Pipeline on PostgreSQL

Yasser B. — Fri, 10 Apr 2026 09:19:05 +0000

LangChain has a vectorstore abstraction that lets you swap out the underlying vector database without rewriting your application logic. Swap Chroma for Pinecone, Pinecone for Weaviate, whatever. In theory, it's clean. In practice, most teams end up staying with whatever they picked first, because migration is never as simple as swapping a class name.

So the decision matters. And if you're building on PostgreSQL, the answer is almost always: use pgvector. Your embeddings live in the same database as your users, documents, and application state. No sync pipeline. No eventual consistency. Full SQL.

This guide walks through the LangChain PGVector integration from scratch, including document loading, embedding, similarity search, metadata filtering, and wiring it into a working retrieval chain.

Why pgvector Over a Dedicated Vectorstore

Before the code, let's be direct about the tradeoff.

Dedicated vectorstores like Pinecone are fast and scale to billions of vectors without you thinking about infrastructure. If you're building something where vectors are the entire product, they're a reasonable choice.

But most applications aren't like that. You have users. You have documents. You have metadata. You have access control logic. When your vectors live in Pinecone and everything else lives in PostgreSQL, you've just created a sync problem. Document gets deleted from Postgres, the embedding stays in Pinecone. You want to filter by user_id, now you need to implement that in a separate system. You want a transaction that inserts a document and its embedding atomically, you can't have one.

pgvector collapses this. Everything lives in one place. JOINs work. Transactions work. Your existing backup strategy, monitoring setup, and connection pooler all work with zero changes.

Setting Up

You'll need:

PostgreSQL with the vector extension enabled
Python 3.9+
An OpenAI API key (or any LangChain-compatible embeddings model)

Install dependencies:

pip install langchain langchain-openai langchain-postgres psycopg

The langchain-postgres package is the current home for the PGVector integration. On a managed PostgreSQL service with pgvector pre-installed, just run this once:

CREATE EXTENSION IF NOT EXISTS vector;

Connecting LangChain to pgvector

The PGVector class takes a connection string and an embeddings object:

from langchain_postgres import PGVector
from langchain_openai import OpenAIEmbeddings

connection_string = "postgresql+psycopg://user:password@localhost:5432/mydb"
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = PGVector(
    embeddings=embeddings,
    collection_name="documents",
    connection=connection_string,
    use_jsonb=True,
)

Always set use_jsonb=True — it stores metadata as JSONB, unlocking proper metadata filtering. LangChain creates the required tables automatically on first use.

Loading and Embedding Documents

from langchain_core.documents import Document

docs = [
    Document(
        page_content="PostgreSQL supports ACID transactions and is MVCC-based.",
        metadata={"source": "postgres-intro", "category": "database"}
    ),
    Document(
        page_content="pgvector adds vector similarity search using HNSW and IVFFlat indexes.",
        metadata={"source": "pgvector-intro", "category": "extensions"}
    ),
]

vectorstore.add_documents(docs)

For real use cases, split long documents first:

from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
split_docs = splitter.split_documents(raw_docs)
vectorstore.add_documents(split_docs)

Similarity Search

results = vectorstore.similarity_search("How does pgvector index work?", k=4)

results_with_scores = vectorstore.similarity_search_with_score("pgvector index", k=4)
for doc, score in results_with_scores:
    print(f"Score: {score:.4f} — {doc.page_content[:80]}")

Lower scores mean higher similarity when using cosine distance.

Metadata Filtering

# Filter by a single field
results = vectorstore.similarity_search(
    "database indexing",
    k=4,
    filter={"category": "database"}
)

# Multi-tenant: scope to a specific user
results = vectorstore.similarity_search(
    "vector search",
    k=4,
    filter={"user_id": "user_123"}
)

With use_jsonb=True, this translates to a JSONB containment query that can be indexed.

MMR Search for Diverse Results

Maximum Marginal Relevance avoids returning five chunks that all say the same thing:

results = vectorstore.max_marginal_relevance_search(
    "PostgreSQL extensions",
    k=4,
    fetch_k=20,
    lambda_mult=0.5  # 0 = max diversity, 1 = max relevance
)

Building a Retrieval Chain

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 4, "fetch_k": 20}
)

prompt = ChatPromptTemplate.from_template("""
Answer the question based only on the following context:

{context}

Question: {question}
""")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

chain = (
    {"context": retriever | (lambda docs: "\n\n".join(d.page_content for d in docs)),
     "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

answer = chain.invoke("What indexes does pgvector support?")

For multi-tenant RAG, scope the retriever per authenticated user:

retriever = vectorstore.as_retriever(
    search_kwargs={"k": 4, "filter": {"user_id": current_user_id}}
)

HNSW Indexing for Production

By default, pgvector uses exact k-NN search. For datasets over 100k vectors, add an HNSW index:

CREATE INDEX ON langchain_pg_embedding
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- At query time:
SET hnsw.ef_search = 100;

LangChain won't create this index automatically. Run it once after initial load.

Connection Pooling

Configure an engine with a pool instead of relying on per-instance connections:

from sqlalchemy import create_engine

engine = create_engine(
    connection_string,
    pool_size=10,
    max_overflow=20,
    pool_pre_ping=True,
)

vectorstore = PGVector(
    embeddings=embeddings,
    collection_name="documents",
    connection=engine,
    use_jsonb=True,
)

Async Usage

vectorstore = PGVector(
    embeddings=embeddings,
    collection_name="documents",
    connection=async_engine,
    use_jsonb=True,
    async_mode=True,
)

results = await vectorstore.asimilarity_search("your query", k=4)
answer = await chain.ainvoke("What indexes does pgvector support?")

Document Upsert and Deletion

# Upsert by ID
vectorstore.add_documents(updated_docs, ids=["doc_id_1", "doc_id_2"])

# Delete by ID
vectorstore.delete(ids=["doc_id_1"])

Summary

pgvector with LangChain is production-ready. The langchain-postgres package handles embeddings storage, similarity search, MMR, and metadata filtering. The underlying database is PostgreSQL, so transactions, JOINs, and your existing operational tooling all work.

Key things to get right: use use_jsonb=True, create an HNSW index before your dataset grows large, configure a connection pool at the engine level, and scope retrievers to the authenticated user for multi-tenant workloads.

If you want a managed PostgreSQL setup with pgvector pre-installed, PgBouncer included, and backups handled: check out Rivestack.

Originally published at rivestack.io

PostgreSQL Connection Pooling with PgBouncer: A Complete Guide

Yasser B. — Tue, 31 Mar 2026 15:22:09 +0000

You launch your app. Traffic is light, everything works. A few weeks later you start seeing FATAL: remaining connection slots are reserved for non-replication superuser connections. Your PostgreSQL server is out of connections and your app is falling over.

This is one of the most common PostgreSQL scaling problems, and connection pooling is the fix. But the fix has its own complexity: PgBouncer has three modes with different tradeoffs, the configuration is full of footguns, and if you get it wrong you get subtle bugs that are much harder to debug than the original connection error.

This guide covers how PostgreSQL connections actually work, how to set up and configure PgBouncer correctly, and how to choose the right pool mode for your application.

Why PostgreSQL Connections Are Expensive

PostgreSQL handles each connection with a dedicated server process. When a client connects, Postgres forks a new OS process. That process:

Allocates its own memory (typically 5-10 MB per connection including shared memory overhead)
Maintains its own backend state, transaction state, and lock tables
Requires the kernel to schedule it like any other process

At 50 connections, this is fine. At 500 connections, you have 500 OS processes and the scheduler starts showing up in your performance profiles. At 1,000 connections, you are likely hitting the max_connections limit (default 100 in stock PostgreSQL) and your app is returning errors.

The naive fix is to increase max_connections. Don't do that without thinking it through. Each connection costs memory. Set max_connections = 1000 on a server with 8 GB of RAM and you've allocated the entire heap to idle connections before a single query runs. The shared_buffers and work_mem math goes sideways fast.

The right fix is to reduce the number of actual connections to PostgreSQL. That's what connection poolers do.

What PgBouncer Does

PgBouncer sits between your application and PostgreSQL. Your app thinks it's talking to Postgres, but it's actually talking to PgBouncer. PgBouncer maintains a pool of real connections to Postgres and hands them out to client requests.

The numbers look like this in practice:

Before PgBouncer: 300 app threads, 300 Postgres connections
After PgBouncer (transaction mode): 300 app threads, 20 actual Postgres connections

Those 20 connections serve 300 clients because most clients are not actually executing SQL at any given moment. They're waiting for network I/O, processing results, or sitting idle. Transaction mode takes advantage of this by returning a connection to the pool the moment a transaction commits.

Installing PgBouncer

On Ubuntu/Debian:

sudo apt-get install pgbouncer

On macOS with Homebrew:

brew install pgbouncer

Configuring PgBouncer

A minimal working config:

[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
min_pool_size = 5
reserve_pool_size = 5
server_idle_timeout = 600

For production, use hashed passwords. Generate them with:

SELECT concat('"', rolname, '" "', rolpassword, '"')
FROM pg_authid
WHERE rolname = 'myuser';

Your app connects to port 6432 instead of 5432. Nothing else changes in the app code.

The Three Pool Modes

Session Mode

A server connection is assigned when the client connects and held until the client disconnects. This is the safest mode, behaves identically to a direct PostgreSQL connection. Prepared statements, advisory locks, LISTEN/NOTIFY all work correctly.

Session mode does not help much with connection counts at steady state. Use it when you need full compatibility and your problem is peak load, not constant high concurrency.

Transaction Mode

A server connection is assigned for the duration of a transaction and returned to the pool immediately after. This gives you the dramatic reduction in server connections.

The tradeoff: session-level state does not persist across transactions. This breaks:

PREPARE and server-side prepared statement caching
SET commands that are not wrapped in a transaction
Advisory locks (session-scoped)
LISTEN and NOTIFY subscriptions
Temp tables that are supposed to persist across transactions

Transaction mode works well for stateless web applications using pg (Node.js), psycopg2/psycopg3 (Python), or JDBC (Java), as long as those frameworks don't use session-level features.

Statement Mode

A connection is held only for a single SQL statement, then returned. Breaks multi-statement transactions entirely. Rarely the right choice for web applications.

Choosing Your Mode

Your app	Recommended mode
Stateless API using ORM (Django, Rails, Prisma)	Transaction
Long-lived connections with prepared statements	Session
Connection count issues at peak only	Session
Connection count issues at steady state	Transaction
Serverless functions	Transaction
Application using LISTEN/NOTIFY	Session

Pool Sizing

A reasonable starting formula:

default_pool_size = (number of PostgreSQL CPU cores) * 2 + number of disks

In practice, 20-30 works well for most web applications. Check pool utilization while running:

SHOW POOLS;

cl_waiting > 0 sustained means the pool is undersized. sv_idle consistently high means it's oversized.

Monitoring PgBouncer

Connect to the admin console:

psql -h 127.0.0.1 -p 6432 -U pgbouncer pgbouncer

Key commands: SHOW POOLS, SHOW STATS, SHOW CLIENTS, SHOW SERVERS, RELOAD.

SHOW STATS tells you avg_query_time and avg_wait_time. If either is climbing, something is backing up.

Common Pitfalls

Prepared statements in transaction mode: If your app uses server-side prepared statements, transaction mode will break it. Disable them in your driver: prepare_threshold=None in psycopg2, { prepare: false } in pg (Node.js).

Connection storms at startup: Set reserve_pool_size = 5 and max_client_conn to at least 2-3x your expected peak.

SSL: Always use SSL for both client-to-PgBouncer and PgBouncer-to-PostgreSQL in production.

When to Skip PgBouncer

You probably don't need it if:

Fewer than 50 concurrent connections at peak
Using a modern ORM with client-side pooling already (Prisma, Django, Rails)
Serverless workloads with 1-2 connections per function instance

You need it if multiple processes or pods each maintain their own pool and the total exceeds max_connections.

Bottom Line

PostgreSQL connection pooling is not optional at scale. PgBouncer in transaction mode is the right default for most web applications.

The main things to get right: pool size (start at 20-30 for OLTP), mode (transaction for stateless apps, session for anything using prepared statements or advisory locks), monitoring (cl_waiting and avg_wait_time), and SSL in production.

Connection pooling is unglamorous infrastructure, but it's the difference between an app that falls over at traffic spikes and one that just handles them.

Originally published at rivestack.io

PostgreSQL Full Text Search: A Complete Guide

Yasser B. — Mon, 30 Mar 2026 14:49:54 +0000

There's a database already running in your stack. It has your users, your content, your transactions. And buried in that same PostgreSQL instance is a full text search engine you've probably never turned on.

PostgreSQL full text search has been production ready for over a decade. It handles stemming, stop words, multiple languages, weighted ranking, and trigram fuzzy matching. You don't need Elasticsearch for a search feature. You don't need Algolia if your data is already in Postgres. For most applications, especially those with under a few million documents, built-in full text search is the right call.

This guide covers everything you need to ship full text search in PostgreSQL: how the underlying model works, how to index correctly, how to rank results, and how it compares to vector search with pgvector.

How PostgreSQL Full Text Search Works

PostgreSQL doesn't search raw text. It converts text into a normalized representation called a tsvector, then matches queries expressed as tsquery objects. This two-step process is what makes it fast.

A tsvector is a sorted list of lexemes: normalized word forms that strip suffixes and reduce words to their base form. The word "running" becomes "run". "Postgres" becomes "postgr". Stop words like "the", "a", "an" are dropped entirely.

SELECT to_tsvector('english', 'The quick brown fox jumps over the lazy dog');
-- 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2

A tsquery is what you match against:

SELECT to_tsquery('english', 'jumping');
-- Returns: 'jump'

-- Boolean operators
SELECT to_tsquery('english', 'postgres & search');  -- AND
SELECT to_tsquery('english', 'postgres | mysql');   -- OR
SELECT to_tsquery('english', 'database & !oracle'); -- NOT

The match operator is @@:

SELECT to_tsvector('english', 'PostgreSQL is a powerful database')
  @@ to_tsquery('english', 'powerful');
-- Returns: true

Setting Up Full Text Search

Option 1: Generated column (recommended for PostgreSQL 12+)

ALTER TABLE articles
  ADD COLUMN search_vector TSVECTOR
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(body, '')), 'B')
  ) STORED;

setweight assigns priority to fields: 'A' (highest) to title, 'B' to body. Documents where the search term appears in the title rank higher.

Query with ranking:

SELECT id, title,
  ts_rank(search_vector, query) AS rank
FROM articles,
  to_tsquery('english', 'postgresql') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

Option 2: Trigger-maintained column

ALTER TABLE articles ADD COLUMN search_vector TSVECTOR;

CREATE FUNCTION articles_search_vector_trigger() RETURNS TRIGGER AS $$
BEGIN
  NEW.search_vector :=
    setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(NEW.body, '')), 'B');
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER articles_search_vector_update
  BEFORE INSERT OR UPDATE ON articles
  FOR EACH ROW EXECUTE FUNCTION articles_search_vector_trigger();

Indexing for Performance

Without an index, full text search does a full table scan. GIN (Generalized Inverted Index) is purpose built for tsvectors.

CREATE INDEX articles_search_idx ON articles USING GIN (search_vector);

With this index, full text search queries return in milliseconds even on tables with millions of rows.

Ranking Results

SELECT id, title,
  ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'postgresql & database') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 10;

ts_rank_cd uses cover density (how close matching terms are to each other) and often gives better results for multi-word queries.

Snippet Generation

SELECT id, title,
  ts_headline(
    'english', body,
    to_tsquery('english', 'postgresql'),
    'MaxWords=50, MinWords=15, StartSel=<mark>, StopSel=</mark>'
  ) AS snippet
FROM articles
WHERE search_vector @@ to_tsquery('english', 'postgresql')
LIMIT 5;

Note: ts_headline does not use the GIN index. Call it only on the final page of results, not before pagination.

Handling User Input Safely

-- websearch_to_tsquery: safe, handles Google-style syntax
SELECT id, title
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', '"full text search" postgres -oracle');

websearch_to_tsquery (PostgreSQL 11+) is the best default for user input. It's injection-safe, handles partial syntax, and supports quoted phrases and exclusions.

Full Text Search vs pgvector

They solve different problems:

Full text search finds documents containing specific words or phrases. Fast, precise, no ML model required.

pgvector finds semantically similar documents, even without shared keywords. Needs an embedding model.

Use case	Best approach
Blog/docs search	Full text search
Semantic Q&A, RAG	pgvector
E-commerce	Often both

Many systems use both, merging scores with Reciprocal Rank Fusion. For RAG pipelines, see our guides on getting started with pgvector and pgvector with Python.

Complete Production Setup

CREATE TABLE articles (
  id BIGSERIAL PRIMARY KEY,
  title TEXT NOT NULL,
  body TEXT NOT NULL,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  search_vector TSVECTOR GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(body, '')), 'B')
  ) STORED
);

CREATE INDEX articles_search_idx ON articles USING GIN (search_vector);

CREATE FUNCTION search_articles(query_text TEXT, page_size INT DEFAULT 20, page_offset INT DEFAULT 0)
RETURNS TABLE (id BIGINT, title TEXT, snippet TEXT, rank FLOAT4) AS $$
DECLARE q TSQUERY := websearch_to_tsquery('english', query_text);
BEGIN
  RETURN QUERY
  SELECT a.id, a.title,
    ts_headline('english', a.body, q, 'MaxWords=40, MinWords=10, StartSel=<mark>, StopSel=</mark>'),
    ts_rank(a.search_vector, q)
  FROM articles a
  WHERE a.search_vector @@ q
  ORDER BY ts_rank(a.search_vector, q) DESC
  LIMIT page_size OFFSET page_offset;
END;
$$ LANGUAGE plpgsql;

-- Usage
SELECT * FROM search_articles('postgresql replication', 20, 0);

Keyword based search, relevance ranking, HTML ready snippets, and pagination, all within PostgreSQL.

Try It on Rivestack

PostgreSQL full text search and pgvector work side by side on the same database. No separate infrastructure for keyword vs semantic search.

Rivestack gives you a fully managed PostgreSQL instance with pgvector, pg_trgm, and all standard extensions enabled by default. Try it free, no credit card required.

Originally published at rivestack.io

PostgreSQL High Availability: A Practical Guide for Production

Yasser B. — Sun, 29 Mar 2026 16:08:53 +0000

Your application is live. Customers are using it. The database goes down.

How long before traffic routes around the failure? Ten seconds? Five minutes? Never, because you're paged at 2 AM and have to manually promote a replica while the on-call engineer Slacks you asking if the database is "doing a thing"?

PostgreSQL high availability is one of those topics that looks straightforward in blog posts and turns out to be deeply humbling when you actually implement it in production. This guide covers how PostgreSQL HA actually works, the main tools people use, what typically goes wrong, and when the complexity of DIY HA stops being worth it.

What "High Availability" Means for PostgreSQL

High availability means your database keeps serving requests even when individual components fail. For PostgreSQL, that typically requires three things working together:

Data replication — at least one copy of your data exists on a server other than the primary
Failure detection — something notices when the primary is unreachable
Automatic failover — the replica promotes itself to primary without a human in the loop

PostgreSQL ships with excellent replication primitives but no built-in automatic failover. The replication part is solid and well-understood. The failover part is where teams get into trouble.

Streaming Replication: The Foundation

PostgreSQL replication is based on Write-Ahead Log (WAL) shipping. Every write to the primary is first written to the WAL. Replicas connect to the primary and stream that WAL in near-real-time, replaying it to stay current.

Setting up a basic standby looks like this in postgresql.conf on the primary:

-- postgresql.conf on the primary
wal_level = replica
max_wal_senders = 3
wal_keep_size = 1GB

And in pg_hba.conf, you allow the replica to connect for replication:

# pg_hba.conf on the primary
host replication replicator 10.0.0.2/32 scram-sha-256

The replica connects with a primary_conninfo in its configuration and starts streaming WAL from the primary. Once streaming, the replica is typically only milliseconds behind.

Synchronous vs Asynchronous Replication

By default, PostgreSQL replication is asynchronous: the primary commits a transaction and returns success to the client before confirming the replica received the data. If the primary dies at exactly the wrong moment, you can lose the last few transactions.

Synchronous replication waits for at least one replica to confirm it has received and written the WAL before reporting the commit as successful:

-- postgresql.conf on the primary
synchronous_standby_names = 'replica1'
synchronous_commit = on

This gives you zero-RPO (recovery point objective) — no committed data is ever lost. The tradeoff is latency: every write waits for a round trip to the replica. On a local network this is typically 1–5ms. Across availability zones it can be 10–30ms depending on the cloud provider.

Most production setups use synchronous replication for the hot standby and asynchronous replication for additional read replicas or DR standbys that are geographically distant.

The Problem: PostgreSQL Doesn't Fail Over Itself

With streaming replication running, you have your data in two places. But if the primary goes down, PostgreSQL doesn't automatically promote the replica. You have two choices:

Manual failover — someone runs pg_ctl promote or SELECT pg_promote() on the replica. Fast if someone is awake, catastrophic if not.
Automated failover via an HA tool — a separate process watches the primary and promotes the replica when it detects a failure.

Almost every production PostgreSQL HA setup uses one of three tools for automated failover: Patroni, pg_auto_failover, or repmgr. They all solve the same problem; they have meaningfully different complexity and tradeoff profiles.

Patroni: The Industry Standard (and Why It's Hard)

Patroni is what most teams with serious PostgreSQL HA requirements end up using. It's battle-tested, highly configurable, and runs at scale. It's also genuinely complex to operate.

Patroni uses a distributed consensus store — either etcd, Consul, or ZooKeeper — to maintain cluster state and elect a primary. A minimal production Patroni setup is 3 etcd nodes + 2–3 PostgreSQL nodes + HAProxy. That's 5–6 servers before you've even started.

A minimal patroni.yml:

scope: prod-cluster
namespace: /db/
name: pg-node-1

restapi:
  listen: 0.0.0.0:8008
  connect_address: 10.0.0.1:8008

etcd3:
  hosts: 10.0.1.1:2379,10.0.1.2:2379,10.0.1.3:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 30
    maximum_lag_on_failover: 1048576
    synchronous_mode: true

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 10.0.0.1:5432
  data_dir: /var/lib/postgresql/data

HAProxy needs health checks against the Patroni REST API (port 8008), not just the PostgreSQL port — that's how it knows which node is the current primary.

Patroni is genuinely good software. But "run Patroni in production" is a weeks-long project, not an afternoon task.

pg_auto_failover: Simpler, More Opinionated

pg_auto_failover uses a dedicated monitor node instead of etcd:

# On the monitor node
pg_autoctl create monitor \
  --pgdata /var/lib/postgresql/monitor \
  --pgport 5000 \
  --hostname monitor.internal

# On the primary node
pg_autoctl create postgres \
  --pgdata /var/lib/postgresql/data \
  --monitor postgres://autoctl_node@monitor.internal:5000/pg_auto_failover \
  --hostname primary.internal \
  --pgport 5432

Easier to set up than Patroni, but the monitor is a single point of failure and it's less flexible for complex topologies. Good choice for teams that want something working quickly without running etcd.

What Goes Wrong in Production

Split-Brain

The most dangerous failure mode: a network partition causes both nodes to think they're primary and accept writes. Patroni prevents this with etcd distributed locks — a node can only be primary if it holds the lock. If it can't reach etcd, it demotes itself. pg_auto_failover prevents it by centralizing all promotion decisions in the monitor.

Replication Lag at Failover Time

With async replication, a lagged replica that promotes will be missing the last N transactions. Patroni's maximum_lag_on_failover controls this — but set it too conservatively and failover blocks entirely if all replicas are lagged after a network partition.

Application Not Reconnecting

After failover, point your connection string at a VIP or load balancer, not a node's IP:

postgresql://ha-proxy.internal:5432/mydb?target_session_attrs=read-write

The target_session_attrs=read-write parameter tells libpq to reject connections to read-only servers, which helps clients find the current primary automatically.

Testing Your Failover

# Kill the primary while watching application logs
sudo systemctl stop postgresql@17-main

# Check the cluster state
patronictl -c /etc/patroni/config.yml list

If you haven't tested your failover, you don't have HA. You have a plan that might work.

The Failover Time Question

Typical Patroni failover: 10–30 seconds. The timeline:

Primary unreachable (0s)
Patroni TTL expires, primary declared dead (default: 30s)
Replica acquires DCS lock and promotes (1–2s)
HAProxy detects new primary (2–5s)
Application reconnects (depends on pool settings)

Reducing TTL speeds detection but increases false positives from transient network blips. Most teams settle on 15–30 seconds.

When Managed PostgreSQL High Availability Makes Sense

DIY HA works. Large companies run Patroni at massive scale. But you're running 5+ nodes, maintaining etcd, and debugging edge cases at the worst possible time.

As we've broken down before, the real cost of self-hosted PostgreSQL HA includes infrastructure, tooling, and engineering time — and engineering time dominates.

Rivestack's HA clusters handle streaming replication, automatic failover, and connection routing automatically. Failover happens in seconds, your connection string doesn't change, and there's no etcd cluster to maintain. HA clusters start at $99/month with NVMe storage, automated backups, point-in-time recovery, and monitoring included.

What to Do If You're Starting From Scratch

For a managed setup: Try Rivestack — spin up an HA cluster, verify failover works from the dashboard, and move on to building your application.

For a self-managed setup:

Start with pg_auto_failover for 2–3 nodes and fast setup
Move to Patroni if you need multi-datacenter support or fine-grained control
Use HAProxy with Patroni REST API health checks (port 8008) for connection routing
Enable synchronous replication if your application can't tolerate data loss
Test failover in staging before relying on it in production

The Bottom Line

PostgreSQL has excellent HA building blocks. Streaming replication is fast, reliable, and well-understood. The gap is automated failover, and Patroni or pg_auto_failover fills it — but adds real operational complexity.

Test your failover before you need it. The worst time to discover your replica is 10 minutes behind is during an actual outage.

If you want to skip the infrastructure work, Rivestack handles the HA layer for you — including pgvector if you're building AI applications. See the getting started guide if that's your stack.

Originally published at rivestack.io