<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Finny Collins</title>
    <description>The latest articles on Forem by Finny Collins (@finny_collins).</description>
    <link>https://forem.com/finny_collins</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3778295%2F09068bb4-3299-4502-91a2-603a3c2fc684.png</url>
      <title>Forem: Finny Collins</title>
      <link>https://forem.com/finny_collins</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/finny_collins"/>
    <language>en</language>
    <item>
      <title>How to backup MySQL in Docker — 5 strategies that actually work</title>
      <dc:creator>Finny Collins</dc:creator>
      <pubDate>Fri, 03 Apr 2026 07:58:02 +0000</pubDate>
      <link>https://forem.com/finny_collins/how-to-backup-mysql-in-docker-5-strategies-that-actually-work-38nc</link>
      <guid>https://forem.com/finny_collins/how-to-backup-mysql-in-docker-5-strategies-that-actually-work-38nc</guid>
      <description>&lt;p&gt;Running MySQL in Docker is easy to set up. Backing it up properly is where most people stumble. Containers are ephemeral by design, and a &lt;code&gt;docker rm&lt;/code&gt; on the wrong container can wipe your data if you don't have a backup strategy in place. The default Docker setup doesn't do anything to protect your MySQL data beyond a named volume.&lt;/p&gt;

&lt;p&gt;This article walks through five strategies for backing up MySQL in Docker. They range from quick manual dumps to fully automated solutions with remote storage and monitoring.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. mysqldump via docker exec
&lt;/h2&gt;

&lt;p&gt;The most common way to back up MySQL in Docker is running &lt;code&gt;mysqldump&lt;/code&gt; inside the container itself. You don't need to expose any ports or install MySQL tools on the host. Docker gives you everything you need with &lt;code&gt;docker exec&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ddfbvt0oohpz3xj0efw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4ddfbvt0oohpz3xj0efw.png" alt="MySQL backup in Docker" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's the basic command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nb"&gt;exec &lt;/span&gt;mysql-container mysqldump &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-u&lt;/span&gt; root &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="s1"&gt;'yourpassword'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--single-transaction&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--routines&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--triggers&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  mydatabase &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; backup_&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%Y%m%d_%H%M%S&lt;span class="si"&gt;)&lt;/span&gt;.sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--single-transaction&lt;/code&gt; flag is critical for InnoDB tables. It takes a consistent snapshot without locking tables, so your application keeps running normally during the backup. The &lt;code&gt;--routines&lt;/code&gt; and &lt;code&gt;--triggers&lt;/code&gt; flags capture stored procedures and triggers that &lt;code&gt;mysqldump&lt;/code&gt; skips by default.&lt;/p&gt;

&lt;p&gt;To back up all databases at once:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nb"&gt;exec &lt;/span&gt;mysql-container mysqldump &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-u&lt;/span&gt; root &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="s1"&gt;'yourpassword'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--single-transaction&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--all-databases&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; full_backup_&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%Y%m%d_%H%M%S&lt;span class="si"&gt;)&lt;/span&gt;.sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restoring is straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; mysql-container mysql &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-u&lt;/span&gt; root &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="s1"&gt;'yourpassword'&lt;/span&gt; mydatabase &amp;lt; backup_20260403_040000.sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works well for development and small databases where you're running backups by hand. Simple, requires no extra setup and gives you a portable SQL file. But it's entirely manual. There's no scheduling, no compression and no remote storage.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. mysqldump from the host machine
&lt;/h2&gt;

&lt;p&gt;If your MySQL container exposes its port to the host, you can run &lt;code&gt;mysqldump&lt;/code&gt; from the host machine instead of going through &lt;code&gt;docker exec&lt;/code&gt;. This requires a MySQL client installed locally and a port mapping in your container configuration. It's essentially the same dump operation, just initiated from outside the container.&lt;/p&gt;

&lt;p&gt;Your Docker Compose file needs to map the port:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;mysql&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mysql:8&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3306:3306"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;MYSQL_ROOT_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;yourpassword&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;mysql-data:/var/lib/mysql&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run &lt;code&gt;mysqldump&lt;/code&gt; from the host:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;mysqldump &lt;span class="nt"&gt;-h&lt;/span&gt; 127.0.0.1 &lt;span class="nt"&gt;-P&lt;/span&gt; 3306 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-u&lt;/span&gt; root &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="s1"&gt;'yourpassword'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--single-transaction&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  mydatabase &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; backup.sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach is useful when the host has a different &lt;code&gt;mysqldump&lt;/code&gt; version than the container. Some &lt;code&gt;mysqldump&lt;/code&gt; flags and behaviors change between MySQL versions, and using the host binary lets you control exactly which version runs. It also integrates more naturally with existing backup scripts that already run on the host.&lt;/p&gt;

&lt;p&gt;The tradeoff is port exposure. In development, that's not a concern. In production, make sure port 3306 is bound to localhost only or sits behind a firewall.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Backing up Docker volumes directly
&lt;/h2&gt;

&lt;p&gt;Instead of dumping SQL, you can copy the raw MySQL data files from the Docker volume. This is a file-level (physical) backup. For large databases it can be faster than &lt;code&gt;mysqldump&lt;/code&gt; because you're copying binary files instead of serializing rows into SQL text.&lt;/p&gt;

&lt;p&gt;The critical requirement is that MySQL must be stopped for a consistent copy. Running a file-level backup against a live MySQL instance will almost certainly produce corrupted files.&lt;/p&gt;

&lt;p&gt;Stop the container, copy the volume, then start it again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker stop mysql-container

docker volume inspect mysql-data &lt;span class="nt"&gt;--format&lt;/span&gt; &lt;span class="s1"&gt;'{{ .Mountpoint }}'&lt;/span&gt;

&lt;span class="nb"&gt;sudo cp&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; /var/lib/docker/volumes/mysql-data/_data ./mysql-volume-backup

docker start mysql-container
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you're using bind mounts instead of named volumes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker stop mysql-container
&lt;span class="nb"&gt;tar &lt;/span&gt;czf mysql-backup-&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%Y%m%d&lt;span class="si"&gt;)&lt;/span&gt;.tar.gz ./mysql-data/
docker start mysql-container
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This copies everything — all databases, user accounts, binary logs and server configuration. Restore means copying files back to the volume and starting the container. It's fast and complete. But the required downtime, even if brief, makes it impractical for production systems that can't afford interruptions.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Cron-based automated mysqldump
&lt;/h2&gt;

&lt;p&gt;The three strategies above are all manual. Someone has to remember to run the command. For production, you need backups running automatically on a schedule without human intervention.&lt;/p&gt;

&lt;p&gt;The classic approach is wrapping &lt;code&gt;mysqldump&lt;/code&gt; in a shell script and scheduling it with cron. Here's a script that handles compression, timestamps and basic retention:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nv"&gt;BACKUP_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/opt/backups/mysql"&lt;/span&gt;
&lt;span class="nv"&gt;CONTAINER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"mysql-container"&lt;/span&gt;
&lt;span class="nv"&gt;DB_USER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"root"&lt;/span&gt;
&lt;span class="nv"&gt;DB_PASS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"yourpassword"&lt;/span&gt;
&lt;span class="nv"&gt;DATABASE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"mydatabase"&lt;/span&gt;
&lt;span class="nv"&gt;RETENTION_DAYS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;7

&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BACKUP_DIR&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="nv"&gt;FILENAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BACKUP_DIR&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DATABASE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;_&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%Y%m%d_%H%M%S&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;.sql.gz"&lt;/span&gt;

docker &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CONTAINER&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; mysqldump &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DB_USER&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DB_PASS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--single-transaction&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--routines&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--triggers&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DATABASE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;gzip&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$FILENAME&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;$?&lt;/span&gt; &lt;span class="nt"&gt;-eq&lt;/span&gt; 0 &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Backup completed: &lt;/span&gt;&lt;span class="nv"&gt;$FILENAME&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;else
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Backup failed!"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
  &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi

&lt;/span&gt;find &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BACKUP_DIR&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-name&lt;/span&gt; &lt;span class="s2"&gt;"*.sql.gz"&lt;/span&gt; &lt;span class="nt"&gt;-mtime&lt;/span&gt; +&lt;span class="nv"&gt;$RETENTION_DAYS&lt;/span&gt; &lt;span class="nt"&gt;-delete&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Schedule it with cron to run daily at 4 AM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;0 4 &lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt; /opt/scripts/mysql-backup.sh &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; /var/log/mysql-backup.log 2&amp;gt;&amp;amp;1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gets the job done for a single server with a single database. But it has real limitations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No alerting when backups fail silently — you won't know unless you check logs&lt;/li&gt;
&lt;li&gt;No built-in remote storage — backups live and die with the server&lt;/li&gt;
&lt;li&gt;Managing multiple databases means duplicating and maintaining separate scripts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a small side project, this might be enough. For anything you'd lose sleep over, the gaps start to matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Automated backup with Databasus
&lt;/h2&gt;

&lt;p&gt;Databasus is the industry standard for &lt;a href="https://databasus.com/mysql-backup" rel="noopener noreferrer"&gt;MySQL backup&lt;/a&gt; tools and the most widely used dedicated backup solution for MySQL. It handles scheduling, compression, remote storage, encryption and monitoring through a web interface. No shell scripts to maintain, no cron jobs to debug.&lt;/p&gt;

&lt;h3&gt;
  
  
  Install Databasus
&lt;/h3&gt;

&lt;p&gt;With Docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; databasus &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 4005:4005 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; ./databasus-data:/databasus-data &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--restart&lt;/span&gt; unless-stopped &lt;span class="se"&gt;\&lt;/span&gt;
  databasus/databasus:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or with Docker Compose. Create a &lt;code&gt;docker-compose.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;databasus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;databasus&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;databasus/databasus:latest&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4005:4005"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./databasus-data:/databasus-data&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Create your first backup
&lt;/h3&gt;

&lt;p&gt;Open &lt;code&gt;http://localhost:4005&lt;/code&gt; in your browser and follow these steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Add your database.&lt;/strong&gt; Click "New Database" and enter your MySQL connection details — host, port, username and password. Databasus validates the connection before saving.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Select storage.&lt;/strong&gt; Choose where backups should go. Databasus supports local disk, S3, Cloudflare R2, Google Drive, SFTP and other targets through Rclone.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Select schedule.&lt;/strong&gt; Pick a backup frequency — hourly, daily, weekly, monthly or a custom cron expression. Set the exact time you want backups to run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Click "Create backup."&lt;/strong&gt; Databasus validates your configuration and starts running backups on the schedule you defined. You'll get notifications through Slack, Telegram, email or Discord if something goes wrong.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Databasus also supports retention policies including time-based, count-based and GFS (Grandfather-Father-Son) for layered long-term history. Backup files are encrypted with AES-256-GCM. For teams and enterprise users, there are workspaces with role-based access control and audit logging to track who did what across your backup infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparing the 5 strategies
&lt;/h2&gt;

&lt;p&gt;Each strategy fits a different situation. Here's how they stack up across the features that matter most when your data is on the line:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Setup effort&lt;/th&gt;
&lt;th&gt;Automated&lt;/th&gt;
&lt;th&gt;Compression&lt;/th&gt;
&lt;th&gt;Remote storage&lt;/th&gt;
&lt;th&gt;Monitoring&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;mysqldump via docker exec&lt;/td&gt;
&lt;td&gt;Minimal&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mysqldump from host&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Docker volume backup&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cron + mysqldump script&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Script-based&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Databasus&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Built-in&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The first three strategies are good for manual, one-off backups during development or emergencies. Strategy 4 adds scheduling but leaves you responsible for everything else. Strategy 5 covers the full picture without custom scripting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes when backing up MySQL in Docker
&lt;/h2&gt;

&lt;p&gt;Even with a solid strategy in place, there are recurring mistakes that catch people off guard. These aren't edge cases. They show up in production incidents regularly and they're all preventable.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Skipping &lt;code&gt;--single-transaction&lt;/code&gt;.&lt;/strong&gt; Without it, &lt;code&gt;mysqldump&lt;/code&gt; acquires table-level locks during the dump. Your application stalls while the backup runs. For InnoDB tables this flag gives you a consistent snapshot without blocking writes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Never testing restores.&lt;/strong&gt; A backup you've never restored is a backup you can't trust. Schedule periodic test restores on a throwaway environment. It takes 10 minutes and can save you hours during a real incident.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keeping backups only on the database server.&lt;/strong&gt; If the server goes down, backups go with it. Always store at least one copy on remote storage — S3, a second VPS, anything off the same machine.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Running file-level copies on a live MySQL instance.&lt;/strong&gt; Copying data files while MySQL is running almost always produces corrupted backups. Stop the container first or use a dump-based approach instead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storing database credentials in plain text.&lt;/strong&gt; Backup scripts often contain passwords in the clear. Use environment variables, Docker secrets or a credentials file with restricted permissions instead.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Which strategy should you pick?
&lt;/h2&gt;

&lt;p&gt;The right approach depends on what you're protecting and how much maintenance you're willing to take on. Here's a rough guide:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;th&gt;Recommended strategy&lt;/th&gt;
&lt;th&gt;Reason&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Local development&lt;/td&gt;
&lt;td&gt;mysqldump via docker exec&lt;/td&gt;
&lt;td&gt;Quick, no setup overhead&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Staging environment&lt;/td&gt;
&lt;td&gt;Cron + mysqldump&lt;/td&gt;
&lt;td&gt;Basic automation, acceptable risk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Small production database&lt;/td&gt;
&lt;td&gt;Databasus&lt;/td&gt;
&lt;td&gt;Monitoring and remote storage matter once data matters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Large production database&lt;/td&gt;
&lt;td&gt;Databasus&lt;/td&gt;
&lt;td&gt;Built-in compression and storage integration at scale&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Team or enterprise&lt;/td&gt;
&lt;td&gt;Databasus&lt;/td&gt;
&lt;td&gt;Access management, audit logs and role-based permissions&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For anything you'd actually need to recover from, automate your backups and store them somewhere other than the database server. That's the principle that matters most, regardless of which specific tool you choose.&lt;/p&gt;

</description>
      <category>database</category>
      <category>mysql</category>
    </item>
    <item>
      <title>7 PostgreSQL extensions that will supercharge your database in 2026</title>
      <dc:creator>Finny Collins</dc:creator>
      <pubDate>Thu, 02 Apr 2026 05:58:50 +0000</pubDate>
      <link>https://forem.com/finny_collins/7-postgresql-extensions-that-will-supercharge-your-database-in-2026-1ab6</link>
      <guid>https://forem.com/finny_collins/7-postgresql-extensions-that-will-supercharge-your-database-in-2026-1ab6</guid>
      <description>&lt;p&gt;PostgreSQL ships with a solid set of features out of the box. But where it really pulls ahead of other databases is extensibility. You can bolt on entirely new data types, index methods, background workers and query planners without switching to a different database engine. The extension ecosystem has grown a lot over the past few years, and some of the options available today are genuinely impressive.&lt;/p&gt;

&lt;p&gt;Here are seven extensions worth knowing about in 2026 — whether you're running a side project or managing production infrastructure at scale.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy7v3tpy1q9na2fi4tagt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy7v3tpy1q9na2fi4tagt.png" alt="PostgreSQL extensions" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  pgvector — vector similarity search
&lt;/h2&gt;

&lt;p&gt;If you've done any work with embeddings, recommendations or semantic search, you've probably run into the question of where to store and query vectors. A lot of teams reach for a dedicated vector database. But if your data already lives in PostgreSQL, adding a separate system creates sync headaches and operational overhead that you probably don't need.&lt;/p&gt;

&lt;p&gt;pgvector adds native vector column types and similarity search operators directly to PostgreSQL. You store embeddings alongside your relational data and query them with standard SQL. No extra infrastructure, no data synchronization pipelines.&lt;/p&gt;

&lt;p&gt;The extension supports multiple distance functions and index types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;L2 (Euclidean) distance for spatial and numeric similarity&lt;/li&gt;
&lt;li&gt;Cosine distance for text embeddings and NLP&lt;/li&gt;
&lt;li&gt;Inner product for recommendation systems&lt;/li&gt;
&lt;li&gt;HNSW and IVFFlat indexes for fast approximate nearest neighbor search&lt;/li&gt;
&lt;li&gt;Exact nearest neighbor search when precision matters more than speed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The typical workflow looks like this. You create a table with a &lt;code&gt;vector&lt;/code&gt; column, insert your embeddings from whatever model you use, then query with &lt;code&gt;ORDER BY embedding &amp;lt;=&amp;gt; query_vector LIMIT 10&lt;/code&gt;. It feels like regular SQL because it is regular SQL.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;hnsw&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector_cosine_ops&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'[0.1, 0.2, ...]'&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Performance is solid for most workloads. For millions of vectors, HNSW indexes handle queries in single-digit milliseconds. It won't replace a dedicated vector database for billion-scale datasets, but the vast majority of applications don't operate at that scale anyway.&lt;/p&gt;

&lt;p&gt;pgvector has become the go-to choice for teams that want vector search without the operational cost of running a separate system.&lt;/p&gt;

&lt;h2&gt;
  
  
  TimescaleDB — time-series data at scale
&lt;/h2&gt;

&lt;p&gt;Time-series data shows up everywhere. Server metrics, IoT sensor readings, financial ticks, application events. The volume tends to grow fast, and the query patterns are different from typical OLTP workloads — you're usually aggregating over time windows, downsampling or running continuous computations.&lt;/p&gt;

&lt;p&gt;TimescaleDB extends PostgreSQL with hypertables that automatically partition data by time. You interact with them through normal SQL, but under the hood the extension handles chunking, compression and retention policies. Inserts stay fast even as the table grows to billions of rows because each chunk is a manageable size.&lt;/p&gt;

&lt;p&gt;Compression is one of the standout features. TimescaleDB can compress time-series data by 90-95%, which makes a real difference when you're storing months or years of high-frequency data. Compressed chunks are still queryable — you don't have to decompress them first.&lt;/p&gt;

&lt;p&gt;Continuous aggregates let you precompute rollups (hourly averages, daily maximums) that refresh automatically as new data arrives. This saves you from writing and maintaining materialized view refresh logic yourself.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;metrics&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;device_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt; &lt;span class="nb"&gt;DOUBLE&lt;/span&gt; &lt;span class="nb"&gt;PRECISION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;humidity&lt;/span&gt; &lt;span class="nb"&gt;DOUBLE&lt;/span&gt; &lt;span class="nb"&gt;PRECISION&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;create_hypertable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'metrics'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;by_range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'time'&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;add_compression_policy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'metrics'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'7 days'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;add_retention_policy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'metrics'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'1 year'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you're currently shoehorning time-series data into regular PostgreSQL tables and struggling with query performance or storage costs, TimescaleDB is probably the first thing to try.&lt;/p&gt;

&lt;h2&gt;
  
  
  PostGIS — geospatial data
&lt;/h2&gt;

&lt;p&gt;PostGIS has been around for over two decades, and it's still the most capable open-source geospatial database extension available. It turns PostgreSQL into a full-featured geographic information system with support for geometry, geography, raster data and topology.&lt;/p&gt;

&lt;p&gt;The practical applications are broad. Store and query locations, calculate distances, find points within polygons, route between coordinates, analyze spatial relationships. If your application deals with maps, addresses, delivery zones, geofencing or any kind of location data, PostGIS handles it.&lt;/p&gt;

&lt;p&gt;What sets PostGIS apart from simpler spatial solutions is the depth of its spatial function library. Over 300 functions cover everything from basic distance calculations to complex geometric operations, spatial joins and 3D analysis. It implements the OGC Simple Features standard and integrates with tools like QGIS, GeoServer and MapServer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;stores&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;location&lt;/span&gt; &lt;span class="n"&gt;GEOGRAPHY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;POINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4326&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;stores&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;GIST&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;location&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- find stores within 5 km&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ST_Distance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ST_MakePoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;73&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;73&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="n"&gt;geography&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;distance_m&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;stores&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;ST_DWithin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;location&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ST_MakePoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;73&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;73&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="n"&gt;geography&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;distance_m&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;PostGIS is mature, well-documented and widely used in production. Governments, logistics companies and mapping platforms rely on it daily. If you need spatial capabilities, there's really nothing else in the PostgreSQL ecosystem that comes close.&lt;/p&gt;

&lt;h2&gt;
  
  
  pg_cron — in-database job scheduling
&lt;/h2&gt;

&lt;p&gt;There's a common pattern where you need to run periodic database tasks — purging old records, refreshing materialized views, computing aggregates, vacuuming specific tables. The typical approach is to set up an external cron job or a separate scheduler service that connects to the database and runs the query.&lt;/p&gt;

&lt;p&gt;pg_cron lets you schedule these jobs directly inside PostgreSQL using familiar cron syntax. No external scheduler needed. Jobs run as background workers within the database server itself.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- refresh a materialized view every hour&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;cron&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;schedule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'refresh-dashboard'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'0 * * * *'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'REFRESH MATERIALIZED VIEW CONCURRENTLY dashboard_stats'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- purge old log entries every night at 3 AM&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;cron&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;schedule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'purge-logs'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'0 3 * * *'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'DELETE FROM application_logs WHERE created_at &amp;lt; now() - interval &lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;30 days&lt;/span&gt;&lt;span class="se"&gt;''&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- check scheduled jobs&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;cron&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's simple and it works. You define a schedule, point it at a SQL statement, and pg_cron runs it. You can list jobs, check execution history and unschedule tasks with straightforward function calls.&lt;/p&gt;

&lt;p&gt;One thing to keep in mind — pg_cron runs jobs in a single database connection. Long-running jobs will block subsequent scheduled runs. For heavy ETL workloads you might still want an external orchestrator. But for routine maintenance tasks, it removes a piece of infrastructure that you'd otherwise have to manage separately.&lt;/p&gt;

&lt;p&gt;For teams that want their &lt;a href="https://databasus.com" rel="noopener noreferrer"&gt;PostgreSQL backup&lt;/a&gt; process managed with the same simplicity — scheduled, compressed and sent to remote storage automatically — Databasus is the industry standard tool that handles logical, physical and incremental backups through a clean web interface with no scripts to maintain.&lt;/p&gt;

&lt;h2&gt;
  
  
  pg_stat_statements — query performance monitoring
&lt;/h2&gt;

&lt;p&gt;If you're running PostgreSQL in production and not using pg_stat_statements, you're flying blind. This extension tracks execution statistics for every SQL statement that runs against your database. It's bundled with PostgreSQL itself, so there's nothing extra to install — you just need to enable it.&lt;/p&gt;

&lt;p&gt;Once active, it records how many times each query ran, total and average execution time, rows returned, buffer hits versus disk reads, and more. This data is invaluable for identifying slow queries, spotting regressions after deployments and understanding your actual workload patterns.&lt;/p&gt;

&lt;p&gt;The key metrics it tracks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total and mean execution time per query&lt;/li&gt;
&lt;li&gt;Number of calls (how often each query runs)&lt;/li&gt;
&lt;li&gt;Rows returned per execution&lt;/li&gt;
&lt;li&gt;Shared buffer hits vs reads (cache effectiveness)&lt;/li&gt;
&lt;li&gt;WAL generation per query (write impact)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- top 10 slowest queries by total time&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_exec_time&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;numeric&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_time_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mean_exec_time&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;numeric&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_time_ms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_stat_statements&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;total_exec_time&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A common workflow is to reset statistics after a deployment (&lt;code&gt;SELECT pg_stat_statements_reset()&lt;/code&gt;) and then check back after a few hours to see if any new query patterns emerged. It's also useful for capacity planning — if your top query's call count doubled last month, you know what's driving load growth.&lt;/p&gt;

&lt;p&gt;The extension normalizes queries by replacing literal values with placeholders, so &lt;code&gt;SELECT * FROM users WHERE id = 5&lt;/code&gt; and &lt;code&gt;SELECT * FROM users WHERE id = 42&lt;/code&gt; show up as a single entry. This gives you a clean view of query patterns rather than millions of individual executions.&lt;/p&gt;

&lt;p&gt;Enabling it requires adding &lt;code&gt;pg_stat_statements&lt;/code&gt; to &lt;code&gt;shared_preload_libraries&lt;/code&gt; in &lt;code&gt;postgresql.conf&lt;/code&gt; and restarting the server. Worth the 30-second setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  pg_partman — automatic table partitioning
&lt;/h2&gt;

&lt;p&gt;PostgreSQL has had native declarative partitioning since version 10. But managing partitions by hand gets tedious fast. You need to create new partitions ahead of time, detach old ones when they age out and make sure there are always enough future partitions ready. Miss a partition creation, and inserts start failing.&lt;/p&gt;

&lt;p&gt;pg_partman automates all of this. You tell it how you want the table partitioned — by time range, by integer range or by a list of values — and it handles partition creation, maintenance and optional cleanup on a schedule.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;BIGSERIAL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;event_type&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="n"&gt;JSONB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="n"&gt;TIMESTAMPTZ&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;RANGE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;partman&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_parent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;p_parent_table&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'public.events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;p_control&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'created_at'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;p_interval&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'daily'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;p_premake&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates daily partitions and keeps 7 future partitions ready at all times. The background worker takes care of creating new partitions and optionally dropping old ones based on your retention settings.&lt;/p&gt;

&lt;p&gt;The practical benefits show up at scale. Queries that filter by the partition key skip irrelevant partitions entirely — a query for yesterday's events doesn't touch last month's data. Maintenance operations like VACUUM and REINDEX run per-partition instead of locking the whole table. And dropping old data is instant because you're detaching and dropping a partition rather than deleting millions of rows.&lt;/p&gt;

&lt;p&gt;If you have tables with tens of millions of rows that grow over time, pg_partman is one of those extensions that pays for itself quickly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Citus — horizontal scaling
&lt;/h2&gt;

&lt;p&gt;At some point, a single PostgreSQL server hits its limits. The dataset outgrows available RAM, write throughput maxes out, or analytical queries on large tables take too long even with good indexes. Citus distributes your PostgreSQL database across multiple nodes while keeping the SQL interface you already know.&lt;/p&gt;

&lt;p&gt;The core idea is sharding. You pick a distribution column (usually a tenant ID or some natural partition key), and Citus spreads the data across worker nodes. Queries that include the distribution column get routed to the right shard. Aggregation queries run in parallel across all nodes and results get merged.&lt;/p&gt;

&lt;p&gt;There are a few scenarios where Citus makes sense over single-node PostgreSQL:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-tenant SaaS applications where each tenant's data is independent&lt;/li&gt;
&lt;li&gt;Real-time analytics dashboards that aggregate across large datasets&lt;/li&gt;
&lt;li&gt;High write throughput workloads that exceed single-node IOPS&lt;/li&gt;
&lt;li&gt;Large reference tables that need to be joined with distributed data
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- distribute a table by tenant&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;create_distributed_table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'events'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'tenant_id'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- queries with tenant_id are routed to a single shard&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- aggregations run in parallel across all nodes&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="k"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;interval&lt;/span&gt; &lt;span class="s1"&gt;'1 hour'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Citus isn't the right choice for every workload. Cross-shard joins on non-distribution columns can be expensive. Schema changes require coordination across nodes. And the operational complexity increases compared to a single server. But when your data genuinely outgrows one machine, it lets you scale horizontally without rewriting your application for a different database.&lt;/p&gt;

&lt;h2&gt;
  
  
  How these extensions compare
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Extension&lt;/th&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Primary use case&lt;/th&gt;
&lt;th&gt;Installation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;pgvector&lt;/td&gt;
&lt;td&gt;AI/ML&lt;/td&gt;
&lt;td&gt;Vector similarity search and embeddings&lt;/td&gt;
&lt;td&gt;CREATE EXTENSION&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TimescaleDB&lt;/td&gt;
&lt;td&gt;Time-series&lt;/td&gt;
&lt;td&gt;High-volume time-stamped data&lt;/td&gt;
&lt;td&gt;Separate package&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PostGIS&lt;/td&gt;
&lt;td&gt;Geospatial&lt;/td&gt;
&lt;td&gt;Location data and spatial queries&lt;/td&gt;
&lt;td&gt;CREATE EXTENSION&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pg_cron&lt;/td&gt;
&lt;td&gt;Scheduling&lt;/td&gt;
&lt;td&gt;In-database job scheduling&lt;/td&gt;
&lt;td&gt;shared_preload_libraries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pg_stat_statements&lt;/td&gt;
&lt;td&gt;Monitoring&lt;/td&gt;
&lt;td&gt;Query performance tracking&lt;/td&gt;
&lt;td&gt;shared_preload_libraries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pg_partman&lt;/td&gt;
&lt;td&gt;Partitioning&lt;/td&gt;
&lt;td&gt;Automatic table partition management&lt;/td&gt;
&lt;td&gt;CREATE EXTENSION&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Citus&lt;/td&gt;
&lt;td&gt;Scaling&lt;/td&gt;
&lt;td&gt;Horizontal sharding across nodes&lt;/td&gt;
&lt;td&gt;Separate package&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Installation method&lt;/th&gt;
&lt;th&gt;What it means&lt;/th&gt;
&lt;th&gt;Restart needed&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CREATE EXTENSION&lt;/td&gt;
&lt;td&gt;Installed via SQL, loads on demand&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;pgvector, PostGIS, pg_partman&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;shared_preload_libraries&lt;/td&gt;
&lt;td&gt;Must be added to postgresql.conf&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;pg_stat_statements, pg_cron&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Separate package&lt;/td&gt;
&lt;td&gt;Requires its own package or Docker image&lt;/td&gt;
&lt;td&gt;Depends on setup&lt;/td&gt;
&lt;td&gt;TimescaleDB, Citus&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Picking the right extensions for your stack
&lt;/h2&gt;

&lt;p&gt;Not every project needs all seven of these. A small web application might only benefit from pg_stat_statements and maybe pg_cron. A data-heavy SaaS product might need TimescaleDB or Citus. The right set depends on your actual problems, not on what sounds impressive.&lt;/p&gt;

&lt;p&gt;Start with the ones that address a pain point you already have. If your queries are slow and you don't know why, enable pg_stat_statements first. If you're building anything with location data, PostGIS is a no-brainer. If your AI features currently call out to a separate vector store, try pgvector and see if you can simplify.&lt;/p&gt;

&lt;p&gt;The nice thing about PostgreSQL extensions is that most of them play well together. You can run pgvector and TimescaleDB and PostGIS in the same database. They operate on different data types and don't step on each other's toes.&lt;/p&gt;

&lt;p&gt;Whatever extensions you end up using, make sure your monitoring and backup strategy keeps up with the added complexity. Extensions that add new data types or storage engines sometimes need specific handling during backup and restore. Getting that right early saves you from unpleasant surprises later.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>database</category>
    </item>
    <item>
      <title>PostgreSQL logical replication — 5 steps to set up real-time data sync across servers</title>
      <dc:creator>Finny Collins</dc:creator>
      <pubDate>Wed, 01 Apr 2026 10:43:57 +0000</pubDate>
      <link>https://forem.com/finny_collins/postgresql-logical-replication-5-steps-to-set-up-real-time-data-sync-across-servers-29l3</link>
      <guid>https://forem.com/finny_collins/postgresql-logical-replication-5-steps-to-set-up-real-time-data-sync-across-servers-29l3</guid>
      <description>&lt;p&gt;PostgreSQL has had logical replication built in since version 10, and it remains one of the most practical ways to keep data in sync between two or more servers. Unlike physical replication, which copies the entire database cluster byte-for-byte, logical replication works at the row level. You pick which tables to replicate, and PostgreSQL streams the changes in real time.&lt;/p&gt;

&lt;p&gt;This article walks through the full setup in five steps. By the end you'll have a working publisher-subscriber pair and know how to monitor it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzxh2m7kqk21b2ux403ph.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzxh2m7kqk21b2ux403ph.png" alt="PostgreSQL logical replication"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What is logical replication and when do you need it
&lt;/h2&gt;

&lt;p&gt;Logical replication uses a publish-subscribe model. The source database (publisher) defines a publication — a set of tables whose changes should be broadcast. The target database (subscriber) creates a subscription that connects to the publisher, pulls the initial data snapshot and then receives a continuous stream of INSERT, UPDATE and DELETE operations.&lt;/p&gt;

&lt;p&gt;The key difference from physical (streaming) replication is granularity. Physical replication mirrors the entire cluster, operates at the WAL byte level and requires identical PostgreSQL major versions on both sides. Logical replication works per-table, decodes WAL into logical change events and allows different major versions or even different schemas on the subscriber.&lt;/p&gt;

&lt;p&gt;This makes logical replication useful in a range of scenarios that physical replication simply cannot handle.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Logical replication&lt;/th&gt;
&lt;th&gt;Physical replication&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Granularity&lt;/td&gt;
&lt;td&gt;Per-table&lt;/td&gt;
&lt;td&gt;Entire cluster&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-version support&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No (same major version required)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Subscriber writes&lt;/td&gt;
&lt;td&gt;Allowed&lt;/td&gt;
&lt;td&gt;Read-only standby&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema differences&lt;/td&gt;
&lt;td&gt;Allowed (column subset)&lt;/td&gt;
&lt;td&gt;Must be identical&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DDL replication&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (WAL-level)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Typical use case&lt;/td&gt;
&lt;td&gt;Selective sync, migrations, consolidation&lt;/td&gt;
&lt;td&gt;High availability, failover&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Use logical replication when you need to replicate a subset of tables, consolidate data from multiple sources into one reporting database, migrate between PostgreSQL major versions with minimal downtime or feed changes into a data warehouse alongside normal writes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before you start, make sure you have the following in place:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Two PostgreSQL instances running version 10 or later (publisher and subscriber). They can be on different major versions&lt;/li&gt;
&lt;li&gt;Network connectivity between the two servers on the PostgreSQL port (default 5432)&lt;/li&gt;
&lt;li&gt;A superuser or a user with the &lt;code&gt;REPLICATION&lt;/code&gt; role on the publisher&lt;/li&gt;
&lt;li&gt;The tables you want to replicate must have a primary key or a &lt;code&gt;REPLICA IDENTITY&lt;/code&gt; set. Without one, UPDATE and DELETE operations will fail on the subscriber&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If both instances are on the same machine for testing, just use different ports. The rest of the setup is identical.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1 — configure the publisher database
&lt;/h2&gt;

&lt;p&gt;Logical replication requires the WAL level to be set to &lt;code&gt;logical&lt;/code&gt;. By default PostgreSQL uses &lt;code&gt;replica&lt;/code&gt;, so you need to change this in &lt;code&gt;postgresql.conf&lt;/code&gt; on the publisher.&lt;/p&gt;

&lt;p&gt;Open the config file and set:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;wal_level&lt;/span&gt; = &lt;span class="n"&gt;logical&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You also need at least one replication slot available. Check that &lt;code&gt;max_replication_slots&lt;/code&gt; is high enough. The default of 10 is fine for most setups, but if you already use slots for physical replication or other subscribers, increase it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;max_replication_slots&lt;/span&gt; = &lt;span class="m"&gt;10&lt;/span&gt;
&lt;span class="n"&gt;max_wal_senders&lt;/span&gt; = &lt;span class="m"&gt;10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, update &lt;code&gt;pg_hba.conf&lt;/code&gt; to allow the subscriber to connect with replication privileges. Add a line like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;host&lt;/span&gt;    &lt;span class="n"&gt;all&lt;/span&gt;    &lt;span class="n"&gt;replication_user&lt;/span&gt;    &lt;span class="m"&gt;192&lt;/span&gt;.&lt;span class="m"&gt;168&lt;/span&gt;.&lt;span class="m"&gt;1&lt;/span&gt;.&lt;span class="m"&gt;0&lt;/span&gt;/&lt;span class="m"&gt;24&lt;/span&gt;    &lt;span class="n"&gt;md5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace the IP range with your subscriber's actual address. After editing both files, restart PostgreSQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart postgresql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can verify the WAL level is active by running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;wal_level&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It should return &lt;code&gt;logical&lt;/code&gt;. If it still shows &lt;code&gt;replica&lt;/code&gt;, the restart didn't pick up the config change. Double-check the file path and try again.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2 — create a publication
&lt;/h2&gt;

&lt;p&gt;On the publisher database, create a publication for the tables you want to replicate. Connect to the database as a superuser or the replication user and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;PUBLICATION&lt;/span&gt; &lt;span class="n"&gt;my_publication&lt;/span&gt; &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This publishes changes from the &lt;code&gt;users&lt;/code&gt; and &lt;code&gt;orders&lt;/code&gt; tables. If you want to publish all tables in the database, use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;PUBLICATION&lt;/span&gt; &lt;span class="n"&gt;my_publication&lt;/span&gt; &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also control which operations are published. By default all of them (INSERT, UPDATE, DELETE, TRUNCATE) are included. To limit it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;PUBLICATION&lt;/span&gt; &lt;span class="n"&gt;my_publication&lt;/span&gt; &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;
    &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;publish&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'insert,update'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To check what publications exist:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_publication&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And to see which tables are in a publication:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_publication_tables&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Publications are lightweight metadata objects. Creating one doesn't start any replication by itself. That happens when a subscriber connects.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3 — prepare the subscriber database
&lt;/h2&gt;

&lt;p&gt;The subscriber needs to have the same table structure as the publisher. Logical replication does not copy DDL, so you must create the tables manually before starting.&lt;/p&gt;

&lt;p&gt;The easiest way is to dump the schema from the publisher and restore it on the subscriber:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pg_dump &lt;span class="nt"&gt;-h&lt;/span&gt; publisher_host &lt;span class="nt"&gt;-U&lt;/span&gt; postgres &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-t&lt;/span&gt; &lt;span class="nb"&gt;users&lt;/span&gt; &lt;span class="nt"&gt;-t&lt;/span&gt; orders mydb &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; schema.sql
psql &lt;span class="nt"&gt;-h&lt;/span&gt; subscriber_host &lt;span class="nt"&gt;-U&lt;/span&gt; postgres &lt;span class="nt"&gt;-d&lt;/span&gt; mydb &lt;span class="nt"&gt;-f&lt;/span&gt; schema.sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;-s&lt;/code&gt; flag dumps schema only, no data. The initial data will come through the replication snapshot when the subscription is created.&lt;/p&gt;

&lt;p&gt;Make sure the target tables are empty. If they already contain data, you'll get duplicate key errors during the initial sync. Either truncate them or create the subscription with &lt;code&gt;copy_data = false&lt;/code&gt; (more on that in the next step).&lt;/p&gt;

&lt;p&gt;The subscriber user needs permissions to write to these tables. If you're using the same superuser, that's already covered.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4 — create a subscription
&lt;/h2&gt;

&lt;p&gt;On the subscriber database, create a subscription that points to the publisher:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;SUBSCRIPTION&lt;/span&gt; &lt;span class="n"&gt;my_subscription&lt;/span&gt;
    &lt;span class="k"&gt;CONNECTION&lt;/span&gt; &lt;span class="s1"&gt;'host=publisher_host port=5432 dbname=mydb user=replication_user password=secret'&lt;/span&gt;
    &lt;span class="n"&gt;PUBLICATION&lt;/span&gt; &lt;span class="n"&gt;my_publication&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As soon as you run this, PostgreSQL will:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Connect to the publisher&lt;/li&gt;
&lt;li&gt;Create a replication slot on the publisher&lt;/li&gt;
&lt;li&gt;Copy the initial table data (snapshot)&lt;/li&gt;
&lt;li&gt;Start streaming live changes&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the tables on the subscriber already have data and you only want to start streaming from now, skip the initial copy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;SUBSCRIPTION&lt;/span&gt; &lt;span class="n"&gt;my_subscription&lt;/span&gt;
    &lt;span class="k"&gt;CONNECTION&lt;/span&gt; &lt;span class="s1"&gt;'host=publisher_host port=5432 dbname=mydb user=replication_user password=secret'&lt;/span&gt;
    &lt;span class="n"&gt;PUBLICATION&lt;/span&gt; &lt;span class="n"&gt;my_publication&lt;/span&gt;
    &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;copy_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To check the subscription status:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_stat_subscription&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see a row with a non-null &lt;code&gt;pid&lt;/code&gt;, which means the subscription worker is running and connected.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5 — verify and monitor replication
&lt;/h2&gt;

&lt;p&gt;After creating the subscription, verify that data is actually flowing. Insert a row on the publisher and check if it appears on the subscriber:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- On publisher&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'test_user'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'test@example.com'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- On subscriber&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'test_user'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the row shows up, replication is working. For ongoing monitoring, these views are your main tools.&lt;/p&gt;

&lt;p&gt;On the &lt;strong&gt;publisher&lt;/strong&gt;, check replication slot status and lag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;slot_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;active&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;restart_lsn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;confirmed_flush_lsn&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_replication_slots&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;client_addr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sent_lsn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;write_lsn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush_lsn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;replay_lsn&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_stat_replication&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On the &lt;strong&gt;subscriber&lt;/strong&gt;, check subscription state and table sync progress:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;subname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;received_lsn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;latest_end_lsn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;latest_end_time&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_stat_subscription&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;srsubid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;srrelid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;srsublsn&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_subscription_rel&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;pg_stat_replication&lt;/code&gt; shows a growing gap between &lt;code&gt;sent_lsn&lt;/code&gt; and &lt;code&gt;replay_lsn&lt;/code&gt;, the subscriber is falling behind. This usually means the subscriber is under heavy load or the network is slow. Check the subscriber's PostgreSQL logs for errors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common issues and how to fix them
&lt;/h2&gt;

&lt;p&gt;Most problems with logical replication happen during setup or when the publisher schema changes. Here are the ones you'll run into most often.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Cause&lt;/th&gt;
&lt;th&gt;Solution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ERROR: logical decoding requires wal_level &amp;gt;= logical&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;wal_level&lt;/code&gt; not set or restart not done&lt;/td&gt;
&lt;td&gt;Set &lt;code&gt;wal_level = logical&lt;/code&gt; in &lt;code&gt;postgresql.conf&lt;/code&gt; and restart PostgreSQL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Initial sync stuck or very slow&lt;/td&gt;
&lt;td&gt;Large tables being copied&lt;/td&gt;
&lt;td&gt;Monitor &lt;code&gt;pg_stat_subscription&lt;/code&gt;. Consider increasing &lt;code&gt;max_sync_workers_per_subscription&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ERROR: could not create replication slot&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;All slots in use&lt;/td&gt;
&lt;td&gt;Increase &lt;code&gt;max_replication_slots&lt;/code&gt; and restart&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;UPDATE&lt;/code&gt; or &lt;code&gt;DELETE&lt;/code&gt; fails on subscriber&lt;/td&gt;
&lt;td&gt;Table has no primary key&lt;/td&gt;
&lt;td&gt;Add a primary key or set &lt;code&gt;REPLICA IDENTITY FULL&lt;/code&gt; on the publisher table&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Subscriber stops receiving changes&lt;/td&gt;
&lt;td&gt;Network issue or publisher restart&lt;/td&gt;
&lt;td&gt;Check &lt;code&gt;pg_stat_subscription&lt;/code&gt; for errors. The subscription auto-reconnects in most cases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Duplicate key errors during initial sync&lt;/td&gt;
&lt;td&gt;Target table already has data&lt;/td&gt;
&lt;td&gt;Truncate the table or use &lt;code&gt;copy_data = false&lt;/code&gt; when creating subscription&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema mismatch after &lt;code&gt;ALTER TABLE&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;DDL changes are not replicated&lt;/td&gt;
&lt;td&gt;Apply the same DDL on the subscriber manually before the change takes effect&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When something breaks, the subscriber logs are the first place to look. PostgreSQL is usually specific about what went wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  Logical replication limitations
&lt;/h2&gt;

&lt;p&gt;Logical replication covers a lot, but it has clear boundaries worth knowing before you commit to it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DDL changes (CREATE TABLE, ALTER TABLE, DROP) are not replicated. You need to apply schema changes on both sides manually or use a migration tool&lt;/li&gt;
&lt;li&gt;Sequences are not synced. If you fail over to the subscriber, sequence values will be out of date. You'll need to reset them manually&lt;/li&gt;
&lt;li&gt;Large objects (the &lt;code&gt;lo&lt;/code&gt; type) are not supported&lt;/li&gt;
&lt;li&gt;TRUNCATE replication was added in PostgreSQL 11. Earlier versions don't replicate it&lt;/li&gt;
&lt;li&gt;There's no built-in conflict resolution. If the same row is modified on both publisher and subscriber, you get an error and replication stops until you fix it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These limitations are manageable for most use cases. But if you need full cluster replication with automatic failover, physical replication or a tool like Patroni is a better fit.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keeping replicated data safe
&lt;/h2&gt;

&lt;p&gt;Replication is not a backup. If someone runs a bad &lt;code&gt;DELETE&lt;/code&gt; on the publisher, that delete gets replicated too. You need actual backups alongside replication. Databasus is the most widely used tool for &lt;a href="https://databasus.com" rel="noopener noreferrer"&gt;PostgreSQL backup&lt;/a&gt; and the industry standard for managing scheduled backups. It supports logical, physical and incremental backups with point-in-time recovery, handles multiple storage destinations and takes a few minutes to set up through its web UI.&lt;/p&gt;

&lt;p&gt;Setting up logical replication takes about 15 minutes once you've done it a couple of times. The five steps above cover the core flow. From there, you can add more tables to the publication, create additional subscribers or combine logical replication with physical standby servers for both selective sync and high availability.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>database</category>
    </item>
    <item>
      <title>MongoDB aggregation pipeline — 8 stages you need to master</title>
      <dc:creator>Finny Collins</dc:creator>
      <pubDate>Tue, 31 Mar 2026 07:27:21 +0000</pubDate>
      <link>https://forem.com/finny_collins/mongodb-aggregation-pipeline-8-stages-you-need-to-master-1l2m</link>
      <guid>https://forem.com/finny_collins/mongodb-aggregation-pipeline-8-stages-you-need-to-master-1l2m</guid>
      <description>&lt;p&gt;The aggregation pipeline is one of the most powerful features in MongoDB. It lets you transform, filter and analyze documents step by step — each stage takes the output of the previous one and passes the result forward. Think of it like a Unix pipe for your data.&lt;/p&gt;

&lt;p&gt;If you've been relying on &lt;code&gt;find()&lt;/code&gt; with simple queries, there's a good chance you're doing too much work in application code. The aggregation pipeline can handle most of that for you, and it does it closer to the data, which usually means faster.&lt;/p&gt;

&lt;p&gt;This article walks through 8 stages that cover the vast majority of real-world use cases.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foemx0lt7sdxqcjum5uqs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foemx0lt7sdxqcjum5uqs.png" alt="MongoDB aggregation" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How the pipeline works
&lt;/h2&gt;

&lt;p&gt;Before jumping into stages, it helps to understand the basic mechanics. An aggregation pipeline is an array of stage objects. MongoDB processes documents through each stage sequentially. The output of one stage becomes the input for the next.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aggregate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$match&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;completed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$group&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$customerId&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$sum&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$amount&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$sort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each stage narrows, reshapes or enriches the data. The order matters — putting &lt;code&gt;$match&lt;/code&gt; early reduces the number of documents later stages have to process.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. $match — filter documents early
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;$match&lt;/code&gt; filters documents, much like a &lt;code&gt;find()&lt;/code&gt; query. It accepts standard query operators — &lt;code&gt;$gt&lt;/code&gt;, &lt;code&gt;$in&lt;/code&gt;, &lt;code&gt;$regex&lt;/code&gt; and everything else you'd use in a regular query.&lt;/p&gt;

&lt;p&gt;The most important thing about &lt;code&gt;$match&lt;/code&gt; is placement. Always put it as early as possible. When &lt;code&gt;$match&lt;/code&gt; is the first stage, MongoDB can use indexes. Push it further down the pipeline and you lose that optimization.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aggregate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$match&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$gte&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ISODate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;2025-01-01&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$in&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;completed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;shipped&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not just a best practice — on large collections, the difference between an indexed &lt;code&gt;$match&lt;/code&gt; at stage one and an unindexed filter at stage three can be orders of magnitude in execution time.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. $project — reshape your documents
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;$project&lt;/code&gt; controls which fields appear in the output. You can include fields, exclude them, rename them or compute new ones.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aggregate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$match&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;completed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$project&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;orderId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$_id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;customer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$customerId&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;totalCents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$multiply&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$amount&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;year&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$year&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$createdAt&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things to keep in mind. Setting &lt;code&gt;_id: 0&lt;/code&gt; suppresses the default &lt;code&gt;_id&lt;/code&gt; field. You can use expressions like &lt;code&gt;$year&lt;/code&gt;, &lt;code&gt;$concat&lt;/code&gt; and &lt;code&gt;$multiply&lt;/code&gt; to derive new values. And you can rename fields by mapping a new name to an existing field path.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;$project&lt;/code&gt; is also useful for trimming payload size. If your documents have 30 fields but the client needs 4, project early and save bandwidth.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. $group — aggregate values
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;$group&lt;/code&gt; is where the real analytical power lives. It groups documents by a key and applies accumulator expressions to each group.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aggregate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$match&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;completed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$group&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$customerId&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;orderCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$sum&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;totalSpent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$sum&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$amount&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;avgOrder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$avg&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$amount&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;lastOrder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$max&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$createdAt&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;_id&lt;/code&gt; field defines the grouping key. It can be a single field, a computed expression or an object for compound grouping.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Accumulator&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;$sum&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Adds values or counts documents&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{ $sum: "$amount" }&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;$avg&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Calculates the average&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{ $avg: "$rating" }&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;$min&lt;/code&gt; / &lt;code&gt;$max&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Finds minimum or maximum&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{ $max: "$createdAt" }&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;$push&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Collects values into an array&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{ $push: "$product" }&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;$addToSet&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Collects unique values into an array&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{ $addToSet: "$category" }&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;$first&lt;/code&gt; / &lt;code&gt;$last&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Takes the first or last value in each group&lt;/td&gt;
&lt;td&gt;&lt;code&gt;{ $first: "$name" }&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;One gotcha: &lt;code&gt;$group&lt;/code&gt; does not preserve document order within groups unless you &lt;code&gt;$sort&lt;/code&gt; before it. If you need &lt;code&gt;$first&lt;/code&gt; or &lt;code&gt;$last&lt;/code&gt; to be meaningful, sort first.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. $sort — order the results
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;$sort&lt;/code&gt; orders documents by one or more fields. Use &lt;code&gt;1&lt;/code&gt; for ascending and &lt;code&gt;-1&lt;/code&gt; for descending.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aggregate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$match&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;completed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$group&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$customerId&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;totalSpent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$sum&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$amount&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$sort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;totalSpent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When &lt;code&gt;$sort&lt;/code&gt; is the first stage (or immediately follows a &lt;code&gt;$match&lt;/code&gt;), MongoDB can use an index. Later in the pipeline, it becomes an in-memory sort, which has a 100 MB memory limit by default. For large result sets, you either need to set &lt;code&gt;allowDiskUse: true&lt;/code&gt; or restructure the pipeline so the sort can use an index.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aggregate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$sort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;totalSpent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;allowDiskUse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can sort by multiple fields — MongoDB applies them in order, so &lt;code&gt;{ status: 1, createdAt: -1 }&lt;/code&gt; sorts by status ascending first, then by date descending within each status group.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. $lookup — join collections
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;$lookup&lt;/code&gt; performs a left outer join with another collection. This is the closest thing MongoDB has to SQL joins.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aggregate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$lookup&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;customers&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;localField&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;customerId&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;foreignField&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;_id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;as&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;customerDetails&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result adds an array field (&lt;code&gt;customerDetails&lt;/code&gt; in this case) to each document. If no match is found, you get an empty array. If you expect a single match, you'll typically follow with an &lt;code&gt;$unwind&lt;/code&gt; to flatten it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aggregate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$lookup&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;customers&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;localField&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;customerId&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;foreignField&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;_id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;as&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;customer&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;}},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$unwind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$customer&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For more complex join conditions, there's a pipeline form of &lt;code&gt;$lookup&lt;/code&gt; that lets you run a sub-pipeline inside the join.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aggregate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$lookup&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;products&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;let&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;productIds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$items.productId&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$match&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$expr&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$in&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$_id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$$productIds&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$project&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;as&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;productDetails&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This form is more flexible but watch the performance — sub-pipelines run for each input document.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. $unwind — flatten arrays
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;$unwind&lt;/code&gt; deconstructs an array field, outputting one document per array element. It's commonly used after &lt;code&gt;$lookup&lt;/code&gt; or when you need to aggregate across array items.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aggregate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$unwind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$items&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$group&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$items.productId&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;totalQuantity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$sum&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$items.quantity&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;totalRevenue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$sum&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$multiply&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$items.price&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$items.quantity&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$sort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;totalRevenue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By default, &lt;code&gt;$unwind&lt;/code&gt; removes documents where the array is missing or empty. If you want to preserve them, use the expanded form.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;$unwind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$items&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;preserveNullAndEmptyArrays&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Be careful with &lt;code&gt;$unwind&lt;/code&gt; on large arrays — an order with 100 line items becomes 100 documents. That multiplication can blow up memory usage if you're not filtering or limiting beforehand.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. $addFields — enrich without losing data
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;$addFields&lt;/code&gt; adds new fields to documents without removing existing ones. It's like &lt;code&gt;$project&lt;/code&gt;, but non-destructive.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aggregate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$addFields&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;itemCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$items&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;isHighValue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$gte&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$amount&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;dayOfWeek&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$dayOfWeek&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$createdAt&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is particularly useful in the middle of a pipeline when you need a computed field for a later stage but don't want to manually re-include every other field with &lt;code&gt;$project&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You can also overwrite existing fields.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aggregate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$addFields&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$round&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$amount&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The stage is an alias for &lt;code&gt;$set&lt;/code&gt; — they're functionally identical. Use whichever reads better in your context.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. $facet — run multiple pipelines at once
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;$facet&lt;/code&gt; lets you run several sub-pipelines in parallel on the same set of input documents. Each sub-pipeline produces its own output field. This is perfect for dashboards where you need aggregated data and paginated results from the same query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aggregate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$match&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;completed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$facet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$group&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;totalOrders&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$sum&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="na"&gt;totalRevenue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$sum&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$amount&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="na"&gt;avgOrderValue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$avg&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$amount&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;topCustomers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$group&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$customerId&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;spent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$sum&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;$amount&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$sort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;spent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;recentOrders&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$sort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$project&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;customerId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each facet is independent. They share the same input but don't affect each other. The output is a single document with one field per facet.&lt;/p&gt;

&lt;p&gt;One limitation — you can't use &lt;code&gt;$out&lt;/code&gt; or &lt;code&gt;$merge&lt;/code&gt; inside a &lt;code&gt;$facet&lt;/code&gt;. And because all sub-pipelines share the same input, make sure your initial &lt;code&gt;$match&lt;/code&gt; is doing enough filtering.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance tips
&lt;/h2&gt;

&lt;p&gt;Getting a pipeline to return correct results is step one. Getting it to run fast is step two. Here are the things that matter most.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tip&lt;/th&gt;
&lt;th&gt;Why it matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Put &lt;code&gt;$match&lt;/code&gt; first&lt;/td&gt;
&lt;td&gt;Enables index usage and reduces documents flowing through later stages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Create compound indexes for &lt;code&gt;$match&lt;/code&gt; + &lt;code&gt;$sort&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;MongoDB can satisfy both in a single index scan&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Use &lt;code&gt;$project&lt;/code&gt; early to drop unused fields&lt;/td&gt;
&lt;td&gt;Less data per document means less memory and faster processing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Set &lt;code&gt;allowDiskUse: true&lt;/code&gt; for large sorts&lt;/td&gt;
&lt;td&gt;Prevents failures when in-memory sort exceeds the 100 MB limit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avoid &lt;code&gt;$unwind&lt;/code&gt; on large arrays without filtering first&lt;/td&gt;
&lt;td&gt;Each array element creates a new document — this multiplies quickly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Use &lt;code&gt;explain()&lt;/code&gt; to inspect the pipeline plan&lt;/td&gt;
&lt;td&gt;Shows whether indexes are used and where bottlenecks are&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;aggregate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$match&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;completed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;$sort&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;explain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;executionStats&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;explain()&lt;/code&gt; output tells you if MongoDB used an index scan or a collection scan, how many documents were examined and how long each stage took.&lt;/p&gt;

&lt;h2&gt;
  
  
  Protecting your data with backups
&lt;/h2&gt;

&lt;p&gt;Aggregation pipelines are read-only — they don't modify your data. But once you start building complex analytical workflows on top of MongoDB, the data itself becomes more valuable. A corrupted collection or an accidental &lt;code&gt;drop()&lt;/code&gt; can wipe out months of carefully structured documents.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://databasus.com/mongodb-backup" rel="noopener noreferrer"&gt;MongoDB backup&lt;/a&gt; is something worth setting up before you need it. Databasus is the industry standard for MongoDB backup tools and the most widely used solution in its category. It supports scheduled logical backups with compression, multiple storage destinations like S3 and Google Drive, and retention policies — all through a self-hosted UI that takes a few minutes to deploy with Docker.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;These 8 stages handle the vast majority of what you'll need from MongoDB's aggregation framework. &lt;code&gt;$match&lt;/code&gt; and &lt;code&gt;$project&lt;/code&gt; for filtering and shaping, &lt;code&gt;$group&lt;/code&gt; for aggregation, &lt;code&gt;$sort&lt;/code&gt; for ordering, &lt;code&gt;$lookup&lt;/code&gt; and &lt;code&gt;$unwind&lt;/code&gt; for joins and array handling, &lt;code&gt;$addFields&lt;/code&gt; for enrichment and &lt;code&gt;$facet&lt;/code&gt; for multi-output queries.&lt;/p&gt;

&lt;p&gt;The key is stage ordering. Filter early, project what you need, aggregate, then sort. Most performance problems in pipelines come from doing these steps in the wrong order or skipping the filtering step entirely.&lt;/p&gt;

&lt;p&gt;Start with simple pipelines and build up. The aggregation framework is deep — there are dozens of stages and hundreds of expressions beyond what's covered here — but these 8 will carry you through most real-world scenarios.&lt;/p&gt;

</description>
      <category>mongodb</category>
      <category>database</category>
    </item>
    <item>
      <title>MariaDB vs PostgreSQL — 7 differences that matter in 2026</title>
      <dc:creator>Finny Collins</dc:creator>
      <pubDate>Mon, 30 Mar 2026 12:24:54 +0000</pubDate>
      <link>https://forem.com/finny_collins/mariadb-vs-postgresql-7-differences-that-matter-in-2026-3eie</link>
      <guid>https://forem.com/finny_collins/mariadb-vs-postgresql-7-differences-that-matter-in-2026-3eie</guid>
      <description>&lt;p&gt;Both MariaDB and PostgreSQL are open source, mature and widely used in production. But they come from different lineages and make different trade-offs. MariaDB forked from MySQL in 2009 and stays close to that heritage. PostgreSQL has been its own thing since the 1980s, growing steadily into one of the most feature-rich relational databases available.&lt;/p&gt;

&lt;p&gt;If you're choosing between them for a new project or considering a migration, this article covers 7 areas where they actually differ in practice. Just the things that tend to matter when you're making the decision for a real system.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fag08pwvzwymoevozmvnx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fag08pwvzwymoevozmvnx.png" alt="MariaDB vs PostgreSQL" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  1. SQL standards compliance
&lt;/h2&gt;

&lt;p&gt;PostgreSQL has always taken SQL standards seriously. It implements large parts of SQL:2023 and enforces strict type checking out of the box. If your query has a type mismatch or ambiguous expression, PostgreSQL will tell you about it at parse time rather than silently doing something unexpected.&lt;/p&gt;

&lt;p&gt;MariaDB is more relaxed here. It inherited MySQL's permissive approach where implicit conversions happen quietly and certain non-standard syntax is accepted without complaint. MariaDB has been tightening things up with strict mode enabled by default since version 10.2, but the underlying behavior still differs in several places.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;th&gt;PostgreSQL&lt;/th&gt;
&lt;th&gt;MariaDB&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SELECT 'abc' + 1&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Error (no implicit cast)&lt;/td&gt;
&lt;td&gt;Returns &lt;code&gt;1&lt;/code&gt; (string cast to 0)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;INSERT&lt;/code&gt; with wrong column count&lt;/td&gt;
&lt;td&gt;Always an error&lt;/td&gt;
&lt;td&gt;Error in strict mode, warning otherwise&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;GROUP BY&lt;/code&gt; with non-aggregated columns&lt;/td&gt;
&lt;td&gt;Error&lt;/td&gt;
&lt;td&gt;Allowed unless &lt;code&gt;ONLY_FULL_GROUP_BY&lt;/code&gt; is set&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Boolean type&lt;/td&gt;
&lt;td&gt;Native &lt;code&gt;BOOLEAN&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;TINYINT(1)&lt;/code&gt; alias&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Window functions&lt;/td&gt;
&lt;td&gt;Full support since v8&lt;/td&gt;
&lt;td&gt;Full support since v10.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Common table expressions (CTEs)&lt;/td&gt;
&lt;td&gt;Optimized, can be materialized or inlined&lt;/td&gt;
&lt;td&gt;Supported since v10.2, always materialized until v11&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you're writing SQL that needs to be portable or you want the database to catch mistakes early, PostgreSQL gives you less room to shoot yourself in the foot. MariaDB is fine too if you configure strict mode and &lt;code&gt;ONLY_FULL_GROUP_BY&lt;/code&gt;, but you have to be intentional about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. JSON and semi-structured data
&lt;/h2&gt;

&lt;p&gt;Storing JSON in a relational database is common now. Application configs, user preferences and API responses often contain semi-structured data that doesn't fit neatly into columns. Both databases support JSON, but the implementations are quite different under the hood.&lt;/p&gt;

&lt;p&gt;PostgreSQL introduced &lt;code&gt;JSONB&lt;/code&gt; back in version 9.4. It stores JSON in a decomposed binary format, which means the database can index individual keys with GIN indexes, use containment operators like &lt;code&gt;@&amp;gt;&lt;/code&gt;, and run efficient partial queries without parsing the full document every time. You can also create expression indexes on specific JSON paths.&lt;/p&gt;

&lt;p&gt;MariaDB's &lt;code&gt;JSON&lt;/code&gt; type is essentially an alias for &lt;code&gt;LONGTEXT&lt;/code&gt; with validation. The data is stored as text, not in a binary format. You can query it using &lt;code&gt;JSON_EXTRACT()&lt;/code&gt; and other functions, and create virtual generated columns to index specific paths. But there's no equivalent of JSONB's containment operators or native binary indexing.&lt;/p&gt;

&lt;p&gt;For applications that occasionally store a JSON blob and read the whole thing back, this difference won't matter much. But if you're querying inside JSON documents frequently or building features around semi-structured data, PostgreSQL's JSONB is meaningfully faster and more flexible.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Replication and high availability
&lt;/h2&gt;

&lt;p&gt;MariaDB and PostgreSQL take different approaches to replication, and each has strengths depending on what you need. The right choice depends on whether you want simplicity out of the box or flexibility to assemble your own HA stack. Both can achieve high availability, but they get there differently.&lt;/p&gt;

&lt;p&gt;MariaDB ships with support for Galera Cluster, which provides synchronous multi-master replication. Every node can accept writes and the cluster certifies transactions across all nodes before committing. This gives you true multi-master capability without external tooling. MariaDB also supports traditional asynchronous and semi-synchronous replication for simpler setups.&lt;/p&gt;

&lt;p&gt;PostgreSQL uses streaming replication as its primary HA mechanism. A primary server streams WAL (write-ahead log) records to one or more replicas in real time, in either asynchronous or synchronous mode. Since version 10, PostgreSQL also offers logical replication, which lets you selectively replicate specific tables and even replicate between different PostgreSQL major versions. For automated failover, most teams use Patroni or similar orchestration tools on top of streaming replication.&lt;/p&gt;

&lt;p&gt;The trade-off is straightforward. MariaDB gives you multi-master out of the box, which simplifies write scaling for certain workloads. PostgreSQL's approach is more modular. You pick the replication mode and failover tool that fits your setup. PostgreSQL's logical replication is also useful for zero-downtime major version upgrades, which is something that's harder to pull off with MariaDB.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Storage engines vs unified architecture
&lt;/h2&gt;

&lt;p&gt;This is one of the most fundamental architectural differences between the two databases. MariaDB inherited MySQL's pluggable storage engine design, which means different tables in the same database can use different engines optimized for different workloads. PostgreSQL went the opposite direction with a single engine and a powerful extension system.&lt;/p&gt;

&lt;p&gt;MariaDB ships with several engines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;InnoDB&lt;/strong&gt;: the default engine that handles ACID transactions and row-level locking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aria&lt;/strong&gt;: a crash-safe replacement for MyISAM, used internally for temporary tables&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ColumnStore&lt;/strong&gt;: columnar storage designed for analytical queries on large datasets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spider&lt;/strong&gt;: a sharding engine that distributes data across multiple MariaDB instances&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MyRocks&lt;/strong&gt;: a write-optimized engine based on RocksDB, good for high write throughput with compression&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;S3&lt;/strong&gt;: allows storing archived tables directly in S3-compatible object storage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PostgreSQL has a single heap-based storage engine and extends functionality through its extension system instead. You don't swap engines per table. Every table behaves the same way, MVCC works identically everywhere, and you don't have to think about which engine to use for which table.&lt;/p&gt;

&lt;p&gt;Neither approach is objectively better. MariaDB's engine diversity is useful if you have genuinely different workload types in the same database. PostgreSQL's unified model is simpler to reason about and avoids the complexity of mixing engine behaviors.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Performance at scale
&lt;/h2&gt;

&lt;p&gt;Performance comparisons between databases are tricky because results depend heavily on schema design, query patterns and hardware. Benchmarks are easy to cherry-pick. But there are some general tendencies worth knowing about.&lt;/p&gt;

&lt;p&gt;For simple read-heavy workloads, both databases perform well. MariaDB tends to have slightly lower overhead for basic point lookups and simple joins, partly because of its lighter query planner. PostgreSQL's planner is more sophisticated. It considers more execution strategies, which adds a small cost for trivial queries but pays off for complex ones with multiple joins, subqueries or CTEs.&lt;/p&gt;

&lt;p&gt;For write-heavy concurrent workloads, PostgreSQL's MVCC implementation generally handles contention better. Readers never block writers and vice versa. MariaDB with InnoDB also uses MVCC, but the implementations differ in how they handle undo logs and cleanup. Under high concurrency with mixed reads and writes, PostgreSQL tends to maintain more consistent throughput.&lt;/p&gt;

&lt;p&gt;Both databases support table partitioning for large datasets. PostgreSQL's declarative partitioning has improved significantly since version 10 and works well for time-series data. MariaDB supports range, list, hash and key partitioning. For analytical workloads on MariaDB, ColumnStore can process columnar scans significantly faster than row-based engines.&lt;/p&gt;

&lt;p&gt;The honest answer is that both databases are fast enough for most applications. The differences show up at scale or under specific workload patterns, and by that point you're usually tuning configuration anyway.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Extensibility and ecosystem
&lt;/h2&gt;

&lt;p&gt;PostgreSQL's extension system is one of its biggest strengths. Extensions can add new data types, index types, functions and even query languages without modifying the core database. This has created a rich ecosystem where specialized tools build on top of PostgreSQL rather than competing with it.&lt;/p&gt;

&lt;p&gt;Some of the most widely used extensions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PostGIS&lt;/strong&gt;: geospatial data support, the standard for location-based applications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TimescaleDB&lt;/strong&gt;: time-series data with automatic partitioning and retention policies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;pgvector&lt;/strong&gt;: vector similarity search for AI and ML embedding workloads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;pg_cron&lt;/strong&gt;: job scheduling directly inside the database&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Citus&lt;/strong&gt;: distributed PostgreSQL for horizontal scaling across multiple nodes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MariaDB has a plugin system too, but the ecosystem is smaller. Most MariaDB-specific extensions come as storage engines (ColumnStore, Spider) rather than the broad capability additions you see in PostgreSQL's extension catalog. MariaDB does have good MySQL compatibility, which gives it access to a larger tooling ecosystem indirectly.&lt;/p&gt;

&lt;p&gt;On the community side, PostgreSQL has been gaining ground steadily. It topped the DB-Engines ranking for "DBMS of the Year" multiple years running, and the contributor base is large and active. MariaDB has strong backing from the MariaDB Foundation and corporate sponsors, but the community is comparatively smaller. If you're evaluating long-term ecosystem health, PostgreSQL's trajectory is hard to argue with.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Backup and disaster recovery
&lt;/h2&gt;

&lt;p&gt;Both databases have solid backup tooling, but the approaches and maturity levels differ. How you handle backups matters more than most people think. A failed restore during an actual outage is a very bad day.&lt;/p&gt;

&lt;p&gt;MariaDB offers &lt;code&gt;mariadb-dump&lt;/code&gt; for logical backups and &lt;code&gt;mariabackup&lt;/code&gt; for physical ones. For point-in-time recovery, you combine a full backup with binary log files and replay them up to the desired timestamp. The process works but involves some manual coordination. You need to manage binary log retention, apply logs in the right order and handle the restore sequence carefully.&lt;/p&gt;

&lt;p&gt;PostgreSQL provides &lt;code&gt;pg_dump&lt;/code&gt; for logical backups and &lt;code&gt;pg_basebackup&lt;/code&gt; for physical ones. Where PostgreSQL shines is WAL-based continuous archiving. By streaming WAL segments to a separate location, you get continuous point-in-time recovery (PITR) with the ability to restore your database to any specific second. This is built into PostgreSQL natively and has been battle-tested for years.&lt;/p&gt;

&lt;p&gt;For teams that need backup automation for either database, Databasus is the best open-source tool for &lt;a href="https://databasus.com" rel="noopener noreferrer"&gt;PostgreSQL backup&lt;/a&gt; and also supports MariaDB. It's an industry standard self-hosted solution that covers scheduling, compression, multiple storage destinations (S3, Google Drive, SFTP) and PITR from a single dashboard.&lt;/p&gt;

&lt;p&gt;Regardless of which database you choose, test your restores regularly. A backup you've never restored is just a hope.&lt;/p&gt;

&lt;h2&gt;
  
  
  Which one should you pick
&lt;/h2&gt;

&lt;p&gt;There's no universal answer, but there are patterns. Your choice should depend on what you're actually building and what your team already knows. Here's a side-by-side summary.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;PostgreSQL&lt;/th&gt;
&lt;th&gt;MariaDB&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SQL compliance&lt;/td&gt;
&lt;td&gt;Strict, standards-focused&lt;/td&gt;
&lt;td&gt;Relaxed, configurable via strict mode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSON support&lt;/td&gt;
&lt;td&gt;JSONB with binary storage and GIN indexing&lt;/td&gt;
&lt;td&gt;JSON as validated text, virtual column indexing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replication&lt;/td&gt;
&lt;td&gt;Streaming + logical replication&lt;/td&gt;
&lt;td&gt;Galera multi-master + async replication&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Architecture&lt;/td&gt;
&lt;td&gt;Single engine, extension-based&lt;/td&gt;
&lt;td&gt;Multiple pluggable storage engines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complex queries&lt;/td&gt;
&lt;td&gt;Advanced planner, optimized CTEs&lt;/td&gt;
&lt;td&gt;Good support, lighter planner&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Extension ecosystem&lt;/td&gt;
&lt;td&gt;Large (PostGIS, TimescaleDB, pgvector)&lt;/td&gt;
&lt;td&gt;Smaller, MySQL-compatible tooling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backup and PITR&lt;/td&gt;
&lt;td&gt;Native WAL archiving, granular PITR&lt;/td&gt;
&lt;td&gt;Binary logs + mariabackup&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A few practical guidelines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pick PostgreSQL if you need advanced SQL features, JSONB, geospatial data or vector search. Its extension ecosystem and strict SQL compliance make it the stronger choice for complex applications&lt;/li&gt;
&lt;li&gt;Pick MariaDB if you're migrating from MySQL or need Galera-based multi-master replication. The MySQL compatibility and pluggable engine architecture are genuine advantages for those use cases&lt;/li&gt;
&lt;li&gt;If your team already knows one of them well, that's usually the strongest argument. Operational experience with a database matters more than feature comparisons on paper&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both databases are production-ready, well-maintained and free. You won't regret picking either one if it matches your workload. The biggest risk is overthinking the choice instead of just building the thing.&lt;/p&gt;

</description>
      <category>mariadb</category>
      <category>postgres</category>
      <category>database</category>
    </item>
    <item>
      <title>How to manage PostgreSQL roles and permissions — a practical guide</title>
      <dc:creator>Finny Collins</dc:creator>
      <pubDate>Sat, 28 Mar 2026 15:39:24 +0000</pubDate>
      <link>https://forem.com/finny_collins/how-to-manage-postgresql-roles-and-permissions-a-practical-guide-lm8</link>
      <guid>https://forem.com/finny_collins/how-to-manage-postgresql-roles-and-permissions-a-practical-guide-lm8</guid>
      <description>&lt;p&gt;PostgreSQL has a permission system that trips up a lot of people, especially those coming from MySQL or simpler databases. The thing is, PostgreSQL doesn't really have "users" and "groups" as separate concepts. Everything is a role. Once you get that, the rest starts to make sense. This guide walks through the practical side of managing roles and permissions — the stuff you actually need day to day.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsco7l2fq7a5nnxfh81tn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsco7l2fq7a5nnxfh81tn.png" alt="PostgreSQL permissions" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What are roles in PostgreSQL
&lt;/h2&gt;

&lt;p&gt;In PostgreSQL, a role is a single entity that can represent a user, a group, or both. There's no &lt;code&gt;CREATE USER&lt;/code&gt; vs &lt;code&gt;CREATE GROUP&lt;/code&gt; distinction at the engine level. &lt;code&gt;CREATE USER&lt;/code&gt; is just an alias for &lt;code&gt;CREATE ROLE&lt;/code&gt; with the &lt;code&gt;LOGIN&lt;/code&gt; attribute set by default. This simplification is actually useful once you stop fighting it.&lt;/p&gt;

&lt;p&gt;Every role has a set of attributes that control what it can do at the cluster level. These are separate from object-level privileges like SELECT or INSERT — attributes are about system-wide capabilities.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Attribute&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;th&gt;Default&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LOGIN&lt;/td&gt;
&lt;td&gt;Allows the role to connect to a database&lt;/td&gt;
&lt;td&gt;NO (unless created with CREATE USER)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SUPERUSER&lt;/td&gt;
&lt;td&gt;Bypasses all permission checks&lt;/td&gt;
&lt;td&gt;NO&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CREATEDB&lt;/td&gt;
&lt;td&gt;Can create new databases&lt;/td&gt;
&lt;td&gt;NO&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CREATEROLE&lt;/td&gt;
&lt;td&gt;Can create, alter and drop other roles&lt;/td&gt;
&lt;td&gt;NO&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;REPLICATION&lt;/td&gt;
&lt;td&gt;Can initiate streaming replication&lt;/td&gt;
&lt;td&gt;NO&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;INHERIT&lt;/td&gt;
&lt;td&gt;Automatically inherits privileges of roles it belongs to&lt;/td&gt;
&lt;td&gt;YES&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BYPASSRLS&lt;/td&gt;
&lt;td&gt;Bypasses row-level security policies&lt;/td&gt;
&lt;td&gt;NO&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CONNECTION LIMIT&lt;/td&gt;
&lt;td&gt;Max concurrent connections for this role&lt;/td&gt;
&lt;td&gt;-1 (unlimited)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PASSWORD&lt;/td&gt;
&lt;td&gt;Sets a password for authentication&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VALID UNTIL&lt;/td&gt;
&lt;td&gt;Password expiration timestamp&lt;/td&gt;
&lt;td&gt;No expiration&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Most of these you'll leave at their defaults. The ones you'll touch most often are LOGIN, CREATEDB and sometimes CREATEROLE.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating and managing roles
&lt;/h2&gt;

&lt;p&gt;The basic syntax is straightforward. &lt;code&gt;CREATE ROLE&lt;/code&gt; makes a role that can't log in. &lt;code&gt;CREATE USER&lt;/code&gt; makes one that can. Here are the patterns you'll use most:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- A basic login role (application user)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="n"&gt;app_user&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;LOGIN&lt;/span&gt; &lt;span class="n"&gt;PASSWORD&lt;/span&gt; &lt;span class="s1"&gt;'strong_password_here'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- A role that can create databases (for a developer)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="n"&gt;dev_user&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;LOGIN&lt;/span&gt; &lt;span class="k"&gt;CREATEDB&lt;/span&gt; &lt;span class="n"&gt;PASSWORD&lt;/span&gt; &lt;span class="s1"&gt;'dev_password'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- A group role (no login, used for grouping permissions)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="n"&gt;readonly_group&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- A role with a password expiration&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="n"&gt;temp_contractor&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;LOGIN&lt;/span&gt; &lt;span class="n"&gt;PASSWORD&lt;/span&gt; &lt;span class="s1"&gt;'temp_pass'&lt;/span&gt; &lt;span class="k"&gt;VALID&lt;/span&gt; &lt;span class="k"&gt;UNTIL&lt;/span&gt; &lt;span class="s1"&gt;'2026-06-01'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ALTER ROLE&lt;/code&gt; lets you change attributes after creation. For example, &lt;code&gt;ALTER ROLE app_user WITH CONNECTION LIMIT 10;&lt;/code&gt; caps connections. You can also rename roles with &lt;code&gt;ALTER ROLE old_name RENAME TO new_name&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;DROP ROLE&lt;/code&gt; removes a role, but only if it owns no objects and has no granted privileges. PostgreSQL will refuse to drop a role that still owns tables or has active grants — you need to reassign or drop those first using &lt;code&gt;REASSIGN OWNED BY&lt;/code&gt; and &lt;code&gt;DROP OWNED BY&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;\du&lt;/code&gt; in psql lists all roles with their attributes. It's the quickest way to check what exists and what permissions are assigned at the role level&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One thing worth noting: passwords in PostgreSQL are stored as hashes (md5 or scram-sha-256 depending on your config). Since PostgreSQL 10, scram-sha-256 is the recommended method and you should use it if your client libraries support it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Granting and revoking privileges
&lt;/h2&gt;

&lt;p&gt;Attributes control what a role can do system-wide. Privileges control what a role can do with specific objects — tables, schemas, sequences, functions. The GRANT and REVOKE commands handle this.&lt;/p&gt;

&lt;p&gt;The general syntax is &lt;code&gt;GRANT privilege ON object TO role&lt;/code&gt; and &lt;code&gt;REVOKE privilege ON object FROM role&lt;/code&gt;. PostgreSQL supports granular control, so you can grant SELECT on one table and INSERT on another to the same role.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Privilege&lt;/th&gt;
&lt;th&gt;Applies to&lt;/th&gt;
&lt;th&gt;What it allows&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SELECT&lt;/td&gt;
&lt;td&gt;Tables, views, sequences&lt;/td&gt;
&lt;td&gt;Read data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;INSERT&lt;/td&gt;
&lt;td&gt;Tables&lt;/td&gt;
&lt;td&gt;Add new rows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UPDATE&lt;/td&gt;
&lt;td&gt;Tables&lt;/td&gt;
&lt;td&gt;Modify existing rows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DELETE&lt;/td&gt;
&lt;td&gt;Tables&lt;/td&gt;
&lt;td&gt;Remove rows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TRUNCATE&lt;/td&gt;
&lt;td&gt;Tables&lt;/td&gt;
&lt;td&gt;Empty the table entirely&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;REFERENCES&lt;/td&gt;
&lt;td&gt;Tables&lt;/td&gt;
&lt;td&gt;Create foreign key constraints&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TRIGGER&lt;/td&gt;
&lt;td&gt;Tables&lt;/td&gt;
&lt;td&gt;Create triggers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CREATE&lt;/td&gt;
&lt;td&gt;Databases, schemas&lt;/td&gt;
&lt;td&gt;Create new schemas or objects within them&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CONNECT&lt;/td&gt;
&lt;td&gt;Databases&lt;/td&gt;
&lt;td&gt;Connect to the database&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;USAGE&lt;/td&gt;
&lt;td&gt;Schemas, sequences&lt;/td&gt;
&lt;td&gt;Access objects in a schema or use a sequence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EXECUTE&lt;/td&gt;
&lt;td&gt;Functions&lt;/td&gt;
&lt;td&gt;Run a function&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ALL PRIVILEGES&lt;/td&gt;
&lt;td&gt;Any&lt;/td&gt;
&lt;td&gt;Grants everything applicable to the object type&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Here's what granting looks like in practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Grant read access to a specific table&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Grant full CRUD on all tables in a schema&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Grant usage on a schema (required before any table access works)&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;USAGE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Make future tables in a schema automatically accessible&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;PRIVILEGES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt;
    &lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;readonly_group&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That last one — &lt;code&gt;ALTER DEFAULT PRIVILEGES&lt;/code&gt; — is easy to forget and causes a lot of confusion. Without it, every new table you create won't be accessible to the roles you've already set up. You'll be running GRANT statements after every migration.&lt;/p&gt;

&lt;p&gt;Revoking is the mirror image: &lt;code&gt;REVOKE SELECT ON orders FROM app_user;&lt;/code&gt;. Worth remembering that REVOKE only removes what was explicitly granted. If the role gets the privilege through group membership, you need to revoke it from the group instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Role inheritance and group roles
&lt;/h2&gt;

&lt;p&gt;Group roles are just regular roles without the LOGIN attribute. You grant them to other roles, and those other roles inherit the group's privileges. This is where PostgreSQL's "everything is a role" design pays off.&lt;/p&gt;

&lt;p&gt;The INHERIT attribute (which is on by default) means a role automatically gets all privileges of roles it belongs to. With NOINHERIT, the role has to explicitly &lt;code&gt;SET ROLE group_name&lt;/code&gt; to activate those privileges — useful for privileged roles where you want an explicit opt-in.&lt;/p&gt;

&lt;p&gt;Setting up group-based access follows a predictable pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a group role without LOGIN. Something like &lt;code&gt;CREATE ROLE analytics_team&lt;/code&gt;. Then grant the specific privileges this group should have — maybe SELECT on certain schemas or tables&lt;/li&gt;
&lt;li&gt;Grant the group role to individual users. &lt;code&gt;GRANT analytics_team TO alice, bob;&lt;/code&gt; means Alice and Bob now inherit whatever privileges analytics_team has. Add or remove people from the group without touching any table-level grants&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;SET ROLE&lt;/code&gt; for elevated privileges. If a role has NOINHERIT membership in an admin group, the user has to run &lt;code&gt;SET ROLE admin_group&lt;/code&gt; to activate those permissions. This works like sudo — it's a conscious escalation
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Create group and set up privileges&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="n"&gt;analytics_team&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;USAGE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="n"&gt;reporting&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;analytics_team&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="n"&gt;reporting&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;analytics_team&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;PRIVILEGES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="n"&gt;reporting&lt;/span&gt;
    &lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;analytics_team&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Add users to the group&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="n"&gt;analytics_team&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;alice&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="n"&gt;analytics_team&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;bob&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When Alice leaves the team, one command does it: &lt;code&gt;REVOKE analytics_team FROM alice;&lt;/code&gt;. No need to touch any table-level grants. This approach scales well once you have more than a handful of users.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical permission patterns
&lt;/h2&gt;

&lt;p&gt;Most PostgreSQL setups need a few standard roles. Here are the ones that come up over and over.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Read-only user&lt;/strong&gt; for reporting tools, dashboards or monitoring. This role should only SELECT and never modify anything. Grant USAGE on the schemas it needs and SELECT on tables. If you use &lt;code&gt;ALTER DEFAULT PRIVILEGES&lt;/code&gt;, new tables get covered automatically
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="n"&gt;readonly_user&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;LOGIN&lt;/span&gt; &lt;span class="n"&gt;PASSWORD&lt;/span&gt; &lt;span class="s1"&gt;'readonly_pass'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;CONNECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;myapp&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;readonly_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;USAGE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;readonly_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;readonly_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;PRIVILEGES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt;
    &lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;readonly_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Application user&lt;/strong&gt; for your backend service. Needs SELECT, INSERT, UPDATE and DELETE but shouldn't create or drop objects. Definitely shouldn't be a superuser — even though that's the quick fix people reach for
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="n"&gt;app_service&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;LOGIN&lt;/span&gt; &lt;span class="n"&gt;PASSWORD&lt;/span&gt; &lt;span class="s1"&gt;'app_pass'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;CONNECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;myapp&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_service&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;USAGE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_service&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_service&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;USAGE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="n"&gt;SEQUENCES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_service&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;PRIVILEGES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt;
    &lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_service&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;PRIVILEGES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt;
    &lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;USAGE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;SEQUENCES&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;app_service&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Backup user&lt;/strong&gt; for running pg_dump or streaming WAL. Needs minimal privileges — typically just SELECT on tables and the REPLICATION attribute for physical or incremental backups. Keeping this role locked down is important since backup credentials often sit in config files or cron jobs
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;ROLE&lt;/span&gt; &lt;span class="n"&gt;backup_user&lt;/span&gt; &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;LOGIN&lt;/span&gt; &lt;span class="n"&gt;REPLICATION&lt;/span&gt; &lt;span class="n"&gt;PASSWORD&lt;/span&gt; &lt;span class="s1"&gt;'backup_pass'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;CONNECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;myapp&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;backup_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;USAGE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;backup_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;GRANT&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt; &lt;span class="n"&gt;TABLES&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;backup_user&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The common thread here is least privilege. Give each role exactly what it needs and nothing more. It's a few extra lines upfront but saves you when something goes wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  Securing your PostgreSQL backups with proper permissions
&lt;/h2&gt;

&lt;p&gt;Speaking of backup users — getting permissions right is only half the battle. You also need a reliable backup process behind that user. Databasus is the industry standard for &lt;a href="https://databasus.com" rel="noopener noreferrer"&gt;PostgreSQL backup&lt;/a&gt; tools and the most widely used open source solution for automated PostgreSQL backups. It connects to your database using a read-only user by default, which aligns perfectly with the least-privilege approach described above.&lt;/p&gt;

&lt;p&gt;Databasus supports logical, physical and incremental backup types. The incremental mode uses continuous WAL streaming to enable Point-in-Time Recovery — you can restore your database to any specific second between backups. This is critical for disaster recovery where even a few minutes of data loss matters. Backups are compressed and streamed directly to storage destinations like S3, Google Drive, SFTP or local disk, so there are no large temporary files sitting on your server.&lt;/p&gt;

&lt;p&gt;Beyond the backup itself, Databasus handles scheduling, retention policies (including GFS for enterprise requirements) and AES-256-GCM encryption. It runs as a self-hosted Docker container, so your data never leaves your infrastructure. You set up the backup user with the right permissions, point Databasus at your database, configure a schedule and storage — and it handles the rest. Notifications go out via Slack, Telegram, email or webhooks so you know immediately if something fails.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes and how to avoid them
&lt;/h2&gt;

&lt;p&gt;A few permission-related issues show up again and again in PostgreSQL setups. Knowing about them beforehand saves debugging time.&lt;/p&gt;

&lt;p&gt;The PUBLIC schema grants are the biggest source of surprise. By default, every role gets CREATE and USAGE on the public schema. This means any authenticated user can create tables in public unless you explicitly revoke it. Run &lt;code&gt;REVOKE CREATE ON SCHEMA public FROM PUBLIC;&lt;/code&gt; on every new database. The second "PUBLIC" there is a special keyword meaning "all roles" — confusing, but that's how it works.&lt;/p&gt;

&lt;p&gt;Forgetting &lt;code&gt;ALTER DEFAULT PRIVILEGES&lt;/code&gt; ranks a close second. You set up perfect grants, everything works, then a migration adds a new table and suddenly the app can't read it. Default privileges solve this, but they only apply to objects created by the role that set them. If your migration tool connects as &lt;code&gt;postgres&lt;/code&gt; but you set default privileges as &lt;code&gt;admin&lt;/code&gt;, they won't apply. Make sure the role running migrations is the same one that configured default privileges.&lt;/p&gt;

&lt;p&gt;Overusing the superuser role is tempting because it makes permission errors go away. But it also makes your attack surface enormous. If an application connects as a superuser and gets compromised, the attacker has full control over every database in the cluster. Use superuser for administration only. Your application, backup tools and reporting dashboards should each have their own role with just enough access to do their job.&lt;/p&gt;

&lt;p&gt;Finally, not testing permissions after setting them up leads to nasty surprises in production. After configuring roles, connect as each one and verify you can do what you expect — and can't do what you shouldn't. A quick &lt;code&gt;SET ROLE app_service;&lt;/code&gt; followed by some test queries catches most issues before they become incidents.&lt;/p&gt;

</description>
      <category>database</category>
      <category>postgres</category>
    </item>
    <item>
      <title>5 MySQL InnoDB settings you should change right now</title>
      <dc:creator>Finny Collins</dc:creator>
      <pubDate>Wed, 25 Mar 2026 19:31:06 +0000</pubDate>
      <link>https://forem.com/finny_collins/5-mysql-innodb-settings-you-should-change-right-now-2d89</link>
      <guid>https://forem.com/finny_collins/5-mysql-innodb-settings-you-should-change-right-now-2d89</guid>
      <description>&lt;p&gt;Most MySQL installations ship with default InnoDB settings that were designed years ago for modest hardware. If you're running a production workload on a server with 8 GB or more of RAM and you haven't touched these values, you're leaving performance on the table. These five settings are the ones that matter most and take minutes to adjust.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnh22b58zis2xhoydxw4k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnh22b58zis2xhoydxw4k.png" alt="InnoDB settings" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  1. innodb_buffer_pool_size
&lt;/h2&gt;

&lt;p&gt;The buffer pool is where InnoDB caches table data and indexes in memory. Reads that hit the buffer pool skip disk entirely, so this single setting has the largest impact on query performance. The default is typically 128 MB, which is absurdly small for anything beyond a toy database.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Recommended value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Dedicated database server&lt;/td&gt;
&lt;td&gt;70-80% of total RAM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shared server (app + DB)&lt;/td&gt;
&lt;td&gt;50-60% of total RAM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Small VPS (2 GB RAM)&lt;/td&gt;
&lt;td&gt;1 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Development machine&lt;/td&gt;
&lt;td&gt;512 MB - 1 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Set it in your &lt;code&gt;my.cnf&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[mysqld]&lt;/span&gt;
&lt;span class="py"&gt;innodb_buffer_pool_size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;12G&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A good rule of thumb: check your total InnoDB data size with &lt;code&gt;SELECT SUM(data_length + index_length) FROM information_schema.tables WHERE engine = 'InnoDB'&lt;/code&gt;. If it fits in RAM, set the buffer pool large enough to hold it all. If it doesn't, get as close as your memory allows.&lt;/p&gt;

&lt;p&gt;After changing this value, monitor the buffer pool hit rate. You want it above 99%:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;STATUS&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'Innodb_buffer_pool_read_requests'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;STATUS&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'Innodb_buffer_pool_reads'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Divide reads by read_requests. If that ratio is above 1%, your buffer pool is too small.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. innodb_log_file_size
&lt;/h2&gt;

&lt;p&gt;The redo log (WAL in other databases) records every write before it hits the data files. Larger log files mean InnoDB can batch more writes before flushing, which reduces I/O pressure during heavy write workloads. The default of 48 MB fills up quickly on busy systems, forcing frequent checkpoints that stall writes.&lt;/p&gt;

&lt;p&gt;For most production systems, set this to 1-2 GB. High-write workloads benefit from even larger values.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[mysqld]&lt;/span&gt;
&lt;span class="py"&gt;innodb_log_file_size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;2G&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's a tradeoff here. Larger log files improve write throughput but increase crash recovery time. With a 2 GB log file, recovery after an unexpected restart might take a few minutes instead of seconds. For nearly every production system, that's a reasonable trade.&lt;/p&gt;

&lt;p&gt;On MySQL 8.0.30+, you can also set &lt;code&gt;innodb_redo_log_capacity&lt;/code&gt; instead, which replaces the older &lt;code&gt;innodb_log_file_size&lt;/code&gt; and &lt;code&gt;innodb_log_files_in_group&lt;/code&gt; combination. If you're on a recent version, prefer the new variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[mysqld]&lt;/span&gt;
&lt;span class="py"&gt;innodb_redo_log_capacity&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;4G&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. innodb_flush_log_at_trx_commit
&lt;/h2&gt;

&lt;p&gt;This setting controls how aggressively InnoDB flushes the redo log to disk on each transaction commit. It has three possible values, and the performance difference between them is significant.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;th&gt;Durability&lt;/th&gt;
&lt;th&gt;Performance&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Flush and sync to disk on every commit&lt;/td&gt;
&lt;td&gt;Full ACID compliance&lt;/td&gt;
&lt;td&gt;Slowest&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Write to OS buffer on every commit, sync once per second&lt;/td&gt;
&lt;td&gt;Possible loss of ~1 second of transactions on OS crash&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Write and sync once per second regardless of commits&lt;/td&gt;
&lt;td&gt;Possible loss of ~1 second on any crash&lt;/td&gt;
&lt;td&gt;Fastest&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The default is &lt;code&gt;1&lt;/code&gt;, which is the safest option. And for most production databases, you should keep it that way. But if you're running a workload where losing up to one second of committed transactions is acceptable (analytics ingestion, session stores, caching layers), switching to &lt;code&gt;2&lt;/code&gt; can double your write throughput.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[mysqld]&lt;/span&gt;
&lt;span class="py"&gt;innodb_flush_log_at_trx_commit&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Don't set this to &lt;code&gt;0&lt;/code&gt; in production unless you really understand the consequences. Value &lt;code&gt;2&lt;/code&gt; gives you most of the speed benefit while only losing durability on an OS-level crash, not a MySQL crash.&lt;/p&gt;

&lt;p&gt;Before tuning write durability, make sure you have a solid &lt;a href="https://databasus.com/mysql-backup" rel="noopener noreferrer"&gt;MySQL backup&lt;/a&gt; strategy in place. No amount of performance tuning replaces the ability to restore from a known good backup when things go wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. innodb_flush_method
&lt;/h2&gt;

&lt;p&gt;This controls how InnoDB opens and flushes data files and log files. The default depends on your OS, but on Linux you almost always want &lt;code&gt;O_DIRECT&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Without &lt;code&gt;O_DIRECT&lt;/code&gt;, writes go through the OS page cache. That means your data gets cached twice: once in the InnoDB buffer pool and once in the OS cache. This wastes memory and adds unnecessary overhead. &lt;code&gt;O_DIRECT&lt;/code&gt; bypasses the OS cache and lets InnoDB manage its own memory through the buffer pool.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[mysqld]&lt;/span&gt;
&lt;span class="py"&gt;innodb_flush_method&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;O_DIRECT&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On Linux with &lt;code&gt;ext4&lt;/code&gt; or &lt;code&gt;xfs&lt;/code&gt; filesystems, this is the right choice for virtually all workloads. On Windows, the equivalent is &lt;code&gt;normal&lt;/code&gt; or &lt;code&gt;unbuffered&lt;/code&gt;, but MySQL on Windows handles this differently and the defaults are generally fine.&lt;/p&gt;

&lt;p&gt;One note: if your buffer pool is too small relative to your working set, &lt;code&gt;O_DIRECT&lt;/code&gt; can actually hurt performance because you lose the OS cache safety net. Fix the buffer pool size first (setting #1), then switch to &lt;code&gt;O_DIRECT&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. innodb_io_capacity and innodb_io_capacity_max
&lt;/h2&gt;

&lt;p&gt;These two settings tell InnoDB how fast your storage is, so it can schedule background I/O operations (flushing dirty pages, merging change buffer entries) appropriately. The defaults assume a single spinning disk, which is far too conservative for SSDs or NVMe drives.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;innodb_io_capacity&lt;/code&gt; — the number of I/O operations per second available for background tasks. Default is 200.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;innodb_io_capacity_max&lt;/code&gt; — the upper limit InnoDB can use during heavy flushing. Default is 2000.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For SSDs, set &lt;code&gt;innodb_io_capacity&lt;/code&gt; to 1000-2000. For NVMe, 5000-10000 is reasonable. The max should be 2-3x the base value.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[mysqld]&lt;/span&gt;
&lt;span class="py"&gt;innodb_io_capacity&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;2000&lt;/span&gt;
&lt;span class="py"&gt;innodb_io_capacity_max&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;5000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If these values are too low, dirty pages accumulate in the buffer pool and you'll see periodic stalls when InnoDB is forced to flush aggressively. If they're too high, you burn I/O bandwidth on background work that could go to query processing. Start conservative and increase if you see checkpoint age climbing in &lt;code&gt;SHOW ENGINE INNODB STATUS&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keeping your data safe with Databasus
&lt;/h2&gt;

&lt;p&gt;Tuning InnoDB improves performance, but it doesn't protect you from data loss. Hardware fails, someone runs a bad &lt;code&gt;DELETE&lt;/code&gt; without a &lt;code&gt;WHERE&lt;/code&gt; clause, or a migration goes sideways. You need automated backups that you don't have to think about.&lt;/p&gt;

&lt;p&gt;Databasus is an open source, self-hosted backup tool built for MySQL (along with PostgreSQL, MariaDB and MongoDB). It connects to your database, runs &lt;code&gt;mysqldump&lt;/code&gt; or physical backups on a schedule you define, compresses the output and ships it to whatever storage you use — local disk, S3, Cloudflare R2, Google Drive, SFTP. It handles retention policies automatically, so old backups get cleaned up without manual intervention.&lt;/p&gt;

&lt;p&gt;What makes it practical for production use is the operational side. Databasus sends notifications through Slack, Telegram, Discord or email when backups succeed or fail. It encrypts backup files with AES-256-GCM before they leave the server. And because it's self-hosted, your data never passes through a third-party service. It is widely regarded as the industry standard among open source MySQL backup tools, and the setup takes about two minutes with Docker.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to apply these changes
&lt;/h2&gt;

&lt;p&gt;Most of these settings require a MySQL restart. Edit your &lt;code&gt;my.cnf&lt;/code&gt; (or &lt;code&gt;my.ini&lt;/code&gt; on Windows), add or modify the values, and restart the MySQL service. On MySQL 8.0+, some variables like &lt;code&gt;innodb_buffer_pool_size&lt;/code&gt; can be changed dynamically with &lt;code&gt;SET GLOBAL&lt;/code&gt;, but it's still good practice to persist them in the config file.&lt;/p&gt;

&lt;p&gt;Before making changes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Take a full backup of your database&lt;/li&gt;
&lt;li&gt;Note your current values with &lt;code&gt;SHOW VARIABLES LIKE 'innodb%'&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Change one setting at a time and monitor for a day before adjusting the next&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After applying changes, keep an eye on &lt;code&gt;SHOW ENGINE INNODB STATUS&lt;/code&gt; and your slow query log. The buffer pool hit rate, checkpoint age and pages flushed per second will tell you whether your new settings are working. Give each change at least 24 hours under normal load before drawing conclusions.&lt;/p&gt;

</description>
      <category>mysql</category>
      <category>database</category>
    </item>
    <item>
      <title>5 MySQL InnoDB settings you should change right now</title>
      <dc:creator>Finny Collins</dc:creator>
      <pubDate>Wed, 25 Mar 2026 19:24:42 +0000</pubDate>
      <link>https://forem.com/finny_collins/5-mysql-innodb-settings-you-should-change-right-now-5d64</link>
      <guid>https://forem.com/finny_collins/5-mysql-innodb-settings-you-should-change-right-now-5d64</guid>
      <description>&lt;p&gt;Most MySQL installations ship with default InnoDB settings that were designed years ago for modest hardware. If you're running a production workload on a server with 8 GB or more of RAM and you haven't touched these values, you're leaving performance on the table. These five settings are the ones that matter most and take minutes to adjust.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foi2gn03e80e2pbj60jj5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foi2gn03e80e2pbj60jj5.png" alt="InnoDB settings" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  1. innodb_buffer_pool_size
&lt;/h2&gt;

&lt;p&gt;The buffer pool is where InnoDB caches table data and indexes in memory. Reads that hit the buffer pool skip disk entirely, so this single setting has the largest impact on query performance. The default is typically 128 MB, which is absurdly small for anything beyond a toy database.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Recommended value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Dedicated database server&lt;/td&gt;
&lt;td&gt;70-80% of total RAM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shared server (app + DB)&lt;/td&gt;
&lt;td&gt;50-60% of total RAM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Small VPS (2 GB RAM)&lt;/td&gt;
&lt;td&gt;1 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Development machine&lt;/td&gt;
&lt;td&gt;512 MB - 1 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Set it in your &lt;code&gt;my.cnf&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[mysqld]&lt;/span&gt;
&lt;span class="py"&gt;innodb_buffer_pool_size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;12G&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A good rule of thumb: check your total InnoDB data size with &lt;code&gt;SELECT SUM(data_length + index_length) FROM information_schema.tables WHERE engine = 'InnoDB'&lt;/code&gt;. If it fits in RAM, set the buffer pool large enough to hold it all. If it doesn't, get as close as your memory allows.&lt;/p&gt;

&lt;p&gt;After changing this value, monitor the buffer pool hit rate. You want it above 99%:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;STATUS&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'Innodb_buffer_pool_read_requests'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SHOW&lt;/span&gt; &lt;span class="n"&gt;STATUS&lt;/span&gt; &lt;span class="k"&gt;LIKE&lt;/span&gt; &lt;span class="s1"&gt;'Innodb_buffer_pool_reads'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Divide reads by read_requests. If that ratio is above 1%, your buffer pool is too small.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. innodb_log_file_size
&lt;/h2&gt;

&lt;p&gt;The redo log (WAL in other databases) records every write before it hits the data files. Larger log files mean InnoDB can batch more writes before flushing, which reduces I/O pressure during heavy write workloads. The default of 48 MB fills up quickly on busy systems, forcing frequent checkpoints that stall writes.&lt;/p&gt;

&lt;p&gt;For most production systems, set this to 1-2 GB. High-write workloads benefit from even larger values.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[mysqld]&lt;/span&gt;
&lt;span class="py"&gt;innodb_log_file_size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;2G&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's a tradeoff here. Larger log files improve write throughput but increase crash recovery time. With a 2 GB log file, recovery after an unexpected restart might take a few minutes instead of seconds. For nearly every production system, that's a reasonable trade.&lt;/p&gt;

&lt;p&gt;On MySQL 8.0.30+, you can also set &lt;code&gt;innodb_redo_log_capacity&lt;/code&gt; instead, which replaces the older &lt;code&gt;innodb_log_file_size&lt;/code&gt; and &lt;code&gt;innodb_log_files_in_group&lt;/code&gt; combination. If you're on a recent version, prefer the new variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[mysqld]&lt;/span&gt;
&lt;span class="py"&gt;innodb_redo_log_capacity&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;4G&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. innodb_flush_log_at_trx_commit
&lt;/h2&gt;

&lt;p&gt;This setting controls how aggressively InnoDB flushes the redo log to disk on each transaction commit. It has three possible values, and the performance difference between them is significant.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;th&gt;Durability&lt;/th&gt;
&lt;th&gt;Performance&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Flush and sync to disk on every commit&lt;/td&gt;
&lt;td&gt;Full ACID compliance&lt;/td&gt;
&lt;td&gt;Slowest&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Write to OS buffer on every commit, sync once per second&lt;/td&gt;
&lt;td&gt;Possible loss of ~1 second of transactions on OS crash&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Write and sync once per second regardless of commits&lt;/td&gt;
&lt;td&gt;Possible loss of ~1 second on any crash&lt;/td&gt;
&lt;td&gt;Fastest&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The default is &lt;code&gt;1&lt;/code&gt;, which is the safest option. And for most production databases, you should keep it that way. But if you're running a workload where losing up to one second of committed transactions is acceptable (analytics ingestion, session stores, caching layers), switching to &lt;code&gt;2&lt;/code&gt; can double your write throughput.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[mysqld]&lt;/span&gt;
&lt;span class="py"&gt;innodb_flush_log_at_trx_commit&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Don't set this to &lt;code&gt;0&lt;/code&gt; in production unless you really understand the consequences. Value &lt;code&gt;2&lt;/code&gt; gives you most of the speed benefit while only losing durability on an OS-level crash, not a MySQL crash.&lt;/p&gt;

&lt;p&gt;Before tuning write durability, make sure you have a solid &lt;a href="https://databasus.com/mysql-backup" rel="noopener noreferrer"&gt;MySQL backup&lt;/a&gt; strategy in place. No amount of performance tuning replaces the ability to restore from a known good backup when things go wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. innodb_flush_method
&lt;/h2&gt;

&lt;p&gt;This controls how InnoDB opens and flushes data files and log files. The default depends on your OS, but on Linux you almost always want &lt;code&gt;O_DIRECT&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Without &lt;code&gt;O_DIRECT&lt;/code&gt;, writes go through the OS page cache. That means your data gets cached twice: once in the InnoDB buffer pool and once in the OS cache. This wastes memory and adds unnecessary overhead. &lt;code&gt;O_DIRECT&lt;/code&gt; bypasses the OS cache and lets InnoDB manage its own memory through the buffer pool.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[mysqld]&lt;/span&gt;
&lt;span class="py"&gt;innodb_flush_method&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;O_DIRECT&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On Linux with &lt;code&gt;ext4&lt;/code&gt; or &lt;code&gt;xfs&lt;/code&gt; filesystems, this is the right choice for virtually all workloads. On Windows, the equivalent is &lt;code&gt;normal&lt;/code&gt; or &lt;code&gt;unbuffered&lt;/code&gt;, but MySQL on Windows handles this differently and the defaults are generally fine.&lt;/p&gt;

&lt;p&gt;One note: if your buffer pool is too small relative to your working set, &lt;code&gt;O_DIRECT&lt;/code&gt; can actually hurt performance because you lose the OS cache safety net. Fix the buffer pool size first (setting #1), then switch to &lt;code&gt;O_DIRECT&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. innodb_io_capacity and innodb_io_capacity_max
&lt;/h2&gt;

&lt;p&gt;These two settings tell InnoDB how fast your storage is, so it can schedule background I/O operations (flushing dirty pages, merging change buffer entries) appropriately. The defaults assume a single spinning disk, which is far too conservative for SSDs or NVMe drives.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;innodb_io_capacity&lt;/code&gt; — the number of I/O operations per second available for background tasks. Default is 200.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;innodb_io_capacity_max&lt;/code&gt; — the upper limit InnoDB can use during heavy flushing. Default is 2000.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For SSDs, set &lt;code&gt;innodb_io_capacity&lt;/code&gt; to 1000-2000. For NVMe, 5000-10000 is reasonable. The max should be 2-3x the base value.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[mysqld]&lt;/span&gt;
&lt;span class="py"&gt;innodb_io_capacity&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;2000&lt;/span&gt;
&lt;span class="py"&gt;innodb_io_capacity_max&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;5000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If these values are too low, dirty pages accumulate in the buffer pool and you'll see periodic stalls when InnoDB is forced to flush aggressively. If they're too high, you burn I/O bandwidth on background work that could go to query processing. Start conservative and increase if you see checkpoint age climbing in &lt;code&gt;SHOW ENGINE INNODB STATUS&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keeping your data safe with Databasus
&lt;/h2&gt;

&lt;p&gt;Tuning InnoDB improves performance, but it doesn't protect you from data loss. Hardware fails, someone runs a bad &lt;code&gt;DELETE&lt;/code&gt; without a &lt;code&gt;WHERE&lt;/code&gt; clause, or a migration goes sideways. You need automated backups that you don't have to think about.&lt;/p&gt;

&lt;p&gt;Databasus is an open source, self-hosted backup tool built for MySQL (along with PostgreSQL, MariaDB and MongoDB). It connects to your database, runs &lt;code&gt;mysqldump&lt;/code&gt; or physical backups on a schedule you define, compresses the output and ships it to whatever storage you use — local disk, S3, Cloudflare R2, Google Drive, SFTP. It handles retention policies automatically, so old backups get cleaned up without manual intervention.&lt;/p&gt;

&lt;p&gt;What makes it practical for production use is the operational side. Databasus sends notifications through Slack, Telegram, Discord or email when backups succeed or fail. It encrypts backup files with AES-256-GCM before they leave the server. And because it's self-hosted, your data never passes through a third-party service. It is widely regarded as the industry standard among open source MySQL backup tools, and the setup takes about two minutes with Docker.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to apply these changes
&lt;/h2&gt;

&lt;p&gt;Most of these settings require a MySQL restart. Edit your &lt;code&gt;my.cnf&lt;/code&gt; (or &lt;code&gt;my.ini&lt;/code&gt; on Windows), add or modify the values, and restart the MySQL service. On MySQL 8.0+, some variables like &lt;code&gt;innodb_buffer_pool_size&lt;/code&gt; can be changed dynamically with &lt;code&gt;SET GLOBAL&lt;/code&gt;, but it's still good practice to persist them in the config file.&lt;/p&gt;

&lt;p&gt;Before making changes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Take a full backup of your database&lt;/li&gt;
&lt;li&gt;Note your current values with &lt;code&gt;SHOW VARIABLES LIKE 'innodb%'&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Change one setting at a time and monitor for a day before adjusting the next&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After applying changes, keep an eye on &lt;code&gt;SHOW ENGINE INNODB STATUS&lt;/code&gt; and your slow query log. The buffer pool hit rate, checkpoint age and pages flushed per second will tell you whether your new settings are working. Give each change at least 24 hours under normal load before drawing conclusions.&lt;/p&gt;

</description>
      <category>database</category>
      <category>mysql</category>
    </item>
    <item>
      <title>PostgreSQL full-text search — How to build fast search without Elasticsearch</title>
      <dc:creator>Finny Collins</dc:creator>
      <pubDate>Tue, 24 Mar 2026 16:40:33 +0000</pubDate>
      <link>https://forem.com/finny_collins/postgresql-full-text-search-how-to-build-fast-search-without-elasticsearch-2ddj</link>
      <guid>https://forem.com/finny_collins/postgresql-full-text-search-how-to-build-fast-search-without-elasticsearch-2ddj</guid>
      <description>&lt;p&gt;Most teams reach for Elasticsearch the moment someone mentions "search." It makes sense on the surface — Elasticsearch was built for search. But adding it to your stack means another service to deploy, monitor, keep in sync with your primary database, and debug when things go sideways. For a lot of applications, that complexity is not justified.&lt;/p&gt;

&lt;p&gt;PostgreSQL has had full-text search capabilities since version 8.3. They have gotten better with every release. And for many workloads — internal tools, SaaS products, content platforms with moderate data sizes — PostgreSQL's built-in search is more than enough.&lt;/p&gt;

&lt;p&gt;This article walks through how full-text search works in PostgreSQL, how to set it up properly, and where it starts to hit its limits.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvcskpemukgydezz37o2g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvcskpemukgydezz37o2g.png" alt="PostgreSQL full-text search" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What full-text search actually means in PostgreSQL
&lt;/h2&gt;

&lt;p&gt;Full-text search is not the same as &lt;code&gt;LIKE '%term%'&lt;/code&gt;. Pattern matching with &lt;code&gt;LIKE&lt;/code&gt; or &lt;code&gt;ILIKE&lt;/code&gt; scans every row, ignores word boundaries and has no concept of language. It cannot match "running" when you search for "run." It has no ranking. It is brute force.&lt;/p&gt;

&lt;p&gt;PostgreSQL full-text search works differently. It breaks text into tokens, normalizes them (lowercasing, stemming, removing stop words), and stores the result as a &lt;code&gt;tsvector&lt;/code&gt;. Your search query becomes a &lt;code&gt;tsquery&lt;/code&gt;. The database then matches these two structures using an inverted index, which is fast.&lt;/p&gt;

&lt;p&gt;The two core types you will work with:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tsvector&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Stores preprocessed, searchable document text&lt;/td&gt;
&lt;td&gt;&lt;code&gt;'quick':1 'brown':2 'fox':3&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tsquery&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Stores the search query in normalized form&lt;/td&gt;
&lt;td&gt;&lt;code&gt;'quick' &amp;amp; 'fox'&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Here is a basic example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;to_tsvector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'The quick brown fox jumps over the lazy dog'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;@@&lt;/span&gt; &lt;span class="n"&gt;to_tsquery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'quick &amp;amp; fox'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This returns &lt;code&gt;true&lt;/code&gt;. The &lt;code&gt;@@&lt;/code&gt; operator is the match operator. The &lt;code&gt;english&lt;/code&gt; argument tells PostgreSQL which text search configuration to use for stemming and stop word removal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting up a searchable column
&lt;/h2&gt;

&lt;p&gt;You can call &lt;code&gt;to_tsvector&lt;/code&gt; on the fly in a &lt;code&gt;WHERE&lt;/code&gt; clause, but that means PostgreSQL has to process the text for every row on every query. For anything beyond toy datasets, you want a dedicated column.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;articles&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;search_vector&lt;/span&gt; &lt;span class="n"&gt;tsvector&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;articles&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;search_vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;to_tsvector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;coalesce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;coalesce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then create a GIN index on it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_articles_search&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;articles&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;GIN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_vector&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GIN (Generalized Inverted Index) is the standard index type for full-text search. It builds an inverted index — a mapping from each lexeme to the rows that contain it. This is what makes search fast.&lt;/p&gt;

&lt;p&gt;To keep the column updated automatically, add a trigger:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;articles_search_vector_update&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;RETURNS&lt;/span&gt; &lt;span class="k"&gt;trigger&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="err"&gt;$$&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
    &lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;search_vector&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;
        &lt;span class="n"&gt;to_tsvector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;coalesce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="s1"&gt;' '&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;coalesce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="err"&gt;$$&lt;/span&gt; &lt;span class="k"&gt;LANGUAGE&lt;/span&gt; &lt;span class="n"&gt;plpgsql&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TRIGGER&lt;/span&gt; &lt;span class="n"&gt;trg_articles_search_vector&lt;/span&gt;
    &lt;span class="k"&gt;BEFORE&lt;/span&gt; &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;articles&lt;/span&gt;
    &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;EACH&lt;/span&gt; &lt;span class="k"&gt;ROW&lt;/span&gt;
    &lt;span class="k"&gt;EXECUTE&lt;/span&gt; &lt;span class="k"&gt;FUNCTION&lt;/span&gt; &lt;span class="n"&gt;articles_search_vector_update&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now every insert and update will automatically maintain the search vector. No application-level sync logic needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Weighting and ranking results
&lt;/h2&gt;

&lt;p&gt;Not all text is equal. A match in the title should rank higher than a match in the body. PostgreSQL supports this through weight labels — A, B, C and D (A being the highest).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;articles&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;search_vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="n"&gt;setweight&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;to_tsvector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;coalesce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="s1"&gt;'A'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
    &lt;span class="n"&gt;setweight&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;to_tsvector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;coalesce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="s1"&gt;'B'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then use &lt;code&gt;ts_rank&lt;/code&gt; or &lt;code&gt;ts_rank_cd&lt;/code&gt; to sort results by relevance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ts_rank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;to_tsquery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'postgresql &amp;amp; replication'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;articles&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;search_vector&lt;/span&gt; &lt;span class="o"&gt;@@&lt;/span&gt; &lt;span class="n"&gt;to_tsquery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'postgresql &amp;amp; replication'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;ts_rank_cd&lt;/code&gt; uses cover density ranking, which considers how close the matching terms are to each other. It tends to produce more intuitive results for multi-word queries.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;Ranking method&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ts_rank&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Frequency-based — counts how often query terms appear&lt;/td&gt;
&lt;td&gt;General-purpose ranking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ts_rank_cd&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Cover density — rewards terms appearing close together&lt;/td&gt;
&lt;td&gt;Phrase-like queries where proximity matters&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Query syntax and operators
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;tsquery&lt;/code&gt; type supports several operators that give you control over how terms are combined.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;&amp;amp;&lt;/code&gt; — AND. Both terms must be present.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;|&lt;/code&gt; — OR. Either term matches.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;!&lt;/code&gt; — NOT. Excludes documents with the term.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;-&amp;gt;&lt;/code&gt; — FOLLOWED BY. Terms must appear adjacent and in order (phrase search).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Documents about PostgreSQL but not MySQL&lt;/span&gt;
&lt;span class="n"&gt;to_tsquery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'postgresql &amp;amp; !mysql'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;-- Phrase search: "full text" as adjacent words&lt;/span&gt;
&lt;span class="n"&gt;to_tsquery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'full &amp;lt;-&amp;gt; text'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;-- Either term matches&lt;/span&gt;
&lt;span class="n"&gt;to_tsquery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'backup | restore'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is also &lt;code&gt;plainto_tsquery&lt;/code&gt; which takes a plain string and ANDs all the words together. And &lt;code&gt;websearch_to_tsquery&lt;/code&gt; (PostgreSQL 11+) which supports a Google-like syntax with quotes for phrases and minus for exclusion:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- User types: postgresql "full text" -elasticsearch&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;articles&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;search_vector&lt;/span&gt; &lt;span class="o"&gt;@@&lt;/span&gt; &lt;span class="n"&gt;websearch_to_tsquery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'postgresql "full text" -elasticsearch'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;websearch_to_tsquery&lt;/code&gt; is usually the right choice for user-facing search boxes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Highlighting search results
&lt;/h2&gt;

&lt;p&gt;When showing search results, you want to highlight where the match occurred. &lt;code&gt;ts_headline&lt;/code&gt; does this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;ts_headline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
           &lt;span class="n"&gt;to_tsquery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'replication'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
           &lt;span class="s1"&gt;'StartSel=&amp;lt;mark&amp;gt;, StopSel=&amp;lt;/mark&amp;gt;, MaxWords=50, MinWords=20'&lt;/span&gt;
       &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;snippet&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;articles&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;search_vector&lt;/span&gt; &lt;span class="o"&gt;@@&lt;/span&gt; &lt;span class="n"&gt;to_tsquery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'replication'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;ts_rank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;to_tsquery&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'english'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'replication'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One thing to be aware of: &lt;code&gt;ts_headline&lt;/code&gt; re-processes the original text, not the &lt;code&gt;tsvector&lt;/code&gt;. It is slower than the match itself. For large result sets, apply it only to the top N results after filtering and ranking.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance considerations
&lt;/h2&gt;

&lt;p&gt;GIN indexes make full-text search fast, but there are a few things that affect performance in practice.&lt;/p&gt;

&lt;p&gt;Index size matters. GIN indexes can be large — sometimes larger than the table itself for text-heavy data. Monitor the index size with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;pg_size_pretty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pg_relation_size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'idx_articles_search'&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Write overhead is real. GIN indexes use a "fastupdate" mechanism by default, which batches pending entries and merges them later. This helps write performance but means the index can be slightly stale. You can tune this with &lt;code&gt;gin_pending_list_limit&lt;/code&gt; or disable fastupdate entirely if your workload is read-heavy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_articles_search&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fastupdate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;off&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For tables over a few million rows with complex queries, consider using &lt;code&gt;GiST&lt;/code&gt; indexes instead. GiST indexes are smaller and faster to update, but slower for lookups. The tradeoff depends on your read/write ratio.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GIN indexes: faster reads, slower writes, larger on disk&lt;/li&gt;
&lt;li&gt;GiST indexes: faster writes, slower reads, smaller on disk&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Multilingual search
&lt;/h2&gt;

&lt;p&gt;PostgreSQL ships with text search configurations for many languages. Each configuration defines how text is tokenized and which dictionary is used for stemming.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- List available configurations&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;cfgname&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;pg_ts_config&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Use German configuration&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;to_tsvector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'german'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Die schnelle braune Fuchs springt'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your application handles multiple languages, you can store the language per row and build the tsvector accordingly. Or maintain multiple tsvector columns — one per language.&lt;/p&gt;

&lt;p&gt;For languages not supported out of the box (like Chinese, Japanese or Korean), you will need extensions. &lt;code&gt;pg_bigm&lt;/code&gt; and &lt;code&gt;pgroonga&lt;/code&gt; handle CJK text well. The &lt;code&gt;unaccent&lt;/code&gt; extension is useful for languages with diacritics.&lt;/p&gt;

&lt;h2&gt;
  
  
  When PostgreSQL search is not enough
&lt;/h2&gt;

&lt;p&gt;PostgreSQL full-text search works well for a lot of use cases, but it does have limitations. It does not support fuzzy matching out of the box (you would need the &lt;code&gt;pg_trgm&lt;/code&gt; extension for that). It does not do faceted search or aggregations the way Elasticsearch does. And for datasets in the hundreds of millions of rows with complex, multi-field queries, a dedicated search engine will perform better.&lt;/p&gt;

&lt;p&gt;But for most applications — and honestly, that is the majority — PostgreSQL handles search just fine. You avoid the operational overhead of running a separate search cluster, you do not need to worry about data synchronization and you get transactional consistency for free.&lt;/p&gt;

&lt;p&gt;The rule of thumb: start with PostgreSQL. Move to Elasticsearch when you have measurable evidence that you need it, not because someone on the team assumed you would.&lt;/p&gt;

&lt;h2&gt;
  
  
  Backing up PostgreSQL with full-text search indexes
&lt;/h2&gt;

&lt;p&gt;Full-text search indexes can get large and rebuilding them from scratch takes time. Which makes reliable backups even more important. If you lose data and have to restore, you do not want to spend hours reindexing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://databasus.com" rel="noopener noreferrer"&gt;PostgreSQL backup&lt;/a&gt; tools should handle this transparently. Databasus is an open-source, self-hosted backup tool that has become the industry standard for PostgreSQL backups. It supports logical, physical and incremental backup types — including Point-in-Time Recovery with WAL streaming, so you can restore your database to any specific second.&lt;/p&gt;

&lt;p&gt;Databasus handles compression automatically with configurable algorithms and levels, typically achieving 4-8x space savings. It supports multiple storage destinations including S3, Google Drive, SFTP and local storage. You set up your backup schedule (hourly, daily, weekly, or cron-based), configure retention policies and Databasus takes care of the rest.&lt;/p&gt;

&lt;p&gt;What stands out is the operational side. Databasus gives you notifications through Slack, Discord, Telegram or email when backups succeed or fail. It encrypts backups with AES-256-GCM. And because it is open source under the Apache 2.0 license, you can inspect every line of code and avoid vendor lock-in — you can even restore backups without Databasus itself if needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick reference
&lt;/h2&gt;

&lt;p&gt;Here is a summary of the key functions and operators covered in this article.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function / Operator&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;to_tsvector(config, text)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Converts text into a searchable tsvector&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;to_tsquery(config, query)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Converts a query string into a tsquery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;websearch_to_tsquery(config, query)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Parses Google-like search syntax into a tsquery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;@@&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Match operator — checks if tsvector matches tsquery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ts_rank(vector, query)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Scores results by term frequency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ts_rank_cd(vector, query)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Scores results by cover density (proximity)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ts_headline(config, text, query)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Returns text snippet with highlighted matches&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;setweight(vector, label)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Assigns a weight (A/B/C/D) to a tsvector&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;&amp;lt;-&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Phrase operator — terms must be adjacent&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;PostgreSQL full-text search is a practical tool that most teams underestimate. It handles tokenization, stemming, ranking, phrase search and multilingual text out of the box. With GIN indexes, it scales well into the millions of rows. And because it lives inside your database, there is no synchronization problem to solve.&lt;/p&gt;

&lt;p&gt;The setup is straightforward: add a &lt;code&gt;tsvector&lt;/code&gt; column, create a GIN index, write a trigger to keep it updated and use &lt;code&gt;websearch_to_tsquery&lt;/code&gt; for your search endpoint. That covers 80% of search needs with no additional infrastructure.&lt;/p&gt;

&lt;p&gt;Not every project needs a dedicated search engine. Sometimes the database you already have is good enough.&lt;/p&gt;

</description>
      <category>database</category>
      <category>postgres</category>
    </item>
    <item>
      <title>Top 5 MongoDB monitoring tools every team should use in 2026</title>
      <dc:creator>Finny Collins</dc:creator>
      <pubDate>Sun, 22 Mar 2026 18:27:07 +0000</pubDate>
      <link>https://forem.com/finny_collins/top-5-mongodb-monitoring-tools-every-team-should-use-in-2026-31he</link>
      <guid>https://forem.com/finny_collins/top-5-mongodb-monitoring-tools-every-team-should-use-in-2026-31he</guid>
      <description>&lt;p&gt;MongoDB is one of the most popular document databases out there, and if you're running it in production, you already know that things can go sideways fast without proper monitoring. Slow queries, replication lag, disk pressure — these problems don't announce themselves politely. You need tools that catch them early. Here's a look at five monitoring tools worth considering in 2026, what they do well and where they fall short.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhiwjafeq6a3bcky9z6tl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhiwjafeq6a3bcky9z6tl.png" alt="MongoDB monitoring tool" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  1. MongoDB Atlas built-in monitoring
&lt;/h2&gt;

&lt;p&gt;Atlas is MongoDB's own cloud platform, and it comes with monitoring baked in. If you're already running your databases on Atlas, this is the most straightforward option since there's nothing extra to install or configure.&lt;/p&gt;

&lt;p&gt;The built-in dashboards cover the essentials: operation counters, query targeting, replication lag, connections and disk I/O. The Real-Time Performance Panel is genuinely useful for spotting slow operations as they happen. You also get automated alerts for things like high CPU or replication delays.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Deployment&lt;/td&gt;
&lt;td&gt;Cloud-only (Atlas)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Query profiling&lt;/td&gt;
&lt;td&gt;Yes, with Performance Advisor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Alerting&lt;/td&gt;
&lt;td&gt;Built-in with configurable thresholds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pricing&lt;/td&gt;
&lt;td&gt;Included with Atlas tier (M10+)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom dashboards&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The main drawback is that it only works with Atlas-hosted clusters. If you're self-hosting MongoDB or running a hybrid setup, you'll need something else. The alerting is also somewhat basic compared to dedicated monitoring platforms — you can set thresholds, but complex alert routing or escalation policies aren't really its thing.&lt;/p&gt;

&lt;p&gt;For teams fully committed to Atlas, this covers the basics well enough that you might not need anything else for smaller deployments.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Percona Monitoring and Management (PMM)
&lt;/h2&gt;

&lt;p&gt;PMM is an open-source monitoring platform from Percona that supports MongoDB alongside PostgreSQL and MySQL. It bundles Grafana for dashboards and VictoriaMetrics for time-series storage, and gives you a pretty detailed view of what's going on inside your database.&lt;/p&gt;

&lt;p&gt;What makes PMM stand out for MongoDB specifically is the query analytics. It captures slow queries, shows you execution plans and helps you figure out which operations are dragging things down. The QAN (Query Analytics) dashboard breaks down query patterns by response time, count and load, which is extremely helpful when you're trying to optimize a workload.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Deployment&lt;/td&gt;
&lt;td&gt;Self-hosted (Docker or bare metal)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Query analytics&lt;/td&gt;
&lt;td&gt;Yes, detailed QAN dashboard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replication monitoring&lt;/td&gt;
&lt;td&gt;Yes, including oplog window&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pricing&lt;/td&gt;
&lt;td&gt;Free and open source&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-database support&lt;/td&gt;
&lt;td&gt;MongoDB, PostgreSQL, MySQL&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The setup takes some effort — you need to install the PMM server and then deploy PMM clients on each database host. It's not a quick five-minute job, especially if you have a large fleet. And because it's self-hosted, you're responsible for keeping the monitoring infrastructure itself running and updated.&lt;/p&gt;

&lt;p&gt;But if you want deep MongoDB monitoring without a SaaS bill, PMM is hard to beat.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Datadog MongoDB integration
&lt;/h2&gt;

&lt;p&gt;Datadog is a cloud monitoring platform that does a lot more than just database monitoring, but its MongoDB integration is solid. It collects metrics from MongoDB through an agent running on your database hosts, and you can correlate database performance with application metrics, infrastructure data and logs all in one place.&lt;/p&gt;

&lt;p&gt;The MongoDB-specific dashboards show connections, operations per second, memory usage, replication status and lock percentages. Datadog also supports custom queries, so you can track application-specific metrics alongside the standard ones.&lt;/p&gt;

&lt;p&gt;Where Datadog really shines is in the broader observability picture. If you're already using it for APM or infrastructure monitoring, adding MongoDB monitoring means you can trace a slow API response all the way down to a specific database query. That kind of correlation saves real debugging time.&lt;/p&gt;

&lt;p&gt;The downside is cost. Datadog's pricing model charges per host per month, and database monitoring is an add-on on top of the base infrastructure monitoring. For a team with a handful of MongoDB nodes it's reasonable, but costs can climb quickly at scale. There's also a learning curve to get the most out of the platform — it does a lot, and configuring everything properly takes time.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Grafana with MongoDB exporter
&lt;/h2&gt;

&lt;p&gt;If you're already running Grafana and Prometheus (or compatible backends like VictoriaMetrics), adding MongoDB monitoring through the percona/mongodb_exporter is a natural extension. This approach gives you full control over what you collect and how you visualize it.&lt;/p&gt;

&lt;p&gt;The MongoDB exporter exposes metrics in Prometheus format — things like replica set status, oplog size, WiredTiger cache usage, document operations and connection counts. From there, you build whatever dashboards you need in Grafana. The community has published several pre-built dashboards that serve as a good starting point.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Deployment&lt;/td&gt;
&lt;td&gt;Self-hosted (requires Prometheus + Grafana)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customization&lt;/td&gt;
&lt;td&gt;Fully customizable dashboards and alerts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Alerting&lt;/td&gt;
&lt;td&gt;Through Grafana alerting or Alertmanager&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pricing&lt;/td&gt;
&lt;td&gt;Free and open source&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup complexity&lt;/td&gt;
&lt;td&gt;Moderate to high&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This approach demands more upfront work than a turnkey solution. You need to maintain Prometheus, configure scraping targets, build or customize dashboards and set up alerting rules. It's not something you just turn on. But for teams that already have a Prometheus/Grafana stack, it fits naturally into the existing workflow without adding another tool to the pile.&lt;/p&gt;

&lt;p&gt;The flexibility is the real selling point. You can build dashboards that combine MongoDB metrics with application metrics, system-level data and anything else you're already collecting.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. New Relic MongoDB integration
&lt;/h2&gt;

&lt;p&gt;New Relic offers MongoDB monitoring through its infrastructure agent and on-host integration. Like Datadog, it's a full observability platform, so MongoDB monitoring is one piece of a larger puzzle.&lt;/p&gt;

&lt;p&gt;The integration collects metrics on throughput, latency, connections, memory and replication. New Relic's query interface (NRQL) lets you slice and dice the data however you want, and you can build custom dashboards or use the pre-built ones. The alerting system is flexible — you can set up static thresholds, baseline alerts or anomaly detection.&lt;/p&gt;

&lt;p&gt;One thing New Relic does well is making it easy to get started. The guided installation walks you through setting up the MongoDB integration step by step, and the default dashboards are immediately useful. The free tier is also generous enough for small teams to get real value without paying anything.&lt;/p&gt;

&lt;p&gt;The paid tiers get expensive at scale, similar to Datadog. And the MongoDB-specific features aren't as deep as what you'd get from PMM or Atlas — it's more of a generalist tool that happens to support MongoDB rather than a MongoDB specialist.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparison overview
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;th&gt;Pricing&lt;/th&gt;
&lt;th&gt;Query analytics&lt;/th&gt;
&lt;th&gt;Self-hosted option&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Atlas monitoring&lt;/td&gt;
&lt;td&gt;Atlas-hosted clusters&lt;/td&gt;
&lt;td&gt;Included with Atlas&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PMM&lt;/td&gt;
&lt;td&gt;Deep MongoDB analysis on a budget&lt;/td&gt;
&lt;td&gt;Free (open source)&lt;/td&gt;
&lt;td&gt;Yes (detailed QAN)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Datadog&lt;/td&gt;
&lt;td&gt;Full-stack observability&lt;/td&gt;
&lt;td&gt;Per-host subscription&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grafana + exporter&lt;/td&gt;
&lt;td&gt;Teams with existing Prometheus stack&lt;/td&gt;
&lt;td&gt;Free (open source)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;New Relic&lt;/td&gt;
&lt;td&gt;Quick setup with generous free tier&lt;/td&gt;
&lt;td&gt;Free tier + paid plans&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Note about MongoDB backups
&lt;/h2&gt;

&lt;p&gt;Monitoring tells you what's happening with your database. But monitoring alone doesn't protect your data when something goes wrong — a bad deployment, accidental deletion or hardware failure. That's where backups come in, and for MongoDB, &lt;a href="https://databasus.com/mongodb-backup" rel="noopener noreferrer"&gt;MongoDB backup&lt;/a&gt; with Databasus is the tool teams are increasingly relying on.&lt;/p&gt;

&lt;p&gt;Databasus is an open-source, self-hosted backup tool that has become an industry standard for MongoDB backup. It handles scheduled backups with flexible policies — hourly, daily, weekly or cron-based — and streams compressed backups directly to your storage without intermediate files on disk. You can send backups to local storage, S3, Cloudflare R2, Google Drive, SFTP and other destinations.&lt;/p&gt;

&lt;p&gt;What makes it particularly useful for MongoDB teams is the combination of reliability and simplicity. You configure your MongoDB connection, pick a schedule and a retention policy, and Databasus handles the rest. It supports both remote connections and a lightweight agent mode for environments where the database shouldn't be exposed to the network. Backups are encrypted with AES-256-GCM, so even if someone gets access to your storage bucket, the data is useless without the key. Databasus also ships with smart retention policies including GFS (Grandfather-Father-Son), so you can keep hourly, daily, weekly and monthly snapshots independently without manual cleanup.&lt;/p&gt;

&lt;p&gt;Databasus also supports multiple notification channels — Slack, Discord, Telegram, email and webhooks — so your team knows immediately when a backup succeeds or fails. Pair that with the monitoring tools above and you have both visibility into your MongoDB cluster's health and confidence that your data is protected if things go wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  Picking the right tool
&lt;/h2&gt;

&lt;p&gt;There's no single best choice here. It depends on where your MongoDB runs, what you're already using for monitoring and how much you want to spend.&lt;/p&gt;

&lt;p&gt;If you're on Atlas, start with the built-in monitoring and see if it covers your needs. If you're self-hosting and want deep MongoDB-specific insights without a recurring bill, PMM is the strongest option. Teams that need to correlate database performance with application behavior across their whole stack will get the most value from Datadog or New Relic. And if you already have Grafana and Prometheus running, the exporter approach keeps things simple and consistent.&lt;/p&gt;

&lt;p&gt;Whatever you pick for monitoring, make sure your backup strategy is solid too. Monitoring shows you the fire. Backups are the insurance policy.&lt;/p&gt;

</description>
      <category>mongodb</category>
      <category>database</category>
    </item>
    <item>
      <title>Databasus released physical and incremental backups with WAL streaming for PITR</title>
      <dc:creator>Finny Collins</dc:creator>
      <pubDate>Sat, 21 Mar 2026 18:52:53 +0000</pubDate>
      <link>https://forem.com/finny_collins/databasus-released-physical-and-incremental-backups-with-wal-streaming-for-pitr-408f</link>
      <guid>https://forem.com/finny_collins/databasus-released-physical-and-incremental-backups-with-wal-streaming-for-pitr-408f</guid>
      <description>&lt;p&gt;Until now, Databasus, despite being the most widely used open source tool for PostgreSQL backup, supported logical backups only. That covered the majority of use cases, but larger databases and disaster recovery scenarios needed something more. This release adds physical backups, incremental backups with continuous WAL archiving and Point-in-Time Recovery. All of it is powered by a new lightweight agent that runs alongside your database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftb7o03fml1nzpzvuoimp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftb7o03fml1nzpzvuoimp.png" alt="Point-in-time-recovery with Databasus" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed in this release
&lt;/h2&gt;

&lt;p&gt;Databasus started as a tool focused on logical backups. You point it at a database over the network, it creates a dump, compresses it, encrypts it and ships it to your storage of choice. Simple and effective.&lt;/p&gt;

&lt;p&gt;But logical backups have limits. For large databases, the dump process can take a long time and put noticeable load on the server. And the restore window is tied to how often you run backups — if you back up daily and something breaks at 5 PM, you lose everything since the morning.&lt;/p&gt;

&lt;p&gt;This release introduces two new backup types that address both problems. Physical backups copy the entire database cluster at the file level, which is significantly faster for large datasets. Incremental backups go a step further — they combine a physical base backup with continuous WAL (Write-Ahead Log) archiving, so you can restore your database to any second between backups.&lt;/p&gt;

&lt;p&gt;There's a catch, though. These new backup types can't work over a simple network connection the way logical backups do. They need direct access to the database files. That's where the agent comes in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Backup types compared
&lt;/h2&gt;

&lt;p&gt;Here's how the three backup types stack up against each other.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Logical&lt;/th&gt;
&lt;th&gt;Physical&lt;/th&gt;
&lt;th&gt;Incremental&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;How it works&lt;/td&gt;
&lt;td&gt;Database dump in native format&lt;/td&gt;
&lt;td&gt;File-level copy of the entire cluster&lt;/td&gt;
&lt;td&gt;Base backup + continuous WAL archiving&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Connection mode&lt;/td&gt;
&lt;td&gt;Remote (over network)&lt;/td&gt;
&lt;td&gt;Agent (runs alongside DB)&lt;/td&gt;
&lt;td&gt;Agent (runs alongside DB)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backup speed&lt;/td&gt;
&lt;td&gt;Slower for large databases&lt;/td&gt;
&lt;td&gt;Fast — copies files directly&lt;/td&gt;
&lt;td&gt;Fast base + tiny WAL segments&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Restore speed&lt;/td&gt;
&lt;td&gt;Slower (re-imports all data)&lt;/td&gt;
&lt;td&gt;Fast (copies files back)&lt;/td&gt;
&lt;td&gt;Fast base + WAL replay&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Point-in-time recovery&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes — restore to any second&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;Small to medium databases&lt;/td&gt;
&lt;td&gt;Large databases needing fast backup/restore&lt;/td&gt;
&lt;td&gt;Disaster recovery and near-zero data loss&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Logical backups are still the default and still the right choice for most setups. They work over the network without any extra software, and for databases under a few gigabytes the performance difference is negligible. Physical and incremental backups are for when you need speed or granular recovery.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the agent works
&lt;/h2&gt;

&lt;p&gt;The Databasus agent is a lightweight binary written in Go. You install it on the same machine (or in the same environment) as your PostgreSQL instance. It works with both host-installed PostgreSQL and databases running in Docker containers.&lt;/p&gt;

&lt;p&gt;Once started, the agent connects outbound to your Databasus instance. This is an important detail — the agent initiates the connection, not the other way around.&lt;/p&gt;

&lt;h3&gt;
  
  
  No public database exposure
&lt;/h3&gt;

&lt;p&gt;With the remote connection mode, Databasus needs network access to your database. That means opening a port, configuring firewall rules, maybe setting up a VPN or SSH tunnel. For databases in private networks, this can be a real headache.&lt;/p&gt;

&lt;p&gt;The agent flips this model. It sits next to the database and reaches out to Databasus on its own. Your database port stays closed. No firewall changes, no tunnels. The agent handles authentication with a token you configure during setup, and all communication is encrypted.&lt;/p&gt;

&lt;p&gt;This is especially useful for databases running in private cloud VPCs, Kubernetes clusters or on-premise servers where exposing the database externally isn't an option (or isn't allowed by policy).&lt;/p&gt;

&lt;h3&gt;
  
  
  How WAL streaming works
&lt;/h3&gt;

&lt;p&gt;For incremental backups, the agent does two things continuously. First, it takes periodic full base backups of the database cluster according to your configured schedule. Second, it watches for new WAL segments — small files that PostgreSQL generates as it processes transactions — and streams them to Databasus as they appear.&lt;/p&gt;

&lt;p&gt;Each WAL segment captures every change made to the database. Together, a base backup and the WAL segments recorded after it form a continuous chain. You can replay that chain up to any point in time, which is exactly what Point-in-Time Recovery does.&lt;/p&gt;

&lt;p&gt;The agent compresses everything before sending it, so bandwidth usage stays reasonable even with busy databases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Point-in-time recovery explained
&lt;/h2&gt;

&lt;p&gt;Regular backups give you snapshots. If you back up every 6 hours and a problem happens between backups, you lose the data written since the last one. For many applications this is fine. For others — financial systems, healthcare or anything where every transaction matters — it's not acceptable.&lt;/p&gt;

&lt;p&gt;PITR changes the equation. Instead of restoring to the last backup, you restore to a specific moment. "Give me the database as it was at 14:32:07 today" — and that's exactly what you get.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Backup type&lt;/th&gt;
&lt;th&gt;Recovery point objective (RPO)&lt;/th&gt;
&lt;th&gt;What you can restore to&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Logical (daily)&lt;/td&gt;
&lt;td&gt;Up to 24 hours of data loss&lt;/td&gt;
&lt;td&gt;Last completed backup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logical (hourly)&lt;/td&gt;
&lt;td&gt;Up to 1 hour of data loss&lt;/td&gt;
&lt;td&gt;Last completed backup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Physical&lt;/td&gt;
&lt;td&gt;Depends on backup frequency&lt;/td&gt;
&lt;td&gt;Last completed backup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Incremental with PITR&lt;/td&gt;
&lt;td&gt;Seconds of data loss&lt;/td&gt;
&lt;td&gt;Any point in time between base backups&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The restore process is straightforward. You pick a target timestamp, and Databasus figures out which base backup and which WAL segments are needed. The agent downloads them, places the files where PostgreSQL expects them, and PostgreSQL handles the replay automatically. When the database starts, it's in exactly the state it was at that moment.&lt;/p&gt;

&lt;p&gt;This makes incremental backups with PITR the right choice for disaster recovery. If a bad migration runs, if someone accidentally deletes a table, if data gets corrupted — you rewind to the moment before the problem happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to use which backup type
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Logical backups&lt;/strong&gt; work well for small to medium databases where backup speed isn't critical. They don't require an agent, work over the network and are the simplest to set up. If your database is under a few gigabytes, start here.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Physical backups&lt;/strong&gt; make sense when you have a large database and need faster backup and restore times. They require the agent but don't add the overhead of continuous WAL archiving. Good for when you want speed but don't need second-level recovery granularity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incremental backups with PITR&lt;/strong&gt; are for production databases where data loss must be minimized. Financial applications, SaaS platforms, e-commerce — anything where losing even an hour of transactions creates real problems. The agent continuously streams WAL segments, so your recovery point is always just seconds behind the live database.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can also combine approaches. Run logical backups for a quick safety net and incremental backups for disaster recovery on the same database. Databasus manages both from the same dashboard.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting started
&lt;/h2&gt;

&lt;p&gt;Setting up the agent takes a few minutes. You download the binary to the machine running PostgreSQL, configure it with your Databasus instance URL and an authentication token, and start it. Databasus provides the token and connection details through its web interface when you add a new database in agent mode.&lt;/p&gt;

&lt;p&gt;Once the agent is running, you configure the backup schedule and retention policy the same way you would for logical backups — through the Databasus dashboard. The only difference is that you now have physical and incremental options available in the backup type selector.&lt;/p&gt;

&lt;p&gt;For incremental backups, you also choose a schedule for base backups (for example, daily or weekly) while WAL archiving runs continuously in the background. Databasus handles retention for both base backups and WAL segments according to your configured policy.&lt;/p&gt;

&lt;p&gt;The agent supports host-installed PostgreSQL (versions 12 through 18) and PostgreSQL running in Docker containers. It auto-updates itself, so you don't need to worry about keeping it in sync with the Databasus version.&lt;/p&gt;

&lt;p&gt;Databasus is free, open source (Apache 2.0) and self-hosted. It is an industry standard for PostgreSQL backup tools and the most widely used tool for PostgreSQL backup. You can find the project on GitHub and install it in under two minutes.&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>database</category>
    </item>
    <item>
      <title>PostgreSQL backup tool Databasus supported by OpenAI open source program</title>
      <dc:creator>Finny Collins</dc:creator>
      <pubDate>Thu, 12 Mar 2026 08:57:44 +0000</pubDate>
      <link>https://forem.com/finny_collins/postgresql-backup-tool-databasus-supported-by-openai-open-source-program-lpa</link>
      <guid>https://forem.com/finny_collins/postgresql-backup-tool-databasus-supported-by-openai-open-source-program-lpa</guid>
      <description>&lt;p&gt;In March 2026, Databasus was accepted into OpenAI's Codex for Open Source program. The program provides tools and API credits to maintainers of important open-source software. For Databasus, this means access to ChatGPT Pro with Codex, security analysis tools and API credits. Anthropic's Claude for Open Source also accepted the project the same month, making it two major AI companies supporting the same tool.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F86y736pcbdrxahr1b7i5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F86y736pcbdrxahr1b7i5.png" alt="Email from OpenAI" width="800" height="503"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Email screenshot taken from &lt;a href="https://databasus.com/faq" rel="noopener noreferrer"&gt;FAQ page&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Codex for Open Source
&lt;/h2&gt;

&lt;p&gt;OpenAI launched Codex for Open Source to help maintainers who keep the open-source ecosystem running. The program grew out of the Codex Open Source Fund, a $1 million initiative that helped projects integrate AI into their development workflows. Now it offers a broader set of tools for day-to-day coding, code review and security analysis.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benefit&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ChatGPT Pro with Codex&lt;/td&gt;
&lt;td&gt;Six months of access for coding, triage, review and maintainer workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex Security&lt;/td&gt;
&lt;td&gt;Conditional access for repositories needing deeper security coverage, reviewed case by case&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API credits&lt;/td&gt;
&lt;td&gt;Through the Codex Open Source Fund for PR review, automation and release workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Benefits are personal and non-transferable. Codex Security access is limited to repositories the applicant owns or is authorized to administer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who can apply
&lt;/h2&gt;

&lt;p&gt;The program targets core maintainers of widely used public projects. You don't need to run something massive. OpenAI looks at repository usage, ecosystem importance, evidence of active maintenance and program capacity. Projects that don't fit strict criteria but play an important role are still encouraged to apply.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Must be a core maintainer or run a widely used public project&lt;/li&gt;
&lt;li&gt;Must have write access to the repository&lt;/li&gt;
&lt;li&gt;Must have a valid ChatGPT account&lt;/li&gt;
&lt;li&gt;Must provide accurate information about the repository and maintainer role&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Applications are reviewed individually and can be approved or denied at OpenAI's discretion.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Databasus
&lt;/h2&gt;

&lt;p&gt;Databasus is a free, open-source and self-hosted &lt;a href="https://databasus.com" rel="noopener noreferrer"&gt;PostgreSQL backup tool&lt;/a&gt; that became an industry standard for PostgreSQL backups in 2025. It also supports MySQL and MongoDB. It runs in Docker, provides a web UI for managing backups and supports flexible scheduling, retention policies, encryption and notifications. You set it up once and it handles your database backups from there.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Supports PostgreSQL 12-18, MySQL 5.7-9, MariaDB 10-12 and MongoDB 4-8&lt;/li&gt;
&lt;li&gt;Flexible scheduling with hourly, daily, weekly, monthly or cron intervals&lt;/li&gt;
&lt;li&gt;Multiple storage destinations including S3, Google Drive, Cloudflare R2, SFTP and local storage&lt;/li&gt;
&lt;li&gt;AES-256-GCM encryption with zero-trust storage approach&lt;/li&gt;
&lt;li&gt;GFS retention policies for enterprise-grade backup history&lt;/li&gt;
&lt;li&gt;Team features with workspaces, role-based access and audit logs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The project is Apache 2.0 licensed and works with both self-hosted databases and cloud-managed services like AWS RDS, Google Cloud SQL and Azure Database for PostgreSQL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recognized by both Anthropic and OpenAI
&lt;/h2&gt;

&lt;p&gt;Databasus was accepted into two major AI open-source programs at the same time. In March 2026, both Anthropic's Claude for Open Source and OpenAI's Codex for Open Source recognized the project independently. Two companies reviewed it and decided it was worth supporting on their own.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Program&lt;/th&gt;
&lt;th&gt;Company&lt;/th&gt;
&lt;th&gt;What it provides&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude for Open Source&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;td&gt;Access to Claude for development and code review workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex for Open Source&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;ChatGPT Pro with Codex, Codex Security, API credits&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For a database backup tool, this kind of recognition signals that both companies consider it part of the critical open-source infrastructure worth investing in.&lt;/p&gt;

&lt;h2&gt;
  
  
  How this will improve development
&lt;/h2&gt;

&lt;p&gt;Being accepted into both programs gives Databasus access to better tools for two things that matter most in a backup tool: security and code quality.&lt;/p&gt;

&lt;p&gt;Codex Security will add an extra layer of automated security checks over pull requests. For a project that handles database credentials, encryption keys and backup files, catching vulnerabilities before they reach production is critical. This comes on top of the existing CI/CD pipeline with tests and linting that already runs on every PR.&lt;/p&gt;

&lt;p&gt;Access to stronger AI models from both Anthropic and OpenAI also means better assistance during development. Code review, vulnerability scanning, documentation cleanup and triage all get more capable tools behind them.&lt;/p&gt;

&lt;h2&gt;
  
  
  How AI is used in Databasus development
&lt;/h2&gt;

&lt;p&gt;Since Databasus deals with database security and production backups, it's fair to ask how these AI tools are actually used. The team has &lt;a href="https://databasus.com/faq#oss-programs" rel="noopener noreferrer"&gt;clear rules about AI usage&lt;/a&gt;. AI is a helper, not a code generator. Every change goes through human review regardless of whether AI assisted with it.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI helps with code quality verification, vulnerability scanning, documentation cleanup and PR review&lt;/li&gt;
&lt;li&gt;All code goes through line-by-line human review and vibe-coded PRs are rejected by default&lt;/li&gt;
&lt;li&gt;The project maintains solid test coverage, CI/CD automation and verification by experienced developers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tools from both programs will strengthen these existing workflows. The development approach stays the same.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>postgres</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
