DEV Community

Cover image for Testing PostgreSQL WAL Streamers for Byte-Level Fidelity
alexey.zh
alexey.zh

Posted on

3 2 2 2 2

Testing PostgreSQL WAL Streamers for Byte-Level Fidelity

Verifying that WAL streamers preserve exact database state — bit by bit.

🧭 Context

In the previous post, we explored the motivations behind building pgrwl, a PostgreSQL WAL receiver designed for zero data loss (RPO=0) scenarios in containerized environments. We covered its architecture, features like compression/encryption, and its suitability for Kubernetes-based disaster recovery.

This follow-up post focuses on testing — specifically validating that pgrwl produces WAL archives that are byte-for-byte identical to PostgreSQL’s official tool (pg_receivewal) and that it supports full PITR (Point-in-Time Recovery) after abrupt system crashes.

🚀 Intro

Write-Ahead Logs (WALs) are at the heart of PostgreSQL’s crash recovery and replication capabilities. But what happens when we replace the native WAL receiver (pg_receivewal) with a third-party tool like pgrwl? Can we trust it to preserve data integrity byte-for-byte?

This post dives into a golden test designed to answer that question — by simulating real-world PostgreSQL workloads, abrupt crashes, and full recovery workflows.

Note: All Bash scripts shown here are simplified examples to illustrate the core logic. The full implementation with deep technical details and automation scripts is available in the pgrwl GitHub repository. This post focuses on explaining the primary test goal, rather than every integration nuance.

Integration Test Source Code

✅ Goal

To verify that:

  • pgrwl can reliably stream WALs during active writes.
  • The restored database is identical to its pre-crash state.
  • WAL files produced by pgrwl match those produced by pg_receivewal bit-for-bit.

🛠️ Tools Used

  • PostgreSQL 16+
  • pg_receivewal — the official WAL receiver.
  • pgrwl — WAL receiver with encryption/compression/backends.
  • pg_dumpall, pgbench
  • Bash for orchestration

🧪 Test Procedure: Step-by-Step

We simulate a live system, insert tons of data, kill everything mid-flight, and then recover from base backup + WALs.

1. Start PostgreSQL

Initialize a clean cluster:

initdb -D /tmp/pgdata
pg_ctl -D /tmp/pgdata -l logfile start
Enter fullscreen mode Exit fullscreen mode

2. Launch WAL Receivers

Run both in parallel (in background):

pg_receivewal --slot=test_slot -D /tmp/pgwal_pg ...
pgrwl --mode=receive -c config.yml ...
Enter fullscreen mode Exit fullscreen mode

3. Take a Base Backup

pg_basebackup \
  --pgdata="/tmp/base_backup" \
  --wal-method=none \
  --checkpoint=fast \
  --progress \
  --no-password \
  --verbose
Enter fullscreen mode Exit fullscreen mode

4. Simulate Real Workload

Insert timestamps every second:

psql -c 'CREATE TABLE ticks(ts TIMESTAMPTZ DEFAULT now());'
while true; do psql -c 'INSERT INTO ticks DEFAULT VALUES;'; sleep 1; done &
Enter fullscreen mode Exit fullscreen mode

Run pgbench to add load:

pgbench -i -s 10
Enter fullscreen mode Exit fullscreen mode

Create 100 tables in parallel:

for i in $(seq 1 100); do
  psql -c "CREATE TABLE t_$i AS SELECT * FROM generate_series(1, 10000) AS g(id);" &
done
wait
Enter fullscreen mode Exit fullscreen mode

5. Capture Golden Snapshot

pg_dumpall > /tmp/before.sql
Enter fullscreen mode Exit fullscreen mode

Kill the ticks inserter.

6. Simulate Crash

pkill -9 postgres || true
pg_ctl -D /tmp/pgdata -m immediate stop
rm -rf /tmp/pgdata
Enter fullscreen mode Exit fullscreen mode

7. Restore from Base + WALs

cp -r /tmp/base_backup /tmp/pgdata
touch /tmp/pgdata/recovery.signal
echo "restore_command = 'pgrwl restore-command --serve-addr=127.0.0.1:7070 %f %p'" >> /tmp/pgdata/postgresql.conf
Enter fullscreen mode Exit fullscreen mode

💡 Rename all *.partial WALs to their final names before restart.

8. Restart PostgreSQL

pg_ctl -D /tmp/pgdata -l logfile start
Enter fullscreen mode Exit fullscreen mode

Wait for recovery to complete.

9. Validate Database Consistency

pg_dumpall > /tmp/after.sql
diff -u /tmp/before.sql /tmp/after.sql
Enter fullscreen mode Exit fullscreen mode

✅ Expect: No differences.

Also verify ticks table for the latest inserted row — confirming no data loss.

10. Compare WAL Files

diff -r /tmp/pgwal_pg /tmp/pgwal_pgrwl
Enter fullscreen mode Exit fullscreen mode

✅ Expect: Identical content and filenames.

📉 Post-Crash: Retest on New Timeline

Restart both WAL streamers on a new timeline (due to crash + recovery) and verify they pick up correctly.

Then rerun the diff again.

🧠 What This Test Proves

  • WALs received by pgrwl are valid and byte-identical to official ones.
  • PostgreSQL can recover from pgrwl's archived WALs to the latest committed transaction.

🔬 Bonus: Add Compression and Encryption

Add this to the config:

compression:
  algo: gzip
encryption:
  algo: aesgcm
  pass: "${PGRWL_ENCRYPT_PASS}"
Enter fullscreen mode Exit fullscreen mode

💡 WALs will no longer match byte-for-byte (they’re transformed), but recovery should still work identically.


✅ Conclusion

Testing WAL archiving isn’t just about receiving files — it’s about trust.
This golden test validates pgrwl as a reliable WAL receiver with byte-level fidelity and advanced features
like encryption and compression.

📦 Check out the code: github.com/hashmap-kz/pgrwl


🙌 Get Involved

pgrwl is an open-source project built for the PostgreSQL community — and your feedback matters!

  • 🐞 Found a bug? Open an issue
  • 💡 Have an idea or feature request? We'd love to hear it.
  • 🧪 Want to improve WAL testing coverage? Run the integration tests or add your own cases.
  • 🔧 Found a rough edge or an unclear doc? Contributions are always welcome.

Start by starring ⭐ the repo, trying it out in your own cluster, and sharing what you learn.

Let’s build better PostgreSQL backup tooling — together.

Sentry image

Make it make sense

Make sense of fixing your code with straight-forward application monitoring.

Start debugging →

Top comments (2)

Collapse
 
jessicajaybrown profile image
Jessica Brown

This is a thorough approach to byte-level verification, but is it always necessary for practical disaster recovery that WAL files are completely bit-for-bit identical, or could there be cases where logical equivalence suffices?

Collapse
 
alzhi_f93e67fa45b972 profile image
alexey.zh

Hello, the very first MVP I made showed poor performance because of a message processing loop that was missing one small but significant detail here: github.com/hashmap-kz/pgrwl/blob/m.... This caused a lot of unnecessary confirmation requests.
Additionally, I later optimized the fsync functions with syscalls, which provided further performance improvements.
So this test is not only for byte-level verification, but also for timing. I rebuilt the pg_receivewal binary from source, injecting log messages and timing, and did the same for pgrwl, to compare that the flow is identical.
The very first implementation lagged behind pg_receivewal, whereas the current version performs similarly.

Tiger Data image

🐯 🚀 Timescale is now TigerData: Building the Modern PostgreSQL for the Analytical and Agentic Era

We’ve quietly evolved from a time-series database into the modern PostgreSQL for today’s and tomorrow’s computing, built for performance, scale, and the agentic future.

So we’re changing our name: from Timescale to TigerData. Not to change who we are, but to reflect who we’ve become. TigerData is bold, fast, and built to power the next era of software.

Read more

👋 Kindness is contagious

Dive into this thoughtful piece, beloved in the supportive DEV Community. Coders of every background are invited to share and elevate our collective know-how.

A sincere "thank you" can brighten someone's day—leave your appreciation below!

On DEV, sharing knowledge smooths our journey and tightens our community bonds. Enjoyed this? A quick thank you to the author is hugely appreciated.

Okay