Verifying that WAL streamers preserve exact database state — bit by bit.
🧭 Context
In the previous post, we explored the motivations behind building pgrwl
, a PostgreSQL WAL receiver designed for zero data loss (RPO=0) scenarios in containerized environments. We covered its architecture, features like compression/encryption, and its suitability for Kubernetes-based disaster recovery.
This follow-up post focuses on testing — specifically validating that pgrwl
produces WAL archives that are byte-for-byte identical to PostgreSQL’s official tool (pg_receivewal
) and that it supports full PITR (Point-in-Time Recovery) after abrupt system crashes.
🚀 Intro
Write-Ahead Logs (WALs) are at the heart of PostgreSQL’s crash recovery and replication capabilities. But what happens when we replace the native WAL receiver (pg_receivewal
) with a third-party tool like pgrwl
? Can we trust it to preserve data integrity byte-for-byte?
This post dives into a golden test designed to answer that question — by simulating real-world PostgreSQL workloads, abrupt crashes, and full recovery workflows.
Note: All Bash scripts shown here are simplified examples to illustrate the core logic. The full implementation with deep technical details and automation scripts is available in the pgrwl GitHub repository. This post focuses on explaining the primary test goal, rather than every integration nuance.
✅ Goal
To verify that:
-
pgrwl
can reliably stream WALs during active writes. - The restored database is identical to its pre-crash state.
- WAL files produced by
pgrwl
match those produced bypg_receivewal
bit-for-bit.
🛠️ Tools Used
- PostgreSQL 16+
-
pg_receivewal
— the official WAL receiver. -
pgrwl
— WAL receiver with encryption/compression/backends. -
pg_dumpall
,pgbench
- Bash for orchestration
🧪 Test Procedure: Step-by-Step
We simulate a live system, insert tons of data, kill everything mid-flight, and then recover from base backup + WALs.
1. Start PostgreSQL
Initialize a clean cluster:
initdb -D /tmp/pgdata
pg_ctl -D /tmp/pgdata -l logfile start
2. Launch WAL Receivers
Run both in parallel (in background):
pg_receivewal --slot=test_slot -D /tmp/pgwal_pg ...
pgrwl --mode=receive -c config.yml ...
3. Take a Base Backup
pg_basebackup \
--pgdata="/tmp/base_backup" \
--wal-method=none \
--checkpoint=fast \
--progress \
--no-password \
--verbose
4. Simulate Real Workload
Insert timestamps every second:
psql -c 'CREATE TABLE ticks(ts TIMESTAMPTZ DEFAULT now());'
while true; do psql -c 'INSERT INTO ticks DEFAULT VALUES;'; sleep 1; done &
Run pgbench
to add load:
pgbench -i -s 10
Create 100 tables in parallel:
for i in $(seq 1 100); do
psql -c "CREATE TABLE t_$i AS SELECT * FROM generate_series(1, 10000) AS g(id);" &
done
wait
5. Capture Golden Snapshot
pg_dumpall > /tmp/before.sql
Kill the ticks
inserter.
6. Simulate Crash
pkill -9 postgres || true
pg_ctl -D /tmp/pgdata -m immediate stop
rm -rf /tmp/pgdata
7. Restore from Base + WALs
cp -r /tmp/base_backup /tmp/pgdata
touch /tmp/pgdata/recovery.signal
echo "restore_command = 'pgrwl restore-command --serve-addr=127.0.0.1:7070 %f %p'" >> /tmp/pgdata/postgresql.conf
💡 Rename all *.partial WALs to their final names before restart.
8. Restart PostgreSQL
pg_ctl -D /tmp/pgdata -l logfile start
Wait for recovery to complete.
9. Validate Database Consistency
pg_dumpall > /tmp/after.sql
diff -u /tmp/before.sql /tmp/after.sql
✅ Expect: No differences.
Also verify ticks
table for the latest inserted row — confirming no data loss.
10. Compare WAL Files
diff -r /tmp/pgwal_pg /tmp/pgwal_pgrwl
✅ Expect: Identical content and filenames.
📉 Post-Crash: Retest on New Timeline
Restart both WAL streamers on a new timeline (due to crash + recovery) and verify they pick up correctly.
Then rerun the diff again.
🧠 What This Test Proves
- WALs received by pgrwl are valid and byte-identical to official ones.
- PostgreSQL can recover from pgrwl's archived WALs to the latest committed transaction.
🔬 Bonus: Add Compression and Encryption
Add this to the config:
compression:
algo: gzip
encryption:
algo: aesgcm
pass: "${PGRWL_ENCRYPT_PASS}"
💡 WALs will no longer match byte-for-byte (they’re transformed), but recovery should still work identically.
✅ Conclusion
Testing WAL archiving isn’t just about receiving files — it’s about trust.
This golden test validates pgrwl
as a reliable WAL receiver with byte-level fidelity and advanced features
like encryption and compression.
📦 Check out the code: github.com/hashmap-kz/pgrwl
🙌 Get Involved
pgrwl
is an open-source project built for the PostgreSQL community — and your feedback matters!
- 🐞 Found a bug? Open an issue
- 💡 Have an idea or feature request? We'd love to hear it.
- 🧪 Want to improve WAL testing coverage? Run the integration tests or add your own cases.
- 🔧 Found a rough edge or an unclear doc? Contributions are always welcome.
Start by starring ⭐ the repo, trying it out in your own cluster, and sharing what you learn.
Let’s build better PostgreSQL backup tooling — together.
Top comments (2)
This is a thorough approach to byte-level verification, but is it always necessary for practical disaster recovery that WAL files are completely bit-for-bit identical, or could there be cases where logical equivalence suffices?
Hello, the very first MVP I made showed poor performance because of a message processing loop that was missing one small but significant detail here: github.com/hashmap-kz/pgrwl/blob/m.... This caused a lot of unnecessary confirmation requests.
Additionally, I later optimized the fsync functions with syscalls, which provided further performance improvements.
So this test is not only for byte-level verification, but also for timing. I rebuilt the pg_receivewal binary from source, injecting log messages and timing, and did the same for pgrwl, to compare that the flow is identical.
The very first implementation lagged behind pg_receivewal, whereas the current version performs similarly.