Forem: Mateus Rauli

How fsync and synchronous_commit Affect PostgreSQL Performance

Mateus Rauli — Mon, 05 May 2025 19:19:57 +0000

While diving into PostgreSQL's configuration options, two settings caught my attention: fsync and synchronous_commit. At first glance, they might seem like minor toggles, but a closer look reveals their critical role in balancing database performance and data durability. Through research and experimentation, I discovered how these settings can significantly affect not just the performance of your PostgreSQL instance, but also its ability to recover from failures. In this post, I’ll guide you through the inner workings of these settings, their trade-offs, and how to make informed decisions based on your use case.

What is fsync

The fsync parameter in PostgreSQL determines whether all changes to the database are physically written to the disk before a transaction is confirmed. When fsyncis enabled, PostgreSQL issues fsync() system calls (or equivalent methods) to ensure that data is securely flushed to disk. This guarantees that the database cluster can recover to a consistent state in the event of an operating system or hardware crash.

When fsync is disabled, PostgreSQL skips this step, which can dramatically improve performance but introduces significant risks. In the event of a crash or unexpected shutdown, data that was not yet written to disk can be lost or corrupted. Worse, because both the Write-Ahead Log (WAL) and checkpoints are critical for maintaining the consistency of the entire cluster, a crash with fsync disabled could render the entire cluster unusable, requiring a full restore from backups.

What is synchronous_commit

The synchronous_commit determines how PostgreSQL handles transaction commits in relation to the WAL. When enabled, PostgreSQL ensures that the WAL records for a transaction are flushed to durable storage before the transaction is reported as committed. This provides a high level of data safety, ensuring that committed transactions are not lost even in the case of a crash.

When synchronous_commit is disabled, PostgreSQL does not wait for the WAL to be flushed to disk. Instead, it relies on the operating system to handle the flushing asynchronously. This can significantly improve transaction throughput but comes with a trade-off: in the event of a crash, some transactions reported as committed might not actually be saved.

Testing performance with fsync on/off

To benchmark the performance impact, I utilized pgbench with 10 connections and 2 threads.

pgbench -c 10 -j 2 -T 30

With fsync disabled, we observed a performance improvement of approximately 58% in TPS and a latency reduction of about 37% when compared to having it enabled.

Testing performance with synchronous_commit on/off

With synchronous_commit disabled, we observed a performance improvement of approximately 3.5% in TPS and a latency reduction of about 3.4% when compared to having it enabled. When both fsync and synchronous_commit were disabled, there was an additional performance boost, resulting in a 10.7% improvement in TPS and a 6.5% reduction in latency compared to having only synchronous_commit disabled.

Conclusion

These results highlight the need to carefully evaluate the trade-offs between performance and data safety based on the application requirements. For production environments, it's generally recommended to keep fsync enabled and consider disabling synchronous_commit only if the risk of losing recent transactions is acceptable.
However, for high-throughput applications where performance is critical and occasional data loss is tolerable, such as analytics pipelines, caching layers, or systems with external redundancy, relaxing durability guarantees can lead to significant throughput improvements. Ultimately, tuning these settings should be guided by thorough benchmarking and a deep understanding of your workload's tolerance for failure.

How TOAST and Tombstones Work in PostgreSQL

Mateus Rauli — Mon, 14 Apr 2025 19:31:03 +0000

In relational databases like PostgreSQL, seemingly simple operations — such as updating JSON field — can hide significant complexity. Behind the scenes, mechanisms like TOAST and tombstones come into play, impacting performance, disk usage and even query efficiency.

Depending on how data is structured, even a tiny modification can trigger disk rewrites, compression processes or silent fragmentation. In this article we'll explore:

What TOAST is, when it activates, and why it matters for large data types
The role of tombstones and how they affect read/write operations
A practical benchmark, exposing how tiny updates can cause surprising overhead

If you work with types like text[], jsonb, or dynamically growing data, understading these concepts is key to avoiding decisions that might compromise your database's scalability.

What TOAST is

TOAST (The Oversized-Attribute Storage Technique) is PostgreSQL's clever solution for handling large data values that exceed the database's default page size (commonly 8 kB). When a column's data — like a lengthy text field, a hefty jsonb object or an array — would otherwise bloat a table row and degrade performance, TOAST steps in. It automatically compresses, slices, or even moves the data out-of-line into a secondary storage area, leaving behind only a compact reference in the main table. This optimization keeps frequent operations (like full-table scans) efficient, but it's not free: updates to TOASTed data can introduce overhead, as PostgreSQL may need to rewrite or recompress chunks behind the scenes.

TOAST targets variable-length or potentially large data types, including:

JSON/JSONB: Especially when storing deeply nested or verbose documents.
Text data: text, varchar (if values exceed ~2KB, even with varchar(n)’s length limit).
Binary data: bytea (e.g., images, files).
Geometric types: PostgreSQL’s built-in path, polygon, or spatial types like PostGIS geometry/geography.

Toast example

-- Step 1: Create a table with a JSONB column (toastable)
CREATE TABLE toast_demo (
  id SERIAL PRIMARY KEY,
  small_data TEXT,           -- Will NOT be toasted
  large_data JSONB           -- Will be toasted
);

-- Step 2: Insert a small record (no TOAST)
INSERT INTO toast_demo (small_data, large_data)
VALUES ('short text', '{"key": "small_value"}');

-- Step 3: Insert a large JSONB payload (triggers TOAST)
INSERT INTO toast_demo (small_data, large_data)
VALUES ('short text', 
        jsonb_build_object(
          'key', 'value',
          'nested', (SELECT array_agg(g) FROM generate_series(1, 10000) AS g
        )));

-- This query should return the TOAST table
SELECT reltoastrelid::regclass 
FROM pg_class 
WHERE relname = 'toast_demo';   

-- Step 4: Verify TOAST usage
SELECT 
  pg_size_pretty(pg_relation_size('toast_demo')) AS main_table_size,
  pg_size_pretty(pg_relation_size('pg_toast.pg_toast_32803')) AS toast_size
FROM pg_class
WHERE relname = 'toast_demo';

-- Step 5: Observe UPDATE overhead
EXPLAIN ANALYZE UPDATE toast_demo SET large_data = large_data || '{"new_key": "value"}' WHERE id = 2;

Key observations: For a single-row update, the execution time of 26.5ms is slightly slower than typical non-TOASTed updates (which usually take 1-5ms). This overhead occurs because PostgreSQL must:

Fetch and decompress the TOASTed large_data value.
Modify it (appending {"new_key": "value"}).
Recompress and potentially relocate the data in the TOAST table.

When this becomes a problem ?

The 26.5ms latency is acceptable if:

Updates are infrequent (e.g., background jobs rather than user-facing operations).
The JSONB payload is very large (>10KB) and requires compression.

Investigate further if you see:

Updates exceeding >100ms (indicates severe TOAST fragmentation or bloat).
Concurrent updates causing lock contention (check for blocked queries in pg_stat_activity).
Autovacuum falling behind on TOAST table maintenance (monitor with pg_stat_user_tables).

The role of Tombstones and their impact on Read/Write Operations

In PostgreSQL, tombstones (often called "dead tuples") are remnants of rows that have been deleted or updated but not yet physically removed from disk. They play crucial role in PostgreSQL's MVCC (Multi-Version Concurrency Control) system, but if left unchecked, they can degrade performance and bloat storage.

Tombstones are created on:

delete - When a row is deleted, it's not immediately erased — instead, it's marked as a "dead" (a tombstone).
update - PostgreSQL treats updates as a delete + insert, leaving the old row version as a tombstone.
vacuum - PostgreSQL's autovacuum daemon (or manual vacuum) eventually cleans up these tombstones, reclaiming space.

How tombstones affect performance

On the read side, they contribute to table bloat by forcing the database to scan through dead rows that remain in heap pages, while index scans must still check MVCC visibility for these obsolete entries, adding CPU overhead. The presence of excessive tombstones can also prevent PostgreSQL from using visibility map optimizations, slowing down sequential scans. For write operations, update-heavy workloads suffer from write amplification as each modification generates new tombstones, requiring addition I/O for the same logical changes. This accumulation poses serious risks, including transaction ID wraparound if VACUUM can't keep pace with tombstone generation, potentially leading to database shutdowns. Storage efficiency takes a hit as well, with dead tuples occupying disk space until vacuumed — sometimes doubling a table's footprint — while leaving behind fragmented pages that degrade storage utilization. These compounding effects make proper tombstone management crucial for maintaining database performance and stability.

Tombstone Example

-- Step 1: Create a table and disable autovacuum (for demo purposes)
CREATE TABLE tombstone_demo (
  id SERIAL PRIMARY KEY,
  data TEXT
);

ALTER TABLE tombstone_demo SET (autovacuum_enabled = false);

-- Step 2: Insert initial data
INSERT INTO tombstone_demo (data) VALUES ('original_value');

-- Step 3: Check dead tuples (should be 0)
SELECT n_dead_tup FROM pg_stat_user_tables WHERE relname = 'tombstone_demo';

-- Step 4: Run 1000 updates (each creates a dead tuple)
DO $$
BEGIN
  FOR i IN 1..1000 LOOP
    UPDATE tombstone_demo SET data = 'updated_' || i WHERE id = 1;
  END LOOP;
END $$;

-- Step 5: Verify dead tuples accumulated
SELECT n_dead_tup FROM pg_stat_user_tables WHERE relname = 'tombstone_demo';

-- Step 6: Show table bloat
SELECT pg_size_pretty(pg_relation_size('tombstone_demo')) AS size_with_bloat;

-- Step 7: Manual VACUUM to clean tombstones
VACUUM (VERBOSE) tombstone_demo;

-- Step 8: Confirm if dead tuples are gone
SELECT n_dead_tup FROM pg_stat_user_tables WHERE relname = 'tombstone_demo';

Even when repeatedly updating the same row, PostgreSQL doesn't overwrite the original data. Instead, it creates a new version of the row and marks the previous one as a dead tuple. This means that with every UPDATE, the table accumulates older versions of the data. By disabling autovacuum, we can observe how these dead tuples accumulate without automatic cleanup. The result? The table begins to bloat on disk, even though it logically contains just one row.

The VACUUM command removes these old versions and makes the space reusable, but doesn't physically reduce the table's size on disk.

VACUUM FULL goes further: it creates a compacted copy of the table, eliminating all empty space and returning it to the operating system. However, this process is blocking (LOCK TABLE) and resource-intensive, as it must completely rewrite the table from scratch. For production systems where downtime isn't an option, the pg_repack extension is the superior alternative. It delivers all the benefits of VACUUM FULL (actual space recovery) without table locking, operating in parallel with normal database operations.

Final thoughts

While TOAST and tombstones are powerful mechanisms that support PostgreSQL’s flexibility and MVCC architecture, they also introduce performance trade-offs — especially in high-write or JSON-heavy applications.

Monitoring tools like pg_stat_user_tables, using VACUUM VERBOSE, and periodically checking bloat via pgstattuple or pg_repack can go a long way in keeping your database healthy and efficient.

POSTGRESQL - ÍNDICE GIN NA PRÁTICA

Mateus Rauli — Thu, 09 Jan 2025 20:53:44 +0000

Hoje vim relatar minha experiência com o índice GIN, que resultou numa melhora muito positiva na performance de algumas das consultas mais lentas do sistema que atuei.

Introdução ao GIN INDEX

O GIN (Generalized Inverted iNdex) é um índice designado para lidar com tipos de dados que são subdivisíveis, ou seja, podem ser "divididos" em partes menores, permitindo a pesquisa de valores individuais. É muito utilizado para consultas em dados estruturados e não estruturados como array, jsonb e campos de texto para busca full-text.

Introduzido na versão 8.2 do PostgreSQL, rapidamente se tornou uma solução indispensável para cenários onde se precisa de buscas rápidas em dados complexos. Diferente do índice padrão B-TREE que é mais adequado para buscas de valores únicos, o GIN é otimizado para casos onde o mesmo valor pode estar associado a vários registros, assim como os valores de um array.

Como eu utilizei o GIN INDEX

Certo dia, me deparei com um problema crítico no sistema: uma consulta estava causando uma lentidão significativa e prejudicando o desempenho geral da aplicação. Após analisar, percebi que ela fazia uso de uma coluna do tipo ARRAY em seu filtro.

Consultando a documentação oficial do PostgreSQL, encontrei o índice GIN como uma possível solução. Decidi implementá-lo na coluna problemática para avaliar o impacto. O resultado foi impressionante: o tempo de execução da consulta caiu drasticamente e, com isso, o sistema se tornou visivelmente mais ágil.

O impacto foi ainda maior porque a coluna onde apliquei o índice GIN era amplamente utilizada em diversas partes do sistema, tornando essa melhoria um divisor de águas para a performance geral. Essa experiência reforçou minha percepção de por que o GIN é frequentemente chamado de "índice mágico".

Desvantagens da utilização do GIN Index

Como visto até então, o índice traz sim diversas vantagens quando o assunto é lidar com dados complexos, porém nem tudo é perfeito e ele possui desvantagens que devem ser consideradas e analisadas caso você esteja pensando em implementar ele na sua tabela.

Uma grande desvantagem são as operações de escrita que se tornam mais custosas já que o índice deve se atualizar para refletir as mudanças dos dados, ou seja, cada operação que altera o dado exige a atualização do índice também, o que é mais custoso em tabelas que possuem alta frequência de escrita.

Além disso, o GIN consome mais memória do que os outros índices, tanto de armazenamento quanto de consulta, especialmente se a coluna indexada contiver muitos valores únicos ou complexos.

Exemplo de criação do índice

Para poder exemplificar o uso do índice, criei uma tabela chamada users contendo apenas id e data.

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    data JSONB
);

Para criarmos o índice, basta usar o USING indicando qual o tipo de índice que será utilizado

CREATE INDEX idx_data_gin ON users USING gin (data);

Pronto! Índice criado, agora ao analisar a consulta será possível notar que ele utilizara o índice criado para buscar a informação.

Query plan
"Bitmap Heap Scan on users u  (cost=12.00..16.01 rows=1 width=68) (actual time=0.022..0.023 rows=1 loops=1)"
"  Recheck Cond: (data @> '{""name"": ""Mateus""}'::jsonb)"
"  Heap Blocks: exact=1"
"  ->  Bitmap Index Scan on idx_data_gin  (cost=0.00..12.00 rows=1 width=0) (actual time=0.011..0.011 rows=1 loops=1)"
"        Index Cond: (data @> '{""name"": ""Mateus""}'::jsonb)"
"Planning Time: 0.120 ms"
"Execution Time: 0.061 ms"

Note que você sempre verá Bitmap Index Scan no plano de execução pois este é o plano de varredura compatível com o índice GIN. Ou seja, isso não significa que seu índice está sendo usado de maneira ineficiente, apenas que é o comportamento esperado.

GIN Multicoluna, uma alternativa para B-TREE x GIN

Em consultas que envolvem colunas com características distintas, pode ser necessário combinar índices B-TREE e GIN. Uma abordagem inicial seria criar índices separados, com um índice B-TREE para colunas que lidam bem com valores únicos e um índice GIN para colunas com dados mais complexos. Embora essa solução funcione, ela pode não ser eficiente em termos de desempenho, pois cada índice é avaliado separadamente.

Uma alternativa interessante é o uso de índices GIN multicoluna. Com essa abordagem, é possível indexar várias colunas em um único índice, cobrindo diferentes tipos de dados. No entanto, para colunas que não são subdivisíveis (como integer ou timestamp), é necessário habilitar a extensão btree_gin do PostgreSQL. Essa extensão permite que esses tipos sejam indexados de maneira compatível dentro de um índice GIN.

CREATE EXTENSION btree_gin;
CREATE INDEX ON records USING gin (data, customer_id);

Com esses comandos você consegue criar um índice GIN multicoluna.

Embora seja uma solução menos comum, ela pode ser útil em cenários específicos. No entanto, é importante considerar que índices maiores resultam em maior uso de I/O e custos adicionais em operações de escrita, como inserções e atualizações.

Considerações finais

Com grandes poderes, vem grandes responsabilidades.

Embora o GIN seja extremamente eficiente em cenários específicos, ele não é uma solução universal para todos os problemas relacionados à performance de consultas. Em muitos casos, o tradicional B-TREE continua sendo a escolha mais adequada, especialmente para consultas simples ou quando se busca por valores únicos. É essencial avaliar cuidadosamente o problema em questão para determinar se a inclusão de um índice é realmente necessária e, caso seja, qual o tipo de índice que melhor atenderá às necessidades do sistema.

Referências

https://www.postgresql.org/docs/current/gin.html
https://pganalyze.com/blog/gin-index