DEV Community

Franck Pachot for MongoDB

Posted on • Edited on

4

JSONB DeTOASTing (read amplification)

PostgreSQL limits tuple sizes to a quarter of the block size, generally capping at 2KB. In document data modeling, where documents represent business transactions, sizes often exceed this limit. Storing entire transaction documents as a single JSONB can lead to compression and splitting via TOAST (The Oversized-Attribute Storage Technique). While this is suitable for static documents that are infrequently accessed, it is less optimal for queries on documents. Let's take an example to detect this issue.

I create a user profile table similar to the previous post, but adding a bio field with large text:


create table users (
  id bigserial primary key,
  data jsonb not null
);

INSERT INTO users (data)
SELECT
  jsonb_build_object(
    'name', 'u' || n::text,
    'bio', (SELECT string_agg(chr(trunc(random() * (126 - 33) + 33)::int), '')   FROM generate_series(1, 5000)),
    'email', jsonb_build_array(
      'u' || n::text || '@compuserve.com'
    ),
    'login', jsonb_build_object(
      'last', to_char(current_timestamp, 'YYYY-MM-DD'),
      'count', 1
    )
  )
FROM generate_series(1, 100000) n
;
vacuum analyze users
;
Enter fullscreen mode Exit fullscreen mode

I check the size of the table, and also the TOAST overflow:

with users_cte as (
  select * from pg_class
  where oid = 'users'::regclass
)
select oid, relname, relpages, reltuples, reltoastrelid
 from users_cte
union all
select oid, relname, relpages, reltuples, reltoastrelid
 from pg_class
where oid = (select reltoastrelid from users_cte)
;

  oid  |    relname     | relpages | reltuples | reltoastrelid 
-------+----------------+----------+-----------+---------------
 32314 | users          |      736 |    100000 |         32318
 32318 | pg_toast_32314 |    71430 |    300000 |             0

Enter fullscreen mode Exit fullscreen mode

The table contains 100000 rows across 736 pages, with three chunks per row stored externally in the TOAST table. This results in a total of 736 + 71430 = 72166 pages. Each chunk is approximately (71430 * 8192) / 300000 = 1951 bytes, ensuring that tuples remain under 2KB.

I use EXPLAIN ANALYZE to query all documents, which shows the number of pages accessed. To retrieve the JSON document, I apply SERIALIZE, as EXPLAIN does not fetch it by default:

explain (analyze, verbose, buffers, wal, serialize text, costs off)
select id, data
from users
;

                                QUERY PLAN                                
--------------------------------------------------------------------------
 Seq Scan on public.users (actual time=0.040..10.973 rows=100000 loops=1)
   Output: id, data
   Buffers: shared read=736
 Planning Time: 0.038 ms
 Serialization: time=2482.359 ms  output=509831kB  format=text
   Buffers: shared hit=384887 read=72253
 Execution Time: 2504.164 ms

Enter fullscreen mode Exit fullscreen mode

The table scan read 736 pages from the base table (shared read=736), while reading the JSONB content accessed 72253 pages from the TOAST (shared read=72253). Each TOAST page was read an average of 384887 / 72253 = 5 times, fortunately staying in the shared buffer cache (shared hit=384887) but accessing to shared buffers costs CPU and lightweight locks.

We observe a fivefold read amplification when querying the JSONB column, as it requires de-toasting.

In a document database, you can retrieve the entire document or access specific fields for filtering and sorting. For example, I include a projection of the "name" field.

explain (analyze, verbose, buffers, wal, serialize text, costs off)
select id, data
, (data->>'name')
from users
;

                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Seq Scan on public.users (actual time=0.085..532.367 rows=100000 loops=1)
   Output: id, data, (data ->> 'name'::text)
   Buffers: shared hit=384887 read=72989
 Planning Time: 0.039 ms
 Serialization: time=2276.025 ms  output=510222kB  format=text
   Buffers: shared hit=457140
 Execution Time: 2819.235 ms

Enter fullscreen mode Exit fullscreen mode

PostgreSQL lacks optimization for accessing JSONB, which may lead to multiple de-toasting. As indicated in the Output of the EXPLAIN VERBOSE, the scan has two projections in addition to the whole document, and access many times to the TOAST pages as indicated by shared hit=384887 read=72989. Retrieving the whole document causes an additional shared hit=457140.

I continue by projecting one more field, "login.last":

explain (analyze, verbose, buffers, wal, serialize text, costs off)
select id, data
, (data->'login'->>'last')
, (data->>'name')
from users
;

                                         QUERY PLAN                                          
---------------------------------------------------------------------------------------------
 Seq Scan on public.users (actual time=0.079..855.246 rows=100000 loops=1)
   Output: id, data, ((data -> 'login'::text) ->> 'last'::text), (data ->> 'name'::text)
   Buffers: shared hit=842027 read=72989
 Planning Time: 0.040 ms
 Serialization: time=2261.679 ms  output=511589kB  format=text
   Buffers: shared hit=457140
 Execution Time: 3128.017 ms

Enter fullscreen mode Exit fullscreen mode

Even when both projections occur in the same scan, the JSONB document is de-toasted twice: once for each field, resulting in shared hit=842027 read=72989.

To avoid retrieving the entire document, I project only the fields I need. I also run the query with an additional field, "login.count", but exclude "data" from the SELECT statement:

explain (analyze, verbose, buffers, wal, serialize text, costs off)
select id
, (data->'login'->>'count')::int
, (data->'login'->>'last')
, (data->>'name')
from users
;

                                                                                                                                       QUERY PLAN                                                                   
-----------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.users (actual time=0.087..1149.260 rows=100000 loops=1)
   Output: id, (((data -> 'login'::text) ->> 'count'::text))::integer, ((data -> 'login'::text) ->> 'last'::text), (data ->> 'name'::text)
   Buffers: shared hit=1299167 read=72989
 Planning Time: 0.042 ms
 Serialization: time=21.837 ms  o

Enter fullscreen mode Exit fullscreen mode

I saved the reads for serializing the result, but the scan indicated shared hit=1299167 read=72989. It accessed only the necessary data from disk: 72989 pages from the base and TOAST tables. However, to read only three fields, it accessed the buffers 1299167 / 72989 = 18 times, leading to excessive CPU usage and potential lightweight lock contention during concurrent access.

To workaround this problem, do not use the JSON operators to access fields, but use JSON functions to extract the required fields to SQL columns like this:

select "user"."name", "login".last, "login"."count"
from "users",                         -- the table with JSONB
 lateral jsonb_to_record("data")      -- first level fields
 as "user" ("name" text, login jsonb),
 lateral jsonb_to_record("login")     -- sub-document fields
 as "login" ("last" date, count int)
Enter fullscreen mode Exit fullscreen mode

This will detoast the JSONB document once and get back to the same number of shared buffer hits and reads as when reading only one field:

                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Nested Loop (actual time=0.092..793.016 rows=100000 loops=1)
   Output: "user".name, login.last, login.count
   Buffers: shared hit=384887 read=72989
   ->  Nested Loop (actual time=0.085..656.744 rows=100000 loops=1)
         Output: "user".name, "user".login
         Buffers: shared hit=384887 read=72989
         ->  Seq Scan on public.users (actual time=0.041..11.807 rows=100000 loops=1)
               Output: users.id, users.data
               Buffers: shared read=736
         ->  Function Scan on pg_catalog.jsonb_to_record "user" (actual time=0.006..0.006 rows=1 loops=100000)
               Output: "user".name, "user".login
               Function Call: jsonb_to_record(users.data)
               Buffers: shared hit=384887 read=72253
   ->  Function Scan on pg_catalog.jsonb_to_record login (actual time=0.001..0.001 rows=1 loops=100000)
         Output: login.last, login.count
         Function Call: jsonb_to_record("user".login)
 Planning Time: 0.068 ms
 Serialization: time=25.103 ms  output=3017kB  format=text
 Execution Time: 828.917 ms
Enter fullscreen mode Exit fullscreen mode

Using jsonb_to_record(), the JSONB of each row is accessed as a single document, allowing for efficient retrieval and manipulation of user information from the output in the process memory. However, this approach cannot be used if field access belongs to an index condition.


In PostgreSQL, JSONB is a datatype for a single column, to store and get the entire document. In contrast, a document database like MongoDB reads and writes individual fields, keeping the document in memory for quick access, like an object cache. Its WiredTiger storage engine decompresses the on-disk representation when loading to memory and compresses it during cache eviction or checkpoint.

I do not recommend to use PostgreSQL blindly as a document database. Instead, fields requiring individual access should be normalized into SQL columns rather than being embedded in a JSONB document. To identify suboptimal designs, examine the shared buffer hits using EXPLAIN ANALYZE with SERIALIZATION. If you choose a document model, then use a document database that was build to access and index the document fields efficiently.

Top comments (1)

Collapse
 
franckpachot profile image
Franck Pachot

There was a discussion about this on LinkedIn saying that it is not specific to the JSON data type, and that's true, I can observe the same with a ARRAY: dbfiddle.uk/Sl2ipvEV (I thought it could use an extended format in memory to avoid that, but apparently not). However, I see more developers accessing multiple JSON fields because the API looks like accessing an in-memory object.

👋 Kindness is contagious

Explore this compelling article, highly praised by the collaborative DEV Community. All developers, whether just starting out or already experienced, are invited to share insights and grow our collective expertise.

A quick “thank you” can lift someone’s spirits—drop your kudos in the comments!

On DEV, sharing experiences sparks innovation and strengthens our connections. If this post resonated with you, a brief note of appreciation goes a long way.

Get Started