PostgreSQL Tutorial: LZ4 compression for TOAST tables

June 13, 2024

Summary: This tutorial introduces LZ4 TOAST compression feature which is available in PostgreSQL version 14, and demonstrates its usage.

Table of Contents

Background

TOAST is a mechanism PostgreSQL uses to keep physical data rows from exceeding the size of a data block (typically 8KB). PostgreSQL does not support physical rows that cross block boundaries, so the block size is a hard upper limit on row size. To allow user tables to have rows wider than this, the TOAST mechanism breaks up wide field values into smaller pieces, which are stored “out of line” in a TOAST table associated with the user table.

Each table you create has its own associated (unique) TOAST table, which may or may not ever end up being used, depending on the size of rows you insert. All of this is transparent to the user, and enabled by default.

When a row that is to be stored is “too wide” (the threshold for that is 2KB by default), the TOAST mechanism first attempts to compress any wide field values. If that isn’t enough to get the row under 2KB, it breaks up the wide field values into chunks that get stored in the associated TOAST table. Each original field value is replaced by a small pointer that shows where to find this “out of line” data in the TOAST table. TOAST will attempt to squeeze the user-table row down to 2KB in this way, but as long as it can get below 8KB, that’s good enough and the row can be stored successfully.

TOAST compression method

Prior to version 14, there is only one algorithm to compress the data in PostgreSQL that is pglz. Which is a very old homegrown algorithm. There is now a per-column COMPRESSION option which can be set to either pglz (which is default) or lz4. A new postgresql.conf configuration parameter, default_toast_compression, is introduced to set the default compression method with default value being pglz. The value of this configuration parameter is used for new table columns when no compression method is specified. We don’t have lz4 support in the core PostgreSQL, so to use lz4 compression, PostgreSQL must be built –with-lz4.

Example

Let’s create two tables: One with the default compression and one with the new LZ4 compression:

CREATE TABLE t1 ( a text );

CREATE TABLE t2 ( a text compression lz4 );

Both tables automatically got a toast table attached:

SELECT oid, relname FROM pg_class WHERE oid IN
  (SELECT reltoastrelid FROM pg_class WHERE relname IN ('t1', 't2'));
  oid  |    relname
-------+----------------
 16387 | pg_toast_16384
 16392 | pg_toast_16389
(2 rows)

\d+ pg_toast.pg_toast_16384
TOAST table "pg_toast.pg_toast_16384"
   Column   |  Type   | Storage
------------+---------+---------
 chunk_id   | oid     | plain
 chunk_seq  | integer | plain
 chunk_data | bytea   | plain
Owning table: "public.t1"
Indexes:
    "pg_toast_16384_index" PRIMARY KEY, btree (chunk_id, chunk_seq)
Access method: heap

Let’s check if there is a difference if we populate those tables with some dummy data:

\timing on

INSERT INTO t1(a) SELECT lpad('a',1000000,'a') FROM generate_series(1,1000);
Time: 10943.866 ms (00:10.944)

INSERT INTO t2(a) SELECT lpad('a',1000000,'a') FROM generate_series(1,1000);
Time: 280.503 ms

This is a huge difference, and I’ve repeated that test several times and got almost the same numbers. That’s really a great improvement regarding speed. But what about the size on disk?

SELECT pg_size_pretty(pg_relation_size('pg_toast.pg_toast_16384')) AS toast1_size;
 toast1_size
-------------
 12 MB
(1 row)

SELECT pg_size_pretty(pg_relation_size('pg_toast.pg_toast_16389')) AS toast2_size;
 toast2_size
-------------
 4000 kB
(1 row)

Quite impressive. In addition to the speed, we also get a great reduction on disk. Because compression is better with LZ4, we see fewer rows in the toast table for t2:

SELECT count(*) FROM pg_toast.pg_toast_16384 AS toast1_rows;
 count
-------
  6000
(1 row)

SELECT count(*) FROM pg_toast.pg_toast_16389 AS toast2_rows;
 count
-------
  2000
(1 row)

Of course the string I used here is not very representative, but this new feature really looks promising. There is also a new parameter if you want to change to this behavior globally;

SHOW default_toast_compression;
 default_toast_compression
---------------------------
 pglz
(1 row)

ALTER SYSTEM SET default_toast_compression = 'lz4';

SELECT pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)