PostgreSQL Subtransaction Problem: Subtransaction cache overflow

By John Doe May 21, 2024

Summary: In this article, we will know a subtransaction problem in PostgreSQL: subtransaction cache overflow.

Table of Contents

Getting ready for testing

To demonstrate the problems caused by overuse of subtransactions, I created a test table:

CREATE UNLOGGED TABLE contend (
    id integer PRIMARY KEY,
    val integer NOT NULL
)
WITH (fillfactor='50');

INSERT INTO contend (id, val)
SELECT i, 0
FROM generate_series(1, 10000) AS i;

VACUUM (ANALYZE) contend;

The table is small, unlogged and has a low fillfactor to reduce the required I/O as much as possible. This way, I can observe the effects of subtransactions better.

I’ll use pgbench, the benchmarking tool shipped with PostgreSQL, to run the following custom SQL script:

BEGIN;
PREPARE sel(integer) AS
   SELECT count(*)
   FROM contend
   WHERE id BETWEEN $1 AND $1 + 100;
PREPARE upd(integer) AS
   UPDATE contend SET val = val + 1
   WHERE id IN ($1, $1 + 10, $1 + 20, $1 + 30);

SAVEPOINT a;
set rnd random(1,990)
EXECUTE sel(10 * :rnd + :client_id + 1);
EXECUTE upd(10 * :rnd + :client_id);

SAVEPOINT a;
set rnd random(1,990)
EXECUTE sel(10 * :rnd + :client_id + 1);
EXECUTE upd(10 * :rnd + :client_id);

...

SAVEPOINT a;
set rnd random(1,990)
EXECUTE sel(10 * :rnd + :client_id + 1);
EXECUTE upd(10 * :rnd + :client_id);

DEALLOCATE ALL;
COMMIT;

The script will set 60 savepoints for test number 1, and 90 for test number 2. It uses prepared statements to minimize the overhead of query parsing.

pgbench will replace :client_id with a number unique to the database session. So as long as there are no more than 10 clients, each client’s UPDATEs won’t collide with those of other clients, but they will SELECT each other’s rows.

Performance testing

Since my machine has 8 cores, I’ll run the tests with 6 concurrent clients for ten minutes.

To see meaningful information with “perf top”, you need the PostgreSQL debugging symbols installed. This is particularly recommended on production systems.

Test 1 (60 subtransactions)

$ pgbench -f subtrans.sql -n -c 6 -T 600

transaction type: subtrans.sql
scaling factor: 1
query mode: simple
number of clients: 6
number of threads: 1
duration: 600 s
number of transactions actually processed: 100434
latency average = 35.846 ms
tps = 167.382164 (including connections establishing)
tps = 167.383187 (excluding connections establishing)

This is what “perf top --no-children --call-graph=fp --dsos=/usr/pgsql-12/bin/postgres” shows while the test is running:

+    1.86%  [.] tbm_iterate
+    1.77%  [.] hash_search_with_hash_value
     1.75%  [.] AllocSetAlloc
+    1.36%  [.] pg_qsort
+    1.12%  [.] base_yyparse
+    1.10%  [.] TransactionIdIsCurrentTransactionId

Test 2 (90 subtransactions)

$ pgbench -f subtrans.sql -n -c 6 -T 600

transaction type: subtrans.sql
scaling factor: 1
query mode: simple
number of clients: 6
number of threads: 1
duration: 600 s
number of transactions actually processed: 41400
latency average = 86.965 ms
tps = 68.993634 (including connections establishing)
tps = 68.993993 (excluding connections establishing)

This is what “perf top --no-children --call-graph=fp --dsos=/usr/pgsql-12/bin/postgres” has to say:

+   10.59%  [.] LWLockAttemptLock
+    7.12%  [.] LWLockRelease
+    2.70%  [.] LWLockAcquire
+    2.40%  [.] SimpleLruReadPage_ReadOnly
+    1.30%  [.] TransactionIdIsCurrentTransactionId

Even if we take into account that the transactions in test 2 are one longer, this is still a performance regression of 60% compared with test 1.

Implementation of subtransactions

To understand what is going on, we have to understand how transactions and subtransactions are implemented.

Whenever a transaction or subtransaction modifies data, it is assigned a transaction ID. PostgreSQL keeps track of these transaction IDs in the commit log, which is persisted in the pg_xact subdirectory of the data directory.

However, there are some differences between transactions and subtransactions:

  • Each subtransaction has a containing transaction or subtransaction (the “parent”)
  • Committing a subtransaction does not require a WAL flush
  • There can be only one transaction per database session, but many subtransactions

The information which (sub)transaction is the parent of a given subtransaction is persisted in the pg_subtrans subdirectory of the data directory. Since this information becomes obsolete as soon as the containing transaction has ended, this data do not have to be preserved across a shutdown or crash.

Subtransactions and visibility

The visibility of a row version (“tuple”) in PostgreSQL is determined by the xmin and xmax system columns, which contain the transaction ID of the creating and destroying transactions. If the transaction ID stored is that of a subtransaction, PostgreSQL also has to consult the state of the containing (sub)transaction to determine if the transaction ID is valid or not.

To determine which tuples a statement can see, PostgreSQL takes a snapshot of the database at the beginning of the statement (or the transaction). Such a snapshot consists of:

  • a maximum transaction ID: everything from that transaction on is invisible
  • the list of transactions and subtransactions that were active when the snapshot was taken
  • the command number of the earliest visible command in the current (sub)transaction

A snapshot is initialized by looking at the process array, which is stored in shared memory and contains information about all currently running backends. This, of course, contains the current transaction ID of the backend and has room for at most 64 non-aborted subtransactions per session. If there are more than 64 such subtransactions, the snapshot is marked as suboverflowed.

Analysis of test results

A suboverflowed snapshot does not contain all data required to determine visibility, so PostgreSQL will occasionally have to resort to pg_subtrans. These pages are cached in shared buffers, but you can see the overhead of looking them up in the high rank of SimpleLruReadPage_ReadOnly in the perf output. Other transactions have to update pg_subtrans to register subtransactions, and you can see in the perf output how they vie for lightweight locks with the readers.