PostgreSQL Tutorial: Measure the volume of data returned by a query and its network transmission cost

July 26, 2024

Summary: in this tutorial, you will learn how to measure the volume of data returned by a query and the elapsed time.

Table of Contents

Introduction

PostgreSQL 17 introduced EXPLAIN (ANALYZE, SERIALIZE), it allows collection of statistics about the volume of data emitted by a query, as well as the time taken to convert the data to the on-the-wire format. Previously there was no way to investigate this without actually sending the data to the client, in which case network transmission costs might swamp what you wanted to see. In particular this feature allows investigating the costs of de-TOASTing compressed or out-of-line data during formatting.

Example

Let’s look at a simple example, and start with a little test setup:

CREATE TABLE t_large (id integer, str text);

INSERT INTO t_large (id, str)
  SELECT i, repeat(chr(65 + mod(i, 26)), 8000) as str
    FROM generate_series(1, 300000) AS s(i);

A simple table containing 300000 rows. Let’s see how long it would take to get all records:

EXPLAIN (analyze) SELECT * FROM t_large;
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Seq Scan on t_large  (cost=0.00..8173.00 rows=300000 width=107) (actual time=0.019..36.160 rows=300000 loops=1)
 Planning Time: 0.179 ms
 Execution Time: 46.754 ms
(3 rows)

Pretty OK. But it’s not full picture. The thing is that while it did run most of query, it didn’t really get the data to memory from toasted columns (and potentially some other places).

Now, with this new SERIALIZE option we can:

EXPLAIN (analyze, serialize) SELECT * FROM t_large;
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Seq Scan on t_large  (cost=0.00..8173.00 rows=300000 width=107) (actual time=0.027..38.856 rows=300000 loops=1)
 Planning Time: 0.063 ms
 Serialization: time=913.841 ms  output=2348329kB  format=text
 Execution Time: 971.575 ms
(4 rows)

Please note that we got new line, and much higher total execution cost. Getting data from table was simple to do in ~ 47ms. But getting data from toast, and generating output to send to client – it took almost 1 second.

Plus, please note that explain shows that there were rows=300000 with width=107, which would be ~ 30MB.

But, serialization showed that full output was output=2348329kB, i.e 2.24GB! Over 76 times more.

This is a good feature for anyone working on query optimization. Finally, we can easily measure the volume of data returned by a query and the elapsed time.

See more

PostgreSQL Optimization