PostgreSQL Tutorial: Tune the estimated execution cost of a function

February 17, 2024

Summary: in this tutorial, you will learn how to tune the estimated execution cost of a function.

Table of Contents

Introduction

When you reference/call functions in PostgreSQL the optimizer does not really know much about the cost nor the amount of rows that a function returns. This is not really surprising as it is hard to predict what the functions is doing and how many rows will be returned for a given set of input parameters. What you might not know is, that indeed you can tell the optimizer a bit more about your functions.

PostgreSQL offers ROWS and COST declarations for a function. These can be used for any PostgreSQL function written in any language. These declarations allow the function designer to dictate to the planner how many records to expect and provide a hint as to how expensive a function call is. COST is measured in CPU cycles. A higher COST number means more costly. For example a high cost function called in an AND where condition will not be called if any of the less costly functions result in a false evaluation. The number of ROWs as well as COST will give the planner a better idea of which strategy to use.

Use functions in WHERE clause

Let’s start with a little test setup:

CREATE TABLE t1 (id integer, str text);

INSERT INTO t1 (id, str)
SELECT a.i, repeat(chr(65 + mod(a.r, 26)), 10 + mod(a.r, 100)) as str
FROM
  (SELECT i, CAST(random() * 1000 AS int) + 1 AS r
     FROM generate_series(1, 1000) AS s(i)) AS a;

A simple table containing 1000 rows. In addition let’s create 2 simple functions:

CREATE FUNCTION fn_cheapfunction(id in int)
  RETURNS integer AS $$
BEGIN
 RETURN mod(id, 10);
END $$ LANGUAGE plpgsql;

CREATE FUNCTION fn_costlyfunction(str in text)
  RETURNS integer AS $$
BEGIN
 PERFORM pg_sleep(0.01); -- make this function slower
 RETURN length(str);
END $$ LANGUAGE plpgsql;

What is the optimizer doing when you call these functions in WHERE clause?

EXPLAIN (analyze) SELECT count(*) FROM t1
  WHERE (fn_costlyfunction(str) > 80 AND fn_cheapfunction(id) > 8);
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=526.28..526.29 rows=1 width=8) (actual time=15710.681..15710.681 rows=1 loops=1)
   ->  Seq Scan on t1  (cost=0.00..526.00 rows=111 width=0) (actual time=293.123..15710.539 rows=27 loops=1)
         Filter: ((fn_costlyfunction(str) > 80) AND (fn_cheapfunction(id) > 8))
         Rows Removed by Filter: 973
 Planning time: 2.181 ms
 Execution time: 15714.909 ms
(6 rows)

Specify the function estimated cost

PostgreSQL offers a COST parameter for the function definition. It is a positive number giving the estimated execution cost for the function, in units of cpu_operator_cost. If the function returns a set, this is the cost per returned row. If the cost is not specified, 1 unit is assumed for C-language and internal functions, and 100 units for functions in all other languages. Larger values cause the planner to try to avoid evaluating the function more often than necessary.

To change the estimated execution cost of the function:

ALTER FUNCTION fn_cheapfunction(id in int) COST 10;      -- a lower cost
ALTER FUNCTION fn_costlyfunction(str in text) COST 1000; -- a higher cost

The planner will detect that fn_costlyfunction() is expensive and runs fn_cheapfunction() first even though it is second in order. Looking again at the execution plan again:

EXPLAIN (analyze) SELECT count(*) FROM t1
  WHERE (fn_costlyfunction(str) > 80 AND fn_cheapfunction(id) > 8);
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2776.28..2776.29 rows=1 width=8) (actual time=1575.873..1575.873 rows=1 loops=1)
   ->  Seq Scan on t1  (cost=0.00..2776.00 rows=111 width=0) (actual time=36.927..1575.726 rows=27 loops=1)
         Filter: ((fn_cheapfunction(id) > 8) AND (fn_costlyfunction(str) > 80))
         Rows Removed by Filter: 973
 Planning time: 0.142 ms
 Execution time: 1575.958 ms
(6 rows)

Now, the query runs much more fast than before.

See more

PostgreSQL Optimization