PostgreSQL Tutorial: Improving VACUUM processing speed

July 17, 2024

Summary: In this tutorial, you will learn how to improve VACUUM processing speed in PostgreSQL.

Table of Contents

An autovacuum related problem you might encounter is that your tables are being vacuumed too slowly. This may manifest as bloat growing because your rate of cleaning up bloat is slower than your transaction rate. Or, you will see vacuum processes running constantly on your system when you check pg_stat_activity.

There are a few ways you can speed up vacuuming: these recommendations apply both to autovacuum and to manually triggered VACUUM.

Reducing the impact of cost limiting

The first thing you should check is if you have cost limiting enabled. When vacuum is running, the system maintains a counter that tracks estimated cost of different I/O operations. When that cost exceeds autovacuum_vacuum_cost_limit (or vacuum_cost_limit), the process sleeps for autovacuum_vacuum_cost_delay (or vacuum_cost_delay) ms. This is called cost limiting and is done to reduce the impact of vacuuming on other processes.

If you notice that vacuum is falling behind, you could disable cost limiting (by setting autovacuum_vacuum_cost_delay to 0) or reduce its impact by either decreasing autovacuum_vacuum_cost_delay or increasing autovacuum_vacuum_cost_limit to a high value (like 10000).

Increasing the number of parallel workers

Autovacuum can only vacuum autovacuum_max_workers tables in parallel. So, if you have hundreds of tables being actively written to (and needing to be vacuumed), doing them 3 at a time might take a while (3 is the default value for autovacuum_max_workers).

Therefore, in scenarios with a large number of active tables, it might be worth increasing autovacuum_max_workers to a higher value—assuming you have enough compute to support running more autovacuum workers.

Before increasing the number of autovacuum workers, make sure that you are not being limited by cost limiting. Cost limits are shared among all active autovacuum workers, so just increasing the number of parallel workers may not help, as each of them will then start doing lesser work.

To find more ideas on what to tune, it might be worth looking into pg_stat_progress_vacuum to understand what phase your ongoing vacuums are in and how you can improve their performance. Let’s look at a few examples where it might give useful insights.

Improving scan speed by prefetching and caching

To see how fast vacuum is progressing, you could compare heap_blks_scanned with heap_blks_total in pg_stat_progress_vacuum over time. If you see progress is slow and the phase is scanning heap, that means vacuum needs to scan a lot of heap blocks to complete.

In this case, you can scan the heap faster by prefetching larger relations in memory by using something like pg_prewarm or by increasing shared_buffers.

Increasing memory to store more dead tuples

When scanning the heap, vacuum collects dead tuples in memory. The number of dead tuples it can store is determined by maintenance_work_mem (or autovacuum_work_mem, if set). Once the maximum number of tuples have been collected, vacuum must switch to vacuuming indexes and then return to scanning heap again after the indexes and heap are vacuumed (i.e. after an index vacuuming cycle).

So, if you notice that index_vacuum_count in pg_stat_progress_vacuum is high — well, it means that vacuum is having to go through many such index vacuum cycles.

To reduce the number of cycles vacuum needs and to make it faster, you can increase autovacuum_work_mem so that vacuum can store more dead tuples per cycle.

Vacuum indexes in parallel

If you see that the phase in pg_stat_progress_vacuum is vacuuming indexes for a long time, you should check if you have a lot of indexes on the table being vacuumed.

If you have many indexes, you could make vacuuming faster by increasing max_parallel_maintenance_workers to process indexes in parallel. Note that this configuration change will help only if you manually run VACUUM commands. (Unfortunately, parallel vacuum is currently not supported for autovacuum.)

With all these recommendations, you should be able to speed up vacuuming significantly.

See more

PostgreSQL Optimization