January 29, 2024
Summary: in this tutorial, you will learn how to check unused indexes in PostgreSQL.
Table of Contents
Why do indexes become unused?
Large databases typically have a life of their own. Many applications come, go and evolve, the Postgres server gets upgraded, configuration settings get tuned, hardware gets replaced, tables get restructured, and the queries, data and access patterns change.
The decision whether to use a certain index for a certain query is taken by PostgreSQL’s query planner, based on:
- it’s heuristics and algorithm – which may itself have changed as a result of Postgres version upgrades
- configuration settings – which may or may not have been tuned for current hardware
- statistics of the data in the table – which may have changed over time since the index was created
- the cost of using the index – using other, newer indexes or table scans would be cheaper
Over multiple releases of the applications that use the database, the original queries for which the index was created might itself not be executed anymore or anywhere as frequently. The application itself might have been refactored or rewritten with a new stack, a new ORM or totally new technology.
In theory it is possible to review each query of each application against the actual data in the database and decide on the optimal set of indexes. In practice, for large, long-lived databases in organizations with larger teams, it is nearly impossible. From a database operations point of view, it is more practical to include regular review and possible deletion of existing indexes as a maintenance task.
Why should I get rid of unused indexes?
Everybody knows that a database index is a good thing because it can speed up SQL queries. But this does not come for free.
The disadvantages of indexes are:
- Indexes use up space. It is not unusual for database indexes to use as much storage space as the data themselves. And the kind of reliable, fast storage you want for a database is not necessarily cheap. The space used up by indexes also increases the size and duration of physical backups.
- Indexes slow down data modification. Whenever you
INSERT
into orDELETE
from a table, all indexes have to be modified, in addition to the table itself (the “heap”). And it is much more expensive to modify the complicated data structure of an index than the heap itself, which has its name precisely because it is basically an unordered “pile” of data (and as everybody knows, maintaining order is more work than having a mess). Modifying an indexed table can easily be an order of magnitude more expensive than modifying an unindexed table. - Indexes prevent HOT updates. Because of the architecture of PostgreSQL, every
UPDATE
causes a new row version (“tuple”) to be written, and that causes a new entry in every index on the table. This behavior has been dubbed “write amplification” and has drawn a lot of fire. This undesirable effect can be avoided if a) the new tuple fits into the same table block as the old one and b) no indexed column is modified. Then PostgreSQL creates the new tuple as a “Heap Only Tuple” (hence HOT), which is much more efficient and also reduces the workVACUUM
has to do.
The many uses of indexes
Now we know that we don’t want unnecessary indexes. The problem is that indexes serve so many purposes that it is difficult to determine if a certain index is needed or not.
Here is a list of all benefits of indexes in PostgreSQL:
- Indexes can speed up queries that use indexed columns (or expressions) in the
WHERE
clause. The traditional B-tree index supports the<
,<=
,=
,>=
and>
operators, while the many other index types in PostgreSQL can support more exotic operators like “overlaps” (for ranges or geometries), “distance” (for words) or regular expression matches. - B-tree indexes can speed up the
max()
andmin()
aggregates. - B-tree indexes can speed up
ORDER BY
clauses. - Indexes can speed up joins. This depends on the “join strategy” chosen by the optimizer: hash joins, for example, will never make use of an index.
- A B-tree index on the origin of a
FOREIGN KEY
constraint avoids a sequential scan when rows are deleted (or keys modified) in the target table. A scan on the origin of the constraint is necessary to make sure that the constraint will not be violated by the modification. - Indexes are used to enforce constraints. Unique B-tree indexes are used to enforce
PRIMARY KEY
andUNIQUE
constraints, while exclusion constraints use GiST indexes. - Indexes can provide the optimizer with better value distribution statistics.
If you create an index on an expression,
ANALYZE
and the autoanalyze daemon will not only collect statistics for the data distribution in table columns, but also for each expression that occurs in an index. This helps the optimizer to get a good estimate for the “selectivity” of complicated conditions that contain the indexed expression, which causes better plans to be chosen. This is a widely ignored benefit of indexes!
Find the unused indexes
The following query will show you all indexes that serve none of the above mentioned purposes.
It makes use of the fact that all uses of indexes in the above list with the exception of the last two result in an index scan.
For completeness’ sake, I have to add that the parameter track_counts
has to remain “on” for the query to work, otherwise index usage is not tracked in pg_stat_user_indexes
. But you must not change that parameter anyway, otherwise autovacuum will stop working.
To find the indexes that have never been used since the last statistics reset with pg_stat_reset()
, use
SELECT s.schemaname,
s.relname AS tablename,
s.indexrelname AS indexname,
pg_relation_size(s.indexrelid) AS index_size
FROM pg_catalog.pg_stat_user_indexes s
JOIN pg_catalog.pg_index i ON s.indexrelid = i.indexrelid
WHERE s.idx_scan = 0 -- has never been scanned
AND 0 <> ALL (i.indkey) -- no index column is an expression
AND NOT i.indisunique -- is not a UNIQUE index
AND NOT EXISTS -- does not enforce a constraint
(SELECT 1 FROM pg_catalog.pg_constraint c
WHERE c.conindid = s.indexrelid)
AND NOT EXISTS -- is not an index partition
(SELECT 1 FROM pg_catalog.pg_inherits AS inh
WHERE inh.inhrelid = s.indexrelid)
ORDER BY pg_relation_size(s.indexrelid) DESC;
Some remarks:
- Don’t do that on your test database, but on the production database!
- If your software is running at several customer sites, run the query on all of them. Different users have different ways to use a software, which can cause different indexes to be used.
- You can replace
s.idx_scan = 0
in the query with a different condition, e.g.s.idx_scan < 10
. Indexes that are very rarely used are also good candidates for removal.