PostgreSQL Tutorial: Altering a column with minimal downtime

July 8, 2024

Summary: in this tutorial, you will learn how to alter a column with minimal downtime in PostgreSQL.

Table of Contents

Background

Assume that there is an id column in your table, which is an auto-increment integer field, whose value can be up to around 2.1 billion in PostgreSQL. Two billion may sound a lot, but unsurprisingly for commercial use cases it is not always enough. If one of your tables is approaching this limit, then we must do something before it is met, at which point it’s no longer possible to add new rows to the table.

Of course, a precaution would be to make this field bigint in the first place, which is an 8-byte integer and well, but it maybe too late. The most straightforward answer is to run an ALTER command:

ALTER TABLE large_table ALTER COLUMN id BIGINT;

The issue is not that simple! Because of the MVCC model Postgres operates on, altering a column will usually cause a rewrite of the entire table, with few exceptions. Meanwhile it acquires an exclusive lock of the table which means no-one else could access it.

If the table is very large, we are looking at a several days of downtime of the table. However, there are live processes that depend on this table, which can be paused temporarily, but definitely not for several days.

Replicating the column with trigger

Fortunately, triggers come to our salvation. Trigger is a Postgres functionality (and many other relational databases too) which executes queries automatically upon certain events. It allows us to replicate the id column into another of a different data type.

The procedure will be something like this:

1. Create a new bigid column, of type bigint:

ALTER TABLE large_table ADD COLUMN bigid BIGINT NULL;

2. Add a trigger to the table, so that bigid is updated whenever a new row is inserted.

CREATE OR REPLACE FUNCTION update_large_table_bigid() RETURNS TRIGGER AS $BODY$
BEGIN
    NEW.bigid=NEW.id;
    RETURN NEW;
END
$BODY$ LANGUAGE PLPGSQL;

CREATE TRIGGER large_table_bigid_update
BEFORE INSERT ON large_table
FOR EACH ROW EXECUTE PROCEDURE update_large_table_bigid();

3. Backfill bigid for existing rows

UPDATE large_table SET bigid = id WHERE bigid IS NULL;

Step 3 is bound to your system’s I/O capability and is the most time-consuming part of the entire migration process. You should optimise it by breaking it down into small sub queries (e.g. only update a small id range at a time), so that it doesn’t lock your entire table at once and can execute more efficiently. This may take around several days to finish, if you have a couple of weeks before the deadline you can live with it.

Once these steps are accomplished, the bigid column will have exactly the same values as id. Migration can be completed by the following queries in transaction:

BEGIN;
LOCK TABLE large_table IN SHARE ROW EXCLUSIVE MODE;
DROP TRIGGER large_table_bigid_update ON large_table;
ALTER SEQUENCE large_table_seq OWNED BY large_table.bigid;
ALTER TABLE large_table ALTER COLUMN bigid SET DEFAULT nextval(large_table_seq);
ALTER TABLE large_table DROP COLUMN id;
ALTER TABLE large_table RENAME COLUMN bigid TO id;
COMMIT;

Two things to keep in mind:

  • We need to move the sequence over, so that bigid can auto increment
  • Table large_table must be locked at the beginning of the transaction, to prevent new rows being inserted for the duration

This transaction should take no more than a few seconds, a great improvement over a several days downtime!

Making it a primary key

You may have noticed something missing in the process above - the new id column is not a primary key. As this column is nullable, making it a primary key involves a NOT NULL constraint check and adding a unique index. This again will lock the table and incurs further downtime.

We can shorten the downtime by creating the unique index beforehand and tell Postgres to use this index when creating the primary key constraint:

CREATE UNIQUE INDEX CONCURRENTLY large_table_uniq ON large_table(id);
ALTER TABLE large_table ADD PRIMARY KEY USING INDEX large_table_uniq;

Note the CONCURRENTLY keyword - this allows the index to be built without acquiring an exclusive lock of the table. Of course, adding the primary key will still lock the table as it needs to enforce the NOT NULL constraint, but should be a lot faster since it can skip the index-creation phase.

An alternative is to create an invalid NOT NULL CHECK constraint first and then validate it:

ALTER TABLE large_table ADD CONSTRAINT large_table_not_null CHECK (id IS NOT NULL) NOT VALID;
ALTER TABLE large_table VALIDATE CONSTRAINT large_table_not_null;

The validation acquires a SHARE UPDATE EXCLUSIVE lock, so other connections can still perform reads and writes. As a result, the table won’t have a primary key, but the unique index and NOT NULL constraint are functionally equivalent, and we won’t face any downtime here.

Handling dependency

So far so good, this plan works perfectly if large_table is a standalone table. Assume that one of your tables is dependent on it, let’s call that table reftable, which has a foreign key constraint referencing large_table’s id column. We need to migrate the referencing column large_table_id together with the constraint.

SEM ID diagram

Similar to the large_table table, we initiate a replication process of the large_table_id column.

1. first by adding a trigger:

CREATE OR REPLACE FUNCTION update_reftable_large_table_bigid() RETURNS TRIGGER AS $BODY$
BEGIN
    NEW.large_table_bigid = NEW.large_table_id;
    RETURN NEW;
END
$BODY$ LANGUAGE PLPGSQL;

CREATE TRIGGER reftable_large_table_bigid_update
BEFORE INSERT ON reftable
FOR EACH ROW EXECUTE PROCEDURE update_reftable_large_table_bigid();

2. Then backfill existing rows:

UPDATE reftable SET large_table_bigid = large_table WHERE large_table_bigid IS NULL;

Again, use subqueries and perhaps parallelisation.

3. Add index concurrently:

CREATE INDEX CONCURRENTLY reftable_large_table_bigid ON reftable(large_table_bigid);

We need a forieign key constraint on this column, referencing the large_table table. Adding it naively will lock the table, therefore we fall back to the “create invalid constraint and validate” approach.

ALTER TABLE reftable ADD CONSTRAINT reftable_large_table_bigid_fk
  FOREIGN KEY (large_table_bigid) REFERENCES large_table(bigid) NOT VALID;
ALTER TABLE reftable VALIDATE CONSTRAINT reftable_large_table_bigid_fk;

Last but not least, add this foreign key migration snippet before the finalisation of large_table migration.

BEGIN;
LOCK TABLE reftable IN SHARE ROW EXCLUSIVE MODE;
ALTER TABLE reftable DROP CONSTRAINT reftable_large_table_id_fk;
ALTER TABLE reftable RENAME CONSTRAINT reftable_large_table_bigid_fk TO reftable_large_table_id_fk;
DROP TRIGGER reftable_large_table_bigid_update ON reftable;
ALTER TABLE reftable DROP COLUMN large_table_id;
ALTER INDEX reftable_large_table_bigid RENAME TO reftable_large_table_id;
ALTER TABLE reftable RENAME COLUMN large_table_bigid TO large_table_id;
COMMIT;

Conclusions

All done. With triggers and NOT VALID constraints, we can reduce the migration downtime from several days to just a few hours. And remember that 99% of the time here is to add the primary key - if you are migrating a non-primary-key column, or if it can be left out, or you don’t mind using a NOT NULL CHECK constraint instead, the downtime can be reduced dramatically to just minutes or even seconds.