PostgreSQL Tutorial: Setting up Logical Replication

October 19, 2023

Summary: Built-in logical replication and logical decoding were introduced in PostgreSQL 10. Over a period of time, more and more features are being implemented within logical replication. It is helpful when a selected list of tables needs to be replicated from multiple OLTP databases to a single OLAP database for reporting and analysis.

Logical replication is also helpful to perform replication between two major PostgreSQL versions to perform rolling-fashion upgrades. In this tutorial, we shall discuss the steps involved in setting up logical replication between two PostgreSQL servers.

Getting ready

Logical replication uses a publish and subscribe model. The node that sends the changes becomes a publisher. And the node that subscribes to those changes becomes a subscriber. There can be one or more subscriptions to a publication. We could choose what to replicate – INSERT or DELETE or UPDATE or ALL. By default, it is ALL.

It is always recommended to enable logical replication for tables with primary keys or a column with unique and not null values. Such columns can be set as a replica identity. A replica identity will be used to uniquely identify a record in the event of updates or deletions especially. If that requirement is not met, we may use a replica identity of FULL. This means an entire record will be used as a key, but it may be inefficient.

How to do it…

We will initiate logical replication using the following steps:

1. Enable the required parameters:

$ psql -c "ALTER SYSTEM SET wal_level TO 'logical'"

2. Restart PostgreSQL:

$ pg_ctl -D $PGDATA restart -mf

3. Add all or one tables for replication:

$ psql -d redrock -c "CREATE PUBLICATION rockpub FOR ALL TABLES"

Or

$ psql -d redrock -c "CREATE PUBLICATION rockpub FOR TABLE scott.employee scott.departments"

4. Copy the schema from the publisher to the subscriber:

$ pg_dump -h publisher_server_ip -p 5432 -d redrock -Fc -s -U postgres | pg_restore -d redrock -h subscriber_node_ip -p 5432 -U postgres

5. Create the subscription:

$ psql -d redrock -c "CREATE SUBSCRIPTION rocksub CONNECTION 'host=publisher_server_ip dbname=redrock user=postgres password=secret port=5432' PUBLICATION rockpub"

To avoid copying pre-existing data:

$ psql -d redrock -c "CREATE SUBSCRIPTION rocksub CONNECTION 'host=publisher_server_ip dbname=redrock user=postgres password=oracle port=5432' PUBLICATION rockpub WITH (copy_data = false)"

How it works…

Setting up logical replication is as simple as setting up streaming replication but with a couple of additional steps or validations (sometimes). As seen in Step 1, we must ensure that both the PostgreSQL servers have wal_level set to logical. If it’s required to change it, a restart is required, which may be performed using the command seen in Step 2.

Once the change to wal_level is in effect, we could then create a publication on the master aka publisher. If you carefully observe Step 3, the first command shows how all the tables in the database redrock can be enabled for replication. However, if you need to add one or more tables, you could use the next command seen in the same step.

Once we have created the publication on the master, we will need to copy the schema of the tables that need to be replicated to the subscriber or the standby. This can be achieved using the command seen in Step 4. This command performs pg_dump using -s, which copies only the schema over PIPE to the subscriber. This can be executed on the publisher or the subscriber.

Once the schema is copied, we can simply create the subscription that will do all the work for us. When we create the subscription, we specify the name of the subscription along with the connection string of the publisher and the name of the publication.

As seen in Step 5, there are two ways to create the subscription. The first one helps us by copying all the pre-existing data and starts replication upon copying all the data. The second avoids copying the pre-existing data and starts replication from the point when the subscription was created.

The second approach is useful when we want to use multiple processes to copy and dump pre-existing data from the publisher to the subscriber manually. However, it involves some downtime as the data copy needs to be consistent. The first approach may not require any downtime as it takes care of both data copying and starting the replication appropriately, but they cannot be performed in parallel. So, the first approach may be recommended but may be slower for large PostgreSQL instances. Once Step 5 is performed, the logical replication setup is complete between the publisher and the subscriber.