PostgreSQL High Availability with Patroni

April 8, 2024

Summary: Patroni is a template for high availability PostgreSQL solutions using Python. For maximum accessibility, Patroni supports a variety of distributed configuration stores like ZooKeeper, etcd, Consul or Kubernetes.

Table of Contents

High Availability

PostgreSQL has been widely adopted as a modern, high-performance transactional database. A highly available PostgreSQL cluster can withstand failures caused by network outages, resource saturation, hardware failures, operating system crashes or unexpected reboots. Such cluster is often a critical component of the enterprise application landscape, where four nines of availability is a minimum requirement.

There are several methods to achieve high availability in PostgreSQL. This solution document provides Patroni - the open-source extension to facilitate and manage the deployment of high availability in PostgreSQL.

High availability methods

There are several native methods for achieving high availability with PostgreSQL:

  • shared disk failover,
  • file system replication,
  • trigger-based replication,
  • statement-based replication,
  • logical replication,
  • Write-Ahead Log (WAL) shipping, and
  • streaming replication

Streaming replication

Streaming replication is part of Write-Ahead Log shipping, where changes to the WALs are immediately made available to standby replicas. With this approach, a standby instance is always up-to-date with changes from the primary node and can assume the role of primary in case of a failover.

Why native streaming replication is not enough

Although the native streaming replication in PostgreSQL supports failing over to the primary node, it lacks some key features expected from a truly highly-available solution. These include:

  • No consensus-based promotion of a “leader” node during a failover
  • No decent capability for monitoring cluster status
  • No automated way to bring back the failed primary node to the cluster
  • A manual or scheduled switchover is not easy to manage

To address these shortcomings, there are a multitude of third-party, open-source extensions for PostgreSQL. The challenge for a database administrator here is to select the right utility for the current scenario.

PostgreSQL solves this challenge by providing the Patroni extension for achieving PostgreSQL high availability.

Patroni

Patroni is a template for you to create your own customized, high-availability solution using Python and - for maximum accessibility - a distributed configuration store like ZooKeeper, etcd, Consul or Kubernetes.

How Patroni works

With Patroni you can customize and automate PostgreSQL HA (High Availability) clusters. Based on modern consensus algorithms, it ensures high availability of your database and protects you from data loss by deciding which actions to perform in the cluster.

Patroni monitors liveliness of primary and replica and can change the configuration of all cluster members. It handles synchronicity requirements and planned switchovers, as well as unplanned failovers. Patroni performs those complex tasks automatically. Moreover, it can be guaranteed that certain conditions are always fulfilled in order to completely rule out irreversible damage to your data.

The architecture of Patroni is such that every PostgreSQL instance is accompanied by a designated Patroni instance that monitors and controls it.

Key benefits of Patroni:

  • Continuous monitoring and automatic failover
  • Manual/scheduled switchover with a single command
  • Built-in automation for bringing back a failed node to cluster again.
  • REST APIs for entire cluster configuration and further tooling.
  • Provides infrastructure for transparent application failover
  • Distributed consensus for every action and configuration.
  • Integration with Linux watchdog for avoiding split-brain syndrome.

Architecture layout

The following diagram shows the architecture of a three-node PostgreSQL cluster with a single-leader node.

Architecture of the three-node, single primary PostgreSQL cluster

Components

The components in this architecture are:

  • PostgreSQL nodes
  • Patroni - a template for configuring a highly available PostgreSQL cluster.
  • ETCD - a Distributed Configuration store that stores the state of the PostgreSQL cluster.
  • HAProxy - the load balancer for the cluster and is the single point of entry to client applications.
  • pgBackRest - the backup and restore solution for PostgreSQL
  • Postgres Monitoring and Management (PMM) - the solution to monitor the health of your cluster

How components work together

Each PostgreSQL instance in the cluster maintains consistency with other members through streaming replication. Each instance hosts Patroni - a cluster manager that monitors the cluster health. Patroni relies on the operational ETCD cluster to store the cluster configuration and sensitive data about the cluster health there.

Patroni periodically sends heartbeat requests with the cluster status to ETCD. ETCD writes this information to disk and sends the response back to Patroni. If the current primary fails to renew its status as leader within the specified timeout, Patroni updates the state change in ETCD, which uses this information to elect the new primary and keep the cluster up and running.

The connections to the cluster do not happen directly to the database nodes but are routed via a connection proxy like HAProxy. This proxy determines the active node by querying the Patroni REST API.