PostgreSQL Tutorial: Splitting wide tables

June 12, 2024

Summary: In this tutorial, you will learn how to split wide tables in PostgreSQL.

Table of Contents

Introduction

Consider all the attributes of an entity that has rows that are too wide for good performance. Look for some theme or principle to divide them into two groups. Then split the table into two tables, a primary table and a companion table, repeating the primary key in each one.

The shorter rows allow you to query or update each table quickly. For example, we have a table defined as following:

CREATE TABLE orders (
  order_num     integer primary key,
  order_date    date,
  customer_num  integer,
  ship_instruct varchar(40),
  ship_date     date,
  ship_weight   decimal(8,2),
  ship_charge   money(6,2),
  paid_date     date);

Division by Bulk

One principle on which you can divide an entity table is bulk. Move the bulky attributes, which are usually character strings, to the companion table. Keep the numeric and other small attributes in the primary table. In the above example, you can split the ship_instruct column from the orders table. You can call the companion table orders_ship. It has two columns, a primary key that is a copy of orders.order_num and the original ship_instruct column.

Division by Frequency of Use

Another principle for division of an entity is frequency of use. If a few attributes are rarely queried, move them to a companion table. In the above table, for example, perhaps only one program queries the ship_instruct, ship_weight, and ship_charge columns. In that case, you can move them to a companion table.

Division by Frequency of Update

Updates take longer than queries, and updating programs lock data pages and rows of data during the update process, preventing querying programs from accessing the tables. If you can separate one table into two companion tables, one with the most-updated entities and the other with the most-queried entities, you can often improve overall response time.

PostgreSQL MVCC works by saving an internal copy of updated or deleted rows (also called tuples) until a transaction is either committed or rolled back. This saved internal copy is invisible to users. However, table bloat can occur when those invisible copies aren’t cleaned up regularly by the VACUUM or AUTOVACUUM utilities. You can separate the frequently updated columns into an individual table, to save storage costs and speed up your database system.

Performance Costs of Splitting Tables

Splitting a table uses extra disk space and adds complexity. Two copies of the primary key occur for each row, one copy in each table. Two primary-key indexes also exist.

You must modify existing programs, reports, and forms that use SELECT * because fewer columns are returned. Programs, reports, and forms that use attributes from both tables must perform a join to bring the tables together.

In this case, when you insert or delete a row, two tables are altered instead of one. If you do not coordinate the alteration of the two tables (by making them within a single transaction, for example), you lose semantic integrity.

See more

PostgreSQL Optimization