Difference Between GROUP BY and PARTITION BY

By John Doe April 8, 2023

Summary: The GROUP BY and PARTITION BY clauses are both used in SQL to group data and calculate aggregate values, then what’s the difference between them?

Table of Contents

Introduction

The GROUP BY and PARTITION BY clauses are used in SQL to group data and calculate aggregate values. The GROUP BY clause divides the result set into groups and returns one row per group, while the PARTITION BY clause divides the result set into partitions and returns aggregated columns with each record in the specified table. The GROUP BY clause is often used with aggregate functions like SUM() and AVG() to calculate sums or averages for each group. The PARTITION BY clause does not reduce the number of rows returned. The PARTITION BY clause gives more flexibility in choosing the grouping columns and allows for different columns in each partition.

Example of GROUP BY

Here is an example that shows how to compare the average salaries of employees in each department:

SELECT depname, avg(salary) FROM empsalary GROUP BY depname;
  depname  |          avg
-----------+-----------------------
 develop   | 5020.0000000000000000
 personnel | 3700.0000000000000000
 sales     | 4866.6666666666666667
(3 rows)

The first output column comes directly from the table empsalary, and the column should be specified as a column in GROUP BY clause. The second column represents the average salary value in the group.

Example of PARTITION BY

Here is an example that shows how to compare each employee’s salary with the average salary in his or her department:

SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;
  depname  | empno | salary |          avg
-----------+-------+--------+-----------------------
 develop   |    11 |   5200 | 5020.0000000000000000
 develop   |     7 |   4200 | 5020.0000000000000000
 develop   |     9 |   4500 | 5020.0000000000000000
 develop   |     8 |   6000 | 5020.0000000000000000
 develop   |    10 |   5200 | 5020.0000000000000000
 personnel |     5 |   3500 | 3700.0000000000000000
 personnel |     2 |   3900 | 3700.0000000000000000
 sales     |     3 |   4800 | 4866.6666666666666667
 sales     |     1 |   5000 | 4866.6666666666666667
 sales     |     4 |   4800 | 4866.6666666666666667
(10 rows)

The first three output columns come directly from the table empsalary, and there is one output row for each row in the table. The fourth column represents an average taken across all the table rows that have the same depname value as the current row. (This actually is the same function as the non-window avg aggregate, but the OVER clause causes it to be treated as a window function and computed across the window frame.)

Summary

Although we use a GROUP BY most of the time, there are numerous cases when a PARTITION BY would be a better choice. In some cases, you could use a GROUP BY using subqueries to simulate a PARTITION BY, but these can end up with very complex queries.

Let’s wrap everything up with the most important similarities and differences:

  • Similarity: Both are used to return aggregated values.
  • Difference: Using a GROUP BY clause collapses original rows, then you cannot access the original values later in the query. On the other hand, using a PARTITION BY clause keeps original values while also allowing us to produce aggregated values.
  • Difference: The PARTITION BY is combined with OVER() and windows functions to add a lot more functionalities.