Top N rows from each group
Sample data
id | name | dep | salary |
---|---|---|---|
1 | Mike | Management | 5000 |
2 | Devon | Accounting | 1000 |
3 | Jane | Accounting | 1500 |
4 | Chiara | IT | 2000 |
5 | Alex | IT | 3000 |
6 | Tim | IT | 3000 |
Problem
For each department, find the N employee(s) with the highest salary.
Solutions
select distinct on (dep)
*
from
employees
order by
dep
, salary desc
id | name | dep | salary
----+------+------------+--------
3 | Jane | Accounting | 1500
5 | Alex | IT | 3000
1 | Mike | Management | 5000
Note
With DISTINCT ON
we are limited to exactly the first row from each group.
If there are multiple rows with the same salary in a group,
provide additional ORDER BY
columns to resolve the ambiguity.
Note
The DISTINCT ON
expression(s) must match the leftmost ORDER BY
expression(s).
ORDER BY
will usually contain additional expression(s)
that determine the desired precedence of rows within each group.
DISTINCT ON
accepts multiple columns.
select distinct on (department, sub_department)
...
DISTINCT ON
is a PostgreSQL extension to the SQL standard.
select
e.*
from
employees e
, lateral (
select
*
from
employees
where
dep = e.dep
order by salary desc
limit 1 -- or limit N
) sq
where
e.id = sq.id
order by
dep, salary desc
id | name | dep | salary
----+------+------------+--------
3 | Jane | Accounting | 1500
5 | Alex | IT | 3000
1 | Mike | Management | 5000
Note
If there are multiple rows with the same salary in a group,
provide additional ORDER BY
columns to the lateral subquery
to resolve the ambiguity.
Note
The ,
in the FROM
clause is shorthand for CROSS JOIN
.
select
*
from
(
select
*
, rank() over (partition by dep order by salary desc) rank
from
employees
) sq
where
rank = 1 -- or rank <= N
id | name | dep | salary | rank
----+------+------------+--------+------
3 | Jane | Accounting | 1500 | 1
5 | Alex | IT | 3000 | 1
6 | Tim | IT | 3000 | 1
1 | Mike | Management | 5000 | 1
select
*
from
employees e
where
id = (
select
id
from
employees
where
dep = e.dep
order by salary desc
limit 1 -- or limit N
)
)
id | name | dep | salary
----+------+------------+--------
1 | Mike | Management | 5000
3 | Jane | Accounting | 1500
5 | Alex | IT | 3000
Note
If there are multiple rows with the same salary in a group,
provide additional ORDER BY
columns to the subquery
to resolve the ambiguity.