Group
Partitions the rows into groups and applies a pipeline to each of the groups.
group {key_columns} (pipeline)
The partitioning of groups are determined by the key_columns (first argument).
The most conventional use of group is with aggregate:
PRQL
from employees
group {title, country} (
aggregate {
average salary,
ct = count salary
}
)
SQL
SELECT
title,
country,
AVG(salary),
COUNT(*) AS ct
FROM
employees
GROUP BY
title,
country
In concept, a transform in context of a group does the same transformation to
the group as it would to the table — for example finding the employee who joined
first across the whole table:
PRQL
from employees
sort join_date
take 1
SQL
SELECT
*
FROM
employees
ORDER BY
join_date
LIMIT
1
To find the employee who joined first in each department, it’s exactly the same
pipeline, but within a group expression:
PRQL
from employees
group role (
sort join_date # taken from above
take 1
)
SQL
WITH table_0 AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY role
ORDER BY
join_date
) AS _expr_0
FROM
employees
)
SELECT
*
FROM
table_0
WHERE
_expr_0 <= 1