Group

Partitions the rows into groups and applies a pipeline to each of the groups.

group [{key_columns}] {pipeline}

The partitioning of groups are determined by the key_columns (first argument).

The most conventional use of group is with aggregate:

PRQL

from employees
group [title, country] (
  aggregate [
    average salary,
    ct = count
  ]
)

SQL

SELECT
  title,
  country,
  AVG(salary),
  COUNT(*) AS ct
FROM
  employees
GROUP BY
  title,
  country

In concept, a transform in context of a group does the same transformation to the group as it would to the table — for example finding the employee who joined first across the whole table:

PRQL

from employees
sort join_date
take 1

SQL

SELECT
  employees.*
FROM
  employees
ORDER BY
  join_date
LIMIT
  1

To find the employee who joined first in each department, it’s exactly the same pipeline, but within a group expression:

PRQL

from employees
group role (
  sort join_date  # taken from above
  take 1
)

SQL

WITH table_0 AS (
  SELECT
    employees.*,
    ROW_NUMBER() OVER (
      PARTITION BY role
      ORDER BY
        join_date
    ) AS _rn_82
  FROM
    employees
)
SELECT
  table_0.*
FROM
  table_0
WHERE
  _rn_82 <= 1