Group

Partitions the rows into groups and applies a pipeline to each of the groups.

group [{key_columns}] {pipeline}

The partitioning of groups are determined by the key_columns (first argument).

The most conventional use of group is with aggregate:

PRQL

from employees
group [title, country] (
  aggregate [
    average salary,
    ct = count
  ]
)

SQL

SELECT
  title,
  country,
  AVG(salary),
  COUNT(*) AS ct
FROM
  employees
GROUP BY
  title,
  country

In concept, a transform in context of a group does the same transformation to the group as it would to the table — for example finding the employee who joined first across the whole table:

PRQL

from employees
sort join_date
take 1

SQL

SELECT
  *
FROM
  employees
ORDER BY
  join_date
LIMIT
  1

To find the employee who joined first in each department, it’s exactly the same pipeline, but within a group expression:

PRQL

from employees
group role (
  sort join_date  # taken from above
  take 1
)

SQL

WITH table_1 AS (
  SELECT
    *,
    ROW_NUMBER() OVER (
      PARTITION BY role
      ORDER BY
        join_date
    ) AS _expr_0
  FROM
    employees
)
SELECT
  *
FROM
  table_1
WHERE
  _expr_0 <= 1