Pipelines

PRQL queries are a sequence of lines (or transforms) that form a pipeline. Each line transforms the data, and passes its result to the next.

The simplest pipeline is just:

PRQL

from employees

SQL

SELECT
  *
FROM
  employees

Adding transforms

As we add additional lines, each one transforms the result:

PRQL

from employees
derive gross_salary = (salary + payroll_tax)

SQL

SELECT
  *,
  salary + payroll_tax AS gross_salary
FROM
  employees

…and so on:

from employees
derive gross_salary = (salary + payroll_tax)
sort gross_salary

Compiling to SQL

PRQL compiles the query to SQL. The PRQL compiler tries to represent as many transforms as possible with a single SELECT statement. When necessary, the compiler “overflows” and creates CTEs (common table expressions):

PRQL

from e = employees
derive gross_salary = (salary + payroll_tax)
sort gross_salary
take 10
join d = department [==dept_no]
select [e.name, gross_salary, d.name]

SQL

WITH table_1 AS (
  SELECT
    name,
    salary + payroll_tax AS gross_salary,
    dept_no
  FROM
    employees AS e
  ORDER BY
    gross_salary
  LIMIT
    10
)
SELECT
  table_0.name,
  table_0.gross_salary,
  d.name
FROM
  table_1 AS table_0
  JOIN department AS d ON table_0.dept_no = d.dept_no

See also