PRQL Language Book

Pipelined Relational Query Language, pronounced “Prequel”.

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement. Like SQL, it’s readable, explicit and declarative. Unlike SQL, it forms a logical pipeline of transformations, and supports abstractions such as variables and functions. It can be used with any database that uses SQL, since it compiles to SQL.

This book serves as a tutorial and reference guide on the language and the broader project. It currently has three sections, navigated by links on the left:

  • Tutorial — A friendly & accessible guide for learning PRQL. It has a gradual increase of difficulty and requires only basic understanding of programming languages. Knowledge of SQL is beneficial, because of many comparisons to SQL, but not required.
  • Reference — In-depth information about the PRQL language. Includes justifications for language design decisions and formal specifications for parts of the language.
  • Project — General information about the project, tooling and development.

Examples of PRQL with a comparison to the generated SQL. PRQL queries can be as simple as:

PRQL

from tracks filter artist == "Bob Marley" # Each line transforms the previous result aggregate { # `aggregate` reduces each column to a value plays = sum plays, longest = max length, shortest = min length, # Trailing commas are allowed }

SQL

SELECT COALESCE(SUM(plays), 0) AS plays, MAX(length) AS longest, MIN(length) AS shortest FROM tracks WHERE artist = 'Bob Marley'

…and here’s a larger example:

PRQL

from employees filter start_date > @2021-01-01 # Clear date syntax derive { # `derive` adds columns / variables gross_salary = salary + (tax ?? 0), # Terse coalesce gross_cost = gross_salary + benefits, # Variables can use other variables } filter gross_cost > 0 group {title, country} ( # `group` runs a pipeline over each group aggregate { # `aggregate` reduces each group to a value average gross_salary, sum_gross_cost = sum gross_cost, # `=` sets a column name } ) filter sum_gross_cost > 100_000 # `filter` replaces both of SQL's `WHERE` & `HAVING` derive id = f"{title}_{country}" # F-strings like Python derive country_code = s"LEFT(country, 2)" # S-strings permit SQL as an escape hatch sort {sum_gross_cost, -country} # `-country` means descending order take 1..20 # Range expressions (also valid as `take 20`)

SQL

WITH table_1 AS ( SELECT title, country, salary + COALESCE(tax, 0) + benefits AS _expr_1, salary + COALESCE(tax, 0) AS _expr_2 FROM employees WHERE start_date > DATE '2021-01-01' ), table_0 AS ( SELECT title, country, AVG(_expr_2) AS _expr_0, COALESCE(SUM(_expr_1), 0) AS sum_gross_cost FROM table_1 WHERE _expr_1 > 0 GROUP BY title, country ) SELECT title, country, _expr_0, sum_gross_cost, CONCAT(title, '_', country) AS id, LEFT(country, 2) AS country_code FROM table_0 WHERE sum_gross_cost > 100000 ORDER BY sum_gross_cost, country DESC LIMIT 20