Syntax

Summary

A summary of PRQL syntax

SyntaxUsageExample
|Pipelinesfrom employees | select first_name
=Assigns & Aliasesfrom e = employees
derive total = (sum salary)
:Named args & Parametersinterp lower:0 1600 sat_score
[]Listsselect [id, amount]
()Precedence & Parenthesesderive celsius = (fahrenheit - 32) / 1.8
'' & ""Stringsderive name = 'Mary'
` `Quoted identifiersselect `first name`
#Comments# A comment
@Dates & Times@2021-01-01
==Expressionsfilter a == b and c != d and e > f
==Self-equality in joinjoin s=salaries [==id]
->Function definitionsfunc add a b -> a + b
+/-Sort ordersort [-amount, +date]
??Coalesceamount ?? 0

Pipes

Pipes — the connection between transforms that make up a pipeline — can be either line breaks or a pipe character (|).

In almost all situations, line-breaks pipe the result of a line’s transform into the transform on the following line. For example, the filter transform operates on the result of from employees (which is just the employees table), and the select transform operates on the result of the filter transform.

PRQL

from employees
filter department == "Product"
select [first_name, last_name]

SQL

SELECT
  first_name,
  last_name
FROM
  employees
WHERE
  department = 'Product'

In the place of a line-break, it’s also possible to use the | character to pipe results, such that this is equivalent:

PRQL

from employees | filter department == "Product" | select [first_name, last_name]

SQL

SELECT
  first_name,
  last_name
FROM
  employees
WHERE
  department = 'Product'

A line-break doesn’t create a pipeline in a couple of cases:

  • within a list (e.g. the derive examples below),
  • when the following line is a new statement, which starts with a keyword of func, table or from.

Lists

Lists are represented with [], and can span multiple lines. A final trailing comma is optional.

PRQL

from numbers
derive [x = 1, y = 2]
derive [
  a = x,
  b = y
]
derive [
  c = a,
  d = b,
]

SQL

SELECT
  *,
  1 AS x,
  2 AS y,
  1 AS a,
  2 AS b,
  1 AS c,
  2 AS d
FROM
  numbers

Most transforms can take either a list or a single item, so these are equivalent:

PRQL

from employees
select [first_name]

SQL

SELECT
  first_name
FROM
  employees

PRQL

from employees
select first_name

SQL

SELECT
  first_name
FROM
  employees

Expressions

PRQL is made up of expressions, like 2 + 3 or ((1 + x) * y). In the example below, note the use of expressions to calculate the alias circumference and in the filter transform.

PRQL

from foo
select [
  circumference = diameter * 3.14159,
  color,
]
filter circumference > 10 and color != "red"

SQL

WITH table_1 AS (
  SELECT
    diameter * 3.14159 AS circumference,
    color
  FROM
    foo
)
SELECT
  circumference,
  color
FROM
  table_1
WHERE
  circumference > 10
  AND color <> 'red'

Precedence and Parentheses

Parentheses — () — are used to give precedence to inner expressions.

Note

We realize some of the finer points here are not intuitive. We are considering approaches to make this more intuitive — even at the cost of requiring more syntax in some circumstances. And we’re planning to make the error messages much better, so the compiler is there to help out.

Parentheses are required around:

  • Any nested function call containing a pipe, either the | symbol or a new line. “Nested” means within a transform; i.e. not just the main pipeline.
  • Any function call that isn’t a single item in a list or a pipeline, like sum distance in round 0 (sum distance)1.
  • A minus sign in a function argument, like in add (-1) (-3)
  • Inner transforms for group, window, and other transforms.

Parentheses are not required around expressions which use operators but no function call, like foo + bar.

1

or, technically, it’s on the right side of an assignment in a list…

Here’s a full rundown of times this applier:

PRQL

from employees
# Requires parentheses, because it's contains a pipe
derive is_proximate = (distance | in 0..20)
# Requires parentheses, because it's a function call
derive total_distance = (sum distance)
# `??` doesn't require parentheses, as it's not a function call
derive min_capped_distance = (min distance ?? 5)
# No parentheses needed, because no function call
derive travel_time = distance / 40
# No inner parentheses needed around `1+1` because no function call
derive distance_rounded_2_dp = (round 1+1 distance)
derive [
  # Requires parentheses, because it contains a pipe
  is_far = (distance | in 100..),
  # The left value of the range requires parentheses,
  # because of the minus sign
  is_negative = (distance | in (-100..0)),
  # ...this is equivalent
  is_negative = (distance | in (-100)..0),
  # Doesn't require parentheses, because it's in a list (confusing, see footnote)!
  average_distance = average distance,
]
# Requires parentheses because of the minus sign
sort (-distance)
# A list is fine too
sort [-distance]

SQL

SELECT
  *,
  distance BETWEEN 0 AND 20 AS is_proximate,
  SUM(distance) OVER () AS total_distance,
  MIN(COALESCE(distance, 5)) OVER () AS min_capped_distance,
  distance / 40 AS travel_time,
  ROUND(distance, 2) AS distance_rounded_2_dp,
  distance >= 100 AS is_far,
  distance BETWEEN -100 AND 0,
  distance BETWEEN -100 AND 0 AS is_negative,
  AVG(distance) OVER () AS average_distance
FROM
  employees
ORDER BY
  distance DESC

Inner Transforms

Parentheses are also used for transforms (such as group and window) that pass their result to an “inner transform”. The example below applies the aggregate pipeline to each group of unique title and country values:

PRQL

from employees
group [title, country] (
  aggregate [
    average salary,
    ct = count
  ]
)

SQL

SELECT
  title,
  country,
  AVG(salary),
  COUNT(*) AS ct
FROM
  employees
GROUP BY
  title,
  country

Comments

Comments are represented by #.

PRQL

from employees  # Comment 1
# Comment 2
aggregate [average salary]

SQL

SELECT
  AVG(salary)
FROM
  employees

There’s no distinct multiline comment syntax.

Quoted identifiers

To use identifiers that are otherwise invalid, surround them with backticks. Depending on the dialect, these will remain as backticks or be converted to double-quotes.

PRQL

prql target:sql.mysql
from employees
select `first name`

SQL

SELECT
  `first name`
FROM
  employees

PRQL

prql target:sql.postgres
from employees
select `first name`

SQL

SELECT
  "first name"
FROM
  employees

PRQL

from `dir/*.parquet`

SQL

SELECT
  *
FROM
  "dir/*.parquet"

BigQuery also uses backticks to surround project & dataset names (even if valid identifiers) in the SELECT statement:

PRQL

prql target:sql.bigquery
from `project-foo.dataset.table`
join `project-bar.dataset.table` [==col_bax]

SQL

SELECT
  `project-foo.dataset.table`.*,
  `project-bar.dataset.table`.*
FROM
  `project-foo.dataset.table`
  JOIN `project-bar.dataset.table` ON `project-foo.dataset.table`.col_bax = `project-bar.dataset.table`.col_bax

Quoting schemas

This is currently not great and we are working on improving it; see
https://github.com/PRQL/prql/issues/1535 for progress.

If supplying a schema without a column — for example in a from or join transform, that also needs to be a quoted identifier:

PRQL

from `music.albums`

SQL

SELECT
  *
FROM
  music.albums

Parameters

PRQL will retain parameters like $1 in SQL output, which can then be supplied to the SQL query:

PRQL

from employees
filter id == $1

SQL

SELECT
  *
FROM
  employees
WHERE
  id = $1

Numbers

Numbers can contain underscores between numbers; which can make reading large numbers easier:

PRQL

from numbers
select [
    small = 1.000_000_1,
    big = 5_000_000,
]

SQL

SELECT
  1.0000001 AS small,
  5000000 AS big
FROM
  numbers