Syntax
Summary
A summary of PRQL syntax
Syntax | Usage | Example |
---|---|---|
| | Pipelines | from employees | select first_name |
= | Assigns & Aliases | from e = employees derive total = (sum salary) |
: | Named args & Parameters | interp lower:0 1600 sat_score |
[] | Lists | select [id, amount] |
() | Precedence & Parentheses | derive celsius = (fahrenheit - 32) / 1.8 |
'' & "" | Strings | derive name = 'Mary' |
` ` | Quoted identifiers | select `first name` |
# | Comments | # A comment |
@ | Dates & Times | @2021-01-01 |
== | Expressions | filter a == b and c != d and e > f |
== | Self-equality in join | join s=salaries [==id] |
-> | Function definitions | func add a b -> a + b |
+ /- | Sort order | sort [-amount, +date] |
?? | Coalesce | amount ?? 0 |
Pipes
Pipes — the connection between transforms that make up a
pipeline — can be either line breaks or a pipe character (|
).
In almost all situations, line-breaks pipe the result of a line’s transform into
the transform on the following line. For example, the filter
transform
operates on the result of from employees
(which is just the employees
table), and the select
transform operates on the result of the filter
transform.
PRQL
from employees
filter department == "Product"
select [first_name, last_name]
SQL
SELECT
first_name,
last_name
FROM
employees
WHERE
department = 'Product'
In the place of a line-break, it’s also possible to use the |
character to
pipe results, such that this is equivalent:
PRQL
from employees | filter department == "Product" | select [first_name, last_name]
SQL
SELECT
first_name,
last_name
FROM
employees
WHERE
department = 'Product'
A line-break doesn’t create a pipeline in a couple of cases:
- within a list (e.g. the
derive
examples below), - when the following line is a new statement, which starts with a keyword of
func
,table
orfrom
.
Lists
Lists are represented with []
, and can span multiple lines. A final trailing
comma is optional.
PRQL
from numbers
derive [x = 1, y = 2]
derive [
a = x,
b = y
]
derive [
c = a,
d = b,
]
SQL
SELECT
*,
1 AS x,
2 AS y,
1 AS a,
2 AS b,
1 AS c,
2 AS d
FROM
numbers
Most transforms can take either a list or a single item, so these are equivalent:
Expressions
PRQL is made up of expressions, like 2 + 3
or ((1 + x) * y)
. In the
example below, note the use of expressions to calculate the alias
circumference
and in the filter
transform.
PRQL
from foo
select [
circumference = diameter * 3.14159,
color,
]
filter circumference > 10 and color != "red"
SQL
WITH table_1 AS (
SELECT
diameter * 3.14159 AS circumference,
color
FROM
foo
)
SELECT
circumference,
color
FROM
table_1
WHERE
circumference > 10
AND color <> 'red'
Precedence and Parentheses
Parentheses — ()
— are used to give precedence to inner expressions.
We realize some of the finer points here are not intuitive. We are considering approaches to make this more intuitive — even at the cost of requiring more syntax in some circumstances. And we’re planning to make the error messages much better, so the compiler is there to help out.
Parentheses are required around:
- Any nested function call containing a pipe, either the
|
symbol or a new line. “Nested” means within a transform; i.e. not just the main pipeline. - Any function call that isn’t a single item in a list or a pipeline, like
sum distance
inround 0 (sum distance)
1. - A minus sign in a function argument, like in
add (-1) (-3)
- Inner transforms for
group
,window
, and other transforms.
Parentheses are not required around expressions which use operators but no
function call, like foo + bar
.
or, technically, it’s on the right side of an assignment in a list…
Here’s a full rundown of times this applier:
PRQL
from employees
# Requires parentheses, because it's contains a pipe
derive is_proximate = (distance | in 0..20)
# Requires parentheses, because it's a function call
derive total_distance = (sum distance)
# `??` doesn't require parentheses, as it's not a function call
derive min_capped_distance = (min distance ?? 5)
# No parentheses needed, because no function call
derive travel_time = distance / 40
# No inner parentheses needed around `1+1` because no function call
derive distance_rounded_2_dp = (round 1+1 distance)
derive [
# Requires parentheses, because it contains a pipe
is_far = (distance | in 100..),
# The left value of the range requires parentheses,
# because of the minus sign
is_negative = (distance | in (-100..0)),
# ...this is equivalent
is_negative = (distance | in (-100)..0),
# Doesn't require parentheses, because it's in a list (confusing, see footnote)!
average_distance = average distance,
]
# Requires parentheses because of the minus sign
sort (-distance)
# A list is fine too
sort [-distance]
SQL
SELECT
*,
distance BETWEEN 0 AND 20 AS is_proximate,
SUM(distance) OVER () AS total_distance,
MIN(COALESCE(distance, 5)) OVER () AS min_capped_distance,
distance / 40 AS travel_time,
ROUND(distance, 2) AS distance_rounded_2_dp,
distance >= 100 AS is_far,
distance BETWEEN -100 AND 0,
distance BETWEEN -100 AND 0 AS is_negative,
AVG(distance) OVER () AS average_distance
FROM
employees
ORDER BY
distance DESC
Inner Transforms
Parentheses are also used for transforms (such as group
and window
) that
pass their result to an “inner transform”. The example below applies the
aggregate
pipeline to each group of unique title
and country
values:
PRQL
from employees
group [title, country] (
aggregate [
average salary,
ct = count
]
)
SQL
SELECT
title,
country,
AVG(salary),
COUNT(*) AS ct
FROM
employees
GROUP BY
title,
country
Comments
Comments are represented by #
.
PRQL
from employees # Comment 1
# Comment 2
aggregate [average salary]
SQL
SELECT
AVG(salary)
FROM
employees
There’s no distinct multiline comment syntax.
Quoted identifiers
To use identifiers that are otherwise invalid, surround them with backticks. Depending on the dialect, these will remain as backticks or be converted to double-quotes.
PRQL
prql target:sql.mysql
from employees
select `first name`
SQL
SELECT
`first name`
FROM
employees
PRQL
prql target:sql.postgres
from employees
select `first name`
SQL
SELECT
"first name"
FROM
employees
BigQuery also uses backticks to surround project & dataset names (even if valid
identifiers) in the SELECT
statement:
PRQL
prql target:sql.bigquery
from `project-foo.dataset.table`
join `project-bar.dataset.table` [==col_bax]
SQL
SELECT
`project-foo.dataset.table`.*,
`project-bar.dataset.table`.*
FROM
`project-foo.dataset.table`
JOIN `project-bar.dataset.table` ON `project-foo.dataset.table`.col_bax = `project-bar.dataset.table`.col_bax
Quoting schemas
This is currently not great and we are working on improving it; see
https://github.com/PRQL/prql/issues/1535 for progress.
If supplying a schema without a column — for example in a from
or join
transform, that also needs to be a quoted identifier:
Parameters
PRQL will retain parameters like $1
in SQL output, which can then be supplied
to the SQL query:
Numbers
Numbers can contain underscores between numbers; which can make reading large numbers easier: