Introduction
PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement. Like SQL, it’s readable, explicit and declarative. Unlike SQL, it forms a logical pipeline of transformations, and supports abstractions such as variables and functions. It can be used with any database that uses SQL, since it transpiles to SQL.
Let’s get started with an example:
PRQL
from employees
filter start_date > @2021-01-01 # Clear date syntax
derive [ # `derive` adds columns / variables
gross_salary = salary + (tax ?? 0), # Terse coalesce
gross_cost = gross_salary + benefits_cost, # Variables can use other variables
]
filter gross_cost > 0
group [title, country] ( # `group` runs a pipeline over each group
aggregate [ # `aggregate` reduces each group to a value
average gross_salary,
sum_gross_cost = sum gross_cost, # `=` sets a column name
]
)
filter sum_gross_cost > 100_000 # `filter` replaces both of SQL's `WHERE` & `HAVING`
derive id = f"{title}_{country}" # F-strings like Python
derive country_code = s"LEFT(country, 2)" # S-strings allow using SQL as an escape hatch
sort [sum_gross_cost, -country] # `-country` means descending order
take 1..20 # Range expressions (also valid here as `take 20`)
SQL
WITH table_3 AS (
SELECT
title,
country,
salary + COALESCE(tax, 0) + benefits_cost AS _expr_1,
salary + COALESCE(tax, 0) AS _expr_2
FROM
employees
WHERE
start_date > DATE '2021-01-01'
),
table_1 AS (
SELECT
title,
country,
AVG(_expr_2) AS _expr_0,
SUM(_expr_1) AS sum_gross_cost
FROM
table_3 AS table_2
WHERE
_expr_1 > 0
GROUP BY
title,
country
)
SELECT
title,
country,
_expr_0,
sum_gross_cost,
CONCAT(title, '_', country) AS id,
LEFT(country, 2) AS country_code
FROM
table_1 AS table_0
WHERE
sum_gross_cost > 100000
ORDER BY
sum_gross_cost,
country DESC
LIMIT
20
As you can see, PRQL is a linear pipeline of transformations — each line of the query is a transformation of the previous line’s result.
You can see that in SQL, operations do not follow one another, which makes it hard to compose larger queries.
Queries
PRQL queries are composed of a sequence of transforms that form a pipeline that modifies data as it is passed from one to the next.
The major items of a query are listed at the left:
Pipelines
PRQL queries are a sequence of lines (or transforms) that form a pipeline. Each line transforms the data, and passes its result to the next.
The simplest pipeline is just:
Adding transforms
As we add additional lines, each one transforms the result:
PRQL
from employees
derive gross_salary = (salary + payroll_tax)
SQL
SELECT
*,
salary + payroll_tax AS gross_salary
FROM
employees
…and so on:
from employees
derive gross_salary = (salary + payroll_tax)
sort gross_salary
Compiling to SQL
PRQL compiles the query to SQL. The PRQL compiler tries to represent as many
transforms as possible with a single SELECT
statement. When necessary, the
compiler “overflows” and creates CTEs (common table expressions):
PRQL
from e = employees
derive gross_salary = (salary + payroll_tax)
sort gross_salary
take 10
join d = department [==dept_no]
select [e.name, gross_salary, d.name]
SQL
WITH table_1 AS (
SELECT
name,
salary + payroll_tax AS gross_salary,
dept_no
FROM
employees AS e
ORDER BY
gross_salary
LIMIT
10
)
SELECT
table_0.name,
table_0.gross_salary,
d.name
FROM
table_1 AS table_0
JOIN department AS d ON table_0.dept_no = d.dept_no
See also
- Transforms - PRQL Transforms
- Syntax - Notation for PRQL queries
Functions
Functions are a fundamental abstraction in PRQL — they allow us to run code in many places that we’ve written once. This reduces the number of errors in our code, makes our code more readable, and simplifies making changes.
Functions have two types of parameters:
- Positional parameters, which require an argument.
- Named parameters, which optionally take an argument, otherwise using their default value.
So this function is named fahrenheit_to_celsius
and has one parameter temp
:
PRQL
func fahrenheit_to_celsius temp -> (temp - 32) / 1.8
from cities
derive temp_c = (fahrenheit_to_celsius temp_f)
SQL
SELECT
*,
(temp_f - 32) / 1.8 AS temp_c
FROM
cities
This function is named interp
, and has two positional parameters named high
and x
, and one named parameter named low
which takes a default argument of
0
. It calculates the proportion of the distance that x
is between low
and
high
.
PRQL
func interp low:0 high x -> (x - low) / (high - low)
from students
derive [
sat_proportion_1 = (interp 1600 sat_score),
sat_proportion_2 = (interp low:0 1600 sat_score),
]
SQL
SELECT
*,
(sat_score - 0) / 1600 AS sat_proportion_1,
(sat_score - 0) / 1600 AS sat_proportion_2
FROM
students
Piping
Consistent with the principles of PRQL, it’s possible to pipe values into functions, which makes composing many functions more readable. When piping a value into a function, the value is passed as an argument to the final positional parameter of the function. Here’s the same result as the examples above with an alternative construction:
PRQL
func interp low:0 high x -> (x - low) / (high - low)
from students
derive [
sat_proportion_1 = (sat_score | interp 1600),
sat_proportion_2 = (sat_score | interp low:0 1600),
]
SQL
SELECT
*,
(sat_score - 0) / 1600 AS sat_proportion_1,
(sat_score - 0) / 1600 AS sat_proportion_2
FROM
students
and
PRQL
func fahrenheit_to_celsius temp -> (temp - 32) / 1.8
from cities
derive temp_c = (temp_f | fahrenheit_to_celsius)
SQL
SELECT
*,
(temp_f - 32) / 1.8 AS temp_c
FROM
cities
We can combine a chain of functions, which makes logic more readable:
PRQL
func fahrenheit_to_celsius temp -> (temp - 32) / 1.8
func interp low:0 high x -> (x - low) / (high - low)
from kettles
derive boiling_proportion = (temp_c | fahrenheit_to_celsius | interp 100)
SQL
SELECT
*,
((temp_c - 32) / 1.8 - 0) / 100 AS boiling_proportion
FROM
kettles
Scope
Late binding
Functions can binding to any variables in scope when the function is executed.
For example, here cost_total
refers to the column that’s introduced in the
from
.
PRQL
func cost_share cost -> cost / cost_total
from costs
select [materials, labor, overhead, cost_total]
derive [
materials_share = (cost_share materials),
labor_share = (cost_share labor),
overhead_share = (cost_share overhead),
]
SQL
SELECT
materials,
labor,
overhead,
cost_total,
materials / cost_total AS materials_share,
labor / cost_total AS labor_share,
overhead / cost_total AS overhead_share
FROM
costs
Variables
We can define a relation — similar to a CTE in SQL — as a variable with let
:
PRQL
let top_50 = (
from employees
sort salary
take 50
aggregate [total_salary = sum salary]
)
from top_50 # Starts a new pipeline
SQL
WITH table_1 AS (
SELECT
salary
FROM
employees
ORDER BY
salary
LIMIT
50
), top_50 AS (
SELECT
SUM(salary) AS total_salary
FROM
table_1 AS table_0
)
SELECT
total_salary
FROM
top_50
We can even place a whole CTE in an s-string, enabling us to use features which PRQL doesn’t yet support.
PRQL
let grouping = s"""
SELECT SUM(a)
FROM tbl
GROUP BY
GROUPING SETS
((b, c, d), (d), (b, d))
"""
from grouping
SQL
WITH table_0 AS (
SELECT
SUM(a)
FROM
tbl
GROUP BY
GROUPING SETS ((b, c, d), (d), (b, d))
),
grouping AS (
SELECT
*
FROM
table_0 AS table_1
)
SELECT
*
FROM
grouping
In PRQL table
s are far less common than CTEs are in SQL, since a linear series
of CTEs can be represented with a single pipeline.
Currently defining variables with let
is restricted to relations. We’d like to
extend this to expressions that evaluate to scalars.
Transforms
PRQL queries are a pipeline of transformations (“transforms”), where each transform takes the previous result and adjusts it in some way, before passing it onto to the next transform.
Because PRQL focuses on modularity, we have far fewer transforms than SQL, each one fulfilling a specific purpose. That’s often referred to as “orthogonality”.
These are the currently available transforms:
Transform | Purpose | SQL Equivalent |
---|---|---|
from | Start from a table | FROM |
derive | Compute new columns | SELECT *, ... AS ... |
select | Pick & compute columns | SELECT ... AS ... |
filter | Pick rows based on their values | WHERE , HAVING ,QUALIFY |
sort | Order rows based on the values of columns | ORDER BY |
join | Add columns from another table, matching rows based on a condition | JOIN |
take | Pick rows based on their position | TOP , LIMIT , OFFSET |
group | Partition rows into groups and applies a pipeline to each of them | GROUP BY , PARTITION BY |
aggregate | Summarize many rows into one row | SELECT foo(...) |
window | Apply a pipeline to overlapping segments of rows | OVER , ROWS , RANGE |
Aggregate
Summarizes many rows into one row.
When applied:
- without
group
, it produces one row from the whole table, - within a
group
pipeline, it produces one row from each group.
aggregate [{expression or assign operations}]
Currently, all declared aggregation functions are min
, max
, count
,
average
, stddev
, avg
, sum
and count_distinct
. We are in the
process of filling out std lib.
Examples
PRQL
from employees
aggregate [
average salary,
ct = count
]
SQL
SELECT
AVG(salary),
COUNT(*) AS ct
FROM
employees
PRQL
from employees
group [title, country] (
aggregate [
average salary,
ct = count
]
)
SQL
SELECT
title,
country,
AVG(salary),
COUNT(*) AS ct
FROM
employees
GROUP BY
title,
country
Aggregate is required
Unlike in SQL, using an aggregation function in derive
or select
(or any
other transform except aggregate
) will not trigger aggregation. By default,
PRQL will interpret such attempts functions as window functions:
PRQL
from employees
derive [avg_sal = average salary]
SQL
SELECT
*,
AVG(salary) OVER () AS avg_sal
FROM
employees
This ensures that derive
does not manipulate the number of rows, but only ever
adds a column. For more information, see window transform.
Append
Concatenates two tables together.
Equivalent to UNION ALL
in SQL. The number of rows is always the sum of the
number of rows from the two input tables. To replicate UNION DISTINCT
, see
set operations.
PRQL
from employees_1
append employees_2
SQL
SELECT
*
FROM
employees_1
UNION
ALL
SELECT
*
FROM
employees_2
Remove
experimental
Removes rows that appear in another relation, like EXCEPT ALL
. Duplicate rows
are removed one-for-one.
PRQL
from employees_1
remove employees_2
SQL
SELECT
*
FROM
employees_1 AS t
EXCEPT
ALL
SELECT
*
FROM
employees_2 AS b
Intersection
experimental
PRQL
from employees_1
intersect employees_2
SQL
SELECT
*
FROM
employees_1 AS t
INTERSECT
ALL
SELECT
*
FROM
employees_2 AS b
Set operations
experimental
To imitate set operations i.e. (UNION
, EXCEPT
and INTERSECT
), you can use
the following functions:
func distinct rel -> (from t = _param.rel | group [t.*] (take 1))
func union `default_db.bottom` top -> (top | append bottom | distinct)
func except `default_db.bottom` top -> (top | distinct | remove bottom)
func intersect_distinct `default_db.bottom` top -> (top | intersect bottom | distinct)
Don’t mind the default_db.
and _param.
, these are compiler implementation
detail for now.
Derive
Computes one or more new columns.
derive [
{name} = {expression},
# or
{column},
]
Examples
PRQL
from employees
derive gross_salary = salary + payroll_tax
SQL
SELECT
*,
salary + payroll_tax AS gross_salary
FROM
employees
PRQL
from employees
derive [
gross_salary = salary + payroll_tax,
gross_cost = gross_salary + benefits_cost
]
SQL
SELECT
*,
salary + payroll_tax AS gross_salary,
salary + payroll_tax + benefits_cost AS gross_cost
FROM
employees
Filter
Picks rows based on their values.
filter {boolean_expression}
Examples
PRQL
from employees
filter (age > 25 || department != "IT")
SQL
SELECT
*
FROM
employees
WHERE
age > 25
OR department <> 'IT'
PRQL
from employees
filter (age | in 25..40)
SQL
SELECT
*
FROM
employees
WHERE
age BETWEEN 25 AND 40
From
Specifies a data source.
from {table_reference}
Table names containing schemas or needing to be quoted for other reasons need to be contained within backticks.
Examples
To introduce an alias, use an assign expression:
Group
Partitions the rows into groups and applies a pipeline to each of the groups.
group [{key_columns}] {pipeline}
The partitioning of groups are determined by the key_column
s (first argument).
The most conventional use of group
is with aggregate
:
PRQL
from employees
group [title, country] (
aggregate [
average salary,
ct = count
]
)
SQL
SELECT
title,
country,
AVG(salary),
COUNT(*) AS ct
FROM
employees
GROUP BY
title,
country
In concept, a transform in context of a group
does the same transformation to
the group as it would to the table — for example finding the employee who joined
first across the whole table:
To find the employee who joined first in each department, it’s exactly the same
pipeline, but within a group
expression:
PRQL
from employees
group role (
sort join_date # taken from above
take 1
)
SQL
WITH table_1 AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY role
ORDER BY
join_date
) AS _expr_0
FROM
employees
)
SELECT
*
FROM
table_1 AS table_0
WHERE
_expr_0 <= 1
Join
Adds columns from another table, matching rows based on a condition.
join side:{inner|left|right|full} {table} {[conditions]}
Parameters
side
decides which rows to include, defaulting toinner
.- Table reference
- List of conditions
- The result of join operation is a cartesian (cross) product of rows from both tables, which is then filtered to match all of these conditions.
- If name is the same from both tables, it can be expressed with only
==col
.
Examples
PRQL
from employees
join side:left positions [employees.id==positions.employee_id]
SQL
SELECT
employees.*,
positions.*
FROM
employees
LEFT JOIN positions ON employees.id = positions.employee_id
PRQL
from employees
join side:left p=positions [employees.id==p.employee_id]
SQL
SELECT
employees.*,
p.*
FROM
employees
LEFT JOIN positions AS p ON employees.id = p.employee_id
Self equality operator
If the join conditions are of form left.x == right.x
, we can use “self
equality operator”:
PRQL
from employees
join positions [==emp_no]
SQL
SELECT
employees.*,
positions.*
FROM
employees
JOIN positions ON employees.emp_no = positions.emp_no
Select
Picks and computes columns.
select [
{name} = {expression},
# or
{column},
]
# or
select ![{column}]
Examples
PRQL
from employees
select name = f"{first_name} {last_name}"
SQL
SELECT
CONCAT(first_name, ' ', last_name) AS name
FROM
employees
PRQL
from employees
select [
name = f"{first_name} {last_name}",
age_eoy = dob - @2022-12-31,
]
SQL
SELECT
CONCAT(first_name, ' ', last_name) AS name,
dob - DATE '2022-12-31' AS age_eoy
FROM
employees
PRQL
from e=employees
select [e.first_name, e.last_name]
SQL
SELECT
first_name,
last_name
FROM
employees AS e
Excluding columns
We can use !
to exclude a list of columns. This can operate in two ways:
- We use
SELECT * EXCLUDE
/SELECT * EXCEPT
for the columns supplied toselect ![]
in dialects which support it. - Otherwise, the columns must have been defined prior in the query (unless all
of a table’s columns are excluded); for example in another
select
or agroup
transform. In this case, we evaluate and specify the columns that should be included in the output SQL.
Some examples:
PRQL
prql target:sql.bigquery
from tracks
select ![milliseconds,bytes]
SQL
SELECT
*
EXCEPT
(milliseconds, bytes)
FROM
tracks
PRQL
from tracks
select [track_id, title, composer, bytes]
select ![title, composer]
SQL
SELECT
track_id,
bytes
FROM
tracks
In the final example above, the e
representing the table / namespace is no
longer available after the select
statement. For example, this would raise an error:
from e=employees
select e.first_name
filter e.first_name == "Fred" # Can't find `e.first_name`
To refer to the e.first_name
column in subsequent transforms,
either refer to it using first_name
, or if it requires a different name,
assign one in the select
statement:
Sort
Orders rows based on the values of one or more columns.
sort [{direction}{column}]
Parameters
- One column or a list of columns to sort by
- Each column can be prefixed with:
+
, for ascending order, the default-
, for descending order
- When using prefixes, even a single column needs to be in a list or
parentheses. (Otherwise,
sort -foo
is parsed as a subtraction betweensort
andfoo
.)
Examples
PRQL
from employees
sort [age, -tenure, +salary]
SQL
SELECT
*
FROM
employees
ORDER BY
age,
tenure DESC,
salary
We can also use expressions:
PRQL
from employees
sort [s"substr({first_name}, 2, 5)"]
SQL
WITH table_1 AS (
SELECT
*,
substr(first_name, 2, 5) AS _expr_0
FROM
employees
ORDER BY
_expr_0
)
SELECT
*
FROM
table_1 AS table_0
Notes
Ordering guarantees
Most DBs will persist ordering through most transforms; for example, you can
expect this result to be ordered by tenure
.
PRQL
from employees
sort tenure
derive name = f"{first_name} {last_name}"
SQL
SELECT
*,
CONCAT(first_name, ' ', last_name) AS name
FROM
employees
ORDER BY
tenure
But:
- This is an implementation detail of the DB. If there are instances where this doesn’t hold, please open an issue, and we’ll consider how to manage it.
- Some transforms which change the existence of rows, such as
join
orgroup
, won’t persist ordering; for example:
PRQL
from employees
sort tenure
join locations [==employee_id]
SQL
WITH table_1 AS (
SELECT
*
FROM
employees
ORDER BY
tenure
)
SELECT
table_0.*,
locations.*
FROM
table_1 AS table_0
JOIN locations ON table_0.employee_id = locations.employee_id
See Issue #1363 for more details.
Take
Picks rows based on their position.
take {n|range}
See Ranges for more details on how ranges work.
Examples
PRQL
from orders
sort [-value, created_at]
take 101..110
SQL
SELECT
*
FROM
orders
ORDER BY
value DESC,
created_at
LIMIT
10 OFFSET 100
Window
Applies a pipeline to segments of rows, producing one output value for every input value.
window rows:{range} range:{range} expanding:false rolling:0 {pipeline}
For each row, the segment over which the pipeline is applied is determined by one of:
rows
, which takes a range of rows relative to the current row position.0
references the current row.
range
, which takes a range of values relative to current row value.
The bounds of the range are inclusive. If a bound is omitted, the segment will extend until the edge of the table or group.
For ease of use, there are two flags that override rows
or range
:
expanding:true
is an alias forrows:..0
. A sum using this window is also known as “cumulative sum”.rolling:n
is an alias forrow:(-n+1)..0
, wheren
is an integer. This will includen
last values, including current row. An average using this window is also knows as a Simple Moving Average.
Some examples:
Expression | Meaning |
---|---|
rows:0..2 | current row plus two following |
rows:-2..0 | two preceding rows plus current row |
rolling:3 | (same as previous) |
rows:-2..4 | two preceding rows plus current row plus four following rows |
rows:..0 | all rows from the start of the table up to & including current row |
expanding:true | (same as previous) |
rows:0.. | current row and all following rows until the end of the table |
rows:.. | all rows, which same as not having window at all |
Example
PRQL
from employees
group employee_id (
sort month
window rolling:12 (
derive [trail_12_m_comp = sum paycheck]
)
)
SQL
SELECT
*,
SUM(paycheck) OVER (
PARTITION BY employee_id
ORDER BY
month ROWS BETWEEN 11 PRECEDING AND CURRENT ROW
) AS trail_12_m_comp
FROM
employees
PRQL
from orders
sort day
window rows:-3..3 (
derive [centered_weekly_average = average value]
)
group [order_month] (
sort day
window expanding:true (
derive [monthly_running_total = sum value]
)
)
SQL
SELECT
*,
AVG(value) OVER (
ORDER BY
day ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
) AS centered_weekly_average,
SUM(value) OVER (
PARTITION BY order_month
ORDER BY
day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS monthly_running_total
FROM
orders
Windowing by default
If you use window functions without window
transform, they will be applied to
the whole table. Unlike in SQL, they will remain window functions and will not
trigger aggregation.
PRQL
from employees
sort age
derive rnk = rank
SQL
SELECT
*,
RANK() OVER (
ORDER BY
age ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS rnk
FROM
employees
ORDER BY
age
You can also only apply group
:
PRQL
from employees
group department (
sort age
derive rnk = rank
)
SQL
SELECT
*,
RANK() OVER (
PARTITION BY department
ORDER BY
age ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS rnk
FROM
employees
Window functions as first class citizens
There is no limitaions where windowed expressions can be used:
PRQL
from employees
filter salary < (average salary)
SQL
WITH table_1 AS (
SELECT
*,
AVG(salary) OVER () AS _expr_0
FROM
employees
)
SELECT
*
FROM
table_1 AS table_0
WHERE
salary < _expr_0
Syntax
A summary of PRQL syntax:
Syntax | Usage | Example |
---|---|---|
| | Pipelines | from employees | select first_name |
= | Assigns & Aliases | from e = employees derive total = (sum salary) |
: | Named args & Parameters | interp low:0 1600 sat_score |
[] | Lists | select [id, amount] |
! && || == + , etc | Expressions & Operators | filter a == b + c || d >= e |
() | Parentheses | derive celsius = (fahrenheit - 32) / 1.8 |
'' , "" | Strings | derive name = 'Mary' |
` ` | Quoted identifiers | select `first name` |
# | Comments | # A comment |
@ | Dates & Times | @2021-01-01 |
== | Self-equality in join | join s=salaries [==id] |
-> | Function definitions | func add a b -> a + b |
=> | Case statement | case [a==1 => c, a==2 => d ] |
+ /- | Sort order | sort [-amount, +date] |
?? | Coalesce | amount ?? 0 |
Pipes
Pipes — the connection between transforms that make up a
pipeline — can be either line breaks or a pipe character (|
).
In almost all situations, line-breaks pipe the result of a line’s transform into
the transform on the following line. For example, the filter
transform
operates on the result of from employees
(which is just the employees
table), and the select
transform operates on the result of the filter
transform.
PRQL
from employees
filter department == "Product"
select [first_name, last_name]
SQL
SELECT
first_name,
last_name
FROM
employees
WHERE
department = 'Product'
In the place of a line-break, it’s also possible to use the |
character to
pipe results, such that this is equivalent:
PRQL
from employees | filter department == "Product" | select [first_name, last_name]
SQL
SELECT
first_name,
last_name
FROM
employees
WHERE
department = 'Product'
A line-break doesn’t create a pipeline in a couple of cases:
- within a list (e.g. the
derive
examples below), - when the following line is a new statement, which starts with a keyword of
func
,let
orfrom
.
Lists
Lists are represented with []
, and can span multiple lines. A final trailing
comma is optional.
PRQL
from numbers
derive [x = 1, y = 2]
derive [
a = x,
b = y
]
derive [
c = a,
d = b,
]
SQL
SELECT
*,
1 AS x,
2 AS y,
1 AS a,
2 AS b,
1 AS c,
2 AS d
FROM
numbers
Most transforms can take either a list or a single item, so these are equivalent:
Expressions and operators
PRQL allows expressions, like 2 + 3
or ((1 + x) * y)
made up of various
operators. In the example below, note the use of expressions to calculate the
alias circumference
and in the filter
transform.
PRQL
from foo
select [
circumference = diameter * 3.14159,
color,
]
filter circumference > 10 && color != "red"
SQL
WITH table_1 AS (
SELECT
diameter * 3.14159 AS circumference,
color
FROM
foo
)
SELECT
circumference,
color
FROM
table_1 AS table_0
WHERE
circumference > 10
AND color <> 'red'
Operator precedence
This table shows operator precedence. Use parentheses ()
to prioritize
operations and for function calls (see the discussion below.)
Group | Operators | Precedence | Associativity |
---|---|---|---|
parentheses | () | 0 | see below |
identifier dot | . | 1 | |
unary | - + ! == | 2 | |
range | .. | 3 | |
mul | * / % | 4 | left-to-right |
add | + - | 5 | left-to-right |
compare | == != <= >= < > | 6 | left-to-right |
coalesce | ?? | 7 | left-to-right |
and | && | 8 | left-to-right |
or | || | 9 | left-to-right |
function call | 10 |
Parentheses
PRQL uses parentheses ()
for several purposes:
-
Parentheses group operands to control the order of evaluation, for example:
((1 + x) * y)
-
Parentheses delimit an inner transform for the
group ()
andwindow ()
transforms. -
Parentheses delimit a minus sign of a function argument, for example:
add (-1) (-3)
-
Parentheses delimit nested function calls that contain a pipe, either the
|
symbol or a new line. “Nested” means within a transform; i.e. not just the main pipeline, for example:(column-name | in 0..20)
-
Parentheses wrap a function call that is part of a larger expression on the right-hand side of an assignment, for example:
round 0 (sum distance)
-
Parentheses are not required for expressions that do not contain function calls, for example:
foo + bar
.
Here’s a set of examples of these rules:
PRQL
from employees
# Requires parentheses, because it contains a pipe
derive is_proximate = (distance | in 0..20)
# Requires parentheses, because it's a function call
derive total_distance = (sum distance)
# `??` doesn't require parentheses, as it's not a function call
derive min_capped_distance = (min distance ?? 5)
# No parentheses needed, because no function call
derive travel_time = distance / 40
# No inner parentheses needed around `1+1` because no function call
derive distance_rounded_2_dp = (round 1+1 distance)
derive [
# Requires parentheses, because it contains a pipe
is_far = (distance | in 100..),
# The left value of the range requires parentheses,
# because of the minus sign
is_negative = (distance | in (-100..0)),
# ...this is equivalent
is_negative = (distance | in (-100)..0),
# Doesn't require parentheses, because it's in a list (confusing, see footnote)!
average_distance = average distance,
]
# Requires parentheses because of the minus sign
sort (-distance)
# A list is fine too
sort [-distance]
SQL
SELECT
*,
distance BETWEEN 0 AND 20 AS is_proximate,
SUM(distance) OVER () AS total_distance,
MIN(COALESCE(distance, 5)) OVER () AS min_capped_distance,
distance / 40 AS travel_time,
ROUND(distance, 2) AS distance_rounded_2_dp,
distance >= 100 AS is_far,
distance BETWEEN -100 AND 0,
distance BETWEEN -100 AND 0 AS is_negative,
AVG(distance) OVER () AS average_distance
FROM
employees
ORDER BY
distance DESC
Note: The total_distance
statement below generates an error because the
function is not in a list. (The PRQL compiler should display a better error
message.)
PRQL
from employees
derive total_distance = sum distance # generates the error shown below
derive other_distance = (sum distance) # works as expected
Error
Error:
╭─[:2:29]
│
2 │ derive total_distance = sum distance # generates the error shown below
│ ────┬───
│ ╰───── Unknown name distance
───╯
We’re continuing to think whether these rules can be more intuitive. We’re also planning to make the error messages much better, so the compiler can help out.
Inner Transforms
Parentheses are also used for transforms (such as group
and window
) that
pass their result to an “inner transform”. The example below applies the
aggregate
pipeline to each group of unique title
and country
values:
PRQL
from employees
group [title, country] (
aggregate [
average salary,
ct = count
]
)
SQL
SELECT
title,
country,
AVG(salary),
COUNT(*) AS ct
FROM
employees
GROUP BY
title,
country
Comments
Comments are represented by #
.
PRQL
from employees # Comment 1
# Comment 2
aggregate [average salary]
SQL
SELECT
AVG(salary)
FROM
employees
There’s no distinct multiline comment syntax.
Quoting identifiers
To use identifiers that are otherwise invalid, surround them with backticks. Depending on the dialect, these will remain as backticks or be converted to double-quotes.
PRQL
prql target:sql.mysql
from employees
select `first name`
SQL
SELECT
`first name`
FROM
employees
PRQL
prql target:sql.postgres
from employees
select `first name`
SQL
SELECT
"first name"
FROM
employees
BigQuery also uses backticks to surround project & dataset names (even if valid
identifiers) in the SELECT
statement:
PRQL
prql target:sql.bigquery
from `project-foo.dataset.table`
join `project-bar.dataset.table` [==col_bax]
SQL
SELECT
`project-foo.dataset.table`.*,
`project-bar.dataset.table`.*
FROM
`project-foo.dataset.table`
JOIN `project-bar.dataset.table` ON `project-foo.dataset.table`.col_bax = `project-bar.dataset.table`.col_bax
Quoting schemas
This is currently not great and we are working on improving it; see https://github.com/PRQL/prql/issues/1535 for progress.
If supplying a schema without a column — for example in a from
or join
transform, that also needs to be a quoted identifier:
Parameters
PRQL will retain parameters like $1
in SQL output, which can then be supplied
to the SQL query as a prepared query:
Numbers
Numbers can contain underscores between numbers; which can make reading large numbers easier:
PRQL
from numbers
select [
small = 1.000_000_1,
big = 5_000_000,
]
SQL
SELECT
1.0000001 AS small,
5000000 AS big
FROM
numbers
Keywords
At the moment, PRQL uses only four keywords:
prql
let
func
case
To use these names as columns or relations, use backticks: `case`
.
It may seem that transforms are also keywords, but they are normal function within std namespace:
PRQL
std.from my_table
std.select [from = my_table.a, take = my_table.b]
std.take 3
SQL
SELECT
a AS "from",
b AS take
FROM
my_table
LIMIT
3
Language features
The pages of this section describe how PRQL handles various aspects of the language.
Feature | Purpose |
---|---|
Coalesce | Handle potentially NULL values |
Dates & times | Handle dates and times |
Distinct | Equivalent of SQL DISTINCT |
Null handling | Handle NULL values |
Ranges | Syntax for all forms of ranges |
Regex | Handle regular expressions |
Strings | Rules for creating strings |
S-strings | Insert SQL directly into a query with an S-string |
F-strings | Combine several column’s data with F-strings |
Case | Create a new column based on the contents of other columns |
Target & Version | Specify a target SQL engine and PRQL version |
Case
PRQL uses case
for both SQL’s CASE
and IF
statements. Here’s an example:
PRQL
from employees
derive distance = case [
city == "Calgary" => 0,
city == "Edmonton" => 300,
]
SQL
SELECT
*,
CASE
WHEN city = 'Calgary' THEN 0
WHEN city = 'Edmonton' THEN 300
ELSE NULL
END AS distance
FROM
employees
If no condition is met, the value takes a null
value. To set a default, use a
true
condition:
PRQL
from employees
derive distance = case [
city == "Calgary" => 0,
city == "Edmonton" => 300,
true => "Unknown",
]
SQL
SELECT
*,
CASE
WHEN city = 'Calgary' THEN 0
WHEN city = 'Edmonton' THEN 300
ELSE 'Unknown'
END AS distance
FROM
employees
Coalesce
We can coalesce values with an ??
operator. Coalescing takes either the first
value or, if that value is null, the second value.
Dates & times
PRQL uses @
followed by a string to represent dates & times. This is less
verbose than SQL’s approach of TIMESTAMP '2004-10-19 10:23:54'
and more
explicit than SQL’s implicit option of just using a string
'2004-10-19 10:23:54'
.
Currently PRQL passes strings which can be compiled straight through to the database, and so many compatible formats string may work, but we may refine this in the future to aid in compatibility across databases. We’ll always support the canonical ISO8601 format described below.
Dates
Dates are represented by @{yyyy-mm-dd}
— a @
followed by the date format.
PRQL
from employees
derive age_at_year_end = (@2022-12-31 - dob)
SQL
SELECT
*,
DATE '2022-12-31' - dob AS age_at_year_end
FROM
employees
Times
Times are represented by @{HH:mm:ss.SSS±Z}
with any parts not supplied being
rounded to zero, including the timezone, which is represented by +HH:mm
,
-HH:mm
or Z
. This is consistent with the ISO8601 time format.
PRQL
from orders
derive should_have_shipped_today = (order_time < @08:30)
SQL
SELECT
*,
order_time < TIME '08:30' AS should_have_shipped_today
FROM
orders
Timestamps
Timestamps are represented by @{yyyy-mm-ddTHH:mm:ss.SSS±Z}
/ @{date}T{time}
,
with any time parts not supplied being rounded to zero, including the timezone,
which is represented by +HH:mm
, -HH:mm
or Z
(:
is optional). This is @
followed by the ISO8601 datetime format, which uses T
to separate date & time.
PRQL
from commits
derive first_prql_commit = @2020-01-01T13:19:55-08:00
SQL
SELECT
*,
TIMESTAMP '2020-01-01T13:19:55-08:00' AS first_prql_commit
FROM
commits
Intervals
Intervals are represented by {N}{periods}
, such as 2years
or 10minutes
,
without a space.
These aren’t the same as ISO8601, because we evaluated P3Y6M4DT12H30M5S
to
be difficult to understand, but we could support a simplified form if there’s
demand for it. We don’t currently support compound expressions, for example
2years10months
, but most DBs will allow 2years + 10months
. Please raise an
issue if this is inconvenient.
PRQL
from projects
derive first_check_in = start + 10days
SQL
SELECT
*,
start + INTERVAL 10 DAY AS first_check_in
FROM
projects
Examples
Here’s a fuller list of examples:
@20221231
is invalid — it must contain full punctuation (-
and:
),@2022-12-31
is a date@2022-12
or@2022
are invalid — SQL can’t express a month, only a date@16:54:32.123456
is a time@16:54:32
,@16:54
,@16
are all allowed, expressing@16:54:32.000000
,@16:54:00.000000
,@16:00:00.000000
respectively@2022-12-31T16:54:32.123456
is a timestamp without timezone@2022-12-31T16:54:32.123456Z
is a timestamp in UTC@2022-12-31T16:54+02
is timestamp in UTC+2@2022-12-31T16:54+02:00
and@2022-12-31T16:54+02
are datetimes in UTC+2@16:54+02
is invalid — time is always local, so it cannot have a timezone@2022-12-31+02
is invalid — date is always local, so it cannot have a timezone
Roadmap
Datetimes
Datetimes are supported by some databases (e.g. MySql, BigQuery) in addition to timestamps. When we have type annotations, these will be represented by a timestamp annotated as a datetime:
derive pi_day = @2017-03-14T15:09:26.535898<datetime>
These are some examples we can then add:
@2022-12-31T16:54<datetime>
is datetime without timezone@2022-12-31<datetime>
is forbidden — datetime must specify time@16:54<datetime>
is forbidden — datetime must specify date
Distinct
PRQL doesn’t have a specific distinct
keyword. Instead, use group
and
take 1
:
PRQL
from employees
select department
group department (
take 1
)
SQL
SELECT
DISTINCT department
FROM
employees
This also works without a linebreak:
PRQL
from employees
select department
group department (take 1)
SQL
SELECT
DISTINCT department
FROM
employees
Selecting from each group
We are be able to
select a single row from each group
by combining group
and sort
:
PRQL
# youngest employee from each department
from employees
group department (
sort age
take 1
)
SQL
WITH table_1 AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY department
ORDER BY
age
) AS _expr_0
FROM
employees
)
SELECT
*
FROM
table_1 AS table_0
WHERE
_expr_0 <= 1
Note that we can’t always compile to DISTINCT
; when the columns in the group
aren’t all the available columns, we need to use a window function:
PRQL
from employees
group [first_name, last_name] (take 1)
SQL
WITH table_1 AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY first_name, last_name) AS _expr_0
FROM
employees
)
SELECT
*
FROM
table_1 AS table_0
WHERE
_expr_0 <= 1
Roadmap
When using Postgres dialect, we are planning to compile:
# youngest employee from each department
from employees
group department (
sort age
take 1
)
… to …
SELECT DISTINCT ON (department) *
FROM employees
ORDER BY department, age
Null handling
SQL has an unconventional way of handling NULL
values, since it treats them as
unknown values. As a result, in SQL:
NULL
is not a value indicating a missing entry, but a placeholder for anything possible,NULL = NULL
evaluates toNULL
, since one cannot know if one unknown is equal to another unknown,NULL <> NULL
evaluates toNULL
, using same logic,- to check if a value is
NULL
, SQL introducesIS NULL
andIS NOT NULL
operators, DISTINCT column
may return multipleNULL
values.
For more information, check out the Postgres documentation.
PRQL, on the other hand, treats null
as a value, which means that:
null == null
evaluates totrue
,null != null
evaluates tofalse
,- distinct column cannot contain multiple
null
values.
PRQL
from employees
filter first_name == null
filter null != last_name
SQL
SELECT
*
FROM
employees
WHERE
first_name IS NULL
AND last_name IS NOT NULL
Note that PRQL doesn’t change how NULL
is compared between columns, for
example in joins. (PRQL compiles to SQL and so can’t change the behavior of the
database).
For more context or to provide feedback check out the discussion on issue #99.
Ranges
PRQL has a concise range syntax start..end
. If only one of start
& end
are
supplied, the range is open on the empty side.
Ranges can be used in filters with the in
function, with any type of literal,
including dates:
PRQL
from events
filter (created_at | in @1776-07-04..@1787-09-17)
filter (magnitude | in 50..100)
derive is_northern = (latitude | in 0..)
SQL
SELECT
*,
latitude >= 0 AS is_northern
FROM
events
WHERE
created_at BETWEEN DATE '1776-07-04' AND DATE '1787-09-17'
AND magnitude BETWEEN 50 AND 100
Like in SQL, ranges are inclusive.
As discussed in the take docs, ranges can also be used
in take
:
PRQL
from orders
sort [-value, created_at]
take 101..110
SQL
SELECT
*
FROM
orders
ORDER BY
value DESC,
created_at
LIMIT
10 OFFSET 100
Roadmap
We’d like to use ranges for other types, such as whether an object is in an array or list literal.
Regex expressions
At this time, PRQL does not have a facility for regular expression (regex) processing.
The Milestone for 0.5
talks about general plans for regex’s.
The Discussion #1123
provides a workaround function that emulates %LIKE%
.
Standard library
The standard library currently contains commonly used functions that are used in SQL. It’s not yet as broad as we’d like, and we’re very open to expanding it.
Currently s-strings are an escape-hatch for any function that isn’t in our standard library. If we find ourselves using them for something frequently, raise an issue and we’ll add it to the stdlib.
Currently the stdlib implementation doesn’t support different DB implementations itself; those need to be built deeper into the compiler. We’ll resolve this at some point. Until then, we’ll only add functions here that are broadly supported by most DBs.
Here’s the source of the current
PRQL std
:
# The PRQL standard library defines the following functions and transforms.
# The definitions are whitespace insensitive, and have this form:
#
# func my_func <return_type> param1 param2 ... -> body_expr
# where:
# * my_func is the name of the function
# * <return_type> is the type of result wrapped in "< ... >"
# * param1 is the first parameter optionally followed by a type in "< ... >"
# * param2 etc. follow the same pattern as param1
# * -> body_expr defines the function body that creates the result.
# It can be PRQL code or "null" to indicate internal compiler code
# Aggregate functions
# These return either a scalar when used within `aggregate`, or a column when used anywhere else.
func min <scalar || column> column -> null
func max <scalar || column> column -> null
func sum <scalar || column> column -> null
func avg <scalar || column> column -> null
func stddev <scalar || column> column -> null
func average <scalar || column> column -> null
func count <scalar || column> non_null:s"*" -> null
# TODO: Possibly make this into `count distinct:true` (or like `distinct:` as an
# abbreviation of that?)
func count_distinct <scalar || column> column -> null
# Window functions
func lag<column> offset column -> null
func lead<column> offset column -> null
func first<column> offset column -> null
func last<column> offset column -> null
func rank<column> -> null
func rank_dense<column> -> null
func row_number<column> -> null
# Other functions
func round<scalar> n_digits column -> null
func as<scalar> `noresolve.type` column -> null
func in<bool> pattern value -> null
# Transform type definitions
func from<table> `default_db.source`<table> -> null
func select<table> columns<column> tbl<table> -> null
func filter<table> condition<bool> tbl<table> -> null
func derive<table> columns<column> tbl<table> -> null
func aggregate<table> a<column> tbl<table> -> null
func sort<table> by tbl<table> -> null
func take<table> expr tbl<table> -> null
func join<table> `default_db.with`<table> filter `noresolve.side`:inner tbl<table> -> null
func group<table> by pipeline tbl<table> -> null
func window<table> rows:0..0 range:0..0 expanding:false rolling:0 pipeline tbl<table> -> null
func append<table> `default_db.bottom`<table> top<table> -> null
func intersect<table> `default_db.bottom`<table> top<table> -> (
from t = _param.top
join b = _param.bottom (all (map _eq (zip t.* b.*)))
select t.*
)
func remove<table> `default_db.bottom`<table> top<table> -> (
from t = _param.top
join side:left b = _param.bottom (all (map _eq (zip t.* b.*)))
filter (all (map _is_null b.*))
select t.*
)
func loop<table> pipeline top<table> -> null
# List functions
func all<bool> list<list> -> null
func map<list> fn list<list> -> null
func zip<list> a<list> b<list> -> null
func _eq<list> a<list> -> null
func _is_null a -> _param.a == null
# Misc
func from_text<table> input<text> `noresolve.format`:csv -> null
# String functions
func lower <text> column -> null
func upper <text> column -> null
# type primitives
type int
type float
type bool
type text
type date
type time
type timestamp
type table
type column
type list
type scalar
# Source-reading functions, primarily for DuckDB
func read_parquet<table> source<text> -> s"SELECT * FROM read_parquet({source})"
func read_csv<table> source<text> -> s"SELECT * FROM read_csv_auto({source})"
And a couple of examples:
PRQL
from employees
derive [
gross_salary = (salary + payroll_tax | as int),
gross_salary_rounded = (gross_salary | round 0),
time = s"NOW()", # an s-string, given no `now` function exists in PRQL
]
SQL
SELECT
*,
CAST(salary + payroll_tax AS int) AS gross_salary,
ROUND(CAST(salary + payroll_tax AS int), 0) AS gross_salary_rounded,
NOW() AS time
FROM
employees
from_text
It’s often useful to make a small table inline, for example when exploring how a database will evaluate an expression, or to have a small lookup table inline. This can be quite verbose in SQL.
PRQL uses from_text
for this.
It accepts a few formats:
format:csv
parses CSV (default),format:json
parses either:- an array of objects each of which represents a row, or
- an object with fields
columns
&data
, wherecolumns
take an array of column names anddata
takes an array of arrays.
PRQL
from_text """
a,b,c
1,2,3
4,5,6
"""
derive [
d = b + c,
answer = 20 * 2 + 2,
]
SQL
WITH table_0 AS (
SELECT
'1' AS a,
'2' AS b,
'3' AS c
UNION
ALL
SELECT
'4' AS a,
'5' AS b,
'6' AS c
)
SELECT
a,
b,
c,
b + c AS d,
42 AS answer
FROM
table_0 AS table_1
An example of adding a small lookup table:
PRQL
let temp_format_lookup = from_text format:csv """
country_code,format
uk,C
us,F
lr,F
de,C
"""
from temperatures
join temp_format_lookup [==country_code]
SQL
WITH table_0 AS (
SELECT
'uk' AS country_code,
'C' AS format
UNION
ALL
SELECT
'us' AS country_code,
'F' AS format
UNION
ALL
SELECT
'lr' AS country_code,
'F' AS format
UNION
ALL
SELECT
'de' AS country_code,
'C' AS format
),
temp_format_lookup AS (
SELECT
country_code,
format
FROM
table_0 AS table_1
)
SELECT
temperatures.*,
temp_format_lookup.country_code,
temp_format_lookup.format
FROM
temperatures
JOIN temp_format_lookup ON temperatures.country_code = temp_format_lookup.country_code
And JSON:
PRQL
let x = from_text format:json """{
"columns": ["a", "b", "c"],
"data": [
[1, "x", false],
[4, "y", null]
]
}"""
let y = from_text format:json """
[
{"a": 1, "m": "5"},
{"a": 4, "n": "6"}
]
"""
from x | join y [==a]
SQL
WITH table_0 AS (
SELECT
1 AS a,
'x' AS b,
false AS c
UNION
ALL
SELECT
4 AS a,
'y' AS b,
NULL AS c
),
x AS (
SELECT
a,
b,
c
FROM
table_0 AS table_1
),
table_2 AS (
SELECT
1 AS a,
'5' AS m
UNION
ALL
SELECT
4 AS a,
NULL AS m
),
y AS (
SELECT
a,
m
FROM
table_2 AS table_3
)
SELECT
x.a,
x.b,
x.c,
y.a,
y.m
FROM
x
JOIN y ON x.a = y.a
Loop
Experimental
loop {step_function} {initial_relation}
Iteratively applies step
function to initial
relation until the step
returns an empty table. Returns a relation that contains rows of initial
relation and all intermediate relations.
This behavior could be expressed with following pseudo-code:
def loop(step, initial):
result = []
current = initial
while current is not empty:
result = append(result, current)
current = step(current)
return result
Examples
PRQL
from_text format:json '[{"n": 1 }]'
loop (
filter n<4
select n = n+1
)
# returns [1, 2, 3, 4]
SQL
WITH table_0 AS (
SELECT
1 AS n
),
table_4 AS (
WITH RECURSIVE _loop AS (
SELECT
n
FROM
table_0 AS table_1
UNION
ALL
SELECT
n + 1
FROM
_loop AS table_2
WHERE
n < 4
)
SELECT
*
FROM
_loop
)
SELECT
n
FROM
table_4 AS table_3
The behavior of WITH RECURSIVE
may depend on the database configuration in MySQL.
The compiler assumes the behavior described by
the Postgres documentation
and will not produce correct results for
alternative configurations of MySQL.
Currently, loop
may produce references to the recursive CTE in sub-queries,
which is not supported by some database engines, e.g. SQLite. For now, we suggest step
functions are kept simple enough to fit into a single SELECT statement.
Reading files
We have a couple of functions named read_*
, which ask the DB to read files,
designed for DuckDB:
PRQL
from (read_parquet 'artists.parquet')
join (read_csv 'albums.csv') [==track_id]
SQL
WITH table_0 AS (
SELECT
*
FROM
read_parquet('artists.parquet')
),
table_1 AS (
SELECT
*
FROM
read_csv_auto('albums.csv')
)
SELECT
table_2.*,
table_3.*
FROM
table_0 AS table_2
JOIN table_1 AS table_3 ON table_2.track_id = table_3.track_id
These don’t currently have all the DuckDB options. If those would be helpful, please log an issue and it’s a fairly easy addition.
Strings
Strings in PRQL can use either single or double quotes:
To quote a string containing quotes, either use the “other” type of quote, or use 3, 4, 5 or 6 quotes, and close with the same number.
PRQL
from my_table
select x = """I said "hello world"!"""
SQL
SELECT
'I said "hello world"!' AS x
FROM
my_table
PRQL
from my_table
select x = """""I said """hello world"""!"""""
SQL
SELECT
'I said """hello world"""!' AS x
FROM
my_table
Strings can also contain any escape defined by JSON standard.
PRQL
from my_table
select x = "\t\tline ends here\n \\ "
SQL
SELECT
' line ends here
\ ' AS x
FROM
my_table
F-Strings and S-Strings
These special case strings can be used to:
F-strings - Build up a new string from a set of columns or values
S-strings - Insert SQL statements directly into the query. Use when PRQL doesn’t have an equivalent facility.
Currently PRQL allows multiline strings with either a single character or multiple character quotes. This may change for strings using a single character quote in future versions.
S-strings
An s-string inserts SQL directly, as an escape hatch when there’s something that
PRQL doesn’t yet implement. For example, there’s no version()
function in SQL
that returns the Postgres version, so if we want to use that, we use an
s-string:
PRQL
from my_table
select db_version = s"version()"
SQL
SELECT
version() AS db_version
FROM
my_table
Embed a column name in an s-string using braces. For example, PRQL’s standard
library defines the average
function as:
func average column -> s"AVG({column})"
So this compiles using the function:
Here’s an example of a more involved use of an s-string:
PRQL
from de=dept_emp
join s=salaries side:left [
(s.emp_no == de.emp_no),
s"""({s.from_date}, {s.to_date})
OVERLAPS
({de.from_date}, {de.to_date})"""
]
SQL
SELECT
de.*,
s.*
FROM
dept_emp AS de
LEFT JOIN salaries AS s ON s.emp_no = de.emp_no
AND (s.from_date, s.to_date) OVERLAPS (de.from_date, de.to_date)
For those who have used Python, s-strings are similar to Python’s f-strings, but
the result is SQL code, rather than a string literal. For example, a Python
f-string of f"average({col})"
would produce "average(salary)"
, with quotes;
while in PRQL, s"average({col})"
produces average(salary)
, without quotes.
Note that interpolations can only contain plain variable names and not whole expression like Python.
We can also use s-strings to produce a full table:
PRQL
from s"SELECT DISTINCT ON first_name, id, age FROM employees ORDER BY age ASC"
join s = s"SELECT * FROM salaries" [==id]
SQL
WITH table_0 AS (
SELECT
DISTINCT ON first_name,
id,
age
FROM
employees
ORDER BY
age ASC
),
table_1 AS (
SELECT
*
FROM
salaries
)
SELECT
table_2.*,
table_3.*
FROM
table_0 AS table_2
JOIN table_1 AS table_3 ON table_2.id = table_3.id
S-strings in user code are intended as an escape-hatch for an unimplemented feature. If we often need s-strings to express something, that’s a sign we should implement it in PRQL or PRQL’s stdlib.
Braces
To output braces from an s-string, use double braces:
PRQL
from employees
derive [
has_valid_title = s"regexp_contains(title, '([a-z0-9]*-){{2,}}')"
]
SQL
SELECT
*,
regexp_contains(title, '([a-z0-9]*-){2,}') AS has_valid_title
FROM
employees
Precedence
The PRQL compiler simply places a literal copy of each variable into the resulting string, which means we may get surprising behavior when the variable is has multiple terms and the s-string isn’t parenthesized.
In this toy example, the salary + benefits / 365
gets precedence wrong:
PRQL
from employees
derive [
gross_salary = salary + benefits,
daily_rate = s"{gross_salary} / 365"
]
SQL
SELECT
*,
salary + benefits AS gross_salary,
salary + benefits / 365 AS daily_rate
FROM
employees
Instead, put the denominator {gross_salary}
in parentheses:
PRQL
from employees
derive [
gross_salary = salary + benefits,
daily_rate = s"({gross_salary}) / 365"
]
SQL
SELECT
*,
salary + benefits AS gross_salary,
(salary + benefits) / 365 AS daily_rate
FROM
employees
F-strings
F-strings are a readable approach to building new strings from existing strings. Currently PRQL supports this for concatenating strings:
PRQL
from employees
select full_name = f"{first_name} {last_name}"
SQL
SELECT
CONCAT(first_name, ' ', last_name) AS full_name
FROM
employees
This can be much easier to read for longer strings, relative to the SQL approach:
PRQL
from web
select url = f"http{tls}://www.{domain}.{tld}/{page}"
SQL
SELECT
CONCAT(
'http',
tls,
'://www.',
domain,
'.',
tld,
'/',
page
) AS url
FROM
web
Note that interpolations can only contain plain variable names and not whole expression like Python.
Roadmap
In the future, f-strings may incorporate string formatting such as datetimes, numbers, and padding. If there’s a feature that would be helpful, please post an issue.
Target & Version
Target dialect
PRQL allows specifying a target dialect at the top of the query, which allows PRQL to compile to a database-specific SQL flavor.
Examples
PRQL
prql target:sql.postgres
from employees
sort age
take 10
SQL
SELECT
*
FROM
employees
ORDER BY
age
LIMIT
10
PRQL
prql target:sql.mssql
from employees
sort age
take 10
SQL
SELECT
TOP (10) *
FROM
employees
ORDER BY
age
Supported dialects
Note that dialect support is early — most differences are not implemented, and
most dialects’ implementations are identical to generic
’s. Contributions are
very welcome.
sql.ansi
sql.bigquery
sql.clickhouse
sql.generic
sql.hive
sql.mssql
sql.mysql
sql.postgres
sql.sqlite
sql.snowflake
sql.duckdb
Version
PRQL allows specifying a version of the language in the PRQL header, like:
This has two roles, one of which is implemented:
- The compiler will raise an error if the compiler is older than the query version. This prevents confusing errors when queries use newer features of the language but the compiler hasn’t yet been upgraded.
- The compiler will compile for the major version of the query. This allows the language to evolve without breaking existing queries, or forcing multiple installations of the compiler. This isn’t yet implemented, but is a gating feature for PRQL 1.0.
Bindings
PRQL has bindings for many languages. These include:
We have three tiers of bindings:
- Supported
- Unsupported
- Nascent
Supported
Supported bindings require:
- A maintainer.
- Implementations of the core compile functions.
- Test coverage for these functions.
- A published package to the language’s standard package repository.
- A script in
Taskfile.yml
to bootstrap a development environment. - Any dev tools, such as a linter & formatter, in pre-commit or MegaLinter.
The currently supported bindings are:
Most of these are in the main PRQL repo, and we gate any changes to the compiler’s API on compatible changes to the bindings.
Unsupported
Unsupported bindings work, but don’t fulfil all of the above criteria. We don’t gate changes to the compiler’s API. If they stop working, we’ll demote them to nascent.
Nascent
Nascent bindings are in development, and may not yet fully work.
prql-dotnet
prql-net
offers PRQL bindings for .NET bindings as a netstandard2.0
library.
It provides the PrqlCompiler
class which contains the ToJson
and ToSql
static methods.
It’s still at an early stage, and isn’t published to NuGet. Contributions are welcome.
Installation
Make sure that libprql_lib.so
(Linux), libprql_lib.dylib
(macOS) or
libprql_lib.dll
(Windows) is in your project’s bin
directory together with
PrqlCompiler.dll
and the rest of your project’s compiled files. I.e.
{your_project}/bin/Debug/net7.0/
.
The libprql_lib
library gets dynamically imported at runtime.
Usage
using Prql.Compiler;
var options = new PrqlCompilerOptions
{
Format = false,
SignatureComment = false,
};
var sql = PrqlCompiler.Compile("from employees", options);
Console.WriteLine(sql);
TODO
This is currently at 0.1.0 because we’re waiting to update prql-lib for the latest API. When we’ve done that, we can match the version here with the broader PRQL version.
PRQL
PRQL bindings for Elixir.
Installation
def deps do
[
{:prql, "~> 0.1.0"}
]
end
Basic Usage
iex> PRQL.compile("from customers")
{:ok, "SELECT\n *\nFROM\n customers\n\n-- Generated by PRQL compiler version 0.3.1 (https://prql-lang.org)\n"}
iex> PRQL.compile("from customers\ntake 10", dialect: :mssql)
{:ok, "SELECT\n TOP (10) *\nFROM\n customers\n\n-- Generated by PRQL compiler version 0.3.1 (https://prql-lang.org)\n"}
Development
We are in the early stages of developing Elixir bindings.
We’re using Rustler
to provide Rust bindings for prql-compiler
.
Currently using the bindings in an Elixir project requires compiling the Rust crate from this repo:
- Install dependencies with
mix deps.get
- Compile project
mix compile
- Run tests
mix test
Future work includes publishing pre-compiled artifacts, so Elixir projects can run PRQL without needing a Rust toolchain.
Mac
We currently don’t enable compilation for Mac. This is possible to enable, but
causes some issues with cargo’s compilation cache. Briefly: it requires
RUST_FLAGS
to be set, and because of
https://github.com/rust-lang/cargo/issues/8716 &
https://github.com/rust-lang/cargo/issues/8899, any compilation of a different
target will bust the cache.
The possible future workarounds include:
- Passing
--target=aarch64-apple-darwin
to every cargo call, which is inconvenient and can be difficult in some situations; e.g. Rust Analyzer. This disables passingRUST_FLAGS
(I’m actually unclear whyprql-elixir
builds successfully in that case…) - Directing other cargo calls to different paths, such as
/target-ra
for Rust Analyzer and/target-book
for the book building. But onecargo build
from the terminal without either thetarget
ortarget_dir
specified will bust the cache! - Never compiling for other targets. But our standard tests run for
--target=wasm32-unknown-unknown
, so this requires refraining from using them. - Removing
prql-elixir
from our workspace, so thatcargo
commands in the PRQL workspace don’t require rust flags. This would work well, but means we need separate test coverage for this crate, which adds some weight to the tests.
If prql-elixir
becomes more used (for example, we start publishing to Hex, or
Mac developers want to work on it), then we can re-enable and deal with the
caching issues. We can also re-enable them if the cargo
issue is resolved.
To test on Mac temporarily — for example if there’s an error in GHA and we’re on
a Mac locally — apply a diff like this, and then run cargo build
from the
prql-elixir
path, which will enable the local
.cargo/config.toml
). (We could also make a
feature like elixir-mac
which enabled building on Mac).
diff --git a/bindings/prql-elixir/native/prql/Cargo.toml b/bindings/prql-elixir/native/prql/Cargo.toml
index 7194ca4f..9c7240ff 100644
--- a/bindings/prql-elixir/native/prql/Cargo.toml
+++ b/bindings/prql-elixir/native/prql/Cargo.toml
@@ -19,5 +19,5 @@ path = "src/lib.rs"
prql-compiler = {path = "../../../../prql-compiler", default-features = false, version = "0.6.1"}
# See Readme for details on Mac
-[target.'cfg(not(any(target_family="wasm", target_os = "macos")))'.dependencies]
+# [target.'cfg(not(any(target_family="wasm", target_os = "macos")))'.dependencies]
rustler = "0.27.0"
diff --git a/bindings/prql-elixir/native/prql/src/lib.rs b/bindings/prql-elixir/native/prql/src/lib.rs
index 2c5c8f27..68e77217 100644
--- a/bindings/prql-elixir/native/prql/src/lib.rs
+++ b/bindings/prql-elixir/native/prql/src/lib.rs
@@ -1,5 +1,5 @@
// See Readme for more information on Mac compiling
-#![cfg(not(target_os = "macos"))]
+// #![cfg(not(target_os = "macos"))]
// These bindings aren't relevant on wasm
#![cfg(not(target_family = "wasm"))]
// TODO: unclear why we need this `allow`; it's required in `CompileOptions`,
prql-java
prql-java
offers Java bindings to the prql-compiler
Rust library. It exposes
a Java native method public static native String toSql(String query)
.
It’s still at an early stage, and currently requires compiling locally, and isn’t published to Maven. Contributions are welcome.
Installation
<dependency>
<groupId>org.prqllang</groupId>
<artifactId>prql-java</artifactId>
<version>${PRQL_VERSION}</version>
</dependency>
Usage
import org.prqllang.prql4j.PrqlCompiler;
class Main {
public static void main(String[] args) {
String sql = PrqlCompiler.toSql("from table");
System.out.println(sql);
}
}
prql-js
JavaScript bindings for prql-compiler
.
Installation
npm install prql-js
Usage
Currently these functions are exposed
function compile(prql_query: string, options?: CompileOptions): string;
function prql_to_pl(prql_query: string): string;
function pl_to_rq(pl_json: string): string;
function rq_to_sql(rq_json: string): string;
From Node.js
Direct usage
const prqljs = require("prql-js");
const sql = prqljs.compile(`from employees | select first_name`);
console.log(sql);
Options
const opts = new prql.CompileOptions();
opts.target = "sql.mssql";
opts.format = false;
opts.signature_comment = false;
const sql = prqljs.compile(`from employees | take 10`, opts);
console.log(sql);
Template literal
const prqljs = require("prql-js");
const prql = (string) => prqljs.compile(string[0] || "");
const sql = prql`from employees | select first_name`;
console.log(sql);
Template literal with newlines
const prqljs = require("prql-js");
const prql = (string) => prqljs.compile(string[0] || "");
const sql = prql`
from employees
select first_name
`;
console.log(sql);
From a browser
<html>
<head>
<script src="./node_modules/prql-js/dist/web/prql_js.js"></script>
<script>
const { compile } = wasm_bindgen;
async function run() {
await wasm_bindgen("./node_modules/prql-js/dist/web/prql_js_bg.wasm");
const sql = compile("from employees | select first_name");
console.log(sql);
}
run();
</script>
</head>
<body></body>
</html>
From a framework or a bundler
import compile from "prql-js/dist/bundler";
const sql = compile(`from employees | select first_name`);
console.log(sql);
Errors
Errors are returned as following object, serialized as a JSON array:
interface ErrorMessage {
/// Plain text of the error
reason: string;
/// A list of suggestions of how to fix the error
hint: string | null;
/// Character offset of error origin within a source file
span: [number, number] | null;
/// Annotated code, containing cause and hints.
display: string | null;
/// Line and column number of error origin within a source file
location: SourceLocation | null;
}
/// Location within the source file.
/// Tuples contain:
/// - line number (0-based),
/// - column number within that line (0-based),
interface SourceLocation {
start: [number, number];
end: [number, number];
}
These errors can be caught as such:
try {
const sql = prqlJs.compile(`from employees | foo first_name`);
} catch (error) {
const errorMessages = JSON.parse(error.message).inner;
console.log(errorMessages[0].display);
console.log(errorMessages[0].location);
}
Development
Build:
npm run build
This builds Node, bundler and web packages in the dist
path.
Test:
npm test
Notes
- This uses
wasm-pack
to generate bindings1. - We’ve added an
npm
layer on top of the usual approach of just usingwasm-pack
, so we can distribute a single package with targets ofnode
,bundler
andno-modules
— somewhat inverting the approach recommended bywasm-pack
. The build instruction goes in abuild
script, rather than apack
script.
Though we would be very open to other approaches, given wasm-pack does not
seem maintained, and we're eliding many of its features to build for three
targets. See <https://github.com/PRQL/prql/issues/1836> for more details.
prql-python
prql-python
offers Rust bindings to the prql-compiler
Rust library. It
exposes a Python method compile(query: str) -> str
.
This is consumed by PyPrql & dbt-prql.
The crate is not published to crates.io; only to PyPI at https://pypi.org/project/prql-python/.
Installation
pip install prql-python
Usage
import prql_python as prql
prql_query = """
from employees
join salaries [==emp_id]
group [employees.dept_id, employees.gender] (
aggregate [
avg_salary = average salaries.salary
]
)
"""
options = prql.CompileOptions(
format=True, signature_comment=True, target="sql.postgres"
)
sql = prql.compile(prql_query)
sql_postgres = prql.compile(prql_query, options)
Relies on pyo3 for all the magic.
R (prqlr)
R bindings for prql-compiler
.
prqlr
also includes knitr
(R Markdown and Quarto) integration, which allows
us to easily create documents with the PRQL conversion results embedded in.
Check out https://eitsupi.github.io/prqlr/ for more context.
prqlr
is generously maintained by @eitsupi in the
eitsupi/prqlr repo.
Installation
install.packages("prqlr")
Usage
library(prqlr)
"
from employees
join salaries [emp_id]
group [dept_id, gender] (
aggregate [
avg_salary = average salary
]
)
" |>
prql_compile()
PRQL compiler
prql-compiler
contains the implementation of PRQL’s compiler, written in Rust.
For more on PRQL, check out the PRQL website or the PRQL repo.
For more usage examples and the library documentation, check out the
prql-compiler
documentation.
Installation
cargo add prql-compiler
Examples
Compile a PRQL string to a SQLite dialect string.
src/main.rs
#![allow(unused)] fn main() { use prql_compiler::{compile, Options, Target, sql::Dialect}; let prql = "from employees | select [name, age]"; let opts = &Options { format: false, target: Target::Sql(Some(Dialect::SQLite)), signature_comment: false, color: false, }; let sql = compile(&prql, opts).unwrap(); assert_eq!("SELECT name, age FROM employees", sql); }
Terminology
Relation: Standard definition of a relation in context of databases:
- An ordered set of tuples of form
(d_0, d_1, d_2, ...)
. - Set of all
d_x
is called an attribute or a column. It has a name and a type domainD_x
.
Frame: descriptor of a relation. Contains list of columns (with names and types). Does not contain data.
Table: persistently stored relation. Some uses of this term actually mean to say “relation”.
Integrations
PRQL is building integrations with lots of external tools, including:
dbt-prql
Original docs at https://github.com/prql/dbt-prql
As of Feb 2023, we’re again considering how to best integrate with
dbt more closely. Ideally a file with a .prql
extension will just work™.
If you’re interested in this, subscribe or 👍 to https://github.com/dbt-labs/dbt-core/pull/5982.
The original plugin is hosted at https://github.com/prql/dbt-prql, but only works with a minority of dialects, and isn’t a focus of development at the moment.
dbt-prql allows writing PRQL in dbt models. This combines the benefits of PRQL’s power & simplicity within queries, with dbt’s version control, lineage & testing across queries.
Once dbt-prql
in installed, dbt commands compile PRQL between {% prql %}
&
{% endprql %}
Jinja tags to SQL as part of dbt’s compilation. No additional
config is required.
Examples
Simple example
{% prql %}
from employees
filter (age | in 20..30)
{% endprql %}
…would appear to dbt as:
SELECT
employees.*
FROM
employees
WHERE
age BETWEEN 20
AND 30
Less simple example
{% prql %}
from in_process = {{ source('salesforce', 'in_process') }}
derive expected_sales = probability * value
join {{ ref('team', 'team_sales') }} [name]
group name (
aggregate (sum expected_sales)
)
{% endprql %}
…would appear to dbt as:
SELECT
name,
sum(in_process.probability * in_process.value) AS expected_sales
FROM
{{ source('salesforce', 'in_process') }} AS in_process
JOIN {{ ref('team', 'team_sales') }} USING(name)
GROUP BY
name
…and then dbt will compile the source
and ref
s to a full SQL query.
Replacing macros
dbt’s use of macros has saved many of us many lines of code, and even saved some
people some time. But imperatively programming text generation with code like
if not loop.last
is not our highest calling. It’s the “necessary” part rather
than beautiful part of dbt.
Here’s the canonical example of macros in the dbt documentation:
{%- set payment_methods = ["bank_transfer", "credit_card", "gift_card"] -%}
select
order_id,
{%- for payment_method in payment_methods %}
sum(case when payment_method = '{{payment_method}}' then amount end) as {{payment_method}}_amount
{%- if not loop.last %},{% endif -%}
{% endfor %}
from {{ ref('raw_payments') }}
group by 1
Here’s that model using PRQL1, including the prql Jinja tags.
{% prql %}
func filter_amount method -> s"sum(case when payment_method = '{method}' then amount end) as {method}_amount"
from {{ ref('raw_payments') }}
group order_id (
aggregate [
filter_amount bank_transfer,
filter_amount credit_card,
filter_amount gift_card,
]
)
{% endprql %}
As well the query being simpler in its final form, writing in PRQL also gives us live feedback around any errors, on every keystroke. Though there’s much more to come, check out the current version on PRQL Playground.
Note that when <https://github.com/prql/prql/issues/82> is implemented, we
can dispense with the s-string, and optionally dispense with the function.
```elm
from {{ ref('raw_payments') }}
group order_id (
aggregate [
bank_transfer_amount = amount | filter payment_method == 'bank' | sum,
credit_card_amount = amount | filter payment_method == 'credit_card' | sum,
gift_amount = amount | filter payment_method == 'gift_card' | sum,
]
)
```
or
```elm
func filter_amount method -> amount | filter payment_method == method | sum
from {{ ref('raw_payments') }}
group order_id (
aggregate [
bank_transfer_amount = filter_amount 'bank'
credit_card_amount = filter_amount 'credit_card'
gift_amount = filter_amount 'gift_card'
]
)
```
What it does
When dbt compiles models to SQL queries:
- Any text in a dbt model between
{% prql %}
and{% endprql %}
tags is compiled from PRQL to SQL before being passed to dbt. - The PRQL compiler passes text that’s containing
{{
&}}
through to dbt without modification, which allows us to embed Jinja expressions in PRQL. (This was added to PRQL specifically for this use-case.) - dbt will then compile the resulting model into its final form of raw SQL, and dispatch it to the database, as per usual.
There’s no config needed in the dbt project; this works automatically on any dbt
command (e.g. dbt run
) assuming dbt-prql
is installed.
Installation
pip install dbt-prql
Current state
Currently this is new, but fairly feature-complete. It’s enthusiastically supported — if there are any problems, please open an issue.
How does it work?
It’s some dark magic, unfortunately.
dbt doesn’t allow adding behavior beyond the database adapters (e.g.
dbt-bigquery
) or Jinja-only plugins (e.g. dbt-utils
). So this library hacks
the Python import system to monkeypatch dbt’s Jinja environment with an
additional Jinja extension on Python’s startup2.
Thanks to
[mtkennerly/poetry-dynamic-versioning](https://github.com/mtkennerly/poetry-dynamic-versioning)
for the technique.
This approach was discussed with the dbt team here and here.
This isn’t stable between dbt versions, since it relies on internal dbt APIs. The technique is also normatively bad — it runs a few lines of code every time the Python interpreter starts — whose errors could lead to very confusing bugs beyond the domain of the problem (though in the case of this library, it’s small and well-constructed™).
If there’s ever any concern that the library might be causing a problem, just
set an environment variable DBT_PRQL_DISABLE=1
, and this library won’t
monkeypatch anything. It’s also fully uninstallable with
pip uninstall dbt-prql
.
Roadmap
Open to ideas; at the moment it’s fairly feature-complete. If we were unconstrained in dbt functionality:
- If dbt allowed for external plugins, we’d enthusiastically move to that.
- We’d love to have this work on
.prql
files without the{% prql %}
tags; but with the current approach that would require quite invasive monkeypatching. - If we could add the dialect in automatically (i.e.
prql dialect:snowflake
), that would save a line per model. - If we could upstream this into dbt-core, that would be awesome. It may be on PRQL to demonstrate its staying power before that, though.
We may move this library to the https://github.com/prql/PyPrql or https://github.com/prql/prql repos. We’d prefer to keep it as its own package given the hackery above, but there’s no need for it to be its own repo.
Jupyter
pyprql contains pyprql.magic
, a thin
wrapper of JupySQL
’s SQL IPython magics.
This allows us to run PRQL interactively on Jupyter/IPython.
Check out https://pyprql.readthedocs.io/ for more context.
Installation
pip install pyprql
Usage
When installing pyprql, the duckdb-engine package is also installed with it, so we can start using PRQL immediately to query CSV and Parquet files.
For example, running the example from the JupySQL documentation on IPython:
In [1]: %load_ext pyprql.magic
In [2]: !curl -sL https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv -o penguins.csv
In [3]: %prql duckdb://
In [4]: %prql from `penguins.csv` | take 3
Out[4]:
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181 3750 MALE
1 Adelie Torgersen 39.5 17.4 186 3800 FEMALE
2 Adelie Torgersen 40.3 18.0 195 3250 FEMALE
In [5]: %%prql
...: from `penguins.csv`
...: filter bill_length_mm > 40
...: take 3
...:
...:
Out[5]:
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 40.3 18.0 195 3250 FEMALE
1 Adelie Torgersen 42.0 20.2 190 4250 None
2 Adelie Torgersen 41.1 17.6 182 3200 FEMALE
DuckDB
There’s an experimental DuckDB extension from @ywelsch at ywelsch/duckdb-prql.
Check out the Readme for more details.
Prefect
Because Prefect is in native Python, it’s extremely easy to integrate with PRQL.
With a Postgres Task, replace:
PostgresExecute.run(..., query=sql)
…with…
PostgresExecute.run(..., query=prql_python.compile(prql))
We’re big fans of Prefect, and if there is anything that would make the integration easier, please open an issue.
Visual Studio Code extension
PRQL has a Visual Studio Code extension that compiles a PRQL query in a VS Code editor and displays the resulting SQL code in a second pane on the side. This is very handy for editing, saving, and reusing PRQL queries in VS Code.
To install the VS Code extension, open VS Code and type Ctl-Shift-P (Cmd-Shift-P
on a Mac) and type PRQL
. Install the extension as usual.
Repo for the PRQL VS Code extension
Rill
PRQL has had some work to integrate with Rill. See the Rill ssues for more details.
Examples
These examples are rewritten from other languages such as SQL. They try to express real-world problems in PRQL, covering most of the language features. We are looking for different use-cases of data transformation, be it database queries, semantic business modeling or data cleaning.
If you want to help, translate some of your queries to PRQL and open a PR to add them here!
PRQL
from employees
filter country == "USA" # Each line transforms the previous result.
derive [ # This adds columns / variables.
gross_salary = salary + payroll_tax,
gross_cost = gross_salary + benefits_cost # Variables can use other variables.
]
filter gross_cost > 0
group [title, country] ( # For each group use a nested pipeline
aggregate [ # Aggregate each group to a single row
average salary,
average gross_salary,
sum salary,
sum gross_salary,
average gross_cost,
sum_gross_cost = sum gross_cost,
ct = count,
]
)
sort sum_gross_cost
filter ct > 200
take 20
SQL
WITH table_1 AS (
SELECT
title,
country,
salary + payroll_tax + benefits_cost AS _expr_0,
salary + payroll_tax AS _expr_1,
salary
FROM
employees
WHERE
country = 'USA'
)
SELECT
title,
country,
AVG(salary),
AVG(_expr_1),
SUM(salary),
SUM(_expr_1),
AVG(_expr_0),
SUM(_expr_0) AS sum_gross_cost,
COUNT(*) AS ct
FROM
table_1 AS table_0
WHERE
_expr_0 > 0
GROUP BY
title,
country
HAVING
COUNT(*) > 200
ORDER BY
sum_gross_cost
LIMIT
20
PRQL
from employees
group [emp_no] (
aggregate [
emp_salary = average salary # average salary resolves to "AVG(salary)" (from stdlib)
]
)
join titles [==emp_no]
group [title] (
aggregate [
avg_salary = average emp_salary
]
)
select salary_k = avg_salary / 1000 # avg_salary should resolve to "AVG(emp_salary)"
take 10 # induces new SELECT
derive salary = salary_k * 1000 # salary_k should not resolve to "avg_salary / 1000"
SQL
WITH table_3 AS (
SELECT
AVG(salary) AS _expr_1,
emp_no
FROM
employees
GROUP BY
emp_no
),
table_1 AS (
SELECT
AVG(table_2._expr_1) AS _expr_0
FROM
table_3 AS table_2
JOIN titles ON table_2.emp_no = titles.emp_no
GROUP BY
titles.title
)
SELECT
_expr_0 / 1000 AS salary_k,
_expr_0 / 1000 * 1000 AS salary
FROM
table_1 AS table_0
LIMIT
10
Single item is coerced into a list
Same as above but with salary
in a list:
Multiple items
PRQL
from employees
derive [
gross_salary = salary + payroll_tax,
gross_cost = gross_salary + benefits_cost
]
SQL
SELECT
*,
salary + payroll_tax AS gross_salary,
salary + payroll_tax + benefits_cost AS gross_cost
FROM
employees
Same as above but split into two lines:
PRQL
from employees
derive gross_salary = salary + payroll_tax
derive gross_cost = gross_salary + benefits_cost
SQL
SELECT
*,
salary + payroll_tax AS gross_salary,
salary + payroll_tax + benefits_cost AS gross_cost
FROM
employees
PRQL
let newest_employees = (
from employees
sort tenure
take 50
)
let average_salaries = (
from salaries
group country (
aggregate average_country_salary = (average salary)
)
)
from newest_employees
join average_salaries [==country]
select [name, salary, average_country_salary]
SQL
WITH newest_employees AS (
SELECT
*
FROM
employees
ORDER BY
tenure
LIMIT
50
), average_salaries AS (
SELECT
country,
AVG(salary) AS average_country_salary
FROM
salaries
GROUP BY
country
)
SELECT
newest_employees.name,
newest_employees.salary,
average_salaries.average_country_salary
FROM
newest_employees
JOIN average_salaries ON newest_employees.country = average_salaries.country
Employees
These are homework tasks on employees database.
Clone and init the database (requires a local PostgreSQL instance):
psql -U postgres -c 'CREATE DATABASE employees;'
git clone https://github.com/vrajmohan/pgsql-sample-data.git
psql -U postgres -d employees -f pgsql-sample-data/employee/employees.dump
Execute a PRQL query:
cd prql-compiler
cargo run compile examples/employees/average-title-salary.prql | psql -U postgres -d employees
Task 1
rank the employee titles according to the average salary for each department.
My solution:
- for each employee, find their average salary,
- join employees with their departments and titles (duplicating employees for each of their titles and departments)
- group by department and title, aggregating average salary
- join with department to get department name
PRQL
from salaries
group [emp_no] (
aggregate [emp_salary = average salary]
)
join t=titles [==emp_no]
join dept_emp side:left [==emp_no]
group [dept_emp.dept_no, t.title] (
aggregate [avg_salary = average emp_salary]
)
join departments [==dept_no]
select [dept_name, title, avg_salary]
SQL
WITH table_3 AS (
SELECT
AVG(salary) AS _expr_0,
emp_no
FROM
salaries
GROUP BY
emp_no
),
table_1 AS (
SELECT
t.title,
AVG(table_2._expr_0) AS avg_salary,
dept_emp.dept_no
FROM
table_3 AS table_2
JOIN titles AS t ON table_2.emp_no = t.emp_no
LEFT JOIN dept_emp ON table_2.emp_no = dept_emp.emp_no
GROUP BY
dept_emp.dept_no,
t.title
)
SELECT
departments.dept_name,
table_0.title,
table_0.avg_salary
FROM
table_1 AS table_0
JOIN departments ON table_0.dept_no = departments.dept_no
Task 2
Estimate distribution of salaries and gender for each department departments.
PRQL
from e=employees
join salaries [==emp_no]
group [e.emp_no, e.gender] (
aggregate [
emp_salary = average salaries.salary
]
)
join de=dept_emp [==emp_no] side:left
group [de.dept_no, gender] (
aggregate [
salary_avg = average emp_salary,
salary_sd = stddev emp_salary,
]
)
join departments [==dept_no]
select [dept_name, gender, salary_avg, salary_sd]
SQL
WITH table_3 AS (
SELECT
e.gender,
AVG(salaries.salary) AS _expr_0,
e.emp_no
FROM
employees AS e
JOIN salaries ON e.emp_no = salaries.emp_no
GROUP BY
e.emp_no,
e.gender
),
table_1 AS (
SELECT
table_2.gender,
AVG(table_2._expr_0) AS salary_avg,
STDDEV(table_2._expr_0) AS salary_sd,
de.dept_no
FROM
table_3 AS table_2
LEFT JOIN dept_emp AS de ON table_2.emp_no = de.emp_no
GROUP BY
de.dept_no,
table_2.gender
)
SELECT
departments.dept_name,
table_0.gender,
table_0.salary_avg,
table_0.salary_sd
FROM
table_1 AS table_0
JOIN departments ON table_0.dept_no = departments.dept_no
Task 3
Estimate distribution of salaries and gender for each manager.
PRQL
from e=employees
join salaries [==emp_no]
group [e.emp_no, e.gender] (
aggregate [
emp_salary = average salaries.salary
]
)
join de=dept_emp [==emp_no]
join dm=dept_manager [
(dm.dept_no == de.dept_no) && s"(de.from_date, de.to_date) OVERLAPS (dm.from_date, dm.to_date)"
]
group [dm.emp_no, gender] (
aggregate [
salary_avg = average emp_salary,
salary_sd = stddev emp_salary
]
)
derive mng_no = emp_no
join managers=employees [==emp_no]
derive mng_name = s"managers.first_name || ' ' || managers.last_name"
select [mng_name, managers.gender, salary_avg, salary_sd]
SQL
WITH table_5 AS (
SELECT
e.gender,
AVG(salaries.salary) AS _expr_0,
e.emp_no
FROM
employees AS e
JOIN salaries ON e.emp_no = salaries.emp_no
GROUP BY
e.emp_no,
e.gender
),
table_3 AS (
SELECT
AVG(table_4._expr_0) AS salary_avg,
STDDEV(table_4._expr_0) AS salary_sd,
dm.emp_no
FROM
table_5 AS table_4
JOIN dept_emp AS de ON table_4.emp_no = de.emp_no
JOIN dept_manager AS dm ON dm.dept_no = de.dept_no
AND (de.from_date, de.to_date) OVERLAPS (dm.from_date, dm.to_date)
GROUP BY
dm.emp_no,
table_4.gender
),
table_1 AS (
SELECT
salary_avg,
salary_sd,
emp_no
FROM
table_3 AS table_2
)
SELECT
managers.first_name || ' ' || managers.last_name AS mng_name,
managers.gender,
table_0.salary_avg,
table_0.salary_sd
FROM
table_1 AS table_0
JOIN employees AS managers ON table_0.emp_no = managers.emp_no
Task 4
Find distributions of titles, salaries and genders for each department.
PRQL
from de=dept_emp
join s=salaries side:left [
(s.emp_no == de.emp_no),
s"({s.from_date}, {s.to_date}) OVERLAPS ({de.from_date}, {de.to_date})"
]
group [de.emp_no, de.dept_no] (
aggregate salary = (average s.salary)
)
join employees [==emp_no]
join titles [==emp_no]
select [dept_no, salary, employees.gender, titles.title]
SQL
WITH table_1 AS (
SELECT
de.dept_no,
AVG(s.salary) AS salary,
de.emp_no
FROM
dept_emp AS de
LEFT JOIN salaries AS s ON s.emp_no = de.emp_no
AND (s.from_date, s.to_date) OVERLAPS (de.from_date, de.to_date)
GROUP BY
de.emp_no,
de.dept_no
)
SELECT
table_0.dept_no,
table_0.salary,
employees.gender,
titles.title
FROM
table_1 AS table_0
JOIN employees ON table_0.emp_no = employees.emp_no
JOIN titles ON table_0.emp_no = titles.emp_no
Contributing
If you’re interested in joining the community to build a better SQL, here are ways to start:
- Star the repo.
- Send a link to PRQL to a couple of people whose opinion you respect.
- Subscribe to new releases for updates.
- Follow us on Twitter.
- Join our Discord.
- Find an issue labeled Good First Issue and start contributing to the code.
- Join our fortnightly Developer Call; (iCal file).
PRQL is evolving from a project with lots of excitement into a project that folks are using in their work and integrating into their tools. We’re actively looking for collaborators to lead that growth with us.
Areas for larger contributions
Compiler
The compiler is written in Rust, and there’s enough to do such that any level of experience with Rust is sufficient.
We try to keep a few onboarding issues on hand under the “good first issue” label. These have been screened to have sufficient context to get started (and we very much welcome questions where there’s some context missing).
To get started, check out the docs on Development and the Compiler architecture
And if you have questions, there are lots of friendly people on the Discord who will patiently help you.
Bindings & integrations
For PRQL to be successful, it needs to be available for the languages & tools that people already use.
- We currently have bindings to the PRQL compiler in a few different languages; many of these can be improved, documented, and packaged in a better way.
- If you have experience with packaging in an ecosystem that doesn’t currently have bindings, then creating PRQL bindings for that language we don’t currently support would be valuable to the project.
- If there’s a tool that you use yourself to run data queries which you think would benefit from a PRQL integration, suggest one to us or the tool. If it’s open-source, build & share a prototype.
Relevant issues are labeled Integrations.
Language design
We decide on new language features in GitHub issues, usually under “language design” label.
You can also contribute by:
- Finding instances where the compiler produces incorrect results, and post a bug report — feel free to use the playground.
- Opening an issue / append to an existing issue with examples of queries that are difficult to express in PRQL — especially if more difficult than SQL.
With sufficient examples, suggest a change to the language! (Though suggestions without examples are difficult to engage with, so please do anchor suggestions in examples.)
Marketing
- Improve our website. We have a few issues open on this front and are looking for anyone with at least some design skills.
- Contribute towards the docs. Anything from shaping a whole section of the docs, to simply improving a confusing paragraph or fixing a typo.
- Tell people about PRQL.
- Find a group of users who would be interested in PRQL, help them get up to speed, help the project understand what they need.
Core team
If you have any questions or feedback and don’t receive a response on one of the general channels such as GitHub or Discord, feel free to reach out to:
- @aljazerzen — Aljaž Mur Eržen
- @max-sixty — Maximilian Roos
- @eitsupi — SHIMA Tatsuya
- @snth — Tobias Brandt
Core team Emeritus
Thank you to those who have previously served on the core team:
- @charlie-sanders — Charlie Sanders
Development
Setting up an initial dev environment
We can set up a local development environment sufficient for navigating, editing, and testing PRQL’s compiler code in two minutes:
-
Install
rustup
&cargo
. -
[Optional but highly recommended] Install
cargo-insta
, our testing framework:cargo install cargo-insta
-
That’s it! Running the unit tests for the
prql-compiler
crate after cloning the repo should complete successfully:cargo test -p prql-compiler --lib
…or, to run tests and update the test snapshots:
cargo insta test --accept -p prql-compiler --lib
There’s more context on our tests in How we test below.
That’s sufficient for making an initial contribution to the compiler.
Setting up a full dev environment
Note: We really care about this process being easy, both because the project benefits from more contributors like you, and to reciprocate your future contribution. If something isn’t easy, please let us know in a GitHub Issue. We’ll enthusiastically help you, and use your feedback to improve the scripts & instructions.
For more advanced development; for example compiling for wasm or previewing the website, we have two options:
Option 1: Use the project’s task
Note: This is tested on MacOS, should work on Linux, but won’t work on Windows.
-
Install Task; either
brew install go-task/tap/go-task
or as described on Task. -
Then run the
setup-dev
task. This runs commands from our Taskfile.yml, installing dependencies withcargo
,brew
,npm
&pip
, and suggests some VS Code extensions.task setup-dev
Option 2: Install tools individually
-
We’ll need
cargo-insta
, to update snapshot tests:cargo install cargo-insta
-
We’ll need a couple of additional components, which most systems will have already. The easiest way to check whether they’re installed is to try running the full tests:
cargo test
…and if that doesn’t complete successfully, check we have:
-
A clang compiler, to compile the DuckDB integration tests, since we use `duckdb-rs’. To install one:
- On macOS, install xcode with
xcode-select --install
- On Debian Linux,
apt-get update && apt-get install clang
- On Windows,
duckdb-rs
isn’t supported, so these tests are excluded
- On macOS, install xcode with
-
Python >= 3.7, to compile
prql-python
.
-
-
For more involved contributions, such as building the website, playground, book, or some release artifacts, we’ll need some additional tools. But we won’t need those immediately, and the error messages on what’s missing should be clear when we attempt those things. When we hit them, the Taskfile.yml will be a good source to copy & paste instructions from.
Building & testing the full project
We have a couple of tasks which incorporate all building & testing. While they don’t need to be run as part of a standard dev loop — generally we’ll want to run a more specific test — they can be useful as a backstop to ensure everything works, and as a reference for how each part of the repo is built & tested. They should be consistent with the GitHub Actions workflows; please report any inconsistencies.
To build everything:
task build-all
To run all tests; (which includes building everything):
task test-all
These require installing Task, either brew install go-task/tap/go-task
or as
described on Task.
Contribution workflow
We’re similar to most projects on GitHub — open a Pull Request with a suggested change!
Commits
- If a change is user-facing, please add a line in
CHANGELOG.md
, with{message}, ({@contributor, #X})
whereX
is the PR number.- If there’s a missing entry, a follow-up PR containing just the changelog entry is welcome.
- We’re using Conventional Commits message format, enforced through action-semantic-pull-request.
Merges
- We merge any code that makes PRQL better
- A PR doesn’t need to be perfect to be merged; it doesn’t need to solve a big
problem. It needs to:
- be in the right direction,
- make incremental progress,
- be explicit on its current state, so others can continue the progress.
- If you have merge permissions, and are reasonably confident that a PR is
suitable to merge (whether or not you’re the author), feel free to merge.
- If you don’t have merge permissions and have authored a few PRs, ask and ye shall receive.
- The primary way we ratchet the code quality is through automated tests.
- This means PRs almost always need a test to demonstrate incremental progress.
- If a change breaks functionality without breaking tests, our tests were probably insufficient.
- If a change breaks existing tests (for example, changing an external API), that indicates we should be careful about merging a change, including soliciting others’ views.
- We use PR reviews to give general context, offer specific assistance, and
collaborate on larger decisions.
- Reviews around ‘nits’ like code formatting / idioms / etc are very welcome. But the norm is for them to be received as helpful advice, rather than as mandatory tasks to complete. Adding automated tests & lints to automate these suggestions is welcome.
- If you have merge permissions and would like a PR to be reviewed before it merges, that’s great — ask or assign a reviewer.
- If a PR hasn’t received attention after a day, please feel free to ping the pull request.
- People may review a PR after it’s merged. As part of the understanding that we can merge quickly, contributors are expected to incorporate substantive feedback into a future PR.
- We should revert quickly if the impact of a PR turns out not to be consistent with our expectations, or there isn’t as much consensus on a decision as we had hoped. It’s very easy to revert code and then re-revert when we’ve resolved the issue; it’s a sign of moving quickly. Other options which resolve issues immediately are also fine, such as commenting out an incorrect test or adding a quick fix for the underlying issue.
Docs
We’re very keen on contributions to improve our documentation.
This includes our docs in the book, on the website, in our code, or in a Readme. We also appreciate issues pointing out that our documentation was confusing, incorrect, or stale — if it’s confusing for you, it’s probably confusing for others.
Some principles for ensuring our docs remain maintainable:
- Docs should be as close as possible to the code. Doctests are ideal on this
dimension — they’re literally very close to the code and they can’t drift
apart since they’re tested on every commit. Or, for example, it’s better to
add text to a
--help
message, rather than write a paragraph in the Readme explaining the CLI. - We should have some visualization of how to maintain docs when we add them. Docs have a habit of falling out of date — the folks reading them are often different from those writing them, they’re sparse from the code, generally not possible to test, and are rarely the by-product of other contributions. Docs that are concise & specific are easier to maintain.
- Docs should be specifically relevant to PRQL; anything else we can instead link to.
If something doesn’t fit into one of these categories, there are still lots of ways of getting the word out there — a blog post / gist / etc. Let us know and we’re happy to link to it / tweet it.
How we test
We use a pyramid of tests — we have fast, focused tests at the bottom of the pyramid, which give us low latency feedback when developing, and then slower, broader tests which ensure that we don’t miss anything as PRQL develops1.
Note
If you’re making your first contribution, you don’t need to engage with all this — it’s fine to just make a change and push the results; the tests that run in GitHub will point you towards any errors, which can be then be run locally if needed. We’re always around to help out.
Our tests, from the bottom of the pyramid to the top:
-
Static checks — we run a few static checks to ensure the code stays healthy and consistent. They’re defined in
.pre-commit-config.yaml
, using pre-commit. They can be run locally withpre-commit run -a
The tests fix most of the issues they find themselves. Most of them also run on GitHub on every commit; any changes they make are added onto the branch automatically in an additional commit.
- Checking by MegaLinter, which includes more Linters, is also done automatically on GitHub. (experimental)
-
Unit tests & inline insta snapshots — we rely on unit tests to rapidly check that our code basically works. We extensively use Insta, a snapshot testing tool which writes out the values generated by our code, making it fast & simple to write and modify tests2
These are the fastest tests which run our code; they’re designed to run on every save while you’re developing. We include a
task
which does this:task test-rust-fast # or cargo insta test --accept -p prql-compiler --lib # or, to run on every change: task -w test-rust-fast
-
Examples — we compile all examples in the PRQL Book, to test that they produce the SQL we expect, and that changes to our code don’t cause any unexpected regressions.
-
Integration tests — these run tests against real databases, to ensure we’re producing correct SQL.
-
GitHub Actions on every commit — we run the tests described up to this point on every commit to a pull request. These are designed to run in under five minutes, and we should be reassessing their scope if they grow beyond that. Once these pass, a pull request can be merged.
These can be run locally with:
task test-rust
-
GitHub Actions on specific changes — we run additional tests on pull requests when we identify changes to some paths, such as bindings to other languages.
-
GitHub Actions on merge — we run many more tests on every merge to main. This includes testing across OSs, all our language bindings, our
task
tasks, a measure of test code coverage, and some performance benchmarks.We can run these tests before a merge by adding a label
pr-test-all
to the PR.If these tests fail after merging, we revert the merged commit before fixing the test and then re-reverting.
Most of these will run locally with:
task test-all
-
GitHub Actions nightly — we run tests that take a long time or are unrelated to code changes, such as security checks, or expensive timing benchmarks, every night.
We can run these tests before a merge by adding a label
pr-cron
to the PR.
The goal of our tests is to allow us to make changes quickly. If you find they’re making it more difficult for you to make changes, or there are missing tests that would give you the confidence to make changes faster, then please raise an issue.
Website
The website is published together with the book and the playground, and is
automatically built and released on any push to the web
branch.
The web
branch points to the latest release plus any website-specific fixes.
That way, the compiler behavior in the playground matches the latest release
while allowing us to fix mistakes with a tighter loop than every release.
Fixes to the playground, book, or website should have a pr-backport-web
label
added to their PR — a bot will then open another PR onto the web
branch once
the initial branch merges.
Releasing
Currently we release in a semi-automated way:
-
PR & merge an updated Changelog. GitHub will produce a draft version at https://github.com/PRQL/prql/releases/new, including “New Contributors”.
We can use this script to generate the first line:
echo "This release has $(git rev-list --count $(git rev-list --tags --max-count=1)..) commits from $(git shortlog --summary $(git rev-list --tags --max-count=1).. | wc -l | tr -d '[:space:]') contributors. Selected changes:"
-
Run
cargo release version patch -x && cargo release replace -x
to bump the versions, then PR the resulting commit. -
After merging, go to Draft a new release3, copy the changelog entry into the release description4, enter the tag to be created, and hit “Publish”.
-
From there, both the tag and release is created and all packages are published automatically based on our release workflow.
-
Add in the sections for a new Changelog:
## 0.8.X — [unreleased] **Features**: **Fixes**: **Documentation**: **Web**: **Integrations**: **Internal changes**: **New Contributors**:
-
Check whether there are milestones that need to be pushed out.
We may make this more automated in future; e.g. automatic changelog creation.
1: Our approach is very consistent with @matklad’s advice, in his excellent blog post How to Test.
2: Here’s an example of an insta test — note that only the initial line of each test is written by us; the remainder is filled in by insta.
3: Only maintainers have access to this page.
4: Unfortunately GitHub’s markdown parser interprets linebreaks as newlines. I haven’t found a better way of editing the markdown to look reasonable than manually editing the text.
Developing with Docker
The Dockerfile
in this repo builds a Docker image that has current versions of
our Rust development tools. This can be the lowest-effort way of setting up a
Rust environment for those that don’t have one already.
Development cycle
The developer loop when using Docker is substantially the same as if the tools had been installed directly.
All the source files live in the prql
directory on the host. As the source
changes, the tools (running in the Docker container) can watch those directories
and re-run so results are instantly visible.
When the Docker container exits (say, at the end of the development session),
the prql
directory on the local machine contains the latest files. Use git
to pull or to push the prql
repo from the host as normal.
To do all this, build the Docker image and start a container as described in the Installation section.
Installation
Once Docker is installed, build the Docker image with the following commands.
Note: It will take some time while Docker pulls in all the necessary developer tools.
cd <top-level-PRQL-directory>
docker build -t prql .
Optional: Install pre-commit
on the machine that hosts Docker. It runs
several Static Checks to ensure code consistency. You
can also configure git
to run pre-commit
automatically for each commit with
the second (one-time) command below.
pre-commit run -a # Run checks manually
pre-commit install # (one time) install the git hooks
Finally, start up the Docker container with:
cd <top-level-PRQL-directory>
docker run --rm -it -v $(pwd)/:/src -p 3000:3000 prql
- There’ll be a
root@xxxxxxxxx:/src/#
prompt - Enter a command to run or test code; for example
cargo test
- Enter
exit
to stop the container
Running code with Docker
Currently our Docker image only supports running Rust dependencies. (adding
hugo
& nodejs
so that the playground can run would be a welcome
contribution.)
Use the docker run...
command above, then enter the relevant commands; for
example cargo insta test --accept
or task run book
— more details of the
commands are in each component’s README.md
file or our
Development docs.
Note: The first time you run a component, it may take some time to install additional files. Once they’re built, start up is quick.
Developing the Dockerfile
When making updates to the Dockerfile, we have automated testing that the
Dockerfile builds on each merge in
test-all.yaml
,
and automated testing that the confirms all Rust tests pass, in
nightly.yaml
.
Add a label to the PR pr-test-all
or pr-cron
to run these tests on a PR.
Developing with Dev Containers
Dev Containers are a way to package a number of developer tools (compilers, bundlers, package managers, loaders, etc.) into a single object. This is helpful when many people want to contribute to a project: each person only has to install the Dev Container on their own machine to start working. By definition, the Dev Container has a consistent set of tools that are known to work together. This avoids a fuss with finding the proper version of each of the build tools.
While there are a variety of tools that support Dev Containers, the focus here is on developing with VS Code in a container by GitHub Codespaces or VS Code Dev Containers extension.
To use a Dev Container on a local computer with VS Code, install the VS Code Dev Containers extension and its system requirements. Then refer to the links above to get started.
Internals
This chapter explains PRQL’s semantics: how expressions are interpreted and their meaning. It’s intended for advanced users and compiler contributors.
It’s also worth checking out the
prql-compiler
docs for
more details on its API.
PRQL Compiler Architecture
The PRQL compiler operates in the following stages:
-
Lexing & Parsing: PRQL source text is split into tokens with the Chumsky parser named “lexer”. The stream of tokens is then parsed into an Abstract Syntax Tree (AST).
-
Semantic Analysis: This stage resolves names (identifiers), extracts declarations, and determines frames (table columns in each step). A
Context
is declared containing the root module, which maps accessible names to their declarations.The resolving process involves the following operations:
- Assign an ID to each node (
Expr
andStmt
). - Extract function declarations and variable definitions into the appropriate
Module
, accessible fromContext::root_mod
. - Look up identifiers in the module and find the associated declaration. The
identifier is replaced with a fully qualified name that guarantees a unique
name in
root_mod
. In some cases,Expr::target
is also set. - Convert function calls to transforms (
from
,derive
,filter
) fromFuncCall
toTransformCall
, which is more convenient for later processing. - Determine the type of expressions. If an expression is a reference to a
table, use the frame of the table as the type. If it is a
TransformCall
, apply the transform to the input frame to obtain the resulting type. For simple expressions, try to infer fromExprKind
.
- Assign an ID to each node (
-
Lowering: This stage converts the PL into RQ, which is more strictly typed and contains less information but is convenient for translating into SQL or other backends.
-
SQL Backend: This stage converts RQ into SQL. Each relation is transformed into an SQL query. Pipelines are analyzed and split into “AtomicPipelines” at appropriate positions, which can be represented by a single SELECT statement.
Splitting is performed back-to-front. First, a list of all output columns is created. The pipeline is then traversed backwards, and splitting occurs when an incompatible transform with those already present in the pipeline is encountered. Splitting can also be triggered by encountering an expression that cannot be materialized where it is used (e.g., a window function in a WHERE clause).
This process is also called anchoring, as it anchors a column definition to a specific location in the output query.
During this process,
sql::context
keeps track of:- Table instances in the query (to prevent mixing up multiple instances of the same table)
- Column definitions, whether computed or a reference to a table column
- Column names, as defined in RQ or generated
Helpers
Cheat sheets for prqlc
are available on various websites and with various
tools.
tldr
(on the web)eg
Name resolving
Because PRQL primarily handles relational data, it has specialized scoping rules for referencing columns.
Scopes
In PRQL’s compiler, a scope is the collection of all names one can reference from a specific point in the program.
In PRQL, names in the scope are composed from namespace and variable name which are separated by a dot, similar to SQL. Namespaces can contain many dots, but variable names cannot.
Name my_table.some_column
is a variable some_column
from namespace my_table
.
Name foo.bar.baz
is a variable baz
from namespace foo.bar
.
When processing a query, a scope is maintained and updated for each point in the query.
It start with only namespace std
, which is the standard library. It contains
common functions like sum
or count
, along with all transform functions such
as derive
and group
.
In pipelines (or rather in transform functions), scope is also injected with
namespaces of tables which may have been referenced with from
or join
transforms. These namespaces contain simply all the columns of the table and
possibly a wildcard variable, which matches any variable (see the algorithm
below). Within transforms, there is also a special namespace that does not have
a name. It is called a “frame” and it contains columns of the current table
the transform is operating on.
Resolving
For each ident we want to resolve, we search the scope’s items in order. One of three things can happen:
-
Scope contains an exact match, e.g. a name that matches in namespace and the variable name.
-
Scope does not contain an exact match, but the ident did not specify a namespace, so we can match a namespace that contains a
*
wildcard. If there’s a single namespace, the matched namespace is also updated to contain this new variable name. -
Otherwise, the nothing is matched and an error is raised.
Translating to SQL
When translating into an SQL statement which references only one table, there is no need to reference column names with table prefix.
But when there are multiple tables and we don’t have complete knowledge of all
table columns, a column without a prefix (i.e. first_name
) may actually reside
in multiple tables. Because of this, we have to use table prefixes for all
column names.
PRQL
from employees
derive [first_name, dept_id]
join d=departments [==dept_id]
select [first_name, d.title]
SQL
SELECT
employees.first_name,
d.title
FROM
employees
JOIN departments AS d ON employees.dept_id = d.dept_id
As you can see, employees.first_name
now needs table prefix, to prevent
conflicts with potential column with the same name in departments
table.
Similarly, d.title
needs the table prefix.
Functions
Function call
The major distinction between PRQL and today’s conventional programming languages such as C or Python is the function call syntax. It consists of the function name followed by arguments separated by whitespace.
function_name arg1 arg2 arg3
If one of the arguments is also a function call, it must be encased in parentheses, so we know where arguments of inner function end and the arguments of outer function start.
outer_func arg_1 (inner_func arg_a, arg_b) arg_2
Pipeline
There is a alternative way of calling functions: using a pipeline. Regardless of
whether the pipeline is delimited by pipe symbol |
or a new line, the pipeline
is equivalent to applying each of functions as the last argument of the next
function.
a | foo 3 | bar 'hello' 'world' | baz
… is equivalent to …
baz (bar 'hello' 'world' (foo 3 a))
As you may have noticed, transforms are regular functions too!
PRQL
from employees
filter age > 50
sort name
SQL
SELECT
*
FROM
employees
WHERE
age > 50
ORDER BY
name
… is equivalent to …
PRQL
from employees | filter age > 50 | sort name
SQL
SELECT
*
FROM
employees
WHERE
age > 50
ORDER BY
name
… is equivalent to …
PRQL
filter age > 50 (from employees) | sort name
SQL
SELECT
*
FROM
employees
WHERE
age > 50
ORDER BY
name
… is equivalent to …
PRQL
sort name (filter age > 50 (from employees))
SQL
SELECT
*
FROM
employees
WHERE
age > 50
ORDER BY
name
As you can see, the first example with pipeline notation is much easier to comprehend, compared to the last one with the regular function call notation. This is why it is recommended to use pipelines for nested function calls that are 3 or more levels deep.
Currying and late binding
In PRQL, functions are first class citizens. As cool as that sounds, we need simpler terms to explain it. In essence in means that we can operate with functions are with any other value.
Grammars / syntax highlighting
PRQL contains multiple grammar definitions to enable tools to highlight PRQL code. These are all intended to provide as good an experience as the grammar supports. Please raise any shortcomings in a GitHub issue.
The definitions are somewhat scattered around the codebase; this page serves as an index.
-
Lezer — used by CodeMirror editors. The PRQL file is at
grammars/prql-lezer/README.me
. -
Handlebars — currently duplicated:
- The book:
book/highlight-prql.js
- The website (outside of the book & playground):
website/themes/prql-theme/static/plugins/highlight/prql.js
- The book:
-
Textmate — used by the VS Code extension. It’s in the
prql-vscode
repo inprql-vscode/syntaxes/prql.tmLanguage.json
. -
Monarch — used by the Monaco editor, which we use for the Playground. The grammar is at
playground/src/workbench/prql-syntax.js
. -
Tree-Sitter — used by the neovim and helix. The grammar can be found at https://github.com/PRQL/tree-sitter-prql.
Since the Elm language coincidentally provides syntax highlighting suitable for PRQL, it may look better to mark PRQL code as Elm when the above definition files are not available.
For example, the following Markdown code block will be nicely highlighted on GitHub, Pandoc, and other Markdown renderers:
```elm
from employees
filter start_date > @2021-01-01
```
We hope that in the future these renderers will recognize PRQL code blocks and have syntax highlighting applied, and we are tracking these with several issues.
- GitHub (Linguist): https://github.com/PRQL/prql/issues/1636
- Pandoc (Kate): https://github.com/PRQL/prql/issues/2213
PRQL Changelog
0.8.1 — 2023-04-29
0.8.1 is a small release with a new list-targets
command in prqlc
, some
documentation improvements, and some internal improvements.
This release has 41 commits from 8 contributors.
From the broader perspective of the project, we’re increasing the relative prioritization of it being easy for folks to actually use PRQL — either with existing tools, or a tool we’d build. We’re thinking about & discussing the best way to do that over the next few weeks.
0.8.0 — 2023-04-14
0.8.0 renames the and
& or
operators to &&
& ||
respectively,
reorganizes the Syntax section in the book, and introduces read_parquet
&
read_csv
functions for reading files with DuckDB.
This release has 38 commits from 8 contributors. Selected changes:
Features:
-
Rename
and
to&&
andor
to||
. Operators which are symbols are now consistently infix, while “words” are now consistently functions (@aljazerzen, #2422). -
New functions
read_parquet
andread_csv
, which mirror the DuckDB functions, instructing the database to read from files (@max-sixty, #2409).
0.7.1 — 2023-04-03
0.7.1 is a hotfix release to fix prql-js
’s npm install
behavior when being
installed as a dependency.
This release has 17 commits from 4 contributors.
0.7.0 — 2023-04-01
0.7.0 is a fairly small release in terms of new features, with lots of internal improvements, such as integration tests with a whole range of DBs, a blog post on Pi day, RFCs for a type system, and more robust language bindings.
There’s a very small breaking change to the rust API, hence the minor version bump.
Here’s our April 2023 Update, from our Readme:
April 2023 update
PRQL is being actively developed by a growing community. It’s ready to use by the intrepid, either as part of one of our supported extensions, or within your own tools, using one of our supported language bindings.
PRQL still has some minor bugs and some missing features, and probably is only ready to be rolled out to non-technical teams for fairly simple queries.
Here’s our current Roadmap and our Milestones.
Our immediate focus for the code is on:
- Building out the next few big features, including types and modules.
- Ensuring our supported features feel extremely robust; resolving any priority bugs.
We’re also spending time thinking about:
- Making it really easy to start using PRQL. We’re doing that by building integrations with tools that folks already use; for example our VS Code extension & Jupyter integration. If there are tools you’re familiar with that you think would be open to integrating with PRQL, please let us know in an issue.
- Making it easier to contribute to the compiler. We have a wide group of contributors to the project, but contributions to the compiler itself are quite concentrated. We’re keen to expand this; #1840 for feedback.
The release has 131 commits from 10 contributors. Particular credit goes to to @eitsupi & @jelenkee, who have made significant contributions, and @vanillajonathan, whose prolific contribution include our growing language bindings.
A small selection of the changes:
Features:
prqlc compile
adds--color
&--include-signature-comment
options. (@max-sixty, #2267)
Web:
- Added the PRQL snippets from the book to the Playground (@jelenkee, #2197)
Internal changes:
- Breaking: The
compile
function’sOptions
now includes acolor
member, which determines whether error messages use ANSI color codes. This is technically a breaking change to the API. (@max-sixty, #2251) - The
Error
struct now exposes theMessageKind
enum. (@vanillajonathan, #2307) - Integration tests run in CI with DuckDB, SQLite, PostgreSQL, MySQL and SQL Server (@jelenkee, #2286)
New Contributors:
- @k-nut, with #2294
0.6.1 — 2023-03-12
0.6.1 is a small release containing an internal refactoring and improved bindings for C, PHP & .NET.
This release has 54 commits from 6 contributors. Selected changes:
Fixes:
- No longer incorrectly compile to
DISTINCT
when atake 1
refers to a different set of columns than are in thegroup
. (@max-sixty, with thanks to @cottrell, #2109) - The version specification of the dependency Chumsky was bumped from
0.9.0
to0.9.2
.0.9.0
has a bug that causes an infinite loop. (@eitsupi, #2110)
Documentation:
- Add a policy for which bindings are supported / unsupported / nascent. See https://prql-lang.org/book/bindings/index.html for more details (@max-sixty, #2062) (@max-sixty, #2062)
Integrations:
- [prql-lib] Added C++ header file. (@vanillajonathan, #2126)
Internal changes:
- Many of the items that were in the root of the repo have been aggregated into
web
&bindings
, simplifying the repo’s structure. There’s alsogrammars
&packages
(@max-sixty, #2135, #2117, #2121).
0.6.0 — 2023-03-08
0.6.0 introduces a rewritten parser, giving us the ability to dramatically
improve error messages, renames switch
to case
and includes lots of minor
improvements and fixes. It also introduces loop
, which compiles to
WITH RECURSIVE
, as a highly experimental feature.
There are a few cases of breaking changes, including switching switch
to
case
, in case that’s confusing. There are also some minor parsing changes
outlined below.
This release has 108 commits from 11 contributors. Selected changes:
Features:
-
Add a (highly experimental)
loop
language feature, which translates toWITH RECURSIVE
. We expect changes and refinements in upcoming releases. (#1642, @aljazerzen) -
Rename the experimental
switch
function tocase
given it more closely matches the traditional semantics ofcase
. (@max-sixty, #2036) -
Change the
case
syntax to use=>
instead of->
to distinguish it from function syntax. -
Convert parser from pest to Chumsky (@aljazerzen, #1818)
- Improved error messages, and the potential to make even better in the future. Many of these improvements come from error recovery.
- String escapes (
\n \t
). - Raw strings that don’t escape backslashes.
- String interpolations can only contain identifiers and not any expression.
- Operator associativity has been changed from right-to-left to left-to-right to be more similar to other conventional languages.
and
now has a higher precedence thanor
(of same reason as the previous point).- Dates, times and timestamps have stricter parsing rules.
let
,func
,prql
,case
are now treated as keywords.- Float literals without fraction part are not allowed anymore (
1.
).
-
Add a
--format
option toprqlc parse
which can return the AST in YAML (@max-sixty, #1962) -
Add a new subcommand
prqlc jinja
. (@aljazerzen, #1722) -
Breaking: prql-compiler no longer passes text containing
{{
&}}
through to the output. (@aljazerzen, #1722)For example, the following PRQL query
from {{foo}}
was compiled to the following SQL previously, but now it raises an error.
SELECT * FROM {{ foo }}
This pass-through feature existed for integration with dbt.
we’re again considering how to best integrate with dbt, and this change is based on the idea that the jinja macro should run before the PRQL compiler.
If you’re interested in dbt integration, subscribe or 👍 to https://github.com/dbt-labs/dbt-core/pull/5982.
-
A new compile target
"sql.any"
. When"sql.any"
is used as the target of the compile function’s option, the target contained in the query header will be used. (@aljazerzen, #1995) -
Support for SQL parameters with similar syntax (#1957, @aljazerzen)
-
Allow
:
to be elided in timezones, such as0800
in@2020-01-01T13:19:55-0800
(@max-sixty, #1991). -
Add
std.upper
andstd.lower
functions for changing string casing (@Jelenkee, #2019).
Fixes:
prqlc compile
returns a non-zero exit code for invalid queries. (@max-sixty, #1924)- Identifiers can contain any alphabetic unicode characters (@max-sixty, #2003)
Documentation:
- Operator precedence (@aljazerzen, #1818)
- Error messages for invalid queries are displayed in the book (@max-sixty, #2015)
Integrations:
- [prql-php] Added PHP bindings. (@vanillajonathan, #1860)
- [prql-dotnet] Added .NET bindings. (@vanillajonathan, #1917)
- [prql-lib] Added C header file. (@vanillajonathan, #1879)
- Added a workflow building a
.deb
on each release. (Note that it’s not yet published on each release). (@vanillajonathan, #1883) - Added a workflow building a
.rpm
on each release. (Note that it’s not yet published on each release). (@vanillajonathan, #1918) - Added a workflow building a Snap package on each release. (@vanillajonathan, #1881)
Internal changes:
- Test that the output of our nascent autoformatter can be successfully compiled into SQL. Failing examples are now clearly labeled. (@max-sixty, #2016)
- Definition files have been added to configure Dev Containers for Rust development environment. (@eitsupi, #1893, #2025, #2028)
New Contributors:
- @linux-china, with #1971
- @Jelenkee, with #2019
0.5.2 — 2023-02-18
0.5.2 is a tiny release to fix an build issue in yesterday’s prql-js
0.5.1
release.
This release has 7 commits from 2 contributors.
New Contributors:
- @matthias-Q, with #1873
0.5.1 — 2023-02-17
0.5.1 contains a few fixes, and another change to how bindings handle default target / dialects.
This release has 53 commits from 7 contributors. Selected changes:
Fixes:
- Delegate dividing literal integers to the DB. Previously integer division was executed during PRQL compilation, which could be confusing given that behavior is different across DBs. Other arithmetic operations are still executed during compilation. (@max-sixty, #1747)
Documentation:
- Add docs on the
from_text
transform (@max-sixty, #1756)
Integrations:
- [prql-js] Default compile target changed from
Sql(Generic)
toSql(None)
. (@eitsupi, #1856) - [prql-python] Compilation options can now be specified from Python. (@eitsupi, #1807)
- [prql-python] Default compile target changed from
Sql(Generic)
toSql(None)
. (@eitsupi, #1861)
New Contributors:
- @vanillajonathan, with #1766
0.5.0 — 2023-02-08
0.5.0 contains a few fixes, some improvements to bindings, lots of docs
improvements, and some work on forthcoming features. It contains one breaking
change in the compiler’s Options
interface.
This release has 74 commits from 12 contributors. Selected changes:
Features:
-
Change public API to use target instead of dialect in preparation for feature work (@aljazerzen, #1684)
-
prqlc watch
command which watches filesystem for changes and compiles .prql files to .sql (@aljazerzen, #1708)
Fixes:
- Support double brackets in s-strings which aren’t symmetric (@max-sixty, #1650)
- Support Postgres’s Interval syntax (@max-sixty, #1649)
- Fixed tests for
prql-elixir
with MacOS (@kasvith, #1707)
Documentation:
- Add a documentation test for prql-compiler, update prql-compiler README, and include the README in the prql book section for Rust bindings. The code examples in the README are included and tested as doctests in the prql-compiler (@nkicg6, #1679)
Internal changes:
- Add tests for all PRQL website examples to prql-python to ensure compiled results match expected SQL (@nkicg6, #1719)
New Contributors:
- @ruslandoga, with #1628
- @RalfNorthman, with #1632
- @nicot, with #1662
0.4.2 — 2023-01-25
Features:
-
New
from_text format-arg string-arg
function that supports JSON and CSV formats. format-arg can beformat:csv
orformat:json
. string-arg can be a string in any format. (@aljazerzen & @snth, #1514)from_text format:csv """ a,b,c 1,2,3 4,5,6 """
from_text format:json ''' [{"a": 1, "b": "x", "c": false }, {"a": 4, "b": "y", "c": null }] '''
from_text format:json '''{ "columns": ["a", "b", "c"], "data": [ [1, "x", false], [4, "y", null] ] }'''
For now, the argument is limited to string constants.
Fixes
- Export constructor for SQLCompileOptions (@bcho, #1621)
- Remove backticks in count_distinct (@aljazerzen, #1611)
New Contributors
- @1Kinoti, with #1596
- @veenaamb, with #1614
0.4.1 — 2023-01-18
0.4.1 comes a few days after 0.4.0, with a couple of features and the release of
prqlc
, the CLI crate.
0.4.1 has 35 commits from 6 contributors.
Features:
-
Inferred column names include the relation name (@aljazerzen, #1550):
from albums select title # name used to be inferred as title only select albums.title # so using albums was not possible here
-
Quoted identifiers such as
dir/*.parquet
are passed through to SQL. (@max-sixty, #1516). -
The CLI is installed with
cargo install prqlc
. The binary was renamed in 0.4.0 but required an additional--features
flag, which has been removed in favor of this new crate (@max-sixty & @aljazerzen, #1549).
New Contributors:
- @fool1280, with #1554
- @nkicg6, with #1567
0.4.0 — 2023-01-15
0.4.0 brings lots of new features including case
, select ![]
and numbers
with underscores. We have initial (unpublished) bindings to Elixir. And there’s
the usual improvements to fixes & documentation (only a minority are listed
below in this release).
0.4.0 also has some breaking changes: table
is let
, dialect
is renamed to
target
, and the compiler’s API has changed. Full details below.
Features:
-
Defining a temporary table is now expressed as
let
rather thantable
(@aljazerzen, #1315). See the tables docs for details. -
Experimental: The
case
function sets a variable to a value based on one of several expressions (@aljazerzen, #1278).derive var = case [ score <= 10 -> "low", score <= 30 -> "medium", score <= 70 -> "high", true -> "very high", ]
…compiles to:
SELECT *, CASE WHEN score <= 10 THEN 'low' WHEN score <= 30 THEN 'medium' WHEN score <= 70 THEN 'high' ELSE 'very high' END AS var FROM bar
Check out the
case
docs for more details. -
Experimental: Columns can be excluded by name with
select
(@aljazerzen, #1329)from albums select ![title, composer]
-
Experimental:
append
transform, equivalent toUNION ALL
in SQL. (@aljazerzen, #894)from employees append managers
Check out the
append
docs for more details. -
Numbers can contain underscores, which can make reading long numbers easier (@max-sixty, #1467):
from numbers select [ small = 1.000_000_1, big = 5_000_000, ]
-
The SQL output contains a comment with the PRQL compiler version (@aljazerzen, #1322)
-
dialect
is renamed totarget
, and its values are prefixed withsql.
(@max-sixty, #1388); for example:prql target:sql.bigquery # previously was `dialect:bigquery` from employees
This gives us the flexibility to target other languages than SQL in the long term.
-
Tables definitions can contain a bare s-string (@max-sixty, #1422), which enables us to include a full CTE of SQL, for example:
let grouping = s""" SELECT SUM(a) FROM tbl GROUP BY GROUPING SETS ((b, c, d), (d), (b, d)) """
-
Ranges supplied to
in
can be half-open (@aljazerzen, #1330). -
The crate’s external API has changed to allow for compiling to intermediate representation. This also affects bindings. See
prql_compiler
docs for more details.
Fixes:
[This release, the changelog only contains a subset of fixes]
- Allow interpolations in table s-strings (@aljazerzen, #1337)
Documentation:
[This release, the changelog only contains a subset of documentation improvements]
- Add docs on aliases in Select
- Add JS template literal and multiline example (@BCsabaEngine, #1432)
- JS template literal and multiline example (@BCsabaEngine, #1432)
- Improve prql-compiler docs & examples (@aljazerzen, #1515)
- Fix string highlighting in book (@max-sixty, #1264)
Web:
- The playground allows querying some sample data. As before, the result updates on every keystroke. (@aljazerzen, #1305)
Integrations:
[This release, the changelog only contains a subset of integration improvements]
- Added Elixir integration exposing PRQL functions as NIFs (#1500, @kasvith)
- Exposed Elixir flavor with exceptions (#1513, @kasvith)
- Rename
prql-compiler
binary toprqlc
(@aljazerzen #1515)
Internal changes:
[This release, the changelog only contains a subset of internal changes]
- Add parsing for negative select (@max-sixty, #1317)
- Allow for additional builtin functions (@aljazerzen, #1325)
- Add an automated check for typos (@max-sixty, #1421)
- Add tasks for running playground & book (@max-sixty, #1265)
- Add tasks for running tests on every file change (@max-sixty, #1380)
New contributors:
- @EArazli, with #1359
- @boramalper, with #1362
- @allurefx, with #1377
- @bcho, with #1375
- @JettChenT, with #1385
- @BlurrechDev, with #1411
- @BCsabaEngine, with #1432
- @kasvith, with #1500
0.3.1 - 2022-12-03
0.3.1 brings a couple of small improvements and fixes.
Features:
-
Support for using s-strings for
from
(#1197, @aljazerzen)from s"SELECT * FROM employees WHERE foo > 5"
-
Helpful error message when referencing a table in an s-string (#1203, @aljazerzen)
Fixes:
- Multiple columns with same name created (#1211, @aljazerzen)
- Renaming via select breaks preceding sorting (#1204, @aljazerzen)
- Same column gets selected multiple times (#1186, @mklopets)
Internal:
- Update Github Actions and Workflows to current version numbers (and avoid using Node 12)
0.3.0 — 2022-11-29
🎉 0.3.0 is the biggest ever change in PRQL’s compiler, rewriting much of the internals: the compiler now has a semantic understanding of expressions, including resolving names & building a DAG of column lineage 🎉.
While the immediate changes to the language are modest — some long-running bugs are fixed — this unlocks the development of many of the project’s long-term priorities, such as type-checking & auto-complete. And it simplifies the building of our next language features, such as match-case expressions, unions & table expressions.
@aljazerzen has (mostly single-handedly) done this work over the past few months. The project owes him immense appreciation.
Breaking changes:
We’ve had to make some modest breaking changes for 0.3:
-
Pipelines must start with
from
. For example, a pipeline with onlyderive foo = 5
, with nofrom
transform, is no longer valid. Depending on demand for this feature, it would be possible to add this back. -
Shared column names now require
==
in a join. The existing approach is ambiguous to the compiler —id
in the following example could be a boolean column.from employees -join positions [id] +join positions [==id]
-
Table references containing periods must be surrounded by backticks. For example, when referencing a schema name:
-from public.sometable +from `public.sometable`
Features:
- Change self equality op to
==
(#1176, @aljazerzen) - Add logging (@aljazerzen)
- Add clickhouse dialect (#1090, @max-sixty)
- Allow namespaces & tables to contain
.
(#1079, @aljazerzen)
Fixes:
- Deduplicate column appearing in
SELECT
multiple times (#1186, @aljazerzen) - Fix uppercase table names (#1184, @aljazerzen)
- Omit table name when only one ident in SELECT (#1094, @aljazerzen)
Documentation:
- Add chapter on semantics’ internals (@aljazerzen, #1028)
- Add note about nesting variables in s-strings (@max-sixty, #1163)
Internal changes:
- Flatten group and window (#1120, @aljazerzen)
- Split ast into expr and stmt (@aljazerzen)
- Refactor associativity (#1156, @aljazerzen)
- Rename Ident constructor to
from_name
(#1084, @aljazerzen) - Refactor rq folding (#1177, @aljazerzen)
- Add tests for reported bugs fixes in semantic (#1174, @aljazerzen)
- Bump duckdb from 0.5.0 to 0.6.0 (#1132)
- Bump once_cell from 1.15.0 to 1.16.0 (#1101)
- Bump pest from 2.4.0 to 2.5.0 (#1161)
- Bump pest_derive from 2.4.0 to 2.5.0 (#1179)
- Bump sqlparser from 0.25.0 to 0.27.0 (#1131)
- Bump trash from 2.1.5 to 3.0.0 (#1178)
0.2.11 — 2022-11-20
0.2.11 contains a few helpful fixes.
Work continues on our semantic
refactor — look out for 0.3.0 soon! Many thanks
to @aljazerzen for his continued contributions to this.
Note: 0.2.10 was skipped due to this maintainer’s inability to read his own docs on bumping versions…
Features:
- Detect when compiler version is behind query version (@MarinPostma, #1058)
- Add
__version__
to prql-python package (@max-sixty, #1034)
Fixes:
- Fix nesting of expressions with equal binding strength and left associativity,
such as
a - (b - c)
(@max-sixty, #1136) - Retain floats without significant digits as floats (@max-sixty, #1141)
Documentation:
- Add documentation of
prqlr
bindings (@eitsupi, #1091) - Add a ‘Why PRQL’ section to the website (@max-sixty, #1098)
- Add @snth to core-devs (@max-sixty, #1050)
Internal changes:
- Use workspace versioning (@max-sixty, #1065)
0.2.9 — 2022-10-14
0.2.9 is a small release containing a bug fix for empty strings.
Fixes:
- Fix parsing of empty strings (@aljazerzen, #1024)
0.2.8 — 2022-10-10
0.2.8 is another modest release with some fixes, doc improvements, bindings
improvements, and lots of internal changes. Note that one of the fixes causes
the behavior of round
and cast
to change slightly — though it’s handled as a
fix rather than a breaking change in semantic versioning.
Fixes:
-
Change order of the
round
&cast
function parameters to have the column last; for exampleround 2 foo_col
/cast int foo
. This is consistent with other functions, and makes piping possible:derive [ gross_salary = (salary + payroll_tax | as int), gross_salary_rounded = (gross_salary | round 0), ]
Documentation:
- Split
DEVELOPMENT.md
fromCONTRIBUTING.md
(@richb-hanover, #1010) - Make s-strings more prominent in website intro (@max-sixty, #982)
Web:
- Add GitHub star count to website (@max-sixty, #990)
Integrations:
- Expose a shortened error message, in particular for the VS Code extension (@aljazerzen, #1005)
Internal changes:
- Specify 1.60.0 as minimum Rust version (@max-sixty, #1011)
- Remove old
wee-alloc
code (@max-sixty, #1013) - Upgrade clap to version 4 (@aj-bagwell, #1004)
- Improve book-building script in Taskfile (@max-sixty, #989)
- Publish website using an artifact rather than a long-lived branch (@max-sixty, #1009)
0.2.7 — 2022-09-17
0.2.7 is a fairly modest release, six weeks after 0.2.6. We have some more
significant features, including a union
operator and an overhaul of our type
system, as open PRs which will follow in future releases.
We also have new features in the VS Code extension, courtesy of @jiripospisil, including a live output panel.
Fixes:
range_of_ranges
checks the Range end is smaller than its start (@shuozeli, #946)
Documentation:
- Improve various docs (@max-sixty, #974, #971, #972, #970, #925)
- Add reference to EdgeDB’s blog post in our FAQ (@max-sixty, #922)
- Fix typos (@kianmeng, #943)
Integrations:
- Add
prql-lib
, enabling language bindings withgo
(@sigxcpu76, #923) - Fix line numbers in JS exceptions (@charlie-sanders, #929)
Internal changes:
- Lock the version of the rust-toolchain, with auto-updates (@max-sixty, #926, #927)
0.2.6 — 2022-08-05
Fixes:
- Adjust
fmt
to only escape names when needed (@aljazerzen, #907) - Fix quoting on upper case
table
names (@max-sixty, #893) - Fix scoping of identical column names from multiple tables (@max-sixty, #908)
- Fix parse error on newlines in a
table
(@sebastiantoh 🆕, #902) - Fix quoting of upper case table names (@max-sixty, #893)
Documentation:
- Add docs on Architecture (@aljazerzen, #904)
- Add Changelog (@max-sixty, #890 #891)
Internal changes:
- Start trial using Conventional Commits (@max-sixty, #889)
- Add crates.io release workflow, docs (@max-sixty, #887)
0.2.5 - 2022-07-29
0.2.5 is a very small release following 0.2.4 yesterday. It includes:
- Add the ability to represent single brackets in an s-string, with two brackets (#752, @max-sixty)
- Fix the “Copy to Clipboard” command in the Playground, for Firefox (#880, @mklopets)
0.2.4 - 2022-07-28
0.2.4 is a small release following 0.2.3 a few days ago. The 0.2.4 release includes:
- Enrich our CLI, adding commands to get different stages of the compilation process (@aljazerzen , #863)
- Fix multiple
take n
statements in a query, leading to duplicate proxy columns in generated SQL (@charlie-sanders) - Fix BigQuery quoting of identifiers in
SELECT
statements (@max-sixty) - Some internal changes — reorganize top-level functions (@aljazerzen), add a workflow to track our Rust compilation time (@max-sixty), simplify our simple prql-to-sql tests (@max-sixty)
Thanks to @ankane, prql-compiler
is now available from homebrew core;
brew install prql-compiler
1.
we still need to update docs and add a release workflow for this:
<https://github.com/PRQL/prql/issues/866>
0.2.3 - 2022-07-24
A couple of weeks since the 0.2.2 release: we’ve squashed a few bugs, added some mid-sized features to the language, and made a bunch of internal improvements.
The 0.2.3 release includes:
- Allow for escaping otherwise-invalid identifiers (@aljazerzen & @max-sixty)
- Fix a bug around operator precedence (@max-sixty)
- Add a section the book on the language bindings (@charlie-sanders)
- Add tests for our
Display
representation while fixing some existing bugs. This is gradually becoming our code formatter (@arrizalamin) - Add a “copy to clipboard” button in the Playground (@mklopets)
- Add lots of guidance to our
CONTRIBUTING.md
around our tests and process for merging (@max-sixty) - Add a
prql!
macro for parsing a prql query at compile time (@aljazerzen) - Add tests for
prql-js
(@charlie-sanders) - Add a
from_json
method for transforming json to a PRQL string (@arrizalamin) - Add a workflow to release
prql-java
to Maven (@doki23) - Enable running all tests from a PR by adding a
pr-run-all-tests
label (@max-sixty) - Have
cargo-release
to bump all crate & npm versions (@max-sixty) - Update
prql-js
to use the bundler build ofprql-js
(@mklopets)
As well as those contribution changes, thanks to those who’ve reported issues, such as @mklopets @huw @mm444 @ajfriend.
From here, we’re planning to continue squashing bugs (albeit more minor than
those in this release), adding some features like union
, while working on
bigger issues such as type-inference.
We’re also going to document and modularize the compiler further. It’s important that we give more people an opportunity to contribute to the guts of PRQL, especially given the number and enthusiasm of contributions to project in general — and it’s not that easy to do so at the moment. While this is ongoing if anyone has something they’d like to work on in the more difficult parts of the compiler, let us know on GitHub or Discord, and we’d be happy to work together on it.
Thank you!
0.2.2 - 2022-07-10
We’re a couple of weeks since our 0.2.0 release. Thanks for the surge in interest and contributions! 0.2.2 has some fixes & some internal improvements:
- We now test against SQLite & DuckDB on every commit, to ensure we’re producing correct SQL. (@aljazerzen)
- We have the beginning of Java bindings! (@doki23)
- Idents surrounded by backticks are passed through to SQL (@max-sixty)
- More examples on homepage; e.g.
join
&window
, lots of small docs improvements - Automated releases to homebrew (@roG0d)
- prql-js is now a single package for Node, browsers & webpack (@charlie-sanders)
- Parsing has some fixes, including
>=
and leading underscores in idents (@mklopets) - Ranges receive correct syntax highlighting (@max-sixty)
Thanks to Aljaž Mur Eržen @aljazerzen , George Roldugin @roldugin , Jasper McCulloch @Jaspooky , Jie Han @doki23 , Marko Klopets @mklopets , Maximilian Roos @max-sixty , Rodrigo Garcia @roG0d , Ryan Russell @ryanrussell , Steven Maude @StevenMaude , Charlie Sanders @charlie-sanders .
We’re planning to continue collecting bugs & feature requests from users, as well as working on some of the bigger features, like type-inference.
For those interesting in joining, we also have a new Contributing page.
0.2.0 - 2022-06-27
🎉 🎉 After several months of building, PRQL is ready to use! 🎉 🎉
How we got here:
At the end of January, we published a proposal of a better language for data transformation: PRQL. The reception was better than I could have hoped for — we were no. 2 on HackerNews for a day, and gained 2.5K GitHub stars over the next few days.
But man cannot live on GitHub Stars alone — we had to do the work to build it. So over the next several months, during many evenings & weekends, a growing group of us gradually built the compiler, evolved the language, and wrote some integrations.
We want to double-down on the community and its roots in open source — it’s incredible that a few of us from all over the globe have collaborated on a project without ever having met. We decided early-on that PRQL would always be open-source and would never have a commercial product (despite lots of outside interest to fund a seed round!). Because languages are so deep in the stack, and the data stack has so many players, the best chance of building a great language is to build an open language.
We still have a long way to go. While PRQL is usable, it has lots of missing features, and an incredible amount of unfulfilled potential, including a language server, cohesion with databases, and type inference. Over the coming weeks, we’d like to grow the number of intrepid users experimenting PRQL in their projects, prioritize features that will unblock them, and then start fulfilling PRQL’s potential by working through our roadmap.
The best way to experience PRQL is to try it. Check out our website and the Playground. Start using PRQL for your own projects in dbt, Jupyter notebooks and Prefect workflows.
Keep in touch with PRQL by following the project on Twitter, joining us on Discord, starring the repo.
Contribute to the project — we’re a really friendly community, whether you’re a recent SQL user or an advanced Rust programmer. We need bug reports, documentation tweaks & feature requests — just as much as we need compiler improvements written in Rust.
I especially want to give Aljaž Mur Eržen (@aljazerzen) the credit he deserves, who has contributed the majority of the difficult work of building out the compiler. Much credit also goes to Charlie Sanders (@charlie-sanders), one of PRQL’s earliest supporters and the author of PyPrql, and Ryan Patterson-Cross (@rbpatt2019), who built the Jupyter integration among other Python contributions.
Other contributors who deserve a special mention include: @roG0d, @snth, @kwigley
Thank you, and we look forward to your feedback!