PRQL Language Book
Pipelined Relational Query Language, pronounced “Prequel”.
PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement. Like SQL, it’s readable, explicit and declarative. Unlike SQL, it forms a logical pipeline of transformations, and supports abstractions such as variables and functions. It can be used with any database that uses SQL, since it compiles to SQL.
This book serves as a tutorial and reference guide on the language and the broader project. It currently has three sections, navigated by links on the left:
- Tutorial — A friendly & accessible guide for learning PRQL. It has a gradual increase of difficulty and requires only basic understanding of programming languages. Knowledge of SQL is beneficial, because of many comparisons to SQL, but not required.
- Reference — In-depth information about the PRQL language. Includes justifications for language design decisions and formal specifications for parts of the language.
- Project — General information about the project, tooling and development.
Examples of PRQL with a comparison to the generated SQL. PRQL queries can be as simple as:
PRQL
from tracks
filter artist == "Bob Marley" # Each line transforms the previous result
aggregate { # `aggregate` reduces each column to a value
plays = sum plays,
longest = max length,
shortest = min length, # Trailing commas are allowed
}
SQL
SELECT
COALESCE(SUM(plays), 0) AS plays,
MAX(length) AS longest,
MIN(length) AS shortest
FROM
tracks
WHERE
artist = 'Bob Marley'
…and here’s a larger example:
PRQL
from employees
filter start_date > @2021-01-01 # Clear date syntax
derive { # `derive` adds columns / variables
gross_salary = salary + (tax ?? 0), # Terse coalesce
gross_cost = gross_salary + benefits, # Variables can use other variables
}
filter gross_cost > 0
group {title, country} ( # `group` runs a pipeline over each group
aggregate { # `aggregate` reduces each group to a value
average gross_salary,
sum_gross_cost = sum gross_cost, # `=` sets a column name
}
)
filter sum_gross_cost > 100_000 # `filter` replaces both of SQL's `WHERE` & `HAVING`
derive id = f"{title}_{country}" # F-strings like Python
derive country_code = s"LEFT(country, 2)" # S-strings permit SQL as an escape hatch
sort {sum_gross_cost, -country} # `-country` means descending order
take 1..20 # Range expressions (also valid as `take 20`)
SQL
WITH table_1 AS (
SELECT
title,
country,
salary + COALESCE(tax, 0) + benefits AS _expr_1,
salary + COALESCE(tax, 0) AS _expr_2
FROM
employees
WHERE
start_date > DATE '2021-01-01'
),
table_0 AS (
SELECT
title,
country,
AVG(_expr_2) AS _expr_0,
COALESCE(SUM(_expr_1), 0) AS sum_gross_cost
FROM
table_1
WHERE
_expr_1 > 0
GROUP BY
title,
country
)
SELECT
title,
country,
_expr_0,
sum_gross_cost,
CONCAT(title, '_', country) AS id,
LEFT(country, 2) AS country_code
FROM
table_0
WHERE
sum_gross_cost > 100000
ORDER BY
sum_gross_cost,
country DESC
LIMIT
20
Relations
PRQL is designed on top of relational algebra, which is the established data
model used by modern SQL databases. A relation has a rigid mathematical
definition, which can be simplified to “a table of data”. For example, the
invoices
table from the Chinook database
(https://github.com/lerocha/chinook-database)
looks like this:
invoice_id | customer_id | billing_city | other columns | total |
---|---|---|---|---|
1 | 2 | Stuttgart | … | 1.98 |
2 | 4 | Oslo | … | 3.96 |
3 | 8 | Brussels | … | 5.94 |
4 | 14 | Edmonton | … | 8.91 |
5 | 23 | Boston | … | 13.86 |
6 | 37 | Frankfurt | … | 0.99 |
A relation is composed of rows. Each row in a relation contains a value for each
of the relation’s columns. Each column in a relation has an unique name and a
designated data type. The table above is a relation, and has columns named
invoice_id
and customer_id
each with a data type of “integer number”, a
billing_city
column with a data type of “text”, several other columns, and a
total
column that contains floating-point numbers.
Queries
The main purpose of PRQL is to build queries that combine and transform data
from relations such as the invoices
table above. Here is the most basic query:
from invoices
Try each of these examples here in the Playground. Enter the query on the left-hand side, and click output.arrow in the right-hand side to see the result.
The result of the query above is not terribly interesting, it’s just the same relation as before.
select
transform
The select
function picks the columns to pass through based on a list and
discards all others. Formally, that list is a tuple of comma-separated
expressions wrapped in { ... }
.
Suppose we only need the order_id
and total
columns. Use select
to choose
the columns to pass through. (Try it in the
Playground.)
from invoices
select { order_id, total }
We can write the items in the tuple on one or several lines: trailing commas are ignored. In addition, we can assign any of the expressions to a variable that becomes the name of the resulting column in the SQL output.
from invoices
select {
OrderID = invoice_id,
Total = total,
}
This is the same query as above, rewritten on multiple lines, and assigning
OrderID
and Total
names to the columns.
Once we select
certain columns, subsequent transforms will have access only to
those columns named in the tuple.
derive
transform
To add columns to a relation, we can use derive
function. Let’s define a new
column for Value Added Tax, set at 19% of the invoice total.
from invoices
derive { VAT = total * 0.19 }
The value of the new column can be a constant (such as a number or a string), or
can be computed from the value of an existing column. Note that the value of the
new column is assigned the name VAT
.
join
transform
The join
transform also adds columns to the relation by combining the rows
from two relations “side by side”. To determine which rows from each relation
should be joined, join
has match criteria, written in ( ... )
.
from invoices
join customers ( ==customer_id )
This example “connects” the customer information from the customers
relation
with the information from the invoices
relation, using identical values of the
customer_id
column from each relation to match the rows.
It is frequently useful to assign an alias to both relations being joined together so that each relation’s columns can be referred to uniquely.
from inv=invoices
join cust=customers ( ==customer_id )
In the example above, the alias inv
represents the invoices
relation and
cust
represents the customers
relation. It then becomes possible to refer to
inv.billing_city
and cust.last_name
unambiguously.
Summary
PRQL manipulates relations (tables) of data. The derive
, select
, and join
transforms change the number of columns in a table. The first two never affect
the number of rows in a table. join
may change the number of rows, depending
on the variation chosen.
This final example combines the above into a single query. It illustrates a pipeline - the fundamental basis of PRQL. We simply add new lines (transforms) at the end of the query. Each transform modifies the relation produced by the statement above to produce the desired result.
from inv=invoices
join cust=customers (==customer_id)
derive { VAT = inv.total * 0.19 }
select {
OrderID = inv.invoice_id,
CustomerName = cust.last_name,
Total = inv.total,
VAT,
}
Filtering rows
In the previous page we learned how select
, derive
, and join
change the
columns of a table.
Now we will explore how to manipulate the rows of a table using filter
and
take
.
filter
transform
The filter
transform picks rows to pass through based on their values:
from invoices
filter billing_city == "Berlin"
The resulting table contains all the rows that came from Berlin.
PRQL converts the single filter
transform to use the appropriate SQL WHERE
or HAVING
command, depending on where it appears in the pipeline.
take
transform
The take
transform picks rows to pass through based on their position within
the table. The set of rows picked can be specified in two ways:
- a plain number
x
, which will pick the firstx
rows, or - an inclusive range of rows
start..end
.
from invoices
take 4
from invoices
take 4..7
Of course, it is possible combine all these transforms into a single pipeline:
from invoices
# retain only rows for orders from Berlin
filter billing_city == "Berlin"
# skip first 10 rows and take the next 10
take 11..20
# take only first 3 rows of *that* result
take 3
We did something a bit odd at the end: first we took rows 11..20
and then took
the first 3 rows from that result.
Note that a single
transform take 11..13
would have produced the same SQL. The example
serves an example of how PRQL allows fast data exploration by
“stacking” transforms in the pipeline, reducing the cognitive burden of how
a new transform with the previous query.
Aggregation
A key feature of analytics is reducing many values down to some summary. This
act is called “aggregation” and always includes a function — for example,
average
or sum
— that reduces values in the table to a single row.
aggregate
transform
The aggregate
transform takes a tuple to create one or more new columns that
“distill down” data from all the rows.
from invoices
aggregate { sum_of_orders = sum total }
The query above computes the sum of the total
column of all rows of the
invoices
table to produce a single value.
aggregate
can produce multiple summaries at once when one or more aggregation
expressions are contained in a tuple. aggregate
discards all columns that are
not present in the tuple.
from invoices
aggregate {
num_orders = count this,
sum_of_orders = sum total,
}
In the example above, the result is a single row with two columns. The count
function displays the number of rows in the table that was passed in; the sum
function adds up the values of the total
column of all rows.
Grouping
Suppose we want to produce summaries of invoices for each city in the table. We could create a query for each city, and aggregate its rows:
from albums
filter billing_city == "Oslo"
aggregate { sum_of_orders = sum total }
But we would need to do it for each city: London
, Frankfurt
, etc. Of course
this is repetitive (and boring) and error prone (because we would need to type
each billing_city
by hand). Moreover, we would need to create a list of each
billing_city
before we started.
group
transform
The group
transform separates the table into groups (say, those having the
same city) using information that’s already in the table. It then applies a
transform to each group, and combines the results back together:
from invoices
group billing_city (
aggregate {
num_orders = count this,
sum_of_orders = sum total,
}
)
Those familiar with SQL have probably noticed that we just decoupled aggregation from grouping.
Although these operations are connected in SQL, PRQL makes it straightforward to
use group
and aggregate
separate from each other, while combining with other
transform functions, such as:
from invoices
group billing_city (
take 2
)
This code collects the first two rows for each city’s group
.
Syntax
A summary of PRQL syntax:
Syntax | Usage | Example |
---|---|---|
| | Pipelines | from employees | select first_name |
= | Assigns | from e = employees derive total = (sum salary) |
: | Named args & parameters | interp low:0 1600 sat_score |
{} | Tuples | {id, false, total = 3} |
[] | Arrays | [1, 4, 3, 4] |
+ ,! ,&& ,== , etc | Operators | filter a == b + c || d >= e |
() | Parentheses | derive celsius = (fht - 32) / 1.8 |
\ | Line wrap | 1 + 2 + 3 + \ 4 + 5 |
1 ,100_000 ,5e10 | Numbers | derive { huge = 5e10 * 10_000 } |
'' ,"" | Strings | derive name = 'Mary' |
true ,false | Booleans | derive { Col1 = true } |
null | Null | filter ( name != null ) |
@ | Dates & times | @2021-01-01 |
` ` | Quoted identifiers | select `first name` |
# | Comments | # A comment |
== | Self-equality in join | join s=salaries (==id) |
-> | Function definitions | let add = a b -> a + b |
=> | Case statement | case [a==1 => c, a==2 => d] |
+ ,- | Sort order | sort {-amount, +date} |
?? | Coalesce | amount ?? 0 |
Literals
A literal is a constant value expression, with special syntax rules for each data type.
Numbers
Number literals can contain number characters as well as a period, underscores
and char e
.
If a number literal contains a dot or character e
, it is treated as floating
point number (or just float), otherwise it is treated as integer number.
Character e
denotes
“scientific notation”,
where the number after e
is the exponent in 10-base.
Underscores are ignored, so they can be placed at arbitrary positions, but it is advised to use them as thousand separators.
Integers can, alternatively, be expressed using hexadecimal, octal or binary
notation using these prefixes respectively: 0x
, 0o
or 0b
.
PRQL
from numbers
select {
small = 1.000_000_1,
big = 5_000_000,
huge = 5e9,
binary = 0x0011,
hex = 0x80,
octal = 0o777,
}
SQL
SELECT
1.0000001 AS small,
5000000 AS big,
5000000000.0 AS huge,
17 AS binary,
128 AS hex,
511 AS octal
FROM
numbers
Strings
PRQL supports string literals and several other formats of strings. See the Strings page for more information.
Booleans
Boolean values can be expressed with true
or false
keyword.
Null
The null value can be expressed with null
keyword. See also the discussion of
how PRQL handles nulls.
Date and time
Date and time literals are expressed with character @
, followed by a string
that encodes the date & time.
PRQL’s notation is designed to be less verbose than SQL’s
TIMESTAMP '2004-10-19 10:23:54'
and more explicit than SQL’s implicit option
that just uses a string '2004-10-19 10:23:54'
.
Dates
Dates are represented by @{yyyy-mm-dd}
— a @
followed by the date format.
PRQL
from employees
derive age_at_year_end = (@2022-12-31 - dob)
SQL
SELECT
*,
DATE '2022-12-31' - dob AS age_at_year_end
FROM
employees
Times
Times are represented by @{HH:mm:ss.SSS±Z}
with any parts not supplied
defaulting to zero. This includes the timezone, which is represented by
+HH:mm
, -HH:mm
or Z
. This is consistent with the ISO8601 time format.
PRQL
from orders
derive should_have_shipped_today = (order_time < @08:30)
SQL
SELECT
*,
order_time < TIME '08:30' AS should_have_shipped_today
FROM
orders
Timestamps
Timestamps are represented by @{yyyy-mm-ddTHH:mm:ss.SSS±Z}
/ @{date}T{time}
,
with any time parts not supplied being rounded to zero, including the timezone,
which is represented by +HH:mm
, -HH:mm
or Z
(:
is optional). This is @
followed by the ISO8601 datetime format, which uses T
to separate date & time.
PRQL
from commits
derive first_prql_commit = @2020-01-01T13:19:55-08:00
derive first_prql_commit_utc = @2020-01-02T21:19:55Z
SQL
SELECT
*,
TIMESTAMP '2020-01-01T13:19:55-0800' AS first_prql_commit,
TIMESTAMP '2020-01-02T21:19:55Z' AS first_prql_commit_utc
FROM
commits
Durations
Durations are represented by {N}{periods}
, such as 2years
or 10minutes
,
without a space.
These aren’t the same as ISO8601, because we evaluated P3Y6M4DT12H30M5S
to
be difficult to understand, but we could support a simplified form if there’s
demand for it. We don’t currently support compound expressions, for example
2years10months
, but most DBs will allow 2years + 10months
. Please raise an
issue if this is inconvenient.
PRQL
from projects
derive first_check_in = start + 10days
SQL
SELECT
*,
"start" + INTERVAL 10 DAY AS first_check_in
FROM
projects
Examples
Here’s a larger list of date and time examples:
@20221231
is invalid — it must contain full punctuation (-
and:
),@2022-12-31
is a date@2022-12
or@2022
are invalid — SQL can’t express a month, only a date@16:54:32.123456
is a time@16:54:32
,@16:54
,@16
are all allowed, expressing@16:54:32.000000
,@16:54:00.000000
,@16:00:00.000000
respectively@2022-12-31T16:54:32.123456
is a timestamp without timezone@2022-12-31T16:54:32.123456Z
is a timestamp in UTC@2022-12-31T16:54+02
is timestamp in UTC+2@2022-12-31T16:54+02:00
and@2022-12-31T16:54+02
are datetimes in UTC+2@16:54+02
is invalid — time is always local, so it cannot have a timezone@2022-12-31+02
is invalid — date is always local, so it cannot have a timezone
Currently prqlc does not parse or validate any of the datetime strings and will pass them to the database engine without adjustment. This might be refined in the future to aid in compatibility across databases. We’ll always support the canonical ISO8601 format described above.
Roadmap
Datetimes (as a distinct datatype from the timestamps) are supported by some databases (e.g. MySql, BigQuery). With the addition of type casts, these could be represented by a timestamp cast to a datetime:
derive pi_day = @2017-03-14T15:09:26.535898<datetime>
These are some examples we can then add:
@2022-12-31T16:54<datetime>
is datetime without timezone@2022-12-31<datetime>
is forbidden — datetime must specify time@16:54<datetime>
is forbidden — datetime must specify date
Strings
String literals can use any matching odd number of either single or double quotes:
PRQL
from artists
derive {
single = 'hello world',
double = "hello world",
double_triple = """hello world""",
}
SQL
SELECT
*,
'hello world' AS single,
'hello world' AS double,
'hello world' AS double_triple
FROM
artists
Quoting and escape characters
To quote a string containing quote characters, use the “other” type of quote, or
use the escape character \
, or use more quotes.
PRQL
from artists
select {
other = '"hello world"',
escaped = "\"hello world\"",
triple = """I said "hello world"!""",
}
SQL
SELECT
'"hello world"' AS other,
'"hello world"' AS escaped,
'I said "hello world"!' AS triple
FROM
artists
Strings can contain any escape character sequences defined by the JSON standard.
PRQL
from artists
derive escapes = "\tXYZ\n \\ " # tab (\t), "XYZ", newline (\n), " ", \, " "
derive world = "\u{0048}\u{0065}\u{006C}\u{006C}\u{006F}" # "Hello"
derive hex = "\x48\x65\x6C\x6C\x6F" # "Hello"
derive turtle = "\u{01F422}" # "🐢"
SQL
SELECT
*,
' XYZ
\ ' AS escapes,
'Hello' AS world,
'Hello' AS hex,
'🐢' AS turtle
FROM
artists
Other string formats
- F-strings - Build up a new string from a set of columns or values.
- R-strings - Include the raw characters of the string without any form of escaping.
- S-strings - Insert SQL statements directly into the query. Use when PRQL doesn’t have an equivalent facility.
Currently PRQL allows multiline strings with either a single character or multiple character quotes. This may change for strings using a single character quote in future versions.
These escape rules specify how PRQL interprets escape characters when compiling strings to SQL, not necessarily how the database will interpret the string. Dialects interpret escape characters differently, and PRQL doesn’t currently account for these differences. Please open issues with any difficulties in the current implementation.
Escape sequences
Unless an r
prefix is present, escape sequences in string literals are
interpreted according to rules similar to those used by Standard C. The
recognized escape sequences are:
Escape Sequence | Meaning |
---|---|
\\ | Backslash () |
\' | Single quote (’) |
\" | Double quote (“) |
\b | Backspace |
\f | Formfeed |
\n | ASCII Linefeed (LF) |
\r | ASCII Carriage Return (CR) |
\t | ASCII Horizontal Tab (TAB) |
\xhh | Character with hex value hh |
\u{xxxx} | Character with hex value xxxx |
F-strings
F-strings are a readable approach to building new strings from existing strings & variables.
PRQL
from employees
select full_name = f"{first_name} {last_name}"
SQL
SELECT
CONCAT(first_name, ' ', last_name) AS full_name
FROM
employees
This can be much easier to read for longer strings, relative to the SQL approach:
PRQL
from web
select url = f"http{tls}://www.{domain}.{tld}/{page}"
SQL
SELECT
CONCAT(
'http',
tls,
'://www.',
domain,
'.',
tld,
'/',
page
) AS url
FROM
web
Note that currently interpolations can only contain plain variable names and not whole expressions like Python, so this won’t work:
PRQL
from tracks
select length_str = f"{length_seconds / 60} minutes"
Error
Error:
╭─[:2:38]
│
2 │ select length_str = f"{length_seconds / 60} minutes"
│ ┬
│ ╰── interpolated string expected one of ".", ":" or "}", but found " "
───╯
Roadmap
In the future, f-strings may incorporate string formatting such as datetimes, numbers, and padding. If there’s a feature that would be helpful, please post an issue.
R-strings
R-strings handle escape characters without special treatment:
PRQL
from artists
derive normal_string = "\\\t" # two characters - \ and tab (\t)
derive raw_string = r"\\\t" # four characters - \, \, \, and t
SQL
SELECT
*,
'\ ' AS normal_string,
'\\\t' AS raw_string
FROM
artists
S-strings
An s-string inserts SQL directly, as an escape hatch when there’s something that
PRQL doesn’t yet implement. For example, there’s a version()
function in
PostgreSQL that returns the PostgreSQL version, so if we want to use that, we
use an s-string:
PRQL
from my_table
select db_version = s"version()"
SQL
SELECT
version() AS db_version
FROM
my_table
Embed a column name in an s-string using braces. For example, PRQL’s standard
library defines the average
function as:
let average = column -> s"AVG({column})"
So this compiles using the function:
Because S-string contents are SQL, double-quotes ("
) will denote a column name.
To avoid that, use single-quotes ('
) around the SQL string, and
adjust the quotes of the S-string. For example, instead of s'CONCAT("hello", "world")'
use s"CONCAT('hello', 'world')"
Here’s an example of a more involved use of an s-string:
PRQL
from de=dept_emp
join s=salaries side:left (s.emp_no == de.emp_no && s"""
({s.from_date}, {s.to_date})
OVERLAPS
({de.from_date}, {de.to_date})
""")
SQL
SELECT
de.*,
s.*
FROM
dept_emp AS de
LEFT JOIN salaries AS s ON s.emp_no = de.emp_no
AND (s.from_date, s.to_date) OVERLAPS (de.from_date, de.to_date)
For those who have used Python, s-strings are similar to Python’s f-strings, but
the result is SQL code, rather than a string literal. For example, a Python
f-string of f"average({col})"
would produce "average(salary)"
, with quotes;
while in PRQL, s"average({col})"
produces average(salary)
, without quotes.
Note that interpolations can only contain plain variable names and not whole expression like Python.
We can also use s-strings to produce a full table:
PRQL
from s"SELECT DISTINCT ON first_name, id, age FROM employees ORDER BY age ASC"
join s = s"SELECT * FROM salaries" (==id)
SQL
WITH table_0 AS (
SELECT
DISTINCT ON first_name,
id,
age
FROM
employees
ORDER BY
age ASC
),
table_1 AS (
SELECT
*
FROM
salaries
)
SELECT
table_0.*,
table_1.*
FROM
table_0
JOIN table_1 ON table_0.id = table_1.id
S-strings in user code are intended as an escape hatch for an unimplemented feature. If we often need s-strings to express something, that’s a sign we should implement it in PRQL or PRQL’s stdlib. If you often require an s-string, submit an issue with your use case.
Braces
To output braces from an s-string, use double braces:
PRQL
from employees
derive {
has_valid_title = s"regexp_contains(title, '([a-z0-9]*-){{2,}}')"
}
SQL
SELECT
*,
regexp_contains(title, '([a-z0-9]*-){2,}') AS has_valid_title
FROM
employees
Precedence within s-strings
Variables in s-strings are inserted into the SQL source as-is, which means we may get surprising behavior when the variable has multiple terms and the s-string isn’t parenthesized.
In this toy example, the expression salary + benefits / 365
gets precedence
wrong. The generated SQL code is as if we had written
salary + (benefits / 365)
.
PRQL
from employees
derive {
gross_salary = salary + benefits,
daily_rate = s"{gross_salary} / 365"
}
SQL
SELECT
*,
salary + benefits AS gross_salary,
salary + benefits / 365 AS daily_rate
FROM
employees
Instead, the numerator {gross_salary}
must be encased in parentheses:
PRQL
from employees
derive {
gross_salary = salary + benefits,
daily_rate = s"({gross_salary}) / 365"
}
SQL
SELECT
*,
salary + benefits AS gross_salary,
(salary + benefits) / 365 AS daily_rate
FROM
employees
Tuples
Tuple is a container type, composed of multiple fields. Each field can have a different type. Number of fields and their types must be known at compile time.
Tuple is represented by {}
. It can span multiple lines. Fields can be assigned
a name. Fields are separated by commas, trailing trailing comma is optional.
let var1 = {x = 1, y = 2}
let var2 = { # Span multiple lines
a = x,
b = y # Optional trailing comma
}
let var3 = {
c, # Individual item
d = b, # Assignment
}
Tuples are the type of a table row, which means that they are expected by many transforms. Most transforms can also take a single field, which will be converted into a tuple. These are equivalent:
Prior to 0.9.0
, tuples were previously named Lists, and represented with
[]
syntax. There may still be references to the old naming.
Arrays
Array is a container type, composed of multiple items. All items must be of the same type. Number of fields can be vary.
Identifiers & keywords
Identifiers can contain alphanumeric characters and _
and must not start with
a number. They can be chained together with the .
lookup operator, used to
retrieve a tuple from a field or a variable from a module.
hello
_h3llo
hello.world
this
& that
this
refers to the current relation:
Within a join
, that
refers to the other
table:
PRQL
from invoices
join tracks (this.track_id==that.id)
SQL
SELECT
invoices.*,
tracks.*
FROM
invoices
JOIN tracks ON invoices.track_id = tracks.id
this
can also be used to remove any column ambiguity. For example, currently
using a bare time
as a column name will fail, because it’s also a type:
PRQL
from invoices
derive t = time
Error
Error:
╭─[:2:12]
│
2 │ derive t = time
│ ──┬─
│ ╰─── expected a value, but found a type
───╯
But with this.time
, we can remove the ambiguity:
Quoting
To use characters that would be otherwise invalid, identifiers can be surrounded by with backticks.
When compiling to SQL, these identifiers will use dialect-specific quotes and quoting rules.
PRQL
prql target:sql.mysql
from employees
select `first name`
SQL
SELECT
`first name`
FROM
employees
PRQL
prql target:sql.postgres
from employees
select `first name`
SQL
SELECT
"first name"
FROM
employees
PRQL
prql target:sql.bigquery
from `project-foo.dataset.table`
join `project-bar.dataset.table` (==col_bax)
SQL
SELECT
`project-foo.dataset.table`.*,
`project-bar.dataset.table`.*
FROM
`project-foo.dataset.table`
JOIN `project-bar.dataset.table` ON `project-foo.dataset.table`.col_bax = `project-bar.dataset.table`.col_bax
Schemas & database names
Identifiers of database tables can be prefixed with schema and databases names.
Note that all of following identifiers will be treated as separate table
definitions: tracks
, public.tracks
, my_database.public.tracks
.
Keywords
PRQL uses following keywords:
prql
- query header more…let
- variable definition more…into
- variable definition more…case
- flow control more…type
- type declarationfunc
- explicit function declaration more…module
- used internallyinternal
- used internallytrue
- boolean more…false
- boolean more…null
- NULL more…
Keywords can be used as identifiers (of columns or variables) when encased in
backticks: `case`
.
Transforms are normal functions within the std
namespace, not keywords. That
is, std.from
is the same function as from
. In the example below, the
resulting query is the same as without the std.
namespace:
PRQL
std.from my_table
std.select {from = my_table.a, take = my_table.b}
std.take 3
SQL
SELECT
a AS "from",
b AS take
FROM
my_table
LIMIT
3
Function calls
Simple
A distinction between PRQL and most other programming languages is the function call syntax. It consists of the function name followed by arguments separated by whitespace.
function_name arg1 arg2 arg3
If one of the arguments is also a function call, it must be encased in parentheses, so we know where arguments of inner function end and the arguments of outer function start.
outer_func arg_1 (inner_func arg_a, arg_b) arg_2
The function name must refer to a function variable, which has either been declared in the standard library or some other module.
Function calls can also specify named parameters using :
notation:
function_name arg1 named_param:arg2 arg3
Pipeline
There is a alternative way of calling functions: using a pipeline. Regardless of
whether the pipeline is delimited by pipe symbol |
or a new line, the pipeline
is equivalent to applying each of functions as the last argument of the next
function.
a | foo 3 | bar 'hello' 'world' | baz
… is equivalent to …
baz (bar 'hello' 'world' (foo 3 a))
Pipes
Pipes are the connection between transforms that make
up a pipeline. The relation produced by a transform before the pipe is used as
the input for the transform following the pipe. A pipe can be represented with
either a line break or a pipe character (|
).
For example, here the filter
transform operates on the result of
from employees
(which is just the employees
table), and the select
transform operates on the result of the filter
transform.
PRQL
from employees
filter department == "Product"
select {first_name, last_name}
SQL
SELECT
first_name,
last_name
FROM
employees
WHERE
department = 'Product'
In the place of a line break, it’s also possible to use the |
character to
pipe results between transforms, such that this is equivalent:
PRQL
from employees | filter department == "Product" | select {first_name, last_name}
SQL
SELECT
first_name,
last_name
FROM
employees
WHERE
department = 'Product'
“C’est ne pas un pipe”
In almost all situations, a line break acts as a pipe. But there are a few cases where a line break doesn’t act as a pipe.
- before or after tuple items
- before or after list items
- before a new statement, which starts with
let
orfrom
(orfunc
) - within a line wrap
For example:
PRQL
[
{a=2} # No pipe from line break before & after this list item
]
derive {
c = 2 * a, # No pipe from line break before & after this tuple item
}
SQL
WITH table_0 AS (
SELECT
2 AS a
)
SELECT
a,
2 * a AS c
FROM
table_0
PRQL
let b =
\ 3 # No pipe from line break within this line wrap
# No pipe from line break before this `from` statement
from y
derive a = b
SQL
SELECT
*,
3 AS a
FROM
y
Inner Transforms
Parentheses are also used for transforms (such as group
and window
) that
pass their result to an “inner transform”. The example below applies the
aggregate
pipeline to each group of unique title
and country
values:
PRQL
from employees
group {title, country} (
aggregate {
average salary,
ct = count salary,
}
)
SQL
SELECT
title,
country,
AVG(salary),
COUNT(*) AS ct
FROM
employees
GROUP BY
title,
country
Operators
Expressions can be composed from function calls and operations, such as
2 + 3
or ((1 + x) * -y)
. In the example below, note the use of expressions
to calculate the alias circumference
and in the filter
transform.
PRQL
from foo
select {
circumference = diameter * 3.14159,
area = (diameter / 2) ** 2,
color,
}
filter circumference > 10 && color != "red"
SQL
WITH table_0 AS (
SELECT
diameter * 3.14159 AS circumference,
POW(diameter / 2, 2) AS area,
color
FROM
foo
)
SELECT
circumference,
area,
color
FROM
table_0
WHERE
circumference > 10
AND color <> 'red'
Operator precedence
This table shows operator precedence. Use parentheses ()
to prioritize
operations and for function calls (see the discussion below.)
Group | Operators | Precedence | Associativity |
---|---|---|---|
parentheses | () | 0 | see below |
identifier dot | . | 1 | |
unary | - + ! == | 2 | |
range | .. | 3 | |
pow | ** | 4 | right-to-left |
mul | * / // % | 5 | left-to-right |
add | + - | 6 | left-to-right |
compare | == != <= >= < > | 7 | left-to-right |
coalesce | ?? | 8 | left-to-right |
and | && | 9 | left-to-right |
or | || | 10 | left-to-right |
function call | 11 |
Division and integer division
The /
operator performs division that always returns a float value, while the
//
operator does integer division (truncated division) that always returns an
integer value.
PRQL
prql target:sql.sqlite
from [
{a = 5, b = 2},
{a = 5, b = -2},
]
select {
div_out = a / b,
int_div_out = a // b,
}
SQL
WITH table_0 AS (
SELECT
5 AS a,
2 AS b
UNION
ALL
SELECT
5 AS a,
-2 AS b
)
SELECT
(a * 1.0 / b) AS div_out,
ROUND(ABS(a / b) - 0.5) * SIGN(a) * SIGN(b) AS int_div_out
FROM
table_0
Coalesce
We can coalesce values with an ??
operator. Coalescing takes either the first
value or, if that value is null, the second value.
Regex expressions
To perform a case-sensitive regex search, use the ~=
operator. This generally
compiles to REGEXP
, though differs by dialect. A regex search means that to
match an exact value, the start and end need to be anchored with ^foo$
.
PRQL
prql target:sql.duckdb
from artists
filter (name ~= "Love.*You")
SQL
SELECT
*
FROM
artists
WHERE
REGEXP_MATCHES(name, 'Love.*You')
PRQL
prql target:sql.bigquery
from tracks
filter (name ~= "\\bLove\\b")
SQL
SELECT
*
FROM
tracks
WHERE
REGEXP_CONTAINS(name, '\bLove\b')
PRQL
prql target:sql.postgres
from tracks
filter (name ~= "\\(I Can't Help\\) Falling")
SQL
SELECT
*
FROM
tracks
WHERE
name ~ '\(I Can''t Help\) Falling'
PRQL
prql target:sql.mysql
from tracks
filter (name ~= "With You")
SQL
SELECT
*
FROM
tracks
WHERE
REGEXP_LIKE(name, 'With You', 'c')
PRQL
prql target:sql.sqlite
from tracks
filter (name ~= "But Why Isn't Your Syntax More Similar\\?")
SQL
SELECT
*
FROM
tracks
WHERE
name REGEXP 'But Why Isn''t Your Syntax More Similar\?'
Parentheses
PRQL uses parentheses ()
for several purposes:
-
Parentheses group operands to control the order of evaluation, for example:
((1 + x) * y)
-
Parentheses delimit a minus sign of a function argument, for example:
add (-1) (-3)
-
Parentheses delimit nested function calls that contain a pipe, either the
|
symbol or a new line. “Nested” means within a transform; i.e. not just the main pipeline, for example:(column-name | in 0..20)
-
Parentheses wrap a function call that is part of a larger expression, for example:
math.round 0 (sum distance)
Parentheses are not required for expressions that do not contain function
calls, for example: foo + bar
.
Here’s a set of examples of these rules:
PRQL
from employees
# Requires parentheses, because it contains a pipe
derive is_proximate = (distance | in 0..20)
# Requires parentheses, because it's a function call
derive total_distance = (sum distance)
# `??` doesn't require parentheses, as it's not a function call
derive min_capped_distance = (min distance ?? 5)
# No parentheses needed, because no function call
derive travel_time = distance / 40
# No inner parentheses needed around `1+1` because no function call
derive distance_rounded_2_dp = (math.round 1+1 distance)
derive {
# Requires parentheses, because it contains a pipe
is_far = (distance | in 100..),
# The left value of the range requires parentheses,
# because of the minus sign
is_negative = (distance | in (-100..0)),
# ...this is equivalent
is_negative = (distance | in (-100)..0),
# _Technically_, this doesn't require parentheses, because it's
# the RHS of an assignment in a tuple
# (this is especially confusing)
average_distance = average distance,
}
# Requires parentheses because of the minus sign
sort (-distance)
# A tuple is fine too
sort {-distance}
SQL
SELECT
*,
distance BETWEEN 0 AND 20 AS is_proximate,
SUM(distance) OVER () AS total_distance,
MIN(COALESCE(distance, 5)) OVER () AS min_capped_distance,
distance / 40 AS travel_time,
ROUND(distance, 1 + 1) AS distance_rounded_2_dp,
distance >= 100 AS is_far,
distance BETWEEN -100 AND 0,
distance BETWEEN -100 AND 0 AS is_negative,
AVG(distance) OVER () AS average_distance
FROM
employees
ORDER BY
distance DESC
For example, the snippet below produces an error because the sum
function call
is not in a tuple.
PRQL
from employees
derive total_distance = sum distance
Error
Error:
╭─[:2:29]
│
2 │ derive total_distance = sum distance
│ ────┬───
│ ╰───── Unknown name `distance`
───╯
…while with parentheses, it works at expected:
PRQL
from employees
derive other_distance = (sum distance)
SQL
SELECT
*,
SUM(distance) OVER () AS other_distance
FROM
employees
We’re continuing to think whether these rules can be more intuitive. We’re also planning to make the error messages much better, so the compiler can help out.
Wrapping lines
Line breaks in PRQL have semantic meaning, so to wrap a single logical line into
multiple physical lines, we can use \
at the beginning of subsequent physical
lines:
PRQL
from artists
select is_europe =
\ country == "DE"
\ || country == "FR"
\ || country == "ES"
SQL
SELECT
country = 'DE'
OR country = 'FR'
OR country = 'ES' AS is_europe
FROM
artists
Wrapping will “jump over” empty lines or lines with comments. For example, the
select
here is only one logical line:
PRQL
from tracks
# This would be a really long line without being able to split it:
select listening_time_years = (spotify_plays + apple_music_plays + pandora_plays)
# We can toggle between lines when developing:
# \ * length_seconds
\ * length_s
# min hour day year
\ / 60 / 60 / 24 / 365
SQL
SELECT
(
spotify_plays + apple_music_plays + pandora_plays
) * length_s / 60 / 60 / 24 / 365 AS listening_time_years
FROM
tracks
Note that PRQL differs from most languages, which use a \
at the end of the
preceding line. Because PRQL aims to be friendly for data exploration, we want
to make it possible to comment out any line, including the final line, without
breaking the query. This requires all lines after the first to be structured similarly,
and for the character to be at the start of each following line.
See Pipes for more details on line breaks.
Case
Search for the first condition that evaluates to true
and return its
associated value. If none of the conditions match, null
is returned.
PRQL
from employees
derive distance = case [
city == "Calgary" => 0,
city == "Edmonton" => 300,
]
SQL
SELECT
*,
CASE
WHEN city = 'Calgary' THEN 0
WHEN city = 'Edmonton' THEN 300
ELSE NULL
END AS distance
FROM
employees
To set a default, a true
condition can be used:
PRQL
from employees
derive distance = case [
city == "Calgary" => 0,
city == "Edmonton" => 300,
true => "Unknown",
]
SQL
SELECT
*,
CASE
WHEN city = 'Calgary' THEN 0
WHEN city = 'Edmonton' THEN 300
ELSE 'Unknown'
END AS distance
FROM
employees
Ranges
Range start..end
represents as set of values between start
and end
,
inclusive (greater of equal to start
and less than or equal to end
).
To express a range that is open on one side, either start
or end
can be
omitted.
Ranges can be used in filters with the in
function, with any type of literal,
including dates:
PRQL
from events
filter (created_at | in @1776-07-04..@1787-09-17)
filter (magnitude | in 50..100)
derive is_northern = (latitude | in 0..)
SQL
SELECT
*,
latitude >= 0 AS is_northern
FROM
events
WHERE
created_at BETWEEN DATE '1776-07-04' AND DATE '1787-09-17'
AND magnitude BETWEEN 50 AND 100
Ranges can also be used in take
:
PRQL
from orders
sort {-value, created_at}
take 101..110
SQL
SELECT
*
FROM
orders
ORDER BY
value DESC,
created_at
LIMIT
10 OFFSET 100
See also
Roadmap
We’d like to use ranges for other types, such as whether an object is in an array or list literal.
Comments
Character #
denotes a comment until the end of the line.
PRQL
from employees # Comment 1
# Comment 2
aggregate {average salary}
SQL
SELECT
AVG(salary)
FROM
employees
There’s no distinct multiline comment syntax.
Parameters
Parameter is a placeholder for a value provided after the compilation of the query.
It uses the following syntax: $id
, where id
is an arbitrary alpha numeric
string.
Most database engines only support numeric positional parameter ids (i.e $3
).
Importing data
From
Specifies a data source.
To introduce an alias, use an assign expression:
Table names containing spaces or special characters need to be contained within backticks:
default_db.tablename
can be used if the table name matches a function from the
standard library.
We realize this is an awkward workaround. Track & 👍 #3271 for resolving this.
Reading files
There are a couple of functions mainly designed for DuckDB to read from files:
PRQL
prql target:sql.duckdb
from (read_parquet "artists.parquet")
join (read_csv "albums.csv") (==track_id)
SQL
WITH table_0 AS (
SELECT
*
FROM
read_parquet('artists.parquet')
),
table_1 AS (
SELECT
*
FROM
read_csv_auto('albums.csv')
)
SELECT
table_0.*,
table_1.*
FROM
table_0
JOIN table_1 ON table_0.track_id = table_1.track_id
These don’t currently have all the DuckDB options. If those would be helpful, please log an issue and it’s a fairly easy addition.
When specifying file names directly in the FROM
clause without using
functions, which is allowed in DuckDB, enclose the file names in backticks
``
as follows:
See also
How do I: create ad-hoc relations?
It’s often useful to make a small inline relation, for example when exploring how a database will evaluate an expression, or for a small lookup table. This can be quite verbose in SQL.
PRQL offers two approaches — array literals, and a from_text
transform.
Array literals
Because relations (aka a table) in PRQL are just arrays of tuples, they can be expressed with array and tuple syntax:
PRQL
from [
{a=5, b=false},
{a=6, b=true},
]
filter b == true
select a
SQL
WITH table_0 AS (
SELECT
5 AS a,
false AS b
UNION
ALL
SELECT
6 AS a,
true AS b
)
SELECT
a
FROM
table_0
WHERE
b = true
PRQL
let my_artists = [
{artist="Miles Davis"},
{artist="Marvin Gaye"},
{artist="James Brown"},
]
from artists
join my_artists (==artist)
join albums (==artist_id)
select {artists.artist_id, albums.title}
SQL
WITH table_0 AS (
SELECT
'Miles Davis' AS artist
UNION
ALL
SELECT
'Marvin Gaye' AS artist
UNION
ALL
SELECT
'James Brown' AS artist
),
my_artists AS (
SELECT
artist
FROM
table_0
)
SELECT
artists.artist_id,
albums.title
FROM
artists
JOIN my_artists ON artists.artist = my_artists.artist
JOIN albums ON artists.artist_id = albums.artist_id
from_text
from_text
takes a string in a common format, and converts it to table. It
accepts a few formats:
-
format:csv
parses CSV (default), -
format:json
parses either:-
an array of objects each of which represents a row, or
-
an object with fields
columns
&data
, wherecolumns
take an array of column names anddata
takes an array of arrays.
-
PRQL
from_text """
a,b,c
1,2,3
4,5,6
"""
derive {
d = b + c,
answer = 20 * 2 + 2,
}
SQL
WITH table_0 AS (
SELECT
'1' AS a,
'2' AS b,
'3' AS c
UNION
ALL
SELECT
'4' AS a,
'5' AS b,
'6' AS c
)
SELECT
a,
b,
c,
b + c AS d,
20 * 2 + 2 AS answer
FROM
table_0
PRQL
from_text format:json """
[
{"a": 1, "m": "5"},
{"a": 4, "n": "6"}
]
"""
SQL
WITH table_0 AS (
SELECT
1 AS a,
'5' AS m
UNION
ALL
SELECT
4 AS a,
NULL AS m
)
SELECT
a,
m
FROM
table_0
PRQL
from_text format:json """
{
"columns": ["a", "b", "c"],
"data": [
[1, "x", false],
[4, "y", null]
]
}
"""
SQL
WITH table_0 AS (
SELECT
1 AS a,
'x' AS b,
false AS c
UNION
ALL
SELECT
4 AS a,
'y' AS b,
NULL AS c
)
SELECT
a,
b,
c
FROM
table_0
See also
Variables — let
& into
Variables assign a name — say x
— to an expression, like in most programming
languages. The name can then be used in any expression, acting as a substitute
for the expression x
.
Syntactically, variables can take 3 forms.
-
let
declares the name before the expression.let my_name = x
-
into
declares the name after the expression. This form is useful for quick pipeline splitting and conforms with the “flow from top to bottom” rule of pipelines.x into my_name
-
The final expression of a pipeline defaults to taking the name
main
.from x
… is equivalent to:
let main = x
When compiling to SQL, relational variables are compiled to Common Table Expressions (or sub-queries in some cases).
PRQL
let top_50 = (
from employees
sort salary
take 50
aggregate {total_salary = sum salary}
)
from top_50 # Starts a new pipeline
SQL
WITH table_0 AS (
SELECT
salary
FROM
employees
ORDER BY
salary
LIMIT
50
), top_50 AS (
SELECT
COALESCE(SUM(salary), 0) AS total_salary
FROM
table_0
)
SELECT
total_salary
FROM
top_50
PRQL
from employees
take 50
into first_50
from first_50
SQL
WITH first_50 AS (
SELECT
*
FROM
employees
LIMIT
50
)
SELECT
*
FROM
first_50
Variables can be assigned an s-string containing the whole SQL query s-string, enabling us to use features which PRQL doesn’t yet support.
PRQL
let grouping = s"""
SELECT SUM(a)
FROM tbl
GROUP BY
GROUPING SETS
((b, c, d), (d), (b, d))
"""
from grouping
SQL
WITH table_0 AS (
SELECT
SUM(a)
FROM
tbl
GROUP BY
GROUPING SETS ((b, c, d), (d), (b, d))
)
SELECT
*
FROM
table_0
Functions
Functions have two types of parameters:
- Positional parameters, which require an argument.
- Named parameters, which optionally take an argument, otherwise using their default value.
So this function is named fahrenheit_to_celsius
and has one parameter temp
:
PRQL
let fahrenheit_to_celsius = temp -> (temp - 32) / 1.8
from cities
derive temp_c = (fahrenheit_to_celsius temp_f)
SQL
SELECT
*,
(temp_f - 32) / 1.8 AS temp_c
FROM
cities
This function is named interp
, and has two positional parameters named high
and x
, and one named parameter named low
which takes a default argument of
0
. It calculates the proportion of the distance that x
is between low
and
high
.
PRQL
let interp = low:0 high x -> (x - low) / (high - low)
from students
derive {
sat_proportion_1 = (interp 1600 sat_score),
sat_proportion_2 = (interp low:0 1600 sat_score),
}
SQL
SELECT
*,
(sat_score - 0) / (1600 - 0) AS sat_proportion_1,
(sat_score - 0) / (1600 - 0) AS sat_proportion_2
FROM
students
Other examples
PRQL
let is_adult = col -> col >= 18
let writes_code = col -> (col | in ["PRQL", "Rust"])
let square = col -> (col | math.pow 2)
let starts_with_a = col -> (col | text.lower | text.starts_with("a"))
from employees
select {
first_name,
last_name,
hobby,
adult = is_adult age,
age_squared = square age,
}
filter ((starts_with_a last_name) && (writes_code hobby))
SQL
WITH table_0 AS (
SELECT
first_name,
last_name,
hobby,
age >= 18 AS adult,
POW(age, 2) AS age_squared
FROM
employees
)
SELECT
first_name,
last_name,
hobby,
adult,
age_squared
FROM
table_0
WHERE
LOWER(last_name) LIKE CONCAT('a', '%')
AND hobby IN ('PRQL', 'Rust')
Piping values into functions
Consistent with the principles of PRQL, it’s possible to pipe values into functions, which makes composing many functions more readable. When piping a value into a function, the value is passed as an argument to the final positional parameter of the function. Here’s the same result as the examples above with an alternative construction:
PRQL
let interp = low:0 high x -> (x - low) / (high - low)
from students
derive {
sat_proportion_1 = (sat_score | interp 1600),
sat_proportion_2 = (sat_score | interp low:0 1600),
}
SQL
SELECT
*,
(sat_score - 0) / (1600 - 0) AS sat_proportion_1,
(sat_score - 0) / (1600 - 0) AS sat_proportion_2
FROM
students
and
PRQL
let fahrenheit_to_celsius = temp -> (temp - 32) / 1.8
from cities
derive temp_c = (temp_f | fahrenheit_to_celsius)
SQL
SELECT
*,
(temp_f - 32) / 1.8 AS temp_c
FROM
cities
We can combine a chain of functions, which makes logic more readable:
PRQL
let fahrenheit_to_celsius = temp -> (temp - 32) / 1.8
let interp = low:0 high x -> (x - low) / (high - low)
from kettles
derive boiling_proportion = (temp_c | fahrenheit_to_celsius | interp 100)
SQL
SELECT
*,
((temp_c - 32) / 1.8 - 0) / (100 - 0) AS boiling_proportion
FROM
kettles
Late binding
Functions can bind to any variable that is in scope when the function is
executed. For example, here cost_total
refers to the column that’s introduced
in the from
.
PRQL
let cost_share = cost -> cost / cost_total
from costs
select {materials, labor, overhead, cost_total}
derive {
materials_share = (cost_share materials),
labor_share = (cost_share labor),
overhead_share = (cost_share overhead),
}
SQL
SELECT
materials,
labor,
overhead,
cost_total,
materials / cost_total AS materials_share,
labor / cost_total AS labor_share,
overhead / cost_total AS overhead_share
FROM
costs
Standard library
The standard library currently contains commonly used functions that are used in SQL. It’s not yet as broad as we’d like, and we’re very open to expanding it.
Currently s-strings are an escape-hatch for any function that isn’t in our standard library. If we find ourselves using them for something frequently, raise an issue and we’ll add it to the stdlib.
Here’s the source of the current
PRQL std
:
PRQL 0.9.0 has started supporting different DB implementations for standard library functions.
The source is the std.sql
.
# The PRQL standard library defines the following functions and transforms.
# The definitions are whitespace insensitive, and have this form:
#
# ```
# let my_func = param1 param2 ... -> <return_type> body_expr
# ```
#
# Where:
# * `my_func` is the name of the function
# * `param1` is the first parameter optionally followed by a type in "< ... >"
# * `param2` etc. follow the same pattern as param1
# * `<return_type>` is the type of result wrapped in "< ... >"
# * `body_expr` defines the function body that creates the result.
# It can be PRQL code or `internal ...` to indicate internal compiler code.
# Operators
let mul = left right -> <int || float> internal std.mul
let div_i = left right -> <int || float> internal std.div_i
let div_f = left right -> <int || float> internal std.div_f
let mod = left right -> <int || float> internal std.mod
let add = left right -> <int || float || timestamp || date> internal std.add
let sub = left right -> <int || float || timestamp || date> internal std.sub
let eq = left right -> <bool> internal std.eq
let ne = left right -> <bool> internal std.ne
let gt = left right -> <bool> internal std.gt
let lt = left right -> <bool> internal std.lt
let gte = left right -> <bool> internal std.gte
let lte = left right -> <bool> internal std.lte
let and = left<bool> right<bool> -> <bool> internal std.and
let or = left<bool> right<bool> -> <bool> internal std.or
let coalesce = left right -> internal std.coalesce
let regex_search = text pattern -> <bool> internal std.regex_search
let neg = expr<int || float> -> <int || float> internal std.neg
let not = expr<bool> -> <bool> internal std.not
# Types
## Type primitives
type int = int
type float = float
type bool = bool
type text = text
type date = date
type time = time
type timestamp = timestamp
type `func` = func
type anytype = anytype
## Generic array
# TODO: an array of anything, not just nulls
type array = [anytype]
## Scalar
type scalar = int || float || bool || text || date || time || timestamp || null
type tuple = {..anytype}
## Range
type range = {start = scalar, end = scalar}
## Relation (an array of tuples)
type relation = [tuple]
## Transform
type transform = func relation -> relation
# Functions
## Relational transforms
let from = func
`default_db.source` <relation>
-> <relation> source
let select = func
columns <anytype>
tbl <relation>
-> <relation> internal select
let filter = func
condition <bool>
tbl <relation>
-> <relation> internal filter
let derive = func
columns <anytype>
tbl <relation>
-> <relation> internal derive
let aggregate = func
columns <anytype>
tbl <relation>
-> <relation> internal aggregate
let sort = func
by <anytype>
tbl <relation>
-> <relation> internal sort
let take = func
expr <anytype>
tbl <relation>
-> <relation> internal take
let join = func
`default_db.with` <relation>
condition <bool>
`noresolve.side`:inner
tbl <relation>
-> <relation> internal join
let group = func
by<scalar || tuple>
pipeline <transform>
tbl <relation>
-> <relation> internal group
let window = func
rows:0..-1
range:0..-1
expanding <bool>:false
rolling <int>:0
pipeline <transform>
tbl <relation>
-> <relation> internal window
let append = `default_db.bottom`<relation> top<relation> -> <relation> internal append
let intersect = `default_db.bottom`<relation> top<relation> -> <relation> (
t = top
join (b = bottom) (tuple_every (tuple_map _eq (tuple_zip t.* b.*)))
select t.*
)
let remove = `default_db.bottom`<relation> top<relation> -> <relation> (
t = top
join side:left (b = bottom) (tuple_every (tuple_map _eq (tuple_zip t.* b.*)))
filter (tuple_every (tuple_map _is_null b.*))
select t.*
)
let loop = func
pipeline <transform>
top <relation>
-> <relation> internal loop
## Aggregate functions
# These return either a scalar when used within `aggregate`, or a column when used anywhere else.
let min = column <array> -> <int || float || null> internal std.min
let max = column <array> -> <int || float || null> internal std.max
let sum = column <array> -> <int || float> internal std.sum
let average = column <array> -> <float || null> internal std.average
let stddev = column <array> -> <float || null> internal std.stddev
let all = column <array> -> <bool> internal std.all
let any = column <array> -> <bool> internal std.any
let concat_array = column <array> -> <text> internal std.concat_array
# Counts number of items in the column.
# Note that the count will include null values.
let count = column<array> -> <int> internal count
# Deprecated in favour of filterning input to the [std.count] function (not yet implemented).
@{deprecated}
let count_distinct = column <array> -> internal std.count_distinct
## Window functions
let lag = offset <int> column <array> -> internal std.lag
let lead = offset <int> column <array> -> internal std.lead
let first = column <array> -> internal std.first
let last = column <array> -> internal std.last
let rank = column <array> -> internal std.rank
let rank_dense = column <array> -> internal std.rank_dense
let row_number = column <array> -> internal row_number
# Mathematical functions
module math {
let abs = column -> <int || float> internal std.math.abs
let floor = column -> <int> internal std.math.floor
let ceil = column -> <int> internal std.math.ceil
let pi = -> <float> internal std.math.pi
let exp = column -> <int || float> internal std.math.exp
let ln = column -> <int || float> internal std.math.ln
let log10 = column -> <int || float> internal std.math.log10
let log = base<int || float> column -> <int || float> internal std.math.log
let sqrt = column -> <int || float> internal std.math.sqrt
let degrees = column -> <int || float> internal std.math.degrees
let radians = column -> <int || float> internal std.math.radians
let cos = column -> <int || float> internal std.math.cos
let acos = column -> <int || float> internal std.math.acos
let sin = column -> <int || float> internal std.math.sin
let asin = column -> <int || float> internal std.math.asin
let tan = column -> <int || float> internal std.math.tan
let atan = column -> <int || float> internal std.math.atan
let pow = exponent<int || float> column -> <int || float> internal std.math.pow
let round = n_digits column -> <scalar> internal std.math.round
}
## Misc functions
let as = `noresolve.type` column -> <scalar> internal std.as
let in = pattern value -> <bool> internal in
## Tuple functions
let tuple_every = func list -> <bool> internal tuple_every
let tuple_map = func fn <func> list -> internal tuple_map
let tuple_zip = func a b -> internal tuple_zip
let _eq = func a -> internal _eq
let _is_null = func a -> _param.a == null
## Misc
let from_text = input<text> `noresolve.format`:csv -> <relation> internal from_text
## Text functions
module text {
let lower = column -> <text> internal std.text.lower
let upper = column -> <text> internal std.text.upper
let ltrim = column -> <text> internal std.text.ltrim
let rtrim = column -> <text> internal std.text.rtrim
let trim = column -> <text> internal std.text.trim
let length = column -> <int> internal std.text.length
let extract = offset<int> length<int> column -> <text> internal std.text.extract
let replace = pattern<text> replacement<text> column -> <text> internal std.text.replace
let starts_with = prefix<text> column -> <bool> internal std.text.starts_with
let contains = substr<text> column -> <bool> internal std.text.contains
let ends_with = suffix<text> column -> <bool> internal std.text.ends_with
}
## Date functions
module date {
let to_text = format<text> column -> <text> internal std.date.to_text
}
## File-reading functions, primarily for DuckDB
let read_parquet = source<text> -> <relation> internal std.read_parquet
let read_csv = source<text> -> <relation> internal std.read_csv
## PRQL compiler functions
module `prql` {
let version = -> <text> internal prql_version
}
# Deprecated, will be removed in 0.12.0
let prql_version = -> <text> internal prql_version
And a couple of examples:
PRQL
from employees
derive {
gross_salary = (salary + payroll_tax | as int),
gross_salary_rounded = (gross_salary | math.round 0),
time = s"NOW()", # an s-string, given no `now` function exists in PRQL
}
SQL
SELECT
*,
CAST(salary + payroll_tax AS int) AS gross_salary,
ROUND(CAST(salary + payroll_tax AS int), 0) AS gross_salary_rounded,
NOW() AS time
FROM
employees
Example of different implementations of division and integer division:
PRQL
prql target:sql.sqlite
from [{x = 13, y = 5}]
select {
quotient = x / y,
int_quotient = x // y,
}
SQL
WITH table_0 AS (
SELECT
13 AS x,
5 AS y
)
SELECT
(x * 1.0 / y) AS quotient,
ROUND(ABS(x / y) - 0.5) * SIGN(x) * SIGN(y) AS int_quotient
FROM
table_0
PRQL
prql target:sql.mysql
from [{x = 13, y = 5}]
select {
quotient = x / y,
int_quotient = x // y,
}
SQL
WITH table_0 AS (
SELECT
13 AS x,
5 AS y
)
SELECT
(x / y) AS quotient,
(x DIV y) AS int_quotient
FROM
table_0
Transforms
Transforms are functions that take a relation and produce a relation.
Usually they are chained together into a pipeline, which resembles an SQL query.
Transforms were designed with a focus on modularity, so each of them is fulfilling a specific purpose and has defined invariants (properties of the relation that are left unaffected). That’s often referred to as “orthogonality” and its goal is to keep transform functions composable by minimizing interference of their effects. Additionally, it also keeps the number of transforms low.
For example, select
and derive
will not change the number of rows, while
filter
and take
will not change the number of columns.
In SQL, we can see this lack of invariant when an aggregation function is used
in the SELECT
clause. Before, the number of rows was kept constant, but
introduction of an aggregation function caused the whole statement to produce
only one row (per group).
These are the currently available transforms:
Transform | Purpose | SQL Equivalent |
---|---|---|
derive | Compute new columns | SELECT *, ... AS ... |
select | Pick & compute columns | SELECT ... AS ... |
filter | Pick rows based on their values | WHERE , HAVING ,QUALIFY |
sort | Order rows based on the values of columns | ORDER BY |
join | Add columns from another table, matching rows based on a condition | JOIN |
take | Pick rows based on their position | TOP , LIMIT , OFFSET |
group | Partition rows into groups and applies a pipeline to each of them | GROUP BY , PARTITION BY |
aggregate | Summarize many rows into one row | SELECT foo(...) |
window | Apply a pipeline to overlapping segments of rows | OVER , ROWS , RANGE |
loop | Iteratively apply a function to a relation until it’s empty | WITH RECURSIVE ... |
See also
from
—from
is the main way of getting data into a pipeline (it’s not listed above since it’s not technically a transform, since it doesn’t receive an input).
Aggregate
Summarizes many rows into one row.
When applied:
- without
group
, it produces one row from the whole table, - within a
group
pipeline, it produces one row from each group.
aggregate {expression or assign operations}
Currently, all declared aggregation functions are min
, max
, count
,
average
, stddev
, avg
, sum
and count_distinct
. We are in the
process of filling out std lib.
Examples
PRQL
from employees
aggregate {
average salary,
ct = count salary
}
SQL
SELECT
AVG(salary),
COUNT(*) AS ct
FROM
employees
PRQL
from employees
group {title, country} (
aggregate {
average salary,
ct = count salary,
}
)
SQL
SELECT
title,
country,
AVG(salary),
COUNT(*) AS ct
FROM
employees
GROUP BY
title,
country
Aggregate is required
Unlike in SQL, using an aggregation function in derive
or select
(or any
other transform except aggregate
) will not trigger aggregation. By default,
PRQL will interpret such attempts functions as window functions:
PRQL
from employees
derive {avg_sal = average salary}
SQL
SELECT
*,
AVG(salary) OVER () AS avg_sal
FROM
employees
This ensures that derive
does not manipulate the number of rows, but only ever
adds a column. For more information, see window transform.
Append
Concatenates two tables together.
Equivalent to UNION ALL
in SQL. The number of rows is always the sum of the
number of rows from the two input tables. To replicate UNION DISTINCT
, see
set operations.
PRQL
from employees_1
append employees_2
SQL
SELECT
*
FROM
employees_1
UNION
ALL
SELECT
*
FROM
employees_2
Remove
experimental
Removes rows that appear in another relation, like EXCEPT ALL
. Duplicate rows
are removed one-for-one.
PRQL
from employees_1
remove employees_2
SQL
SELECT
*
FROM
employees_1 AS t
EXCEPT
ALL
SELECT
*
FROM
employees_2 AS b
Intersection
experimental
PRQL
from employees_1
intersect employees_2
SQL
SELECT
*
FROM
employees_1 AS t
INTERSECT
ALL
SELECT
*
FROM
employees_2 AS b
Set operations
experimental
To imitate set operations i.e. (UNION
, EXCEPT
and INTERSECT
), you can use
the following functions:
let distinct = rel -> (from t = _param.rel | group {t.*} (take 1))
let union = `default_db.bottom` top -> (top | append bottom | distinct)
let except = `default_db.bottom` top -> (top | distinct | remove bottom)
let intersect_distinct = `default_db.bottom` top -> (top | intersect bottom | distinct)
Don’t mind the default_db.
and noop
, these are compiler implementation
detail for now.
Derive
Computes one or more new columns.
derive {
name = expression,
# or
column,
}
Examples
PRQL
from employees
derive gross_salary = salary + payroll_tax
SQL
SELECT
*,
salary + payroll_tax AS gross_salary
FROM
employees
PRQL
from employees
derive {
gross_salary = salary + payroll_tax,
gross_cost = gross_salary + benefits_cost
}
SQL
SELECT
*,
salary + payroll_tax AS gross_salary,
salary + payroll_tax + benefits_cost AS gross_cost
FROM
employees
Filter
Picks rows based on their values.
filter boolean_expression
Examples
PRQL
from employees
filter (age > 25 || department != "IT")
SQL
SELECT
*
FROM
employees
WHERE
age > 25
OR department <> 'IT'
PRQL
from employees
filter (department | in ["IT", "HR"])
SQL
SELECT
*
FROM
employees
WHERE
department IN ('IT', 'HR')
PRQL
from employees
filter (age | in 25..40)
SQL
SELECT
*
FROM
employees
WHERE
age BETWEEN 25 AND 40
Group
Partitions the rows into groups and applies a pipeline to each of the groups.
group {key_columns} (pipeline)
The partitioning of groups are determined by the key_column
s (first argument).
The most conventional use of group
is with aggregate
:
PRQL
from employees
group {title, country} (
aggregate {
average salary,
ct = count salary
}
)
SQL
SELECT
title,
country,
AVG(salary),
COUNT(*) AS ct
FROM
employees
GROUP BY
title,
country
In concept, a transform in context of a group
does the same transformation to
the group as it would to the table — for example finding the employee who joined
first across the whole table:
To find the employee who joined first in each department, it’s exactly the same
pipeline, but within a group
expression:
PRQL
from employees
group role (
sort join_date # taken from above
take 1
)
SQL
WITH table_0 AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY role
ORDER BY
join_date
) AS _expr_0
FROM
employees
)
SELECT
*
FROM
table_0
WHERE
_expr_0 <= 1
Join
Adds columns from another relation, matching rows based on a condition.
join side:{inner|left|right|full} rel (condition)
Parameters
side
specifies which rows to include, defaulting toinner
.rel
- the relation to join with, possibly including an alias, e.g.a=artists
.condition
- the criteria on which to match the rows from the two relations. Theoretically,join
will produce a cartesian product of the two input relations and then filter the result by the condition. It supports two additional features:- Names
this
&that
: Along namethis
, which refers to the first input relation,condition
can use namethat
, which refers to the second input relation. - Self equality operator: If the condition is an equality comparison between
two columns with the same name (i.e.
(this.col == that.col)
), it can be expressed with only(==col)
.
- Names
Examples
PRQL
from employees
join side:left positions (employees.id==positions.employee_id)
SQL
SELECT
employees.*,
positions.*
FROM
employees
LEFT JOIN positions ON employees.id = positions.employee_id
PRQL
from employees
join side:left p=positions (employees.id==p.employee_id)
SQL
SELECT
employees.*,
p.*
FROM
employees
LEFT JOIN positions AS p ON employees.id = p.employee_id
PRQL
from tracks
join side:left artists (
# This adds a `country` condition, as an alternative to filtering
artists.id==tracks.artist_id && artists.country=='UK'
)
SQL
SELECT
tracks.*,
artists.*
FROM
tracks
LEFT JOIN artists ON artists.id = tracks.artist_id
AND artists.country = 'UK'
In SQL, CROSS JOIN is a join that returns each row from first relation matched
with all rows from the second relation. To accomplish this, we can use condition
true
, which will return all rows of the cartesian product of the input
relations:
from shirts
join hats true
this
& that
can be used to refer to
the current & other table respectively:
PRQL
from tracks
join side:inner artists (
this.id==that.artist_id
)
SQL
SELECT
tracks.*,
artists.*
FROM
tracks
JOIN artists ON tracks.id = artists.artist_id
If the join conditions are of form left.x == right.x
, we can use “self
equality operator”:
PRQL
from employees
join positions (==emp_no)
SQL
SELECT
employees.*,
positions.*
FROM
employees
JOIN positions ON employees.emp_no = positions.emp_no
Loop
Experimental
loop {step_function} {initial_relation}
Iteratively applies step
function to initial
relation until the step
returns an empty table. Returns a relation that contains rows of initial
relation and all intermediate relations.
This behavior could be expressed with following pseudo-code:
def loop(step, initial):
result = []
current = initial
while current is not empty:
result = append(result, current)
current = step(current)
return result
Examples
PRQL
from [{n = 1}]
loop (
filter n<4
select n = n+1
)
# returns [1, 2, 3, 4]
SQL
WITH RECURSIVE table_0 AS (
SELECT
1 AS n
),
table_1 AS (
SELECT
n
FROM
table_0
UNION
ALL
SELECT
n + 1
FROM
table_1
WHERE
n < 4
)
SELECT
n
FROM
table_1 AS table_2
The behavior of WITH RECURSIVE
may depend on the database configuration in MySQL.
The compiler assumes the behavior described by
the Postgres documentation
and will not produce correct results for
alternative configurations of MySQL.
Currently, loop
may produce references to the recursive CTE in sub-queries,
which is not supported by some database engines, e.g. SQLite. For now, we suggest step
functions are kept simple enough to fit into a single SELECT statement.
Select
Picks and computes columns.
select {
name = expression,
# or
column,
}
# or
select !{column}
Examples
PRQL
from employees
select name = f"{first_name} {last_name}"
SQL
SELECT
CONCAT(first_name, ' ', last_name) AS name
FROM
employees
PRQL
from employees
select {
name = f"{first_name} {last_name}",
age_eoy = dob - @2022-12-31,
}
SQL
SELECT
CONCAT(first_name, ' ', last_name) AS name,
dob - DATE '2022-12-31' AS age_eoy
FROM
employees
PRQL
from e=employees
select {e.first_name, e.last_name}
SQL
SELECT
first_name,
last_name
FROM
employees AS e
Excluding columns
We can use !
to exclude a list of columns. This can operate in two ways:
- We use
SELECT * EXCLUDE
/SELECT * EXCEPT
for the columns supplied toselect !{}
in dialects which support it. - Otherwise, the columns must have been defined prior in the query (unless all
of a table’s columns are excluded); for example in another
select
or agroup
transform. In this case, we evaluate and specify the columns that should be included in the output SQL.
Some examples:
PRQL
prql target:sql.bigquery
from tracks
select !{milliseconds, bytes}
SQL
SELECT
*
EXCEPT
(milliseconds, bytes)
FROM
tracks
PRQL
from tracks
select {track_id, title, composer, bytes}
select !{title, composer}
SQL
SELECT
track_id,
bytes
FROM
tracks
Note that !
is also the NOT
operator, so without the tuple it has a
different meaning:
PRQL
prql target:sql.bigquery
from tracks
select !is_compilation
SQL
SELECT
NOT is_compilation
FROM
tracks
Sort
Order rows based on the values of one or more expressions (generally columns).
sort {(+|-) column}
Parameters
- One expression or a tuple of expressions to sort by
- Each expression can be prefixed with:
+
, for ascending order, the default-
, for descending order
- When using prefixes, even a single expression needs to be in a tuple or
parentheses. (Otherwise,
sort -foo
is parsed as a subtraction betweensort
andfoo
.)
Examples
PRQL
from employees
sort {age, -tenure, +salary}
SQL
SELECT
*
FROM
employees
ORDER BY
age,
tenure DESC,
salary
We can also use expressions:
PRQL
from employees
sort {s"substr({first_name}, 2, 5)"}
SQL
WITH table_0 AS (
SELECT
*,
substr(first_name, 2, 5) AS _expr_0
FROM
employees
)
SELECT
*
FROM
table_0
ORDER BY
_expr_0
Ordering guarantees
Ordering is persistent through a pipeline in PRQL. For example:
PRQL
from employees
sort tenure
join locations (==employee_id)
SQL
SELECT
employees.*,
locations.*
FROM
employees
JOIN locations ON employees.employee_id = locations.employee_id
ORDER BY
employees.tenure
Here, PRQL pushes the sort
down the pipeline, compiling the ORDER BY
to the
end of the query. Consequently, most relation transforms retain the row order.
The explicit semantics are:
sort
introduces a new order,group
resets the order,join
retains the order of the left relation,- database tables don’t have a known order.
Comparatively, in SQL, relations possess no order, being orderable solely within
the context of the query result, LIMIT
statement, or window function. The lack
of inherent order can result in an unexpected reshuffling of a previously
ordered relation from a JOIN
or windowing operation.
To be precise — in PRQL, a relation is an array of tuples and not a set or a bag. The persistent nature of this order remains intact through sub-queries and intermediate table definitions.
For instance, an SQL query such as:
WITH albums_sorted AS (
SELECT *
FROM albums
ORDER BY title
)
SELECT *
FROM albums_sorted
JOIN artists USING (artist_id)
…doesn’t guarantee any row order (indeed — even without the JOIN
, the SQL
standard doesn’t guarantee an order, although most implementations will respect
it).
Specification
This chapter explains PRQL’s semantics: how expressions are interpreted and their meaning. It’s intended for advanced users and compiler contributors.
Null handling
SQL has an unconventional way of handling NULL
values, since it treats them as
unknown values. As a result, in SQL:
NULL
is not a value indicating a missing entry, but a placeholder for anything possible,NULL = NULL
evaluates toNULL
, since one cannot know if one unknown is equal to another unknown,NULL <> NULL
evaluates toNULL
, using same logic,- to check if a value is
NULL
, SQL introducesIS NULL
andIS NOT NULL
operators, DISTINCT column
may return multipleNULL
values.
For more information, check out the Postgres documentation.
PRQL, on the other hand, treats null
as a value, which means that:
null == null
evaluates totrue
,null != null
evaluates tofalse
,- distinct column cannot contain multiple
null
values.
PRQL
from employees
filter first_name == null
filter null != last_name
SQL
SELECT
*
FROM
employees
WHERE
first_name IS NULL
AND last_name IS NOT NULL
Note that PRQL doesn’t change how NULL
is compared between columns, for
example in joins. (PRQL compiles to SQL and so can’t change the behavior of the
database).
For more context or to provide feedback check out the discussion on issue #99.
Name resolution
Because PRQL primarily handles relational data, it has specialized scoping rules for referencing columns.
Scopes
In PRQL’s compiler, a scope is the collection of all names one can reference from a specific point in the program.
In PRQL, names in the scope are composed from namespace and variable name which are separated by a dot, similar to SQL. Namespaces can contain many dots, but variable names cannot.
Name my_table.some_column
is a variable some_column
from namespace my_table
.
Name foo.bar.baz
is a variable baz
from namespace foo.bar
.
When processing a query, a scope is maintained and updated for each point in the query.
It start with only namespace std
, which is the standard library. It contains
common functions like sum
or count
, along with all transform functions such
as derive
and group
.
In pipelines (or rather in transform functions), scope is also injected with
namespaces of tables which may have been referenced with from
or join
transforms. These namespaces contain simply all the columns of the table and
possibly a wildcard variable, which matches any variable (see the algorithm
below). Within transforms, there is also a special namespace that does not have
a name. It is called a “frame” and it contains columns of the current table
the transform is operating on.
Resolving
For each ident we want to resolve, we search the scope’s items in order. One of three things can happen:
-
Scope contains an exact match, e.g. a name that matches in namespace and the variable name.
-
Scope does not contain an exact match, but the ident did not specify a namespace, so we can match a namespace that contains a
*
wildcard. If there’s a single namespace, the matched namespace is also updated to contain this new variable name. -
Otherwise, the nothing is matched and an error is raised.
Translating to SQL
When translating into an SQL statement which references only one table, there is no need to reference column names with table prefix.
But when there are multiple tables and we don’t have complete knowledge of all
table columns, a column without a prefix (i.e. first_name
) may actually reside
in multiple tables. Because of this, we have to use table prefixes for all
column names.
PRQL
from employees
derive {first_name, dept_id}
join d=departments (==dept_id)
select {first_name, d.title}
SQL
SELECT
employees.first_name,
d.title
FROM
employees
JOIN departments AS d ON employees.dept_id = d.dept_id
As you can see, employees.first_name
now needs table prefix, to prevent
conflicts with potential column with the same name in departments
table.
Similarly, d.title
needs the table prefix.
Modules
The module
facility is in discussion.
This page documents our understanding of the way
the final PRQL compiler will likely work.
The PRQL compiler currently uses these techniques to compile
the std
, date
, text
, and math
modules into the language.
However, at this time (Spring 2024), the module
facility does not work
within a PRQL query itself.
That is, a module
statement in a query cannot import files
from the local file system.
Design goals for modules:
-
Allow importing declarations from other files.
-
Have namespaces for things like
std
. -
Have a hierarchical structure so we can represent files in directories.
-
Have an unambiguous module structure within a project.
Definition
A module is a namespace that contains declarations. A module is itself a declaration, which means that it can contain nested child modules.
This means that modules form a tree graph, which we call “the module structure”.
For the sake of this document, we will express the module structure with
module
keyword and a code block encased in curly braces:
module my_playlists {
let bangers = ... # a declaration
module soundtracks {
let movie_albums = ... # another declaration
}
}
The syntax
module name { ...decls... }
is not part of PRQL language, with the objection that it is unnecessary as it only adds more ways of defining modules. If a significant upside of this syntax is found, it may be added in the future.
Name resolution
Any declarations within a module can be referenced from the outside of the module:
# using module structure declared above
module my_playlists
let great_tracks = my_playlists.bangers
let movie_scores = my_playlists.soundtracks.movie_albums
Identifiers are resolved relative to current module.
module my_playlists {
module soundtracks {
let movie_albums = (from albums | filter id == 3)
}
from soundtracks.movie_albums
}
from my_playlists.soundtracks.movie_albums
If an identifier cannot be resolved relative to the current module, it tries to resolve relative to the parent module. This is repeated, stepping up the module hierarchy until a match is found or root of the module structure is reached.
module my_playlists {
let decl_1 = ...
module soundtracks {
let decl_2 = ...
}
module upbeat_rock {
let decl_3 = ...
from decl_1 | join soundtracks.decl2 | join decl_3
}
}
Main var declaration
The final variable declaration in a module can omit the leading let main =
and
acquire an implicit name main.
module my_playlists {
let bangers = (from tracks | take 10)
from playlists | take 10
}
let album_titles = my_playlists.main
When a module is referenced as a value, the main
variable is used instead.
This is especially useful when referring to a module which is to be compiled to
RQ (and later SQL).
# last line from previous example could thus be shortened to:
let album_titles = my_playlists
File importing
The examples below do not work.
At this time (Spring 2024), the module
facility does not work
within a PRQL query itself.
That is, a module
statement in a query cannot import files
from the local file system.
To include PRQL source code from other files, we can use the following syntax:
module my_playlists
This loads either ./my_playlists.prql
(a leaf module) or
./my_playlists/_my_playlists.prql
(a directory module) and uses its contents
as module my_playlists
. If none or both of the files are present, a
compilation error is raised.
Only directory modules can contain module declarations. If a leaf module contains a module declaration, a compilation error is raised, suggesting the leaf module to be converted into a directory module. This is a step toward any module structure having a single “normalized” representation in the file system. Such normalization is desired because it restrains the possible file system layouts to a comprehensible and predictable layout, while not sacrificing any functionality.
Described importing rules don’t achieve this “single normalized representation” in full, since any leaf modules could be replaced by a directory module with zero submodules, without any semantic changes. Restricting directory modules to have at least one sub-module would not improve approachability enough to justify adding this restriction.
For example, the following module structure is annotated with files names in which the modules would reside:
module my_project {
# _my_project.prql
module sales {
# sales.prql
}
module projections {
# projections/_projections.prql
module year_2023 {
# projections/year_2023.prql
}
module year_2024 {
# projections/year_2024.prql
}
}
}
If module my_project.sales
wants to add a submodule util
, it has to be
converted to a directory modules. This means that it has to be moved to
sales/_sales.prql
. The new module would reside in sales/util.prql
.
The annotated layout is not the only possible layout for this module structure,
since any of the modules sales
, year_2023
or year_2024
could be converted
into a directory module with zero sub-modules.
Point 4 of design goals means that each declaration within a project has a single fully-qualified name within this project. This is ensured by strict rules regarding importing files and the fact that the module structure is a tree.
Declaration order
The order of declarations in a module holds no semantic value, except the “last
main
variable”.
References between modules can be cyclic.
module mod_a {
let decl_a_1 = ...
let decl_a_2 = (from mod_b.decl_b | take 10)
}
module mod_b {
let decl_b = (from mod_a.decl_a | take 10)
}
References between variable declarations cannot be cyclic.
let decl_a = (from decl_b)
let decl_b = (from decl_a) # error: cyclic reference
module mod_a {
let decl_a = (from mod_b.decl_b)
}
module mod_b {
let decl_b = (from mod_a.decl_a) # error: cyclic reference
}
Compiler interface
prqlc
provides two interfaces for compiling files.
Multi-file interface requires three arguments:
- path to the file containing the module which is the root of the module structure,
- identifier of the pipeline that should be compiled to RQ (this can also be an
identifier of a module that has a
main
pipeline) and, - a “file loader”, which can load files on-demand.
The path to the root module can be automatically detected by searching for
.prql
files starting with _
in the current working directory.
Example prqlc usage:
$ prqlc compile _project.prql sales.projections.orders_2024
$ prqlc compile sales.projections.orders_2024
Single-file interface requires a single argument; the PRQL source. Any attempts to load modules in this mode result in compilation errors. This interface is needed, for example, when integrating the compiler with a database connector (i.e. JDBC) where no other files can be loaded.
Built-in module structure
As noted above, this facility is in discussion.
# root module of every project
module project {
module std {
let sum = a -> ...
let mean = a -> ...
}
module default_db {
# all inferred tables and defined CTEs
}
let main = (
from t = tracks
select [track_id, title]
)
}
Example
The examples below do not work.
At this time (Spring 2024), the module
facility does not work
within a PRQL query itself.
That is, a module
statement in a query cannot import files
from the local file system.
This is an example project, where each of code block is a separate file.
# _project.prql
module employees
module sales
module util
# employees.prql
let employees = (...)
let salaries = (...)
let departments = (...)
# sales/_sales.prql
module orders
module projections
let revenue_by_source = (...)
# sales/orders.prql
let current_year = (...)
let archived = (...)
let by_employee = (from orders | join employees.employees ...)
# sales/projections.prql
let orders_2023 = (from orders.current_year | append orders.archived)
let orders_2024 = (...)
# util.prql
let pretty_print_num = col -> (...)
Sources:
- Notes On Module System, by @matklad.
Type system
Status: under development
The type system determines the allowed values of a term.
Purpose
Each of the SQL DBMSs has their own type system. Thanks to the SQL standard, they are very similar, but have key differences regardless. For example, SQLite does not have a type for date or time or timestamps, but it has functions for handling date and time that take ISO 8601 strings or integers that represent Unix timestamps. So it does support most of what is possible to do with dates in other dialects, even though it stores data with a different physical layout and uses different functions to achieve that.
PRQL’s task is to define it’s own description of data formats, just as how it already defines common data transformations.
This is done in two steps:
-
Define PRQL’s Type System (PTS), following principles we think a relational language should have (and not fixate on what existing SQL DBMSs have).
-
Define a mapping between SQL Type System (STS) and PTS, for each of the DBMSs. Ideally we’d want that to be a bijection, so each type in PTS would be represented by a single type in STS and vice-versa. Unfortunately this is not entirely possible, as shown below.
In practical terms, we want for a user to be able to:
-
… express types of their database with PRQL (map their STS into PTS). In some cases, we can allow to say “your database is not representable with PRQL, change it or use only a subset of it”. An example of what we don’t want to support are arrays with arbitrary indexes in Postgres (i.e. 2-based index for arrays).
This task of mapping to PTS could be automated by LSP server, by introspecting user’s SQL database and generating PRQL source.
-
… express their SQL queries in PRQL. Again, using mapping from STS to PTS, one should be able to express any SQL operation in PRQL.
For example, translate MSSQL
DATEDIFF
to subtraction operator-
in PRQL.For now, this mapping is manual, but should be documented and may be automated.
-
… use any PRQL feature in their database. Here we are mapping from PTS into an arbitrary STS.
For example, translate PRQL’s datetime operations to use TEXT in SQLite.
As of now, prqlc already does a good job of automatically doing this mapping.
Example of the mapping between PTS and two STSs:
PTS | STS Postgres | STS SQLite |
---|---|---|
int32 | integer | INTEGER |
int64 | bigint | INTEGER |
timestamp | timestamp | TEXT |
Principles
Algebraic types - have a way of expressing sum and product types. In Rust, sum would be an enum and product would be tuple or a struct. In SQL, product would be a row, since it can contain different types, all at once. Sum would be harder to express, see this post.
The value proposition here is that algebraic types give a lot modeling flexibility, all while being conceptually simple.
Composable - as with transformation, we’d want types to compose together.
Using Python, JavaScript, C++ or Rust, one could define many different data structures that would correspond to our idea of “relation”. Most of them would be an object/struct that has column names and types and then a generic array of arrays for rows.
PRQL’s type system should also be able to express relations as composed from primitive types, but have only one idiomatic way of doing so.
In practice, this means that builtin types include only primitives (int, text, bool, float), tuple (for product), enum (for sum) and array (for repeating). An SQL row translates to a tuple, and a relation translates to an array of tuples.
Composability also leads to a minimal type system, which does not differentiate between tuples, objects and structs. A single product type is enough.
No subtyping - avoid super types and inheritance.
Subtyping is a natural extension to a type system, where a type can be a super
type of some other type. This is base mechanism for Object Oriented Programming,
but is also present in most dynamically types languages. For example, a type
number
might be super type of int
and float
.
PTS does not have subtyping, because it requires dynamic dispatch and because it adds unnecessary complexity to generic type arguments.
Dynamic dispatch, is a mechanism that would be able, for example, to call
appropriate to_string
function for each element of an array of number
. This
array contains both elements of type int
and type float
, with different
to_string
implementations.
Definition
For any undefined terms used in this section, refer to set theory and mathematical definitions in general.
A “type of a variable” is a “set of all possible values of that variable”.
Primitives
At the moment of writing, PRQL defines following primitive types: int
,
float
, bool
, text
, date
, time
and timestamp
. New primitive types
will be added in the future and some of existing types might be split into
smaller subsets (see section “Splitting primitives”).
Tuples
Tuple type is a product type.
It contains n ordered fields, where n is known at compile-time. Each field has a type itself and an optional name. Fields are not necessarily of the same type.
In other languages, similar constructs are named record, struct, tuple, named tuple or (data)class.
type my_row = {id = int, bool, name = str}
Arrays
Array is a container type that contains n ordered fields, where n is not known at compile-time. All fields are of the same type and cannot be named.
type array_of_int = [int]
Functions
type floor_signature = func float -> int
Union
type status = (
paid = () ||
unpaid = float ||
{reason = text, cancelled_at = timestamp} ||
)
This is “a sum type”.
Type annotations
Variable annotations and function parameters may specify type annotations:
let a <t> = x
The value of x
(and thus a
) must be an element of t
.
let my_func = func x <t> -> y
The value of argument supplied to x
must be an element of t
.
let my_func = func x -> <t> y
The value of function body y
must be an element of t
.
Physical layout
Logical type is user-facing the notion of a type that is the building block of the type system.
Physical layout is the underlying memory layout of the data represented by a variable.
In many programming languages, physical layout of a logical type is dependent on the target platform. Similarly, physical layout of a PRQL logical type is dependent on representation of that type in the target STS.
PTS logical type ---> STS logical type ---> STS physical layout
Note that not all STS types do not have a single physical layout. Postgres has a
logical (pseudo)type anyelement
, which is a super type of any data type. It
can be used as a function parameter type, but does not have a single physical
layout so it cannot be used in a column declaration.
For now, PRQL does not define physical layouts of any type. It is not needed since PRQL is not used for DDL (see section “Splitting primitives”) or does not support raw access to underlying memory.
As a consequence, results of a PRQL query cannot be robustly compared across DBMSs, since the physical layout of the result will vary.
In the future, PRQL may define a common physical layout of types, probably using Apache Arrow.
Examples
type my_relation = [{
id = int,
title = text,
age = int
}]
type invoices = [{
invoice_id = int64,
issued_at = timestamp,
labels = [text]
#[repr(json)]
items = [{
article_id = int64,
count = int16 where x -> x >= 1,
}],
paid_by_user_id = (int64 || null),
}]
Appendix
Splitting primitives
This document mentions int32
and int64
as distinct types, but there is no
need for that in the initial implementation. The built-in int
can associate
with all operations on integers and translate PRQL to valid SQL regardless of
the size of the integer. Later, int
cam be replaced by int8
, int16
,
int32
, int64
.
The general rule for “when to make a distinction between types” would be “as
soon as the types carry different information and we find an operation that
would be expressed differently”. In this example, that would require some
operation on int32
to have different syntax than same operation over int64
.
We can have such relaxed rule because PRQL is not aiming to be a Data Definition Language and does not have to bother with exact physical layout of types.
Type representations
There are cases where a PTS type has multiple possible and valid representations in some STSs.
For such cases, we’d want to support the use of alternative representations for storing data, but also application of any function that is defined for the original type.
Using SQLite as an example again, users may have some temporal data stored as
INTEGER unix timestamp and some as TEXT that contains ISO 8601 without timezone.
From the user’s perspective, both of these types are timestamp
s and should be
declared as such. But when compiling operations over these types to SQL, the
compiler should consider their different representations in STS. For example a
difference between two timestamps timestamp - timestamp
can be translated to a
normal int subtraction for INTEGER repr, but must apply SQLite’s function
unixepoch
when dealing with TEXT repr.
Table declarations should therefore support annotations that give hints about which representation is used:
table foo {
#[repr(text)]
created_at: timestamp,
}
A similar example is an “array of strings type” in PTS that could be represented
by a text[]
(if DBMS supports arrays) or json
or it’s variant jsonb
in
Postgres. Again, the representation would affect operators: in Postgres, arrays
would be accessed with my_array[1]
and json arrays would use
my_json_array -> 1
. This example may not be applicable, if we decide that we
want a separate JSON type in PST.
RQ functions, targets and reprs
This part is talks about technical implementations, not the language itself
Idea
RQ contains a single node kind for expressing operations and functions: BuiltInFunction (may be renamed in the future).
It is a bottleneck that we can leverage when trying to affect how an operator or a function interacts with different type representations on different targets.
Idea is to implement the BuiltInFunction multiple times and annotate it with it intended target and parameter representation. Then we can teach the compiler to pick the appropriate function implementation that suit current repr and compilation target.
Specifics
RQ specification is an interface that contains functions, identified by name
(i.e. std.int8.add
). These functions have typed parameters and a return value.
If an RQ function call does not match the function declaration in number or in
types of the parameters, this is considered an invalid RQ AST.
We provide multiple implementations for each RQ function. They are annotated
with a target (i.e. #[target(sql.sqlite)]
) and have their params annotated
with type reprs (i.e. #[repr(int)]
).
# using a made-up syntax
#[target(sql.sqlite)]
func std.int8.add
#[repr(int8)] x
#[repr(int8)] y
-> s"{x} + {y}"
Each RQ type has one canonical repr that serves as the reference implementation for other reprs and indicates the amount of contained data (i.e. 1 bit, 8 bits, 64 bits).
Example
Let’s say for example, that we’d want to support 8bit integer arithmetic, and
that we’d want the result of 127 + 1
to be -128
(ideally we’d handle this
better, but bear with me for the sake of the example). Because some RDBMSs don’t
support 8bit numbers and do all their integer computation with 64bit numbers
(SQLite), we need to implement an alternative type representation for that
target.
The logical type int8
could have the following two reprs:
- canonical
repr_int8
that contains 8 bits in two’s complement, covering integer values in range -128 to 127 (inclusive), repr_int64
that contains 64 bits of data, but is using only the values that are also covered byrepr_int8
.
Now we’d implement function std.int8.add
for each of the reprs. Let’s assume
that the int8
implementation is straightforward and that databases don’t just
change the data type when a number overflows. The impl for int64
requires a
CASE statement that checks if the value would overflow and subtact 256 in that
case.
The goal here is that the results of the two impls are equivalent. To validate
that, we also need a way to convert between the reprs, or another to_string
function, implemented for both reprs.
PRQL Changelog
0.13.0 — 2024-07-25
0.13.0 brings a new debug logging framework, a big refactor of the parser, a new highlighter, a few bug fixes, and lots of other changes. It has 153 commits from 11 contributors.
Our work continues on rewriting the resolver and completing prqlc fmt
.
Selected changes:
Language:
-
Parentheses are always required around pipelines, even within tuples. For example:
from artists # These parentheses are now required derive {a=(b | math.abs)} # No change — doesn't affect expressions or function calls without pipelines derive {x = 3 + 4}
This is a small breaking change. The new behavior matches the existing documentation. (@max-sixty, #4775)
Features:
prqlc compile --debug-log=log.html
will generate an HTML file with a detailed log of the compilation process. (@aljazerzen, #4646)- Added
prqlc debug json-schema
command to auto-generate JSON Schema representations of commonly exposed IR types such as PL and RQ. (@kgutwin, #4698) - Add documentation comments to the output of the documentation generator. (@vanillajonathan, #4729)
- Add CLI syntax highlighting to
prqlc
. You can try it asprqlc experimental highlight example.prql
. (@vanillajonathan, #4755)
Fixes:
- Using
in
with an empty array pattern (e.g.expr | in []
) will now output a constantfalse
condition instead of anexpr IN ()
, which is syntactically invalid in some SQL dialects (@Globidev, #4598)
Integrations:
- The Snap package previously released on the edge channel is now released on the stable channel. (@vanillajonathan, #4784)
Internal changes:
-
Major reorganization of
prqlc-parser
—prqlc-ast
is merged intoprqlc-parser
, andprqlc-parser
’s files are rearranged, including its exports. This is part of an effort to modularize the compiler by stage, reducing the amount of context that’s required to understand a single stage. There will likely be some further changes (more detail in the PR description). (@m-span, #4634)- This is a breaking change for any libraries that depend on
prqlc-parser
(which should be fairly rare).
- This is a breaking change for any libraries that depend on
-
Renamed
prql-compiler-macros
toprqlc-macros
for consistency with other crates (@max-sixty, #4565) -
prql-compiler
, the old name forprqlc
, is removed as a facade toprqlc
. It had been deprecated for a few versions and will no longer be updated. (@max-sixty) -
New benchmarks (@max-sixty, #4654)
New Contributors:
0.12.2 — 2024-06-10
0.12.2 is a very small release which renames prql-js
to prqlc-js
to match
our standard naming scheme. Within node the package is imported as prqlc
.
It also fixes a mistake in the prqlc-python
release pipeline.
0.12.1 — 2024-06-09
0.12.1 is a tiny hotfix release which fixes how intra-prql crate dependencies were specified.
0.12.0 — 2024-06-08
0.12.0 contains a few months of smaller features. Our focus has been on rewriting the resolver, an effort that is still ongoing.
It has 239 commits from 12 contributors. Selected changes (most are not listed here, possibly we should be more conscientious about adding them…):
Features:
- Add
prqlc lex
command to the CLI (@max-sixty) - Add
prqlc debug lineage
command to the CLI, creating an expression lineage graph from a query (@kgutwin, #4533) - Initial implementation of an experimental documentation generator that
generates Markdown documentation from
.prql
files. (@vanillajonathan, #4152). - Join’s
side
parameter can take a reference that resolves to a literal (note: this is an experimental feature which may change in the future) (@kgutwin, #4499)
Fixes:
- Support expressions on left hand side of
std.in
operator. (@kgutwin, #4498) - Prevent panic for
from {}
andstd
(@m-span, #4538)
Web:
- The
browser
dist files are now built withwasm-pack
’sweb
target. As a result, they should be usable as ES Modules, through JS CDNs, and for example with Observable Framework (@srenatus, #4274).
Integrations:
- The syntax highlighter package for Sublime Text is now published (@vanillajonathan).
- The
VSCode Great Icons
icon pack extension shows a database icon for
.prql
files. (@EmmanuelBeziat) - Tokei, a source lines of code counter
now has support for
.prql
files. (@vanillajonathan) - Add syntax highlight file for the micro text editor. (@vanillajonathan)
New Contributors:
- @srenatus, with #4274
- @jacquayj, with #4332
- @pdelewski, with #4337
- @m-span, with #4422
- @kgutwin, with #4498
0.11.4 — 2024-02-25
0.11.4 is a hotfix release, fixing a CI issue that caused the CLI binaries to be
built without the cli
feature.
0.11.3 — 2024-02-10
0.11.3 is a very small release, mostly a rename of the Python bindings.
The release has 13 commits from 4 contributors.
Internal changes:
- As part of making our names more consistent, the Python bindings are renamed.
prql-python
becomes a package published and importable asprqlc
. The internal Rust crate is namedprqlc-python
.
0.11.2 — 2024-02-07
0.11.2 contains lots of internal changes, lots of syntax highlighting, and the
beginning of lutra
, a query runner.
This release has 122 commits from 9 contributors. Selected changes:
Features:
- Initial implementation of
lutra
, a query runner. (@aljazerzen, #4182, #4174, #4134) prqlc fmt
works on projects with multiple files. (@max-sixty, #4028)
Fixes:
- Reduce stack memory usage (@aljazerzen, #4103)
Integrations:
- Add syntax highlight file for GtkSourceView. (@vanillajonathan, #4062)
- Add syntax highlight file for CotEditor. (@vanillajonathan)
- Add syntax highlight file for Sublime Text. (@vanillajonathan, #4127)
- sloc, a source lines of code counter now has
support for
.prql
files. (@vanillajonathan)
Internal changes:
prql-compiler
has been renamed toprqlc
, and we’ve established a more consistent naming scheme. The existing crate will still be published, re-exportingprqlc
, so no dependencies will break. A future version will add a deprecation warning.- The
prqlc-clib
crate was renamed toprqlc-c
, and associated artifacts were renamed. We’re trying to make names consistent (ideally for the final time!), and have a plan to rename some other bindings. (@max-sixty, #4077) - Add lots of whitespace items to the lexer, in preparation for the completion
of
prqlc fmt
(@max-sixty, #4109, #4105) - Table declarations (@aljazerzen, #4126)
New Contributors:
- @kaspermarstal, with #4124
0.11.1 — 2023-12-26
0.11.1 fixes a couple of small bugs; it comes a few days after 0.11.
This release has 16 commits from 6 contributors. Selected changes:
Features:
- Infer the type of array literals to be the union of types of its items. (@aljazerzen, #3989)
prql
module is added and theprql_version
function is renamed to theprql.version
function. The oldprql_version
function is deprecated and will be removed in the future release. (@eitsupi, #4006)
Fixes:
- Do not compile to
DISTINCT ON
whentake n
is used withgroup
for the targetsclickhouse
,duckdb
andpostgres
. (@PrettyWood, #3988) - Fix
take
n rows formssql
dialect by switching from TOP to FETCH (@PrettyWood, #3994)
0.11.0 — 2023-12-19
0.11.0 introduces new date
, text
& math
modules with lots of standard
functions, including a new date.to_text
function. It contains a few bugs
fixes, and lots of internal improvements to the compiler.
This release has 119 commits from 9 contributors. Selected changes:
Language:
- Breaking:
group
’sby
columns are now excluded from the partition. (#3490) - Breaking:
round
is now in themath
module and needs to be called viamath.round
. (#3928) - Breaking:
lower
andupper
are now in thetext
module and need to be called viatext.lower
andtext.upper
. (#3913, #3973)
Features:
- The
std.in
function now supports a list of values (@PrettyWood, #3883) - Most standard mathematical functions are now supported:
abs
,floor
,ceil
,pi
,exp
,ln
,log10
,log
,sqrt
,degrees
,radians
,cos
,acos
,sin
,asin
,tan
,atan
,pow
andround
.
Those functions are in themath
module (@PrettyWood, #3909, #3916 & 3928) - Most standard string functions are now supported:
ltrim
,rtrim
,trim
,length
,extract
,replace
. Utility functionsstarts_with
,contains
andends_with
are also available.
Those functions are in thetext
module (@PrettyWood, #3913, #3973) - Formatting a date to a text is now available for Clickhouse, DuckDB, MySQL,
MSSQL and Postgres. A new
date
module has been added with theto_text
function (@PrettyWood, #3951, #3954 & #3955)
Fixes:
- Fix an issue with arithmetic precedence (@max-sixty, #3846)
+
and-
can be used after a cast (@PrettyWood, #3923)- The Lezer grammar had plenty of improvements and fixes. (@vanillajonathan)
Web:
- The Playground now uses Vite. (@vanillajonathan)
Internal changes:
- Bump
prql-compiler
’s MSRV to 1.70.0 (@eitsupi, #3876)
New Contributors:
- @PrettyWood, with #3883
0.10.1 — 2023-11-14
0.10.1 is a small release containing some internal fixes of the compiler.
This release has 36 commits from 7 contributors. Selected changes:
Features:
- The
std.sql.read_csv
function and thestd.sql.read_parquet
function supports thesql.glaredb
target. (@eitsupi, #3749)
Fixes:
- Fix the bug of compiling to
DISTINCT ON
whentake 1
is used withgroup by
for the targetssql.clickhouse
,sql.duckdb
andsql.postgres
. (@aljazerzen, #3792)
Integrations:
- Enable integration tests for GlareDB. (@eitsupi, #3749)
- trapd00r/LS_COLORS, a collection of
LS_COLORS definitions colorizes
.prql
files. (@vanillajonathan) - vivid, a themeable LS_COLORS generator
colorizes
.prql
files. (@vanillajonathan) - colorls, displays
.prql
files with a database icon. (@vanillajonathan) - Emoji File Icons,
a VS Code extension displays
.prql
files with a database emoji icon. (@vanillajonathan) - eza, a modern ls replacement colorizes
.prql
files. (@vanillajonathan) - lsd, next gen ls command displays
.prql
files with a database icon. (@vanillajonathan)
0.10.0 — 2023-10-26
0.10.0 contains lots of small improvements, including support for new types of
literal notation, support for read_*
functions in more dialects, playground
improvements, and a better Lezer grammar (which we’re planning on using for a
Jupyter extension).
This release has 155 commits from 9 contributors. Selected changes:
Language:
- Breaking: Case syntax now uses brackets
[]
rather than braces{}
. To convert previous PRQL queries to this new syntax simply changecase { ... }
tocase [ ... ]
. (@AaronMoat, #3517)
Features:
- Breaking: The
std.sql.read_csv
function is now compiled toread_csv
by default. Please set the targetsql.duckdb
to use the DuckDB’sread_csv_auto
function as previously. (@eitsupi, #3599) - Breaking: The
std.every
function is renamed tostd.all
(@aljazerzen, #3703) - The
std.sql.read_csv
function and thestd.sql.read_parquet
function supports thesql.clickhouse
target. (@eitsupi, #1533) - Add
std.prql_version
function to return PRQL version (@hulxv, #3533) - A new type
anytype
is added. (@aljazerzen, #3703) - Add support for hex escape sequences in strings. Example
"Hello \x51"
. (@vanillajonathan, #3568) - Add support for long Unicode escape sequences. Example
"Hello \u{01F422}"
. (@vanillajonathan, #3569) - Add support for binary numerical notation. Example
filter status == 0b1111000011110000
. (@vanillajonathan, #3661) - Add support for hexadecimal numerical notation. Example
filter status == 0xff
. (@vanillajonathan, #3654) - Add support for octal numerical notation. Example
filter status == 0o777
. (@vanillajonathan, #3672) - New compile target
sql.glaredb
for GlareDB and integration tests for it (However, there is a bug in the test and it is currently not running). (@universalmind303, @scsmithr, @eitsupi, #3669)
Web:
-
Allow cmd-/ (Mac) or ctrl-/ (Windows) to toggle comments in the playground editor (@AaronMoat, #3522)
-
Limit maximum height of the playground editor’s error panel to avoid taking over whole screen (@AaronMoat, #3524)
-
The playground now uses Vite (@vanillajonathan).
Integrations:
- Add a CLI command
prqlc collect
to collect a project’s modules into a single file (@aljazerzen, #3739) - Add a CLI command
prqlc debug expand-pl
to parse & and expand into PL without resolving (@aljazerzen, #3739) - Bump
prqlc
’s MSRV to 1.70.0 (@eitsupi, #3521) - Pygments, a syntax highlighting library now has syntax highlighting for PRQL. (@vanillajonathan, #3564)
- chroma, a syntax highlighting library written in Go and used by the static website generator Hugo. (@vanillajonathan, #3597)
- scc, a source lines of code counter now has
support for
.prql
files. (@vanillajonathan) - gcloc a source lines of code
counter now has support for
.prql
files. (@vanillajonathan) - cloc a source lines of code counter now
has support for
.prql
files. (@AlDanial) - gocloc a source lines of code counter now
has support for
.prql
files. (@vanillajonathan) - The Quarto VS Code extension
supports editing PRQL code blocks
(
prqlr
is required to render Quarto Markdown with PRQL code blocks). (@jjallaire)
Internal:
- Rename some of the internal crates, and refactored their paths in the repo. (@aljazerzen, #3683).
- Add a
justfile
for developers who prefer that above ourTaskfile.yaml
(@aljazerzen, #3681)
New Contributors:
- @hulxv, with #3533
- @AaronMoat, with #3522
- @jangorecki, with #3634
0.9.5 — 2023-09-16
0.9.5 adds a line-wrapping character, fixes a few bugs, and improves our CI. The release has 77 commits from 8 contributors. Selected changes are below.
Look out for some conference talks coming up over the next few weeks, including QCon SF on Oct 2 and date2day on Oct 12.
Language:
-
A new line-wrapping character, for lines that are long and we want to break up into multiple physical lines. This is slightly different from from many languages — it’s on the subsequent line:
from artists select is_europe = \ country == "DE" \ || country == "FR" \ || country == "ES"
This allows for easily commenting out physical lines while maintaining a correct logical line; for example:
from artists select is_europe = \ country == "DE" \ || country == "FR" \ || country == "FR" -\ || country == "ES" +#\ || country == "ES"
(@max-sixty, #3408)
Fixes:
-
Fix stack overflow on very long queries in Windows debug builds (@max-sixty, #2908)
-
Fix panic when unresolved lineage appears in group or window (@davidot, #3266)
-
Fix a corner-case in handling precedence, and remove unneeded parentheses in some outputs (@max-sixty, #3472)
Web:
- Compiler panics are now printed to the console (@max-sixty, #3446)
Integrations:
- Ace, the JavaScript code editor now has syntax highlighting for PRQL. (@vanillajonathan, #3493)
Internal changes:
- Simplify & speed up lexer (@max-sixty, #3426, #3418)
New Contributors:
- @davidot, with #3450
0.9.4 — 2023-08-24
0.9.4 is a small release with some improvements and bug fixes in the compiler
and prqlc
. And, the documentation and CI are continually being improved.
This release has 110 commits from 9 contributors. Selected changes:
Features:
- Strings can be delimited with any odd number of quote characters. The logic for lexing quotes is now simpler and slightly faster. Escapes in single-quote-delimited strings escape single-quotes rather than double-quotes. (@max-sixty, #3274)
Fixes:
- S-strings within double braces now parse correctly (@max-sixty, #3265)
Documentation:
- New docs for strings (@max-sixty, #3281)
Web:
- Improve syntax highlighting for numbers in the book & website (@max-sixty, #3261)
- Add ClickHouse integration to docs (@max-sixty, #3251)
Integrations:
prqlc
no longer displays a prompt when piping a query into its stdin (@max-sixty, #3248).- Add a minimal example for use
prql-lib
with Zig (@vanillajonathan, #3372)
Internal changes:
-
Overhaul our CI to run a cohesive set of tests depending on the specific changes in the PR, and elide all others. This cuts CI latency to less than three minutes for most changes, and enables GitHub’s auto-merge to wait for all relevant tests. It also reduces the CI time on merging to main, by moving some tests to only run on specific path changes or on our nightly run.
We now have one label we can add to PRs to run more tests —
pr-nightly
. (@max-sixty, #3317 & others). -
Auto-merge PRs for backports or pre-commit updates (@max-sixty, #3246)
-
Add a workflow to create an issue when the scheduled nightly workflow fails (@max-sixty, #3304)
New Contributors:
- @FinnRG, with #3292
- @sitiom, with #3353
0.9.3 — 2023-08-02
0.9.3 is a small release, with mostly documentation, internal, and CI changes.
This release has 85 commits from 10 contributors.
We’d like to welcome @not-my-profile as someone who has helped with lots of internal refactoring in the past couple of weeks.
New Contributors:
- @vthriller, with #3171
- @postmeback, with #3216
0.9.2 — 2023-07-25
0.9.2 is a hotfix release to fix an issue in the 0.9.0 & 0.9.1 release pipelines.
0.9.1 — 2023-07-25
0.9.1 is a hotfix release to fix an issue in the 0.9.0 release pipeline.
0.9.0 — 2023-07-24
0.9.0 is probably PRQL’s biggest ever release. We have dialect-specific standard-libraries, a regex operator, an initial implementation of multiple-file projects & modules, lots of bug fixes, and many many internal changes.
We’ve made a few backward incompatible syntax changes. Most queries will work with a simple find/replace; see below for details.
The release has 421 commits from 12 contributors.
A small selection of the changes:
Language:
-
The major breaking change is a new syntax for lists, which have been renamed to tuples, and are now represented with braces
{}
rather than brackets[]
.To convert previous PRQL queries to this new syntax simply change
[ ... ]
to{ ... }
.We made the syntax change to incorporate arrays. Almost every major language uses
[]
for arrays. We are adopting that convention — arrays use[]
, tuples will use{}
. (Though we recognize that{}
for tuples is also rare (Hi, Erlang!), but didn’t want to further load parentheses with meaning.)Arrays are conceptually similar to columns — their elements have a single type. Array syntax can’t contain assignments.
As part of this, we’ve also formalized tuples as containing both individual items (
select {foo, baz}
), and assignments (select {foo=bar, baz=fuz}
). -
Some significant changes regarding SQL dialects:
- Operators and functions can be defined on per-dialect basis. (@aljazerzen, #2681)
- Breaking: The
sql.duckdb
target supports DuckDB 0.8 (@eitsupi, #2810). - Breaking: The
sql.hive
target is removed (@eitsupi, #2837).
-
New arithmetic operators. These compile to different function or operator depending on the target.
-
Breaking: Operator
/
now always performs floating division (@aljazerzen, #2684). See the Division docs for details. -
Truncated integer division operator
//
(@aljazerzen, #2684). See the Division docs for details. -
Regex search operator
~=
(@max-sixty, #2458). An example:from tracks filter (name ~= "Love")
…compiles to;
SELECT * FROM tracks WHERE REGEXP(name, 'Love')
…though the exact form differs by dialect; see the Regex docs for more details.
-
-
New aggregation functions:
every
,any
,average
, andconcat_array
. Breaking: Removeavg
in favor ofaverage
. -
Breaking: We’ve changed our function declaration syntax to match other declarations. Functions were one of the first language constructs in PRQL, and since then we’ve added normal declarations there’s no compelling reason for functions to be different.
let add = a b -> a + b
Previously, this was:
func add a b -> a + b
-
Experimental modules, which allow importing declarations from other files. Docs are forthcoming.
-
Relation literals create a relation (a “table”) as an array of tuples. This example demonstrates the new syntax for arrays
[]
and tuples{}
. (@aljazerzen, #2605)from [{a=5, b=false}, {a=6, b=true}] filter b == true select a
-
this
can be used to refer to the current pipeline, for situations where plain column name would be ambiguous:from x derive sum = my_column select this.sum # does not conflict with `std.sum`
Within a
join
transform, there is also a reference to the right relation:that
. -
Breaking: functions
count
,rank
androw_number
now require an argument of the array to operate on. In most cases you can directly replacecount
withcount this
. Thenon_null
argument ofcount
has been removed.
Features:
-
We’ve changed how we handle colors.
Options::color
is deprecated and has no effect. Code which consumesprql_compiler::compile
should instead accept the output with colors and use a library such asanstream
to handle the presentation of colors. To ensure minimal disruption,prql_compiler
will currently strip color codes when a standard environment variable such asCLI_COLOR=0
is set or when it detectsstderr
is not a TTY.We now use the
anstream
library inprqlc
&prql-compiler
.(@max-sixty, #2773)
-
prqlc
can now show backtraces when the standard backtrace env var (RUST_BACKTRACE
) is active. (@max-sixty, #2751)
Fixes:
- Numbers expressed with scientific notation —
1e9
— are now handled correctly by the compiler (@max-sixty, #2865).
Integrations:
- prql-python now provides type hints (@philpep, #2912)
Internal changes:
-
Annotations in PRQL. These have limited support but are currently used to specify binding strengths. They’re modeled after Rust’s annotations, but with
@
syntax, more similar to traditional decorators. (#2729)@{binding_strength=11} let mod = l r -> s"{l} % {r}"
-
Remove BigQuery’s special handling of quoted identifiers, now that our module system handles its semantics (@max-sixty, #2609).
-
ClickHouse is tested in CI (@eitsupi, #2815).
New Contributors:
- @maxmcd, with #2533
- @khoa165, with #2876
- @philpep, with #2912
- @not-my-profile, with #2971
0.8.1 — 2023-04-29
0.8.1 is a small release with a new list-targets
command in prqlc
, some
documentation improvements, and some internal improvements.
This release has 41 commits from 8 contributors.
From the broader perspective of the project, we’re increasing the relative prioritization of it being easy for folks to actually use PRQL — either with existing tools, or a tool we’d build. We’ll be thinking about & discussing the best way to do that over the next few weeks.
0.8.0 — 2023-04-14
0.8.0 renames the and
& or
operators to &&
& ||
respectively,
reorganizes the Syntax section in the book, and introduces read_parquet
&
read_csv
functions for reading files with DuckDB.
This release has 38 commits from 8 contributors. Selected changes:
Features:
-
Rename
and
to&&
andor
to||
. Operators which are symbols are now consistently infix, while “words” are now consistently functions (@aljazerzen, #2422). -
New functions
read_parquet
andread_csv
, which mirror the DuckDB functions, instructing the database to read from files (@max-sixty, #2409).
0.7.1 — 2023-04-03
0.7.1 is a hotfix release to fix prql-js
’s npm install
behavior when being
installed as a dependency.
This release has 17 commits from 4 contributors.
0.7.0 — 2023-04-01
0.7.0 is a fairly small release in terms of new features, with lots of internal improvements, such as integration tests with a whole range of DBs, a blog post on Pi day, RFCs for a type system, and more robust language bindings.
There’s a very small breaking change to the Rust API, hence the minor version bump.
Here’s our April 2023 Update, from our Readme:
April 2023 update
PRQL is being actively developed by a growing community. It’s ready to use by the intrepid, either as part of one of our supported extensions, or within your own tools, using one of our supported language bindings.
PRQL still has some minor bugs and some missing features, and probably is only ready to be rolled out to non-technical teams for fairly simple queries.
Here’s our current Roadmap and our Milestones.
Our immediate focus for the code is on:
- Building out the next few big features, including types and modules.
- Ensuring our supported features feel extremely robust; resolving any priority bugs.
We’re also spending time thinking about:
- Making it really easy to start using PRQL. We’re doing that by building integrations with tools that folks already use; for example our VS Code extension & Jupyter integration. If there are tools you’re familiar with that you think would be open to integrating with PRQL, please let us know in an issue.
- Making it easier to contribute to the compiler. We have a wide group of contributors to the project, but contributions to the compiler itself are quite concentrated. We’re keen to expand this; #1840 for feedback.
The release has 131 commits from 10 contributors. Particular credit goes to to @eitsupi & @jelenkee, who have made significant contributions, and @vanillajonathan, whose prolific contribution include our growing language bindings.
A small selection of the changes:
Features:
prqlc compile
adds--color
&--include-signature-comment
options. (@max-sixty, #2267)
Web:
- Added the PRQL snippets from the book to the Playground (@jelenkee, #2197)
Internal changes:
- Breaking: The
compile
function’sOptions
now includes acolor
member, which determines whether error messages use ANSI color codes. This is technically a breaking change to the API. (@max-sixty, #2251) - The
Error
struct now exposes theMessageKind
enum. (@vanillajonathan, #2307) - Integration tests run in CI with DuckDB, SQLite, PostgreSQL, MySQL and SQL Server (@jelenkee, #2286)
New Contributors:
- @k-nut, with #2294
0.6.1 — 2023-03-12
0.6.1 is a small release containing an internal refactoring and improved bindings for C, PHP & .NET.
This release has 54 commits from 6 contributors. Selected changes:
Fixes:
- No longer incorrectly compile to
DISTINCT
when atake 1
refers to a different set of columns than are in thegroup
. (@max-sixty, with thanks to @cottrell, #2109) - The version specification of the dependency Chumsky was bumped from
0.9.0
to0.9.2
.0.9.0
has a bug that causes an infinite loop. (@eitsupi, #2110)
Documentation:
- Add a policy for which bindings are supported / unsupported / nascent. See https://prql-lang.org/book/project/bindings/index.html for more details (@max-sixty, #2062) (@max-sixty, #2062)
Integrations:
- [prql-lib] Added C++ header file. (@vanillajonathan, #2126)
Internal changes:
- Many of the items that were in the root of the repo have been aggregated into
web
&bindings
, simplifying the repo’s structure. There’s alsogrammars
&packages
(@max-sixty, #2135, #2117, #2121).
0.6.0 — 2023-03-08
0.6.0 introduces a rewritten parser, giving us the ability to dramatically
improve error messages, renames switch
to case
and includes lots of minor
improvements and fixes. It also introduces loop
, which compiles to
WITH RECURSIVE
, as a highly experimental feature.
There are a few cases of breaking changes, including switching switch
to
case
, in case that’s confusing. There are also some minor parsing changes
outlined below.
This release has 108 commits from 11 contributors. Selected changes:
Features:
-
Add a (highly experimental)
loop
language feature, which translates toWITH RECURSIVE
. We expect changes and refinements in upcoming releases. (#1642, @aljazerzen) -
Rename the experimental
switch
function tocase
given it more closely matches the traditional semantics ofcase
. (@max-sixty, #2036) -
Change the
case
syntax to use=>
instead of->
to distinguish it from function syntax. -
Convert parser from pest to Chumsky (@aljazerzen, #1818)
- Improved error messages, and the potential to make even better in the future. Many of these improvements come from error recovery.
- String escapes (
\n \t
). - Raw strings that don’t escape backslashes.
- String interpolations can only contain identifiers and not any expression.
- Operator associativity has been changed from right-to-left to left-to-right to be more similar to other conventional languages.
and
now has a higher precedence thanor
(of same reason as the previous point).- Dates, times and timestamps have stricter parsing rules.
let
,func
,prql
,case
are now treated as keywords.- Float literals without fraction part are not allowed anymore (
1.
).
-
Add a
--format
option toprqlc parse
which can return the AST in YAML (@max-sixty, #1962) -
Add a new subcommand
prqlc jinja
. (@aljazerzen, #1722) -
Breaking: prql-compiler no longer passes text containing
{{
&}}
through to the output. (@aljazerzen, #1722)For example, the following PRQL query
from {{foo}}
was compiled to the following SQL previously, but now it raises an error.
SELECT * FROM {{ foo }}
This pass-through feature existed for integration with dbt.
We’re again considering how to best integrate with dbt, and this change is based on the idea that the jinja macro should run before the PRQL compiler.
If you’re interested in dbt integration, subscribe or 👍 to https://github.com/dbt-labs/dbt-core/pull/5982.
-
A new compile target
"sql.any"
. When"sql.any"
is used as the target of the compile function’s option, the target contained in the query header will be used. (@aljazerzen, #1995) -
Support for SQL parameters with similar syntax (#1957, @aljazerzen)
-
Allow
:
to be elided in timezones, such as0800
in@2020-01-01T13:19:55-0800
(@max-sixty, #1991). -
Add
std.upper
andstd.lower
functions for changing string casing (@Jelenkee, #2019).
Fixes:
prqlc compile
returns a non-zero exit code for invalid queries. (@max-sixty, #1924)- Identifiers can contain any alphabetic unicode characters (@max-sixty, #2003)
Documentation:
- Operator precedence (@aljazerzen, #1818)
- Error messages for invalid queries are displayed in the book (@max-sixty, #2015)
Integrations:
- [prql-php] Added PHP bindings. (@vanillajonathan, #1860)
- [prql-dotnet] Added .NET bindings. (@vanillajonathan, #1917)
- [prql-lib] Added C header file. (@vanillajonathan, #1879)
- Added a workflow building a
.deb
on each release. (Note that it’s not yet published on each release). (@vanillajonathan, #1883) - Added a workflow building a
.rpm
on each release. (Note that it’s not yet published on each release). (@vanillajonathan, #1918) - Added a workflow building a Snap package on each release. (@vanillajonathan, #1881)
Internal changes:
- Test that the output of our nascent autoformatter can be successfully compiled into SQL. Failing examples are now clearly labeled. (@max-sixty, #2016)
- Definition files have been added to configure Dev Containers for Rust development environment. (@eitsupi, #1893, #2025, #2028)
New Contributors:
- @linux-china, with #1971
- @Jelenkee, with #2019
0.5.2 — 2023-02-18
0.5.2 is a tiny release to fix an build issue in yesterday’s prql-js
0.5.1
release.
This release has 7 commits from 2 contributors.
New Contributors:
- @matthias-Q, with #1873
0.5.1 — 2023-02-17
0.5.1 contains a few fixes, and another change to how bindings handle default target / dialects.
This release has 53 commits from 7 contributors. Selected changes:
Fixes:
- Delegate dividing literal integers to the DB. Previously integer division was executed during PRQL compilation, which could be confusing given that behavior is different across DBs. Other arithmetic operations are still executed during compilation. (@max-sixty, #1747)
Documentation:
- Add docs on the
from_text
transform (@max-sixty, #1756)
Integrations:
- [prql-js] Default compile target changed from
Sql(Generic)
toSql(None)
. (@eitsupi, #1856) - [prql-python] Compilation options can now be specified from Python. (@eitsupi, #1807)
- [prql-python] Default compile target changed from
Sql(Generic)
toSql(None)
. (@eitsupi, #1861)
New Contributors:
- @vanillajonathan, with #1766
0.5.0 — 2023-02-08
0.5.0 contains a few fixes, some improvements to bindings, lots of docs
improvements, and some work on forthcoming features. It contains one breaking
change in the compiler’s Options
interface.
This release has 74 commits from 12 contributors. Selected changes:
Features:
-
Change public API to use target instead of dialect in preparation for feature work (@aljazerzen, #1684)
-
prqlc watch
command which watches filesystem for changes and compiles .prql files to .sql (@aljazerzen, #1708)
Fixes:
- Support double brackets in s-strings which aren’t symmetric (@max-sixty, #1650)
- Support Postgres’s Interval syntax (@max-sixty, #1649)
- Fixed tests for
prql-elixir
with macOS (@kasvith, #1707)
Documentation:
- Add a documentation test for prql-compiler, update prql-compiler README, and include the README in the prql book section for Rust bindings. The code examples in the README are included and tested as doctests in the prql-compiler (@nkicg6, #1679)
Internal changes:
- Add tests for all PRQL website examples to prql-python to ensure compiled results match expected SQL (@nkicg6, #1719)
New Contributors:
- @ruslandoga, with #1628
- @RalfNorthman, with #1632
- @nicot, with #1662
0.4.2 — 2023-01-25
Features:
-
New
from_text format-arg string-arg
function that supports JSON and CSV formats. format-arg can beformat:csv
orformat:json
. string-arg can be a string in any format. (@aljazerzen & @snth, #1514)from_text format:csv """ a,b,c 1,2,3 4,5,6 """
from_text format:json ''' [{"a": 1, "b": "x", "c": false }, {"a": 4, "b": "y", "c": null }] '''
from_text format:json '''{ "columns": ["a", "b", "c"], "data": [ [1, "x", false], [4, "y", null] ] }'''
For now, the argument is limited to string constants.
Fixes
- Export constructor for SQLCompileOptions (@bcho, #1621)
- Remove backticks in count_distinct (@aljazerzen, #1611)
New Contributors
- @1Kinoti, with #1596
- @veenaamb, with #1614
0.4.1 — 2023-01-18
0.4.1 comes a few days after 0.4.0, with a couple of features and the release of
prqlc
, the CLI crate.
0.4.1 has 35 commits from 6 contributors.
Features:
-
Inferred column names include the relation name (@aljazerzen, #1550):
from albums select title # name used to be inferred as title only select albums.title # so using albums was not possible here
-
Quoted identifiers such as
dir/*.parquet
are passed through to SQL. (@max-sixty, #1516). -
The CLI is installed with
cargo install prqlc
. The binary was renamed in 0.4.0 but required an additional--features
flag, which has been removed in favor of this new crate (@max-sixty & @aljazerzen, #1549).
New Contributors:
- @fool1280, with #1554
- @nkicg6, with #1567
0.4.0 — 2023-01-15
0.4.0 brings lots of new features including case
, select ![]
and numbers
with underscores. We have initial (unpublished) bindings to Elixir. And there’s
the usual improvements to fixes & documentation (only a minority are listed
below in this release).
0.4.0 also has some breaking changes: table
is let
, dialect
is renamed to
target
, and the compiler’s API has changed. Full details below.
Features:
-
Defining a temporary table is now expressed as
let
rather thantable
(@aljazerzen, #1315). See the tables docs for details. -
Experimental: The
case
function sets a variable to a value based on one of several expressions (@aljazerzen, #1278).derive var = case [ score <= 10 -> "low", score <= 30 -> "medium", score <= 70 -> "high", true -> "very high", ]
…compiles to:
SELECT *, CASE WHEN score <= 10 THEN 'low' WHEN score <= 30 THEN 'medium' WHEN score <= 70 THEN 'high' ELSE 'very high' END AS var FROM bar
Check out the
case
docs for more details. -
Experimental: Columns can be excluded by name with
select
(@aljazerzen, #1329)from albums select ![title, composer]
-
Experimental:
append
transform, equivalent toUNION ALL
in SQL. (@aljazerzen, #894)from employees append managers
Check out the
append
docs for more details. -
Numbers can contain underscores, which can make reading long numbers easier (@max-sixty, #1467):
from numbers select { small = 1.000_000_1, big = 5_000_000, }
-
The SQL output contains a comment with the PRQL compiler version (@aljazerzen, #1322)
-
dialect
is renamed totarget
, and its values are prefixed withsql.
(@max-sixty, #1388); for example:prql target:sql.bigquery # previously was `dialect:bigquery` from employees
This gives us the flexibility to target other languages than SQL in the long term.
-
Tables definitions can contain a bare s-string (@max-sixty, #1422), which enables us to include a full CTE of SQL, for example:
let grouping = s""" SELECT SUM(a) FROM tbl GROUP BY GROUPING SETS ((b, c, d), (d), (b, d)) """
-
Ranges supplied to
in
can be half-open (@aljazerzen, #1330). -
The crate’s external API has changed to allow for compiling to intermediate representation. This also affects bindings. See
prql-compiler
docs for more details.
Fixes:
[This release, the changelog only contains a subset of fixes]
- Allow interpolations in table s-strings (@aljazerzen, #1337)
Documentation:
[This release, the changelog only contains a subset of documentation improvements]
- Add docs on aliases in Select
- Add JS template literal and multiline example (@BCsabaEngine, #1432)
- JS template literal and multiline example (@BCsabaEngine, #1432)
- Improve prql-compiler docs & examples (@aljazerzen, #1515)
- Fix string highlighting in book (@max-sixty, #1264)
Web:
- The playground allows querying some sample data. As before, the result updates on every keystroke. (@aljazerzen, #1305)
Integrations:
[This release, the changelog only contains a subset of integration improvements]
- Added Elixir integration exposing PRQL functions as NIFs (#1500, @kasvith)
- Exposed Elixir flavor with exceptions (#1513, @kasvith)
- Rename
prql-compiler
binary toprqlc
(@aljazerzen #1515)
Internal changes:
[This release, the changelog only contains a subset of internal changes]
- Add parsing for negative select (@max-sixty, #1317)
- Allow for additional builtin functions (@aljazerzen, #1325)
- Add an automated check for typos (@max-sixty, #1421)
- Add tasks for running playground & book (@max-sixty, #1265)
- Add tasks for running tests on every file change (@max-sixty, #1380)
New contributors:
- @EArazli, with #1359
- @boramalper, with #1362
- @allurefx, with #1377
- @bcho, with #1375
- @JettChenT, with #1385
- @BlurrechDev, with #1411
- @BCsabaEngine, with #1432
- @kasvith, with #1500
0.3.1 - 2022-12-03
0.3.1 brings a couple of small improvements and fixes.
Features:
-
Support for using s-strings for
from
(#1197, @aljazerzen)from s"SELECT * FROM employees WHERE foo > 5"
-
Helpful error message when referencing a table in an s-string (#1203, @aljazerzen)
Fixes:
- Multiple columns with same name created (#1211, @aljazerzen)
- Renaming via select breaks preceding sorting (#1204, @aljazerzen)
- Same column gets selected multiple times (#1186, @mklopets)
Internal:
- Update Github Actions and Workflows to current version numbers (and avoid using Node 12)
0.3.0 — 2022-11-29
🎉 0.3.0 is the biggest ever change in PRQL’s compiler, rewriting much of the internals: the compiler now has a semantic understanding of expressions, including resolving names & building a DAG of column lineage 🎉.
While the immediate changes to the language are modest — some long-running bugs are fixed — this unlocks the development of many of the project’s long-term priorities, such as type-checking & auto-complete. And it simplifies the building of our next language features, such as match-case expressions, unions & table expressions.
@aljazerzen has (mostly single-handedly) done this work over the past few months. The project owes him immense appreciation.
Breaking changes:
We’ve had to make some modest breaking changes for 0.3:
-
Pipelines must start with
from
. For example, a pipeline with onlyderive foo = 5
, with nofrom
transform, is no longer valid. Depending on demand for this feature, it would be possible to add this back. -
Shared column names now require
==
in a join. The existing approach is ambiguous to the compiler —id
in the following example could be a boolean column.from employees -join positions [id] +join positions [==id]
-
Table references containing periods must be surrounded by backticks. For example, when referencing a schema name:
-from public.sometable +from `public.sometable`
Features:
- Change self equality op to
==
(#1176, @aljazerzen) - Add logging (@aljazerzen)
- Add clickhouse dialect (#1090, @max-sixty)
- Allow namespaces & tables to contain
.
(#1079, @aljazerzen)
Fixes:
- Deduplicate column appearing in
SELECT
multiple times (#1186, @aljazerzen) - Fix uppercase table names (#1184, @aljazerzen)
- Omit table name when only one ident in SELECT (#1094, @aljazerzen)
Documentation:
- Add chapter on semantics’ internals (@aljazerzen, #1028)
- Add note about nesting variables in s-strings (@max-sixty, #1163)
Internal changes:
- Flatten group and window (#1120, @aljazerzen)
- Split ast into expr and stmt (@aljazerzen)
- Refactor associativity (#1156, @aljazerzen)
- Rename Ident constructor to
from_name
(#1084, @aljazerzen) - Refactor rq folding (#1177, @aljazerzen)
- Add tests for reported bugs fixes in semantic (#1174, @aljazerzen)
- Bump duckdb from 0.5.0 to 0.6.0 (#1132)
- Bump once_cell from 1.15.0 to 1.16.0 (#1101)
- Bump pest from 2.4.0 to 2.5.0 (#1161)
- Bump pest_derive from 2.4.0 to 2.5.0 (#1179)
- Bump sqlparser from 0.25.0 to 0.27.0 (#1131)
- Bump trash from 2.1.5 to 3.0.0 (#1178)
0.2.11 — 2022-11-20
0.2.11 contains a few helpful fixes.
Work continues on our semantic
refactor — look out for 0.3.0 soon! Many thanks
to @aljazerzen for his continued contributions to this.
Note: 0.2.10 was skipped due to this maintainer’s inability to read his own docs on bumping versions…
Features:
- Detect when compiler version is behind query version (@MarinPostma, #1058)
- Add
__version__
to prql-python package (@max-sixty, #1034)
Fixes:
- Fix nesting of expressions with equal binding strength and left associativity,
such as
a - (b - c)
(@max-sixty, #1136) - Retain floats without significant digits as floats (@max-sixty, #1141)
Documentation:
- Add documentation of
prqlr
bindings (@eitsupi, #1091) - Add a ‘Why PRQL’ section to the website (@max-sixty, #1098)
- Add @snth to core-devs (@max-sixty, #1050)
Internal changes:
- Use workspace versioning (@max-sixty, #1065)
0.2.9 — 2022-10-14
0.2.9 is a small release containing a bug fix for empty strings.
Fixes:
- Fix parsing of empty strings (@aljazerzen, #1024)
0.2.8 — 2022-10-10
0.2.8 is another modest release with some fixes, doc improvements, bindings
improvements, and lots of internal changes. Note that one of the fixes causes
the behavior of round
and cast
to change slightly — though it’s handled as a
fix rather than a breaking change in semantic versioning.
Fixes:
-
Change order of the
round
&cast
function parameters to have the column last; for exampleround 2 foo_col
/cast int foo
. This is consistent with other functions, and makes piping possible:derive [ gross_salary = (salary + payroll_tax | as int), gross_salary_rounded = (gross_salary | round 0), ]
Documentation:
- Split
DEVELOPMENT.md
fromCONTRIBUTING.md
(@richb-hanover, #1010) - Make s-strings more prominent in website intro (@max-sixty, #982)
Web:
- Add GitHub star count to website (@max-sixty, #990)
Integrations:
- Expose a shortened error message, in particular for the VS Code extension (@aljazerzen, #1005)
Internal changes:
- Specify 1.60.0 as minimum Rust version (@max-sixty, #1011)
- Remove old
wee-alloc
code (@max-sixty, #1013) - Upgrade clap to version 4 (@aj-bagwell, #1004)
- Improve book-building script in Taskfile (@max-sixty, #989)
- Publish website using an artifact rather than a long-lived branch (@max-sixty, #1009)
0.2.7 — 2022-09-17
0.2.7 is a fairly modest release, six weeks after 0.2.6. We have some more
significant features, including a union
operator and an overhaul of our type
system, as open PRs which will follow in future releases.
We also have new features in the VS Code extension, courtesy of @jiripospisil, including a live output panel.
Fixes:
range_of_ranges
checks the Range end is smaller than its start (@shuozeli, #946)
Documentation:
- Improve various docs (@max-sixty, #974, #971, #972, #970, #925)
- Add reference to EdgeDB’s blog post in our FAQ (@max-sixty, #922)
- Fix typos (@kianmeng, #943)
Integrations:
- Add
prql-lib
, enabling language bindings withgo
(@sigxcpu76, #923) - Fix line numbers in JS exceptions (@charlie-sanders, #929)
Internal changes:
- Lock the version of the rust-toolchain, with auto-updates (@max-sixty, #926, #927)
0.2.6 — 2022-08-05
Fixes:
- Adjust
fmt
to only escape names when needed (@aljazerzen, #907) - Fix quoting on upper case
table
names (@max-sixty, #893) - Fix scoping of identical column names from multiple tables (@max-sixty, #908)
- Fix parse error on newlines in a
table
(@sebastiantoh 🆕, #902) - Fix quoting of upper case table names (@max-sixty, #893)
Documentation:
- Add docs on Architecture (@aljazerzen, #904)
- Add Changelog (@max-sixty, #890 #891)
Internal changes:
- Start trial using Conventional Commits (@max-sixty, #889)
- Add crates.io release workflow, docs (@max-sixty, #887)
0.2.5 - 2022-07-29
0.2.5 is a very small release following 0.2.4 yesterday. It includes:
- Add the ability to represent single brackets in an s-string, with two brackets (#752, @max-sixty)
- Fix the “Copy to Clipboard” command in the Playground, for Firefox (#880, @mklopets)
0.2.4 - 2022-07-28
0.2.4 is a small release following 0.2.3 a few days ago. The 0.2.4 release includes:
- Enrich our CLI, adding commands to get different stages of the compilation process (@aljazerzen , #863)
- Fix multiple
take n
statements in a query, leading to duplicate proxy columns in generated SQL (@charlie-sanders) - Fix BigQuery quoting of identifiers in
SELECT
statements (@max-sixty) - Some internal changes — reorganize top-level functions (@aljazerzen), add a workflow to track our Rust compilation time (@max-sixty), simplify our simple prql-to-sql tests (@max-sixty)
Thanks to @ankane, prql-compiler
is now available from homebrew core;
brew install prql-compiler
1.
we still need to update docs and add a release workflow for this:
<https://github.com/PRQL/prql/issues/866>
0.2.3 - 2022-07-24
A couple of weeks since the 0.2.2 release: we’ve squashed a few bugs, added some mid-sized features to the language, and made a bunch of internal improvements.
The 0.2.3 release includes:
- Allow for escaping otherwise-invalid identifiers (@aljazerzen & @max-sixty)
- Fix a bug around operator precedence (@max-sixty)
- Add a section the book on the language bindings (@charlie-sanders)
- Add tests for our
Display
representation while fixing some existing bugs. This is gradually becoming our code formatter (@arrizalamin) - Add a “copy to clipboard” button in the Playground (@mklopets)
- Add lots of guidance to our
CONTRIBUTING.md
around our tests and process for merging (@max-sixty) - Add a
prql!
macro for parsing a prql query at compile time (@aljazerzen) - Add tests for
prql-js
(@charlie-sanders) - Add a
from_json
method for transforming json to a PRQL string (@arrizalamin) - Add a workflow to release
prql-java
to Maven (@doki23) - Enable running all tests from a PR by adding a
pr-run-all-tests
label (@max-sixty) - Have
cargo-release
to bump all crate & npm versions (@max-sixty) - Update
prql-js
to use the bundler build ofprql-js
(@mklopets)
As well as those contribution changes, thanks to those who’ve reported issues, such as @mklopets @huw @mm444 @ajfriend.
From here, we’re planning to continue squashing bugs (albeit more minor than
those in this release), adding some features like union
, while working on
bigger issues such as type-inference.
We’re also going to document and modularize the compiler further. It’s important that we give more people an opportunity to contribute to the guts of PRQL, especially given the number and enthusiasm of contributions to project in general — and it’s not that easy to do so at the moment. While this is ongoing if anyone has something they’d like to work on in the more difficult parts of the compiler, let us know on GitHub or Discord, and we’d be happy to work together on it.
Thank you!
0.2.2 - 2022-07-10
We’re a couple of weeks since our 0.2.0 release. Thanks for the surge in interest and contributions! 0.2.2 has some fixes & some internal improvements:
- We now test against SQLite & DuckDB on every commit, to ensure we’re producing correct SQL. (@aljazerzen)
- We have the beginning of Java bindings! (@doki23)
- Idents surrounded by backticks are passed through to SQL (@max-sixty)
- More examples on homepage; e.g.
join
&window
, lots of small docs improvements - Automated releases to homebrew (@roG0d)
- prql-js is now a single package for Node, browsers & webpack (@charlie-sanders)
- Parsing has some fixes, including
>=
and leading underscores in idents (@mklopets) - Ranges receive correct syntax highlighting (@max-sixty)
Thanks to Aljaž Mur Eržen @aljazerzen , George Roldugin @roldugin , Jasper McCulloch @Jaspooky , Jie Han @doki23 , Marko Klopets @mklopets , Maximilian Roos @max-sixty , Rodrigo Garcia @roG0d , Ryan Russell @ryanrussell , Steven Maude @StevenMaude , Charlie Sanders @charlie-sanders .
We’re planning to continue collecting bugs & feature requests from users, as well as working on some of the bigger features, like type-inference.
For those interesting in joining, we also have a new Contributing page.
0.2.0 - 2022-06-27
🎉 🎉 After several months of building, PRQL is ready to use! 🎉 🎉
How we got here:
At the end of January, we published a proposal of a better language for data transformation: PRQL. The reception was better than I could have hoped for — we were no. 2 on HackerNews for a day, and gained 2.5K GitHub stars over the next few days.
But man cannot live on GitHub Stars alone — we had to do the work to build it. So over the next several months, during many evenings & weekends, a growing group of us gradually built the compiler, evolved the language, and wrote some integrations.
We want to double-down on the community and its roots in open source — it’s incredible that a few of us from all over the globe have collaborated on a project without ever having met. We decided early-on that PRQL would always be open-source and would never have a commercial product (despite lots of outside interest to fund a seed round!). Because languages are so deep in the stack, and the data stack has so many players, the best chance of building a great language is to build an open language.
We still have a long way to go. While PRQL is usable, it has lots of missing features, and an incredible amount of unfulfilled potential, including a language server, cohesion with databases, and type inference. Over the coming weeks, we’d like to grow the number of intrepid users experimenting PRQL in their projects, prioritize features that will unblock them, and then start fulfilling PRQL’s potential by working through our roadmap.
The best way to experience PRQL is to try it. Check out our website and the Playground. Start using PRQL for your own projects in dbt, Jupyter notebooks and Prefect workflows.
Keep in touch with PRQL by following the project on Twitter, joining us on Discord, starring the repo.
Contribute to the project — we’re a really friendly community, whether you’re a recent SQL user or an advanced Rust programmer. We need bug reports, documentation tweaks & feature requests — just as much as we need compiler improvements written in Rust.
I especially want to give Aljaž Mur Eržen (@aljazerzen) the credit he deserves, who has contributed the majority of the difficult work of building out the compiler. Much credit also goes to @charlie-sanders, one of PRQL’s earliest supporters and the author of pyprql, and Ryan Patterson-Cross (@rbpatt2019), who built the Jupyter integration among other Python contributions.
Other contributors who deserve a special mention include: @roG0d, @snth, @kwigley
Thank you, and we look forward to your feedback!
Target & Version
Target dialect
PRQL allows specifying a target dialect at the top of the query, which allows PRQL to compile to a database-specific SQL flavor.
Examples
PRQL
prql target:sql.postgres
from employees
sort age
take 10
SQL
SELECT
*
FROM
employees
ORDER BY
age
LIMIT
10
PRQL
prql target:sql.mssql
from employees
sort age
take 10
SQL
SELECT
*
FROM
employees
ORDER BY
age OFFSET 0 ROWS
FETCH FIRST
10 ROWS ONLY
Dialects
Supported
Supported dialects support all PRQL language features where possible, are tested on every commit, and we’ll endeavor to fix bugs.
sql.clickhouse
sql.duckdb
sql.generic
1sql.glaredb
sql.mysql
sql.postgres
sql.sqlite
Unsupported
Unsupported dialects have implementations in the compiler, but are tested minimally or not at all, and may have gaps for some features.
We’re open to contributions to improve our coverage of these, and to adding additional dialects.
sql.mssql
sql.ansi
sql.bigquery
sql.snowflake
Priority of targets
The compile target of a query is defined in the query’s header or as an argument to the compiler. option. The argument to the compiler takes precedence.
For example, the following shell example specifies sql.generic
in the query
and sql.duckdb
in the --target
option of the prqlc compile
command. In
this case, sql.duckdb
takes precedence and the SQL output is based on the
DuckDB dialect.
echo 'prql target:sql.generic
from foo' | prqlc compile --target sql.duckdb
To use the target described in the query, a special target sql.any
can be
specified in the compiler option.
echo 'prql target:sql.generic
from foo' | prqlc compile --target sql.any
Version
PRQL allows specifying a version of the language in the PRQL header, like:
This has two roles, one of which is implemented:
- The compiler will raise an error if the compiler is older than the query version. This prevents confusing errors when queries use newer features of the language but the compiler hasn’t yet been upgraded.
- The compiler will compile for the major version of the query. This allows the language to evolve without breaking existing queries, or forcing multiple installations of the compiler. This isn’t yet implemented, but is a gating feature for PRQL 1.0.
The version of the compiler currently in use can be called using the special
function std.prql.version
in PRQL.
PRQL
[{version = prql.version}]
SQL
WITH table_0 AS (
SELECT
'0.13.0' AS version
)
SELECT
version
FROM
table_0
This function was renamed from std.prql_version
to prql.version
in PRQL 0.11.1.
std.prql_version
will be removed in PRQL 0.12.0.
1: while there’s no “generic” DB to test sql.generic
against, we still count it as supported.
Bindings
PRQL has bindings for many languages. These include:
We have three tiers of bindings:
- Supported
- Unsupported
- Nascent
Supported
Supported bindings require:
- A maintainer.
- Implementations of the core compile functions.
- Test coverage for these functions.
- A published package to the language’s standard package repository.
- A script in
Taskfile.yaml
to bootstrap a development environment. - Any dev tools, such as a linter & formatter, in pre-commit or MegaLinter.
The currently supported bindings are:
Most of these are in the main PRQL repo, and we gate any changes to the compiler’s API on compatible changes to the bindings.
Unsupported
Unsupported bindings work, but don’t fulfil all of the above criteria. We don’t gate changes to the compiler’s API. If they stop working, we’ll demote them to nascent.
Nascent
Nascent bindings are in development, and may not yet fully work.
Naming
Over time, we’re trying to move to a consistent naming scheme:
- Crates are named
prqlc-$lang
. - Where possible, packages are published to each language’s package repository
as
prqlc
.
prql-dotnet
prql-net
offers PRQL bindings for .NET bindings as a netstandard2.0
library.
It provides the PrqlCompiler
class which contains the ToJson
and ToSql
static methods.
It’s still at an early stage, and isn’t published to NuGet. Contributions are welcome.
Installation
Make sure that libprqlc_c.so
(Linux), libprqlc_c.dylib
(macOS) or
libprqlc_c.dll
(Windows) is in the project’s bin
directory together with
PrqlCompiler.dll
and the rest of the project’s compiled files. I.e.
{your_project}/bin/Debug/net7.0/
.
The libprqlc_c
library gets dynamically imported at runtime.
Usage
using Prql.Compiler;
var options = new PrqlCompilerOptions
{
Format = false,
SignatureComment = false,
};
var sql = PrqlCompiler.Compile("from employees", options);
Console.WriteLine(sql);
TODO
This is currently at 0.1.0 because we’re waiting to update prqlc-c for the latest API. When we’ve done that, we can match the version here with the broader PRQL version.
PRQL
PRQL bindings for Elixir.
Installation
def deps do
[
{:prql, "~> 0.1.0"}
]
end
Basic Usage
iex> PRQL.compile("from customers")
{:ok, "SELECT\n *\nFROM\n customers\n\n-- Generated by PRQL compiler version 0.3.1 (https://prql-lang.org)\n"}
iex> PRQL.compile("from customers\ntake 10", dialect: :mssql)
{:ok, "SELECT\n *\nFROM\n customers\nORDER BY\n (\n SELECT\n NULL\n ) OFFSET 0 ROWS\nFETCH FIRST\n 10 ROWS ONLY\n\n-- Generated by PRQL compiler version 0.3.1 (https://prql-lang.org)\n"}
Development
We are in the early stages of developing Elixir bindings.
We’re using Rustler
to provide Rust bindings for prqlc
.
Currently using the bindings in an Elixir project requires compiling the Rust crate from this repo:
- Install dependencies with
mix deps.get
- Compile project
mix compile
- Run tests
mix test
Future work includes publishing pre-compiled artifacts, so Elixir projects can run PRQL without needing a Rust toolchain.
Mac
We currently don’t enable compilation for Mac. This is possible to enable, but
causes some issues with cargo’s compilation cache. Briefly: it requires
RUSTFLAGS
to be set, and because of
https://github.com/rust-lang/cargo/issues/8716 &
https://github.com/rust-lang/cargo/issues/8899, any compilation of a different
target will bust the cache.
The possible future workarounds include:
- Passing
--target=aarch64-apple-darwin
to every cargo call, which is inconvenient and can be difficult in some situations; e.g. Rust Analyzer. This disables passingRUSTFLAGS
(I’m actually unclear whyprql-elixir
builds successfully in that case…) - Directing other cargo calls to different paths, such as
/target-ra
for Rust Analyzer and/target-book
for the book building. But onecargo build
from the terminal without either thetarget
ortarget_dir
specified will bust the cache! - Never compiling for other targets. But our standard tests run for
--target=wasm32-unknown-unknown
, so this requires refraining from using them. - Removing
prql-elixir
from our workspace, so thatcargo
commands in the PRQL workspace don’t require rust flags. This would work well, but means we need separate test coverage for this crate, which adds some weight to the tests.
If prql-elixir
becomes more used (for example, we start publishing to Hex, or
Mac developers want to work on it), then we can re-enable and deal with the
caching issues. We can also re-enable them if the cargo
issue is resolved.
To test on Mac temporarily — for example if there’s an error in GHA and we’re on
a Mac locally — apply a diff like this (though this specific diff is now
out-of-date), and then run cargo build
from the prql-elixir
path, which will
enable the local .cargo/config.toml
). (We could also make a feature like
elixir-mac
which enabled building on Mac).
diff --git a/prqlc/bindings/elixir/native/prql/Cargo.toml b/prqlc/bindings/elixir/native/prql/Cargo.toml
index 7194ca4f..9c7240ff 100644
--- a/prqlc/bindings/elixir/native/prql/Cargo.toml
+++ b/prqlc/bindings/elixir/native/prql/Cargo.toml
@@ -19,5 +19,5 @@ path = "src/lib.rs"
prql-compiler = {path = "../../../../prql-compiler", default-features = false, version = "0.6.1"}
# See Readme for details on Mac
-[target.'cfg(not(any(target_family="wasm", target_os = "macos")))'.dependencies]
+# [target.'cfg(not(any(target_family="wasm", target_os = "macos")))'.dependencies]
rustler = "0.27.0"
diff --git a/prqlc/bindings/elixir/native/prql/src/lib.rs b/prqlc/bindings/elixir/native/prql/src/lib.rs
index 2c5c8f27..68e77217 100644
--- a/prqlc/bindings/elixir/native/prql/src/lib.rs
+++ b/prqlc/bindings/elixir/native/prql/src/lib.rs
@@ -1,5 +1,5 @@
// See Readme for more information on Mac compiling
-#![cfg(not(target_os = "macos"))]
+// #![cfg(not(target_os = "macos"))]
// These bindings aren't relevant on wasm
#![cfg(not(target_family = "wasm"))]
// TODO: unclear why we need this `allow`; it's required in `CompileOptions`,
prql-java
prql-java
offers Java bindings to the prqlc
Rust library. It exposes a Java
native method public static native String toSql(String query)
.
It’s still at an early stage, and currently requires compiling locally, and isn’t published to Maven. Contributions are welcome.
Installation
<dependency>
<groupId>org.prqllang</groupId>
<artifactId>prql-java</artifactId>
<version>${PRQL_VERSION}</version>
</dependency>
Usage
import org.prqllang.prql4j.PrqlCompiler;
class Main {
public static void main(String[] args) {
String sql = PrqlCompiler.toSql("from table");
System.out.println(sql);
}
}
prqlc-js
JavaScript bindings for prqlc
.
Installation
npm install prqlc
Usage
Currently these functions are exposed
function compile(prql_query: string, options?: CompileOptions): string;
function prql_to_pl(prql_query: string): string;
function pl_to_prql(pl_json: string): string;
function pl_to_rq(pl_json: string): string;
function rq_to_sql(rq_json: string): string;
From Node.js
Direct usage
const prqlc = require("prqlc");
const sql = prqlc.compile(`from employees | select first_name`);
console.log(sql);
Options
const opts = new prqlc.CompileOptions();
opts.target = "sql.mssql";
opts.format = false;
opts.signature_comment = false;
const sql = prqlc.compile(`from employees | take 10`, opts);
console.log(sql);
Template literal
const prqlc = require("prqlc");
const prql = (string) => prqlc.compile(string[0] || "");
const sql = prql`from employees | select first_name`;
console.log(sql);
Template literal with newlines
const prqlc = require("prqlc");
const prql = (string) => prqlc.compile(string[0] || "");
const sql = prql`
from employees
select first_name
`;
console.log(sql);
From a browser
<html>
<head>
<script type="module">
import init, { compile } from "./dist/web/prql_js.js";
await init();
const sql = compile("from employees | select first_name");
console.log(sql);
</script>
</head>
<body></body>
</html>
From a framework or a bundler
import compile from "prqlc/dist/bundler";
const sql = compile(`from employees | select first_name`);
console.log(sql);
Errors
Errors are returned as following object, serialized as a JSON array:
interface ErrorMessage {
/// Message kind. Currently only Error is implemented.
kind: "Error" | "Warning" | "Lint";
/// Machine-readable identifier of the error
code: string | null;
/// Plain text of the error
reason: string;
/// A list of suggestions of how to fix the error
hint: string | null;
/// Character offset of error origin within a source file
span: [number, number] | null;
/// Annotated code, containing cause and hints.
display: string | null;
/// Line and column number of error origin within a source file
location: SourceLocation | null;
}
/// Location within the source file.
/// Tuples contain:
/// - line number (0-based),
/// - column number within that line (0-based),
interface SourceLocation {
start: [number, number];
end: [number, number];
}
These errors can be caught as such:
try {
const sql = prqlJs.compile(`from employees | foo first_name`);
} catch (error) {
const errorMessages = JSON.parse(error.message).inner;
console.log(errorMessages[0].display);
console.log(errorMessages[0].location);
}
Development
Build:
npm run build
This builds Node, bundler and web packages in the dist
path.
Test:
npm test
By default the wasm
binaries are optimized on each run, even if the underlying
code hasn’t changed, which can be slow. For a lower-latency dev loop, pass
--profile=dev
to npm install
for a faster, less optimized build.
npm install prqlc --profile=dev
Notes
- This uses
wasm-pack
to generate bindings1. - We’ve added an
npm
layer on top of the usual approach of just usingwasm-pack
, so we can distribute a single package with targets ofnode
,bundler
andno-modules
— somewhat inverting the approach recommended bywasm-pack
. The build instruction goes in abuild
script, rather than apack
script.
Though we would be very open to other approaches, given wasm-pack does not
seem maintained, and we're eliding many of its features to build for three
targets. See <https://github.com/PRQL/prql/issues/1836> for more details.
prql-php
prql-php
offers PHP bindings to prqlc
crate through FFI.
It provides the Compiler
class which contains compile
, prqlToPL
, plToRQ
and rqToSQL
functions.
It’s still at an early stage, and isn’t published to Composer. Contributions are welcome.
Installation
The PHP FFI extension needs to be
enabled. Set ffi.enable
in your php.ini configuration file to "true"
.
Usage
<?php
use Prql\Compiler\Compiler;
$prql = new Compiler();
$result = $prql->compile("from employees");
echo $result->output;
Development
Environment
A way to establish a dev environment with PHP, the ext-ffi extension and Composer is to use a nix flake. After installing nix, enable experimental flakes feature:
mkdir -p ~/.config/nix
echo "experimental-features = nix-command flakes" >> ~/.config/nix/nix.conf
Now you can spawn a shell from prql-php/
:
nix shell github:loophp/nix-shell#env-php81 --impure
This will pull-in ext-ffi extension, because it’s declared in composer.json
.
Building
There is a task build-php
script that:
- runs cargo to build
libprqlc_c
, - copies
libprqlc_c.*
intolib
, - copies
prqlc.h
intolib
.
Tests
task build-php
task test-php
Code style
./vendor/bin/phpcs --standard=PSR12 src tests
PRQL - Python Bindings
Python bindings for PRQL, the Pipelined Relational Query Language.
PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement. Like SQL, it’s readable, explicit and declarative. Unlike SQL, it forms a logical pipeline of transformations, and supports abstractions such as variables and functions. It can be used with any database that uses SQL, since it compiles to SQL.
PRQL can be as simple as:
from tracks
filter artist == "Bob Marley" # Each line transforms the previous result
aggregate { # `aggregate` reduces each column to a value
plays = sum plays,
longest = max length,
shortest = min length, # Trailing commas are allowed
}
Installation
pip install prqlc
Usage
Basic usage:
import prqlc
prql_query = """
from employees
join salaries (==emp_id)
group {employees.dept_id, employees.gender} (
aggregate {
avg_salary = average salaries.salary
}
)
"""
options = prqlc.CompileOptions(
format=True, signature_comment=True, target="sql.postgres"
)
sql = prqlc.compile(prql_query)
sql_postgres = prqlc.compile(prql_query, options)
The following functions and classes are exposed:
def compile(prql_query: str, options: Optional[CompileOptions] = None) -> str:
"""Compiles a PRQL query into SQL."""
...
def prql_to_pl(prql_query: str) -> str:
"""Converts a PRQL query to PL AST in JSON format."""
...
def pl_to_prql(pl_json: str) -> str:
"""Converts PL AST as a JSON string into a formatted PRQL string."""
...
def pl_to_rq(pl_json: str) -> str:
"""Resolves and lowers PL AST (JSON) into RQ AST (JSON)."""
...
def rq_to_sql(rq_json: str, options: Optional[CompileOptions] = None) -> str:
"""Converts RQ AST (JSON) into a SQL query."""
...
class CompileOptions:
def __init__(
self,
*,
format: bool = True,
target: str = "sql.any",
signature_comment: bool = True,
) -> None:
"""Compilation options for SQL backend of the compiler.
Args:
format (bool): Pass generated SQL string through a formatter that splits
it into multiple lines and prettifies indentation and spacing.
Defaults to True.
target (str): Target dialect to compile to. Defaults to "sql.any", which
uses the 'target' argument from the query header to determine the
SQL dialect. Other targets are available by calling the `get_targets`
function.
signature_comment (bool): Emits the compiler signature as a comment after
the generated SQL. Defaults to True.
"""
...
def get_targets() -> list[str]:
"""List available target dialects for compilation."""
...
Debugging functions
The following functions are available within the prqlc.debug
module. They are
for experimental purposes and may be unstable.
def prql_lineage(prql_query: str) -> str:
"""Computes a column-level lineage graph from a PRQL query.
Returns JSON-formatted string. See the docs for the `prqlc debug lineage`
CLI command for more details.
"""
...
def pl_to_lineage(pl_json: str) -> str:
"""Computes a column-level lineage graph from PL AST (JSON)."""
...
Notes
These bindings are in a crate named prqlc-python
and published to a Python
package on PyPI named prqlc
, available at https://pypi.org/project/prqlc.
This crate is not published to crates.io.
The package is consumed by pyprql & dbt-prql.
Relies on pyo3 for all the magic.
R (prqlr)
R bindings for prqlc
.
prqlr
also includes knitr
(R Markdown and Quarto) integration, which allows
us to easily create documents with the PRQL conversion results embedded in.
Check out https://prql.github.io/prqlc-r/ for more context.
prqlr
is generously maintained by @eitsupi in the
PRQL/prqlc-r repo.
Installation
install.packages("prqlr")
Please check the documentation of the prqlc crate.
Integrations
PRQL is building integrations with lots of external tools, including:
prqlc
CLI - Rust compiler for the command line- ClickHouse
- Jupyter
- DuckDB
- qStudio
- Prefect
- VS Code
- PostgreSQL
- Databend
- Rill
- Syntax highlighting for many popular tools.
PRQL compiler
prqlc
is the reference implementation of a compiler from PRQL to SQL, written
in Rust. It also serves as the CLI.
For more on PRQL, check out the PRQL website or the PRQL repo.
CLI
prqlc
serves as a CLI for the PRQL compiler. It is a single, dependency-free
binary that compiles PRQL into SQL.
Usage
prqlc compile
This command works as a filter that compiles a PRQL string into an SQL string.
$ echo 'from employees | filter has_dog | select salary' | prqlc compile
SELECT
salary
FROM
employees
WHERE
has_dog
A PRQL query can be executed with CLI tools compatible with SQL, such as DuckDB CLI.
$ curl -fsL https://raw.githubusercontent.com/PRQL/prql/0.12.2/prqlc/prqlc/tests/integration/data/chinook/albums.csv -o albums.csv
$ echo 'from `albums.csv` | take 3' | prqlc compile | duckdb
┌──────────┬───────────────────────────────────────┬───────────┐
│ album_id │ title │ artist_id │
│ int64 │ varchar │ int64 │
├──────────┼───────────────────────────────────────┼───────────┤
│ 1 │ For Those About To Rock We Salute You │ 1 │
│ 2 │ Balls to the Wall │ 2 │
│ 3 │ Restless and Wild │ 2 │
└──────────┴───────────────────────────────────────┴───────────┘
Executing this command without any argument will start interactive mode,
allowing a PRQL query to be written interactively. In this mode, after writing
PRQL and press Ctrl-d
(Linux, macOS) or Ctrl-z
(Windows) to display the
compiled SQL.
prqlc compile
Just like when using it as a filter, SQL string output can be passed to the DuckDB CLI and similar tools.
$ prqlc compile | duckdb
Enter PRQL, then press ctrl-d to compile:
from `albums.csv`
take 3
┌──────────┬───────────────────────────────────────┬───────────┐
│ album_id │ title │ artist_id │
│ int64 │ varchar │ int64 │
├──────────┼───────────────────────────────────────┼───────────┤
│ 1 │ For Those About To Rock We Salute You │ 1 │
│ 2 │ Balls to the Wall │ 2 │
│ 3 │ Restless and Wild │ 2 │
└──────────┴───────────────────────────────────────┴───────────┘
Installation
via Homebrew (macOS, Linux)
brew install prqlc
via winget (Windows)
winget install prqlc
From GitHub release page
Precompiled binaries are available for Linux, macOS, and Windows on the PRQL release page.
From source
# From crates.io
cargo install prqlc
# From a local PRQL repository
cargo install --path prqlc/prqlc
Shell completions
The prqlc shell-completion
command prints a shell completion script for
supported shells, and saving the printed scripts to files makes for shells to
load completions for each session.
Bash
For Linux:
prqlc shell-completion bash >/etc/bash_completion.d/prqlc
For macOS:
prqlc shell-completion bash >/usr/local/etc/bash_completion.d/prqlc
fish
prqlc shell-completion fish >~/.config/fish/completions/prqlc.fish
PowerShell
mkdir -Path (Split-Path -Parent $profile) -ErrorAction SilentlyContinue
prqlc shell-completion powershell >path/to/prqlc.ps1
echo 'Invoke-Expression -Command path/to/prqlc.ps1' >>$profile
zsh
prqlc shell-completion zsh >"${fpath[1]}/_prqlc"
Ensure that the following lines are present in ~/.zshrc
:
autoload -U compinit
compinit -i
Helpers
Cheat sheets for prqlc
are available on various websites and with various
tools.
tldr
(on the web)eg
Library
For more usage examples and the library documentation, check out the
prqlc
documentation.
Library installation
cargo add prqlc
Examples
Compile a PRQL string to a SQLite dialect string:
#![allow(unused)] fn main() { // In a file src/main.rs use prqlc::{compile, Options, DisplayOptions, Target, sql::Dialect}; let prql = "from employees | select {name, age}"; let opts = &Options { format: false, target: Target::Sql(Some(Dialect::SQLite)), signature_comment: false, display: DisplayOptions::Plain, ..Default::default() }; let sql = compile(&prql, opts).unwrap(); assert_eq!("SELECT name, age FROM employees", sql); }
ClickHouse
PRQL works natively in ClickHouse. Check out the ClickHouse docs for more details.
Jupyter
pyprql contains pyprql.magic
, a thin
wrapper of JupySQL
’s SQL IPython magics.
This allows us to run PRQL interactively on Jupyter/IPython.
Check out https://pyprql.readthedocs.io/ for more context.
Installation
pip install pyprql
Usage
When installing pyprql, the duckdb-engine package is also installed with it, so we can start using PRQL immediately to query CSV and Parquet files.
For example, running the example from the JupySQL documentation on IPython:
In [1]: %load_ext pyprql.magic
In [2]: !curl -sL https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv -o penguins.csv
In [3]: %prql duckdb://
In [4]: %prql from `penguins.csv` | take 3
Out[4]:
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181 3750 MALE
1 Adelie Torgersen 39.5 17.4 186 3800 FEMALE
2 Adelie Torgersen 40.3 18.0 195 3250 FEMALE
In [5]: %%prql
...: from `penguins.csv`
...: filter bill_length_mm > 40
...: take 3
...:
...:
Out[5]:
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 40.3 18.0 195 3250 FEMALE
1 Adelie Torgersen 42.0 20.2 190 4250 None
2 Adelie Torgersen 41.1 17.6 182 3200 FEMALE
DuckDB
There’s a DuckDB community extension by @ywelsch at the DuckDB Community Extension Repository.
INSTALL prql FROM community;
LOAD prql;
-- Once the extension is loaded, you can write PRQL queries
from (read_csv 'https://raw.githubusercontent.com/PRQL/prql/0.8.0/prql-compiler/tests/integration/data/chinook/invoices.csv')
filter invoice_date >= @2009-02-01
take 5;
Check out the extension’s documentation for more details.
qStudio IDE
qStudio is a SQL GUI that lets you browse tables, run SQL scripts, and chart and export the results. qStudio runs on Windows, macOS and Linux, and works with every popular database including mysql, postgresql, mssql, kdb….
qStudio relies on the PRQL compiler. You must ensure that prqlc
is in your path. See the installation instructions for prqlc
in the PRQL reference guide for details.
qStudio calls prqlc
(the PRQL compiler) to generate SQL code from PRQL queries
(.prql files) then runs the SQL against the selected database to display the
results. For more details, check out the
qStudio site.
Prefect
Because Prefect is in native Python, it’s extremely easy to integrate with PRQL.
With a Postgres Task, replace:
PostgresExecute.run(..., query=sql)
…with…
PostgresExecute.run(..., query=prql_python.compile(prql))
We’re big fans of Prefect, and if there is anything that would make the integration easier, please open an issue.
Visual Studio Code extension
PRQL has a Visual Studio Code extension that compiles a PRQL query in a VS Code editor and displays the resulting SQL code in a second pane on the side. This is very handy for editing, saving, and reusing PRQL queries in VS Code.
To install the VS Code extension, open VS Code and type
Ctrl-Shift-P
(Cmd-Shift-P on a Mac) and type PRQL
. Install
the extension as usual.
Repo for the PRQL VS Code extension
PostgreSQL
PL/PRQL is a PostgreSQL extension that lets you write functions with PRQL.
PL/PRQL functions serve as intermediaries, compiling the user’s PRQL code into SQL statements that PostgreSQL executes. The extension is based on the pgrx framework for developing PostgreSQL extensions in Rust. This framework manages the interaction with PostgreSQL’s internal APIs, type conversions, and other function hooks necessary to integrate PRQL with PostgreSQL.
Examples
PL/PRQL functions are defined using the plprql
language specifier:
create function match_stats(int) returns table(player text, kd_ratio float) as $$
from matches
filter match_id == $1
group player (
aggregate {
total_kills = sum kills,
total_deaths = sum deaths
}
)
filter total_deaths > 0
derive kd_ratio = total_kills / total_deaths
select { player, kd_ratio }
$$ language plprql;
select * from match_stats(1001)
player | kd_ratio
---------+----------
Player1 | 0.625
Player2 | 1.6
(2 rows)
You can also run PRQL code directly with the prql
function which is useful for
custom SQL in ORMs:
select prql('from matches | filter player == ''Player1''')
as (id int, match_id int, round int, player text, kills int, deaths int)
limit 2;
id | match_id | round | player | kills | deaths
----+----------+-------+---------+-------+--------
1 | 1001 | 1 | Player1 | 4 | 1
3 | 1001 | 2 | Player1 | 1 | 7
(2 rows)
-- Same as above without the need for the static types, but returns cursor
select prql('from matches | filter player == ''Player1''', 'player1_cursor');
fetch 2 from player1_cursor;
Getting Started
For installation instructions and more information on the extension, see the PL/PRQL repository.
Databend
Databend natively supports PRQL. For more details see the databend docs.
Rill
PRQL has had some work to integrate with Rill. See the Rill issues for more details.
Grammars / syntax highlighting
PRQL contains multiple grammar definitions to enable tools to highlight PRQL code. These are all intended to provide as good an experience as the grammar supports. Please raise any shortcomings in a GitHub issue.
The definitions are somewhat scattered around the codebase; this page serves as an index.
-
Ace — supported. The grammar is upstream (prql_highlight_rules.js). See the demo.
-
chroma — Go library used by the static website generator Hugo. The grammar is upstream (prql.xml). See the demo.
-
Lezer — used by CodeMirror editors. The PRQL file is at
grammars/prql-lezer/README.md
. -
Handlebars — currently duplicated:
- The book:
book/highlight-prql.js
- The website (outside of the book & playground):
website/themes/prql-theme/static/plugins/highlight/prql.js
- The book:
-
Sublime Text — in the
sublime-prql
repository. -
TextMate — used by the VS Code extension; in the
prql-vscode
repo inprql-vscode/syntaxes/prql.tmLanguage.json
. -
Monarch — used by the Monaco editor, which we use for the Playground. The grammar is at
playground/src/workbench/prql-syntax.js
. -
Pygments — Python library used by Wikipedia, Bitbucket, Sphinx and more. The grammar is upstream (prql.py). See the demo.
-
Tree-Sitter — used by the neovim and helix. The grammar can be found at https://github.com/PRQL/tree-sitter-prql.
Since the Elm language coincidentally provides syntax highlighting suitable for PRQL, it may look better to mark PRQL code as Elm when the above definition files are not available.
For example, the following Markdown code block will be nicely highlighted on GitHub, Pandoc, and other Markdown renderers:
```elm
from employees
filter start_date > @2021-01-01
```
We hope that in the future these renderers will recognize PRQL code blocks and have syntax highlighting applied, and we are tracking these with several issues.
- GitHub (Linguist): https://github.com/PRQL/prql/issues/1636
- Pandoc (Kate): https://github.com/PRQL/prql/issues/2213
Contributing
If you’re interested in joining the community to build a better SQL, here are ways to start:
- Star the repo.
- Send a link to PRQL to a couple of people whose opinion you respect.
- Subscribe to new releases for updates.
- Follow us on Twitter.
- Join our Discord.
- Find an issue labeled Good First Issue and start contributing to the code.
- Join our fortnightly Developer Call; (iCal file).
PRQL is evolving from a project with lots of excitement into a project that folks are using in their work and integrating into their tools. We’re actively looking for collaborators to lead that growth with us.
Areas for larger contributions
Compiler
The compiler is written in Rust, and there’s enough to do such that any level of experience with Rust is sufficient.
We try to keep a few onboarding issues on hand under the “good first issue” label. These have been screened to have sufficient context to get started (and we very much welcome questions where there’s some context missing).
To get started, check out the docs on Development and the Compiler architecture
And if you have questions, there are lots of friendly people on the Discord who will patiently help you.
Bindings & integrations
For PRQL to be successful, it needs to be available for the languages & tools that people already use.
- We currently have bindings to the PRQL compiler in a few different languages; many of these can be improved, documented, and packaged in a better way.
- If you have experience with packaging in an ecosystem that doesn’t currently have bindings, then creating PRQL bindings for that language we don’t currently support would be valuable to the project.
- If there’s a tool that you use yourself to run data queries which you think would benefit from a PRQL integration, suggest one to us or the tool. If it’s open-source, build & share a prototype.
Relevant issues are labeled Integrations.
Language design
We decide on new language features in GitHub issues, usually under “language design” label.
You can also contribute by:
- Finding instances where the compiler produces incorrect results, and post a bug report — feel free to use the playground.
- Opening an issue / append to an existing issue with examples of queries that are difficult to express in PRQL — especially if more difficult than SQL.
With sufficient examples, suggest a change to the language! (Though suggestions without examples are difficult to engage with, so please do anchor suggestions in examples.)
Marketing
- Improve our website. We have a few issues open on this front and are looking for anyone with at least some design skills.
- Contribute towards the docs. Anything from shaping a whole section of the docs, to simply improving a confusing paragraph or fixing a typo.
- Tell people about PRQL.
- Find a group of users who would be interested in PRQL, help them get up to speed, help the project understand what they need.
Core team
If you have any questions or feedback and don’t receive a response on one of the general channels such as GitHub or Discord, feel free to reach out to:
- @aljazerzen — Aljaž Mur Eržen
- @max-sixty — Maximilian Roos
- @eitsupi — SHIMA Tatsuya
- @snth — Tobias Brandt
Core team Emeritus
Thank you to those who have previously served on the core team:
- @charlie-sanders — Charlie Sanders
Development
Setting up an initial dev environment
We can set up a local development environment sufficient for navigating, editing, and testing PRQL’s compiler code in two minutes:
-
Install
rustup
&cargo
. -
[Optional but highly recommended] Install
cargo-insta
, our testing framework:cargo install cargo-insta
-
That’s it! Running the unit tests for the
prqlc
crate after cloning the repo should complete successfully:cargo test --package prqlc --lib
…or, to run tests and update the test snapshots:
cargo insta test --accept --package prqlc --lib
There’s more context on our tests in How we test below.
That’s sufficient for making an initial contribution to the compiler.
Setting up a full dev environment
We really care about this process being easy, both because the project benefits from more contributors like you, and to reciprocate your future contribution. If something isn’t easy, please let us know in a GitHub Issue. We’ll enthusiastically help you, and use your feedback to improve the scripts & instructions.
For more advanced development; for example compiling for wasm or previewing the website, we have two options:
Option 1: Use the project’s task
This is tested on macOS, should work on amd64 Linux, but won’t work on others (include Windows),
since it relies on brew
.
-
Then run the
setup-dev
task. This runs commands from our Taskfile.yaml, installing dependencies withcargo
,brew
,npm
&pip
, and suggests some VS Code extensions.task setup-dev
Option 2: Install tools individually
-
We’ll need
cargo-insta
, to update snapshot tests:cargo install cargo-insta
-
We’ll need Python, which most systems will have already. The easiest way to check is to try running the full tests:
cargo test
…and if that doesn’t complete successfully, ensure we have Python >= 3.7, to compile
prqlc-python
. -
For more involved contributions, such as building the website, playground, book, or some release artifacts, we’ll need some additional tools. But we won’t need those immediately, and the error messages on what’s missing should be clear when we attempt those things. When we hit them, the Taskfile.yaml will be a good source to copy & paste instructions from.
Option 3: Use a Dev Container
This project has a devcontainer.json file and a pre-built dev container base Docker image. Learn more about Dev Containers at https://containers.dev/
Currently, the tools for Rust are already installed in the pre-built image, and, Node.js, Python and others are configured to be installed when build the container.
While there are a variety of tools that support Dev Containers, the focus here is on developing with VS Code in a container by GitHub Codespaces or VS Code Dev Containers extension.
To use a Dev Container on a local computer with VS Code, install the VS Code Dev Containers extension and its system requirements. Then refer to the links above to get started.
Option 4: Use nix development environment
This is used by a member of the core team on Linux, but doesn’t currently work on Mac. We’re open to contributions to improve support.
A nix flake flake.nix
provides 3 development
environments:
- default, for building the compiler
- web, for the compiler and the website,
- full, for the compiler, the website and the compiler bindings.
To load the shell:
-
Install nix (the package manager). (only first time)
-
Enable flakes, which are a (pretty stable) experimental feature of nix. (only first time)
For non-NixOS users:
mkdir -p ~/.config/nix/ tee 'experimental-features = nix-command flakes' >> ~/.config/nix/nix.conf
For NixOS users, follow instructions here.
-
Run:
nix develop
To use the “web” or “full” shell, run:
nix develop .#web
Optionally, you can install direnv, to automatically load
the shell when you enter this repo. The easiest way is to also install
direnv-nix and configure your
.envrc
with:
# .envrc
use flake .#full
Contribution workflow
We’re similar to most projects on GitHub — open a Pull Request with a suggested change!
Commits
- If a change is user-facing, please add a line in
CHANGELOG.md
, with{message}, ({@contributor, #X})
whereX
is the PR number.- If there’s a missing entry, a follow-up PR containing just the changelog entry is welcome.
- We’re using Conventional Commits message format, enforced through action-semantic-pull-request.
Merges
- We merge any code that makes PRQL better
- A PR doesn’t need to be perfect to be merged; it doesn’t need to solve a big
problem. It needs to:
- be in the right direction,
- make incremental progress,
- be explicit on its current state, so others can continue the progress.
- That said, there are a few instances when we need to ensure we have some consensus before merging code — for example non-trivial changes to the language, or large refactorings to the library.
- If you have merge permissions, and are reasonably confident that a PR is
suitable to merge (whether or not you’re the author), feel free to merge.
- If you don’t have merge permissions and have authored a few PRs, ask and ye shall receive.
- The primary way we ratchet the code quality is through automated tests.
- This means PRs almost always need a test to demonstrate incremental progress.
- If a change breaks functionality without breaking tests, our tests were probably insufficient.
- If a change breaks existing tests (for example, changing an external API), that indicates we should be careful about merging a change, including soliciting others’ views.
- We use PR reviews to give general context, offer specific assistance, and
collaborate on larger decisions.
- Reviews around ‘nits’ like code formatting / idioms / etc are very welcome. But the norm is for them to be received as helpful advice, rather than as mandatory tasks to complete. Adding automated tests & lints to automate these suggestions is welcome.
- If you have merge permissions and would like a PR to be reviewed before it merges, that’s great — ask or assign a reviewer.
- If a PR hasn’t received attention after a day, please feel free to ping the pull request.
- People may review a PR after it’s merged. As part of the understanding that we can merge quickly, contributors are expected to incorporate substantive feedback into a future PR.
- We should revert quickly if the impact of a PR turns out not to be consistent with our expectations, or there isn’t as much consensus on a decision as we had hoped. It’s very easy to revert code and then re-revert when we’ve resolved the issue; it’s a sign of moving quickly. Other options which resolve issues immediately are also fine, such as commenting out an incorrect test or adding a quick fix for the underlying issue.
Docs
We’re very keen on contributions to improve our documentation.
This includes our docs in the book, on the website, in our code, or in a Readme. We also appreciate issues pointing out that our documentation was confusing, incorrect, or stale — if it’s confusing for you, it’s probably confusing for others.
Some principles for ensuring our docs remain maintainable:
- Docs should be as close as possible to the code. Doctests are ideal on this
dimension — they’re literally very close to the code and they can’t drift
apart since they’re tested on every commit. Or, for example, it’s better to
add text to a
--help
message, rather than write a paragraph in the Readme explaining the CLI. - We should have some visualization of how to maintain docs when we add them. Docs have a habit of falling out of date — the folks reading them are often different from those writing them, they’re sparse from the code, generally not possible to test, and are rarely the by-product of other contributions. Docs that are concise & specific are easier to maintain.
- Docs should be specifically relevant to PRQL; anything else we can instead link to.
If something doesn’t fit into one of these categories, there are still lots of ways of getting the word out there — a blog post / gist / etc. Let us know and we’re happy to link to it / tweet it.
How we test
We use a pyramid of tests — we have fast, focused tests at the bottom of the pyramid, which give us low latency feedback when developing, and then slower, broader tests which ensure that we don’t miss anything as PRQL develops1.
If you’re making your first contribution, you don’t need to engage with all this — it’s fine to just make a change and push the results; the tests that run in GitHub will point you towards any errors, which can be then be run locally if needed. We’re always around to help out.
Our tests, from the bottom of the pyramid to the top:
-
Static checks — we run a few static checks to ensure the code stays healthy and consistent. They’re defined in
.pre-commit-config.yaml
, using pre-commit. They can be run locally withtask test-lint # or pre-commit run -a
The tests fix most of the issues they find themselves. Most of them also run on GitHub on every commit; any changes they make are added onto the branch automatically in an additional commit.
- Checking by MegaLinter, which includes more Linters, is also done automatically on GitHub. (experimental)
-
Unit tests & inline insta snapshots — we rely on unit tests to rapidly check that our code basically works. We extensively use Insta, a snapshot testing tool which writes out the values generated by our code, making it fast & simple to write and modify tests2
These are the fastest tests which run our code; they’re designed to run on every save while you’re developing. We include a
task
which does this:task test-rust-fast # or cargo insta test --accept --package prqlc --lib # or, to run on every change: task -w test-rust-fast
-
Documentation — we compile all examples from our documentation in the Website, README, and PRQL Book, to test that they produce the SQL we expect, and that changes to our code don’t cause any unexpected regressions. These are included in:
cargo insta test --accept
-
Database integration tests — we run tests with example queries against databases with actual data to ensure we’re producing correct SQL across our supported dialects. The in-process tests can be run locally with:
task test-rust # or cargo insta test --accept --features=default,test-dbs
More details on running with external databases are in the Readme.
Integration tests use DuckDB, and so require a clang compiler to compile
duckdb-rs
. Most development
systems will have one, but if the test command fails, install a clang compiler with:
- On macOS, install xcode with
xcode-select --install
- On Debian Linux,
apt-get update && apt-get install clang
- On Windows,
duckdb-rs
isn’t supported, so these tests are excluded
-
GitHub Actions on every commit — we run tests relevant to a PR’s changes in CI — for example changes to docs will attempt to build docs, changes to a binding will run that binding’s tests. The vast majority of changes trigger tests which run in less than five minutes, and we should be reassessing their scope if they take longer than that. Once these pass, a pull request can be merged.
-
GitHub Actions on merge — we run a wider set tests on every merge to main. This includes testing across OSs, all our language bindings, a measure of test code coverage, and some performance benchmarks.
If these tests fail after merging, we should revert the commit before fixing the test and then re-reverting.
Most of these will run locally with:
task test-all
-
GitHub Actions nightly — every night, we run tests that take longer, are less likely to fail, or are unrelated to code changes — such as security checks, bindings’ tests on multiple OSs, or expensive timing benchmarks.
We can run these tests before a merge by adding a label
pr-nightly
to the PR.
The goal of our tests is to allow us to make changes quickly. If they’re making it more difficult to make changes, or there are missing tests that would offer the confidence to make changes faster, please raise an issue.
Website
The website is published together with the book and the playground, and is
automatically built and released on any push to the web
branch.
The web
branch points to the latest release plus any website-specific fixes.
That way, the compiler behavior in the playground matches the latest release
while allowing us to fix mistakes in the docs with a tighter loop than every
release.
Fixes to the playground, book, or website should have a pr-backport-web
label
added to their PR — a bot will then open & merge another PR onto the web
branch once the initial branch merges.
The website components will run locally with:
# Run the main website
task web:run-website
# Run the PRQL online book
task web:run-book
# Run the PRQL playground
task web:run-playground
Bindings
We have a number of language bindings, as documented at https://prql-lang.org/book/project/bindings/index.html. Some of these are within our monorepo, some are in separate repos. Here’s a provisional framework for when we use the main prql repo vs separate repos for bindings:
Factor | Rationale | Example |
---|---|---|
Does someone want to sign up to maintain a repo? | A different repo is harder for the core team to maintain | tree-sitter-prql is well maintained |
Can it change independently from the compiler? | If it’s in a different repo, it can’t be changed in lockstep with the compiler | prql-vscode is fine to change “behind” the language |
Would a separate repo invite new contributors? | A monorepo with all the rust code can be less inviting for those familiar with other langs | prql-vscode had some JS-only contributors |
Is there an convention for a stand-alone repo? | A small number of ecosystems require a separate repo | homebrew-prql needs to be named that way for a Homebrew tap |
Releasing
Currently we release in a semi-automated way:
-
PR & merge an updated Changelog. GitHub will produce a draft version at https://github.com/PRQL/prql/releases/new, including “New Contributors”.
Use this script to generate a line introducing the enumerated changes:
echo "It has $(git rev-list --count $(git rev-list --tags --max-count=1)..) commits from $(git shortlog --summary $(git rev-list --tags --max-count=1).. | wc -l | tr -d '[:space:]') contributors. Selected changes:"
-
If the current version is correct, then skip ahead. But if the version needs to be changed — for example, we had planned on a patch release, but instead require a minor release — then run
cargo release version $version -x && cargo release replace -x && task test-rust
to bump the version, and PR the resulting commit. -
After merging, go to Draft a new release3, copy the changelog entry into the release description4, enter the tag to be created, and hit “Publish”.
-
From there, both the tag and release is created and all packages are published automatically based on our release workflow.
-
Run
cargo release version patch -x --no-confirm && cargo release replace -x --no-confirm && task test-rust
to bump the versions and add a new Changelog section; then PR the resulting commit. Note this currently containstask test-rust
to update snapshot tests which contain the version. -
Check whether there are milestones that need to be pushed out.
-
Review the Current Status on the README.md to ensure it reflects the project state.
We may make this more automated in future; e.g. automatic changelog creation.
1: Our approach is very consistent with @matklad’s advice, in his excellent blog post How to Test.
2: Here’s an example of an insta test — note that only the initial line of each test is written by us; the remainder is filled in by insta.
3: Only maintainers have access to this page.
4: Unfortunately GitHub’s markdown parser interprets linebreaks as newlines. I haven’t found a better way of editing the markdown to look reasonable than manually editing the text or asking LLM to help.
Language design
In a way PRQL is just a transpiler to SQL. This can cause its language design to gravitate toward thinking about PRQL features in terms of how they translate to SQL.
PRQL feature -> SQL feature -> relational result
This is flawed because:
- it does not model interactions between features well,
- SQL behavior can sometimes be misleading (the order of a subquery will not persist in the parent query) or even differs between dialects (set operations).
Instead, we should think of PRQL features in terms of how they affect PRQL expressions, which in most cases means how they affect relations.
PRQL feature -> relation
|
v
PRQL feature -> relation
|
v
PRQL feature -> relation
|
v
relational result
Thinking about SQL comes in only at the last step when relation (or rather relational expression) is translated to an SQL expression.