PRQL Language Book

Pipelined Relational Query Language, pronounced “Prequel”.

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement. Like SQL, it’s readable, explicit and declarative. Unlike SQL, it forms a logical pipeline of transformations, and supports abstractions such as variables and functions. It can be used with any database that uses SQL, since it compiles to SQL.

This book serves as a tutorial and reference guide on the language and the broader project. It currently has three sections, navigated by links on the left:

  • Tutorial — A friendly & accessible guide for learning PRQL. It has a gradual increase of difficulty and requires only basic understanding of programming languages. Knowledge of SQL is beneficial, because of many comparisons to SQL, but not required.
  • Reference — In-depth information about the PRQL language. Includes justifications for language design decisions and formal specifications for parts of the language.
  • Project — General information about the project, tooling and development.

Examples of PRQL with a comparison to the generated SQL. PRQL queries can be as simple as:

PRQL

from tracks
filter artist == "Bob Marley"  # Each line transforms the previous result
aggregate {                    # `aggregate` reduces each column to a value
  plays    = sum plays,
  longest  = max length,
  shortest = min length,       # Trailing commas are allowed
}

SQL

SELECT
  COALESCE(SUM(plays), 0) AS plays,
  MAX(length) AS longest,
  MIN(length) AS shortest
FROM
  tracks
WHERE
  artist = 'Bob Marley'

…and here’s a larger example:

PRQL

from employees
filter start_date > @2021-01-01            # Clear date syntax
derive {                                   # `derive` adds columns / variables
  gross_salary = salary + (tax ?? 0),      # Terse coalesce
  gross_cost = gross_salary + benefits,    # Variables can use other variables
}
filter gross_cost > 0
group {title, country} (                   # `group` runs a pipeline over each group
  aggregate {                              # `aggregate` reduces each group to a value
    average gross_salary,
    sum_gross_cost = sum gross_cost,       # `=` sets a column name
  }
)
filter sum_gross_cost > 100_000            # `filter` replaces both of SQL's `WHERE` & `HAVING`
derive id = f"{title}_{country}"           # F-strings like Python
derive country_code = s"LEFT(country, 2)"  # S-strings permit SQL as an escape hatch
sort {sum_gross_cost, -country}            # `-country` means descending order
take 1..20                                 # Range expressions (also valid as `take 20`)

SQL

WITH table_1 AS (
  SELECT
    title,
    country,
    salary + COALESCE(tax, 0) + benefits AS _expr_1,
    salary + COALESCE(tax, 0) AS _expr_2
  FROM
    employees
  WHERE
    start_date > DATE '2021-01-01'
),
table_0 AS (
  SELECT
    title,
    country,
    AVG(_expr_2) AS _expr_0,
    COALESCE(SUM(_expr_1), 0) AS sum_gross_cost
  FROM
    table_1
  WHERE
    _expr_1 > 0
  GROUP BY
    title,
    country
)
SELECT
  title,
  country,
  _expr_0,
  sum_gross_cost,
  CONCAT(title, '_', country) AS id,
  LEFT(country, 2) AS country_code
FROM
  table_0
WHERE
  sum_gross_cost > 100000
ORDER BY
  sum_gross_cost,
  country DESC
LIMIT
  20

Relations

PRQL is designed on top of relational algebra, which is the established data model used by modern SQL databases. A relation has a rigid mathematical definition, which can be simplified to “a table of data”. For example, the invoices table from the Chinook database (https://github.com/lerocha/chinook-database) looks like this:

invoice_idcustomer_idbilling_cityother columnstotal
12Stuttgart1.98
24Oslo3.96
38Brussels5.94
414Edmonton8.91
523Boston13.86
637Frankfurt0.99

A relation is composed of rows. Each row in a relation contains a value for each of the relation’s columns. Each column in a relation has a unique name and a designated data type. The table above is a relation, and has columns named invoice_idand customer_id each with a data type of “integer number”, a billing_city column with a data type of “text”, several other columns, and a total column that contains floating-point numbers.

Queries

The main purpose of PRQL is to build queries that combine and transform data from relations such as the invoices table above. Here is the most basic query:

from invoices

Note

Try each of these examples here in the Playground. Enter the query on the left-hand side, and click output.arrow in the right-hand side to see the result.

The result of the query above is not terribly interesting, it’s just the same relation as before.

select transform

The select function picks the columns to pass through based on a list and discards all others. Formally, that list is a tuple of comma-separated expressions wrapped in { ... }.

Suppose we only need the order_id and total columns. Use select to choose the columns to pass through. (Try it in the Playground.)

from invoices
select { order_id, total }

We can write the items in the tuple on one or several lines: trailing commas are ignored. In addition, we can assign any of the expressions to a variable that becomes the name of the resulting column in the SQL output.

from invoices
select {
  OrderID = invoice_id,
  Total = total,
}

This is the same query as above, rewritten on multiple lines, and assigning OrderID and Total names to the columns.

Once we select certain columns, subsequent transforms will have access only to those columns named in the tuple.

derive transform

To add columns to a relation, we can use the derive function. Let’s define a new column for Value Added Tax, set at 19% of the invoice total.

from invoices
derive { VAT = total * 0.19 }

The value of the new column can be a constant (such as a number or a string), or can be computed from the value of an existing column. Note that the value of the new column is assigned the name VAT.

join transform

The join transform also adds columns to the relation by combining the rows from two relations “side by side”. To determine which rows from each relation should be joined, join has match criteria, written in ( ... ).

from invoices
join customers ( ==customer_id )

This example “connects” the customer information from the customers relation with the information from the invoices relation, using identical values of the customer_id column from each relation to match the rows.

It is frequently useful to assign an alias to both relations being joined together so that each relation’s columns can be referred to uniquely.

from inv=invoices
join cust=customers ( ==customer_id )

In the example above, the alias inv represents the invoices relation and cust represents the customers relation. It then becomes possible to refer to inv.billing_city and cust.last_name unambiguously.

Summary

PRQL manipulates relations (tables) of data. The derive, select, and join transforms change the number of columns in a table. The first two never affect the number of rows in a table. join may change the number of rows, depending on the chosen type of join.

This final example combines the above into a single query. It illustrates a pipeline - the fundamental basis of PRQL. We simply add new lines (transforms) at the end of the query. Each transform modifies the relation produced by the statement above to produce the desired result.

from inv=invoices
join cust=customers (==customer_id)
derive { VAT = inv.total * 0.19 }
select {
  OrderID = inv.invoice_id,
  CustomerName = cust.last_name,
  Total = inv.total,
  VAT,
}

Filtering rows

In the previous page we learned how select, derive, and join change the columns of a table.

Now we will explore how to manipulate the rows of a table using filter and take.

filter transform

The filter transform picks rows to pass through based on their values:

from invoices
filter billing_city == "Berlin"

The resulting table contains all the rows that came from Berlin.

PRQL converts the single filter transform to use the appropriate SQL WHERE or HAVING command, depending on where it appears in the pipeline.

take transform

The take transform picks rows to pass through based on their position within the table. The set of rows picked can be specified in two ways:

  • a plain number x, which will pick the first x rows, or
  • an inclusive range of rows start..end.
from invoices
take 4
from invoices
take 4..7

Of course, it is possible combine all these transforms into a single pipeline:

from invoices

# retain only rows for orders from Berlin
filter billing_city == "Berlin"

# skip first 10 rows and take the next 10
take 11..20

# take only first 3 rows of *that* result
take 3

We did something a bit odd at the end: first we took rows 11..20 and then took the first 3 rows from that result.

Note

Note that a single transform take 11..13 would have produced the same SQL. The example serves an example of how PRQL allows fast data exploration by “stacking” transforms in the pipeline, reducing the cognitive burden of how a new transform with the previous query.

Aggregation

A key feature of analytics is reducing many values down to some summary. This act is called “aggregation” and always includes a function — for example, average or sum — that reduces values in the table to a single row.

aggregate transform

The aggregate transform takes a tuple to create one or more new columns that “distill down” data from all the rows.

from invoices
aggregate { sum_of_orders = sum total }

The query above computes the sum of the total column of all rows of the invoices table to produce a single value.

aggregate can produce multiple summaries at once when one or more aggregation expressions are contained in a tuple. aggregate discards all columns that are not present in the tuple.

from invoices
aggregate {
    num_orders = count this,
    sum_of_orders = sum total,
}

In the example above, the result is a single row with two columns. The count function displays the number of rows in the table that was passed in; the sum function adds up the values of the total column of all rows.

Grouping

Suppose we want to produce summaries of invoices for each city in the table. We could create a query for each city, and aggregate its rows:

from albums
filter billing_city == "Oslo"
aggregate { sum_of_orders = sum total }

But we would need to do it for each city: London, Frankfurt, etc. Of course this is repetitive (and boring) and error prone (because we would need to type each billing_city by hand). Moreover, we would need to create a list of each billing_city before we started.

group transform

The group transform separates the table into groups (say, those having the same city) using information that’s already in the table. It then applies a transform to each group, and combines the results back together:

from invoices
group billing_city (
    aggregate {
        num_orders = count this,
        sum_of_orders = sum total,
    }
)

Those familiar with SQL have probably noticed that we just decoupled aggregation from grouping.

Although these operations are connected in SQL, PRQL makes it straightforward to use group and aggregate separate from each other, while combining with other transform functions, such as:

from invoices
group billing_city (
    take 2
)

This code collects the first two rows for each city’s group.

Syntax

A summary of PRQL syntax:

SyntaxUsageExample
|Pipelinesfrom employees | select first_name
=Assignsfrom e = employees
derive total = (sum salary)
:Named args & parametersinterp low:0 1600 sat_score
{}Tuples{id, false, total = 3}
[]Arrays[1, 4, 3, 4]
+,!,&&,==, etcOperatorsfilter a == b + c || d >= e
()Parenthesesderive celsius = (fht - 32) / 1.8
\Line wrap1 + 2 + 3 +
\ 4 + 5
1,100_000,5e10Numbersderive { huge = 5e10 * 10_000 }
'',""Stringsderive name = 'Mary'
true,falseBooleansderive { Col1 = true }
nullNullfilter ( name != null )
@Dates & times@2021-01-01
` `Quoted identifiersselect `first name`
#Comments# A comment
==Self-equality in joinjoin s=salaries (==id)
->Function definitionslet add = a b -> a + b
=>Case statementcase [a==1 => c, a==2 => d]
+,-Sort ordersort {-amount, +date}
??Coalesceamount ?? 0

Literals

A literal is a constant value expression, with special syntax rules for each data type.

Numbers

Number literals can contain number characters as well as a period, underscores and char e.

If a number literal contains a dot or character e, it is treated as floating point number (or just float), otherwise it is treated as integer number.

Character e denotes “scientific notation”, where the number after e is the exponent in 10-base.

Underscores are ignored, so they can be placed at arbitrary positions, but it is advised to use them as thousand separators.

Integers can, alternatively, be expressed using hexadecimal, octal or binary notation using these prefixes respectively: 0x, 0o or 0b.

PRQL

from numbers
select {
    small = 1.000_000_1,
    big = 5_000_000,
    huge = 5e9,
    binary = 0x0011,
    hex = 0x80,
    octal = 0o777,
}

SQL

SELECT
  1.0000001 AS small,
  5000000 AS big,
  5000000000.0 AS huge,
  17 AS "binary",
  128 AS hex,
  511 AS octal
FROM
  numbers

Strings

PRQL supports string literals and several other formats of strings. See the Strings page for more information.

Booleans

Boolean values can be expressed with true or false keyword.

Null

The null value can be expressed with null keyword. See also the discussion of how PRQL handles nulls.

Date and time

Date and time literals are expressed with character @, followed by a string that encodes the date & time.

Note

PRQL’s notation is designed to be less verbose than SQL’s TIMESTAMP '2004-10-19 10:23:54' and more explicit than SQL’s implicit option that just uses a string '2004-10-19 10:23:54'.

Dates

Dates are represented by @{yyyy-mm-dd} — a @ followed by the date format.

PRQL

from employees
derive age_at_year_end = (@2022-12-31 - dob)

SQL

SELECT
  *,
  DATE '2022-12-31' - dob AS age_at_year_end
FROM
  employees

Times

Times are represented by @{HH:mm:ss.SSS±Z} with any parts not supplied defaulting to zero. This includes the timezone, which is represented by +HH:mm, -HH:mm or Z. This is consistent with the ISO8601 time format.

PRQL

from orders
derive should_have_shipped_today = (order_time < @08:30)

SQL

SELECT
  *,
  order_time < TIME '08:30' AS should_have_shipped_today
FROM
  orders

Timestamps

Timestamps are represented by @{yyyy-mm-ddTHH:mm:ss.SSS±Z} / @{date}T{time}, with any time parts not supplied being rounded to zero, including the timezone, which is represented by +HH:mm, -HH:mm or Z (: is optional). This is @ followed by the ISO8601 datetime format, which uses T to separate date & time.

PRQL

from commits
derive first_prql_commit = @2020-01-01T13:19:55-08:00
derive first_prql_commit_utc = @2020-01-02T21:19:55Z

SQL

SELECT
  *,
  TIMESTAMP '2020-01-01T13:19:55-0800' AS first_prql_commit,
  TIMESTAMP '2020-01-02T21:19:55Z' AS first_prql_commit_utc
FROM
  commits

Durations

Durations are represented by {N}{periods}, such as 2years or 10minutes, without a space.

Note

These aren’t the same as ISO8601, because we evaluated P3Y6M4DT12H30M5S to be difficult to understand, but we could support a simplified form if there’s demand for it. We don’t currently support compound expressions, for example 2years10months, but most DBs will allow 2years + 10months. Please raise an issue if this is inconvenient.

PRQL

from projects
derive first_check_in = start + 10days

SQL

SELECT
  *,
  "start" + INTERVAL 10 DAY AS first_check_in
FROM
  projects

Examples

Here’s a larger list of date and time examples:

  • @20221231 is invalid — it must contain full punctuation (- and :),
  • @2022-12-31 is a date
  • @2022-12 or @2022 are invalid — SQL can’t express a month, only a date
  • @16:54:32.123456 is a time
  • @16:54:32, @16:54, @16 are all allowed, expressing @16:54:32.000000, @16:54:00.000000, @16:00:00.000000 respectively
  • @2022-12-31T16:54:32.123456 is a timestamp without timezone
  • @2022-12-31T16:54:32.123456Z is a timestamp in UTC
  • @2022-12-31T16:54+02 is timestamp in UTC+2
  • @2022-12-31T16:54+02:00 and @2022-12-31T16:54+02 are datetimes in UTC+2
  • @16:54+02 is invalid — time is always local, so it cannot have a timezone
  • @2022-12-31+02 is invalid — date is always local, so it cannot have a timezone

Note

Currently prqlc does not parse or validate any of the datetime strings and will pass them to the database engine without adjustment. This might be refined in the future to aid in compatibility across databases. We’ll always support the canonical ISO8601 format described above.

Roadmap

Datetimes (as a distinct datatype from the timestamps) are supported by some databases (e.g. MySql, BigQuery). With the addition of type casts, these could be represented by a timestamp cast to a datetime:

derive pi_day = @2017-03-14T15:09:26.535898<datetime>

These are some examples we can then add:

  • @2022-12-31T16:54<datetime> is datetime without timezone
  • @2022-12-31<datetime> is forbidden — datetime must specify time
  • @16:54<datetime> is forbidden — datetime must specify date

Strings

String literals can use any matching odd number of either single or double quotes:

PRQL

from artists
derive {
  single        =   'hello world',
  double        =   "hello world",
  double_triple = """hello world""",
}

SQL

SELECT
  *,
  'hello world' AS single,
  'hello world' AS double,
  'hello world' AS double_triple
FROM
  artists

Quoting and escape characters

To quote a string containing quote characters, use the “other” type of quote, or use the escape character \, or use more quotes.

PRQL

from artists
select {
  other   = '"hello world"',
  escaped = "\"hello world\"",
  triple  = """I said "hello world"!""",
}

SQL

SELECT
  '"hello world"' AS other,
  '"hello world"' AS escaped,
  'I said "hello world"!' AS triple
FROM
  artists

Strings can contain any escape character sequences defined by the JSON standard.

PRQL

from artists
derive escapes = "\tXYZ\n \\ "                            # tab (\t), "XYZ", newline (\n), " ", \, " "
derive world = "\u{0048}\u{0065}\u{006C}\u{006C}\u{006F}" # "Hello"
derive hex = "\x48\x65\x6C\x6C\x6F"                       # "Hello"
derive turtle = "\u{01F422}"                              # "🐢"

SQL

SELECT
  *,
  '	XYZ
 \ ' AS escapes,
  'Hello' AS world,
  'Hello' AS hex,
  '🐢' AS turtle
FROM
  artists

Other string formats

  • F-strings - Build up a new string from a set of columns or values.
  • R-strings - Include the raw characters of the string without any form of escaping.
  • S-strings - Insert SQL statements directly into the query. Use when PRQL doesn’t have an equivalent facility.

Warning

Currently PRQL allows multiline strings with either a single character or multiple character quotes. This may change for strings using a single character quote in future versions.

Note

These escape rules specify how PRQL interprets escape characters when compiling strings to SQL, not necessarily how the database will interpret the string. Dialects interpret escape characters differently, and PRQL doesn’t currently account for these differences. Please open issues with any difficulties in the current implementation.

Escape sequences

Unless an r prefix is present, escape sequences in string literals are interpreted according to rules similar to those used by Standard C. The recognized escape sequences are:

Escape SequenceMeaning
\\Backslash ()
\'Single quote (’)
\"Double quote (“)
\bBackspace
\fFormfeed
\nASCII Linefeed (LF)
\rASCII Carriage Return (CR)
\tASCII Horizontal Tab (TAB)
\xhhCharacter with hex value hh
\u{xxxx}Character with hex value xxxx

F-strings

F-strings are a readable approach to building new strings from existing strings & variables.

PRQL

from employees
select full_name = f"{first_name} {last_name}"

SQL

SELECT
  CONCAT(first_name, ' ', last_name) AS full_name
FROM
  employees

This can be much easier to read for longer strings, relative to the SQL approach:

PRQL

from web
select url = f"http{tls}://www.{domain}.{tld}/{page}"

SQL

SELECT
  CONCAT(
    'http',
    tls,
    '://www.',
    domain,
    '.',
    tld,
    '/',
    page
  ) AS url
FROM
  web

Note that currently interpolations can only contain plain variable names and not whole expressions like Python, so this won’t work:

PRQL

from tracks
select length_str = f"{length_seconds / 60} minutes"

Error

Error:
   ╭─[:2:38]
   
 2 │ select length_str = f"{length_seconds / 60} minutes"
                                       ╰── interpolated string expected one of ".", ":" or "}", but found " "
───╯

Roadmap

In the future, f-strings may incorporate string formatting such as datetimes, numbers, and padding. If there’s a feature that would be helpful, please post an issue.

R-strings

R-strings handle escape characters without special treatment:

PRQL

from artists
derive normal_string =  "\\\t"   #  two characters - \ and tab (\t)
derive raw_string    = r"\\\t"   # four characters - \, \, \, and t

SQL

SELECT
  *,
  '\	' AS normal_string,
  '\\\t' AS raw_string
FROM
  artists

S-strings

An s-string inserts SQL directly, as an escape hatch when there’s something that PRQL doesn’t yet implement. For example, there’s a version() function in PostgreSQL that returns the PostgreSQL version, so if we want to use that, we use an s-string:

PRQL

from my_table
select db_version = s"version()"

SQL

SELECT
  version() AS db_version
FROM
  my_table

Embed a column name in an s-string using braces. For example, PRQL’s standard library defines the average function as:

let average = column -> s"AVG({column})"

So this compiles using the function:

PRQL

from employees
aggregate {average salary}

SQL

SELECT
  AVG(salary)
FROM
  employees

Note

Because S-string contents are SQL, double-quotes (") will denote a column name. To avoid that, use single-quotes (') around the SQL string, and adjust the quotes of the S-string. For example, instead of s'CONCAT("hello", "world")' use s"CONCAT('hello', 'world')"

Here’s an example of a more involved use of an s-string:

PRQL

from de=dept_emp
join s=salaries side:left (s.emp_no == de.emp_no && s"""
  ({s.from_date}, {s.to_date})
  OVERLAPS
  ({de.from_date}, {de.to_date})
""")

SQL

SELECT
  de.*,
  s.*
FROM
  dept_emp AS de
  LEFT JOIN salaries AS s ON s.emp_no = de.emp_no
  AND (s.from_date, s.to_date) OVERLAPS (de.from_date, de.to_date)

For those who have used Python, s-strings are similar to Python’s f-strings, but the result is SQL code, rather than a string literal. For example, a Python f-string of f"average({col})" would produce "average(salary)", with quotes; while in PRQL, s"average({col})" produces average(salary), without quotes.

Note that interpolations can only contain plain variable names and not whole expression like Python.

We can also use s-strings to produce a full table:

PRQL

from s"SELECT DISTINCT ON first_name, id, age FROM employees ORDER BY age ASC"
join s = s"SELECT * FROM salaries" (==id)

SQL

WITH table_0 AS (
  SELECT
    DISTINCT ON first_name,
    id,
    age
  FROM
    employees
  ORDER BY
    age ASC
),
table_1 AS (
  SELECT
    *
  FROM
    salaries
)
SELECT
  table_0.*,
  table_1.*
FROM
  table_0
  JOIN table_1 ON table_0.id = table_1.id

Note

S-strings in user code are intended as an escape hatch for an unimplemented feature. If we often need s-strings to express something, that’s a sign we should implement it in PRQL or PRQL’s stdlib. If you often require an s-string, submit an issue with your use case.

Braces

To output braces from an s-string, use double braces:

PRQL

from employees
derive {
  has_valid_title = s"regexp_contains(title, '([a-z0-9]*-){{2,}}')"
}

SQL

SELECT
  *,
  regexp_contains(title, '([a-z0-9]*-){2,}') AS has_valid_title
FROM
  employees

Precedence within s-strings

Variables in s-strings are inserted into the SQL source as-is, which means we may get surprising behavior when the variable has multiple terms and the s-string isn’t parenthesized.

In this toy example, the expression salary + benefits / 365 gets precedence wrong. The generated SQL code is as if we had written salary + (benefits / 365).

PRQL

from employees
derive {
  gross_salary = salary + benefits,
  daily_rate = s"{gross_salary} / 365"
}

SQL

SELECT
  *,
  salary + benefits AS gross_salary,
  salary + benefits / 365 AS daily_rate
FROM
  employees

Instead, the numerator {gross_salary} must be encased in parentheses:

PRQL

from employees
derive {
  gross_salary = salary + benefits,
  daily_rate = s"({gross_salary}) / 365"
}

SQL

SELECT
  *,
  salary + benefits AS gross_salary,
  (salary + benefits) / 365 AS daily_rate
FROM
  employees

Tuples

Tuple is a container type, composed of multiple fields. Each field can have a different type. Number of fields and their types must be known at compile time.

Tuple is represented by {}. It can span multiple lines. Fields can be assigned a name. Fields are separated by commas, trailing trailing comma is optional.

let var1 = {x = 1, y = 2}

let var2 = {           # Span multiple lines
  a = x,
  b = y                # Optional trailing comma
}

let var3 = {
  c,                   # Individual item
  d = b,               # Assignment
}

Tuples are the type of a table row, which means that they are expected by many transforms. Most transforms can also take a single field, which will be converted into a tuple. These are equivalent:

PRQL

from employees
select {first_name}

SQL

SELECT
  first_name
FROM
  employees

PRQL

from employees
select first_name

SQL

SELECT
  first_name
FROM
  employees

Note

Prior to 0.9.0, tuples were previously named Lists, and represented with [] syntax. There may still be references to the old naming.

Arrays

Array is a container type, composed of multiple items. All items must be of the same type. Number of fields can be vary.

Warning

This page is a stub.

Identifiers & keywords

Identifiers can contain alphanumeric characters and _ and must not start with a number. They can be chained together with the . lookup operator, used to retrieve a tuple from a field or a variable from a module.

hello

_h3llo

hello.world

this & that

this refers to the current relation:

PRQL

from invoices
aggregate (
    count this
)

SQL

SELECT
  COUNT(*)
FROM
  invoices

Within a join, that refers to the other table:

PRQL

from invoices
join tracks (this.track_id==that.id)

SQL

SELECT
  invoices.*,
  tracks.*
FROM
  invoices
  JOIN tracks ON invoices.track_id = tracks.id

this can also be used to remove any column ambiguity. For example, currently using a bare time as a column name will fail, because it’s also a type:

PRQL

from invoices
derive t = time

Error

Error:
   ╭─[:2:12]
   
 2 │ derive t = time
             ──┬─
               ╰─── expected a value, but found a type
───╯

But with this.time, we can remove the ambiguity:

PRQL

from invoices
derive t = this.time

SQL

SELECT
  *,
  time AS t
FROM
  invoices

Quoting

To use characters that would be otherwise invalid, identifiers can be surrounded by with backticks.

When compiling to SQL, these identifiers will use dialect-specific quotes and quoting rules.

PRQL

prql target:sql.mysql
from employees
select `first name`

SQL

SELECT
  `first name`
FROM
  employees

PRQL

prql target:sql.postgres
from employees
select `first name`

SQL

SELECT
  "first name"
FROM
  employees

PRQL

prql target:sql.bigquery

from `project-foo.dataset.table`
join `project-bar.dataset.table` (==col_bax)

SQL

SELECT
  `project-foo.dataset.table`.*,
  `project-bar.dataset.table`.*
FROM
  `project-foo.dataset.table`
  JOIN `project-bar.dataset.table` ON `project-foo.dataset.table`.col_bax = `project-bar.dataset.table`.col_bax

Schemas & database names

Identifiers of database tables can be prefixed with schema and databases names.

PRQL

from my_database.chinook.albums

SQL

SELECT
  *
FROM
  my_database.chinook.albums

Note that all of following identifiers will be treated as separate table definitions: tracks, public.tracks, my_database.public.tracks.

Keywords

PRQL uses following keywords:

  • prql - query header more…
  • let - variable definition more…
  • into - variable definition more…
  • case - flow control more…
  • type - type declaration
  • func - explicit function declaration more…
  • module - used internally
  • internal - used internally
  • true - boolean more…
  • false - boolean more…
  • null - NULL more…

Keywords can be used as identifiers (of columns or variables) when encased in backticks: `case`.

Transforms are normal functions within the std namespace, not keywords. That is, std.from is the same function as from. In the example below, the resulting query is the same as without the std. namespace:

PRQL

std.from my_table
std.select {from = my_table.a, take = my_table.b}
std.take 3

SQL

SELECT
  a AS "from",
  b AS take
FROM
  my_table
LIMIT
  3

Function calls

Simple

A distinction between PRQL and most other programming languages is the function call syntax. It consists of the function name followed by arguments separated by whitespace.

function_name arg1 arg2 arg3

If one of the arguments is also a function call, it must be encased in parentheses, so we know where arguments of inner function end and the arguments of outer function start.

outer_func arg_1 (inner_func arg_a, arg_b) arg_2

The function name must refer to a function variable, which has either been declared in the standard library or some other module.

Function calls can also specify named parameters using : notation:

function_name arg1 named_param:arg2 arg3

Pipeline

There is a alternative way of calling functions: using a pipeline. Regardless of whether the pipeline is delimited by pipe symbol | or a new line, the pipeline is equivalent to applying each of functions as the last argument of the next function.

a | foo 3 | bar 'hello' 'world' | baz

… is equivalent to …

baz (bar 'hello' 'world' (foo 3 a))

Pipes

Pipes are the connection between transforms that make up a pipeline. The relation produced by a transform before the pipe is used as the input for the transform following the pipe. A pipe can be represented with either a line break or a pipe character (|).

For example, here the filter transform operates on the result of from employees (which is just the employees table), and the select transform operates on the result of the filter transform.

PRQL

from employees
filter department == "Product"
select {first_name, last_name}

SQL

SELECT
  first_name,
  last_name
FROM
  employees
WHERE
  department = 'Product'

In the place of a line break, it’s also possible to use the | character to pipe results between transforms, such that this is equivalent:

PRQL

from employees | filter department == "Product" | select {first_name, last_name}

SQL

SELECT
  first_name,
  last_name
FROM
  employees
WHERE
  department = 'Product'

“C’est ne pas un pipe”

In almost all situations, a line break acts as a pipe. But there are a few cases where a line break doesn’t act as a pipe.

  • before or after tuple items
  • before or after list items
  • before a new statement, which starts with let or from (or func)
  • within a line wrap

For example:

PRQL

[
  {a=2}      # No pipe from line break before & after this list item
]
derive {
  c = 2 * a, # No pipe from line break before & after this tuple item
}

SQL

WITH table_0 AS (
  SELECT
    2 AS a
)
SELECT
  a,
  2 * a AS c
FROM
  table_0

PRQL

let b =
  \ 3        # No pipe from line break within this line wrap

# No pipe from line break before this `from` statement

from y
derive a = b

SQL

SELECT
  *,
  3 AS a
FROM
  y

Inner Transforms

Parentheses are also used for transforms (such as group and window) that pass their result to an “inner transform”. The example below applies the aggregate pipeline to each group of unique title and country values:

PRQL

from employees
group {title, country} (
  aggregate {
    average salary,
    ct = count salary,
  }
)

SQL

SELECT
  title,
  country,
  AVG(salary),
  COUNT(*) AS ct
FROM
  employees
GROUP BY
  title,
  country

Operators

Expressions can be composed from function calls and operations, such as 2 + 3 or ((1 + x) * -y). In the example below, note the use of expressions to calculate the alias circumference and in the filter transform.

PRQL

from foo
select {
  circumference = diameter * 3.14159,
  area = (diameter / 2) ** 2,
  color,
}
filter circumference > 10 && color != "red"

SQL

WITH table_0 AS (
  SELECT
    diameter * 3.14159 AS circumference,
    POW(diameter / 2, 2) AS area,
    color
  FROM
    foo
)
SELECT
  circumference,
  area,
  color
FROM
  table_0
WHERE
  circumference > 10
  AND color <> 'red'

Operator precedence

This table shows operator precedence. Use parentheses () to prioritize operations and for function calls (see the discussion below.)

GroupOperatorsPrecedenceAssociativity
parentheses()0see below
identifier dot.1
unary- + ! ==2
range..3
pow**4right-to-left
mul* / // %5left-to-right
add+ -6left-to-right
compare== != <= >= < >7left-to-right
coalesce??8left-to-right
and&&9left-to-right
or||10left-to-right
function call11

Division and integer division

The / operator performs division that always returns a float value, while the // operator does integer division (truncated division) that always returns an integer value.

PRQL

prql target:sql.sqlite

from [
  {a = 5, b = 2},
  {a = 5, b = -2},
]
select {
  div_out = a / b,
  int_div_out = a // b,
}

SQL

WITH table_0 AS (
  SELECT
    5 AS a,
    2 AS b
  UNION
  ALL
  SELECT
    5 AS a,
    -2 AS b
)
SELECT
  (a * 1.0 / b) AS div_out,
  ROUND(ABS(a / b) - 0.5) * SIGN(a) * SIGN(b) AS int_div_out
FROM
  table_0

Coalesce

We can coalesce values with an ?? operator. Coalescing takes either the first value or, if that value is null, the second value.

PRQL

from orders
derive amount ?? 0

SQL

SELECT
  *,
  COALESCE(amount, 0)
FROM
  orders

Regex expressions

Note

This is currently experimental

To perform a case-sensitive regex search, use the ~= operator. This generally compiles to REGEXP, though differs by dialect. A regex search means that to match an exact value, the start and end need to be anchored with ^foo$.

PRQL

from tracks
filter (name ~= "Love")

SQL

SELECT
  *
FROM
  tracks
WHERE
  REGEXP(name, 'Love')

PRQL

prql target:sql.duckdb

from artists
filter (name ~= "Love.*You")

SQL

SELECT
  *
FROM
  artists
WHERE
  REGEXP_MATCHES(name, 'Love.*You')

PRQL

prql target:sql.bigquery

from tracks
filter (name ~= "\\bLove\\b")

SQL

SELECT
  *
FROM
  tracks
WHERE
  REGEXP_CONTAINS(name, '\bLove\b')

PRQL

prql target:sql.postgres

from tracks
filter (name ~= "\\(I Can't Help\\) Falling")

SQL

SELECT
  *
FROM
  tracks
WHERE
  name ~ '\(I Can''t Help\) Falling'

PRQL

prql target:sql.mysql

from tracks
filter (name ~= "With You")

SQL

SELECT
  *
FROM
  tracks
WHERE
  REGEXP_LIKE(name, 'With You', 'c')

PRQL

prql target:sql.sqlite

from tracks
filter (name ~= "But Why Isn't Your Syntax More Similar\\?")

SQL

SELECT
  *
FROM
  tracks
WHERE
  name REGEXP 'But Why Isn''t Your Syntax More Similar\?'

Parentheses

PRQL uses parentheses () for several purposes:

  • Parentheses group operands to control the order of evaluation, for example: ((1 + x) * y)

  • Parentheses delimit a minus sign of a function argument, for example: add (-1) (-3)

  • Parentheses delimit nested function calls that contain a pipe, either the | symbol or a new line. “Nested” means within a transform; i.e. not just the main pipeline, for example: (column-name | in 0..20)

  • Parentheses wrap a function call that is part of a larger expression, for example: math.round 0 (sum distance)

Parentheses are not required for expressions that do not contain function calls, for example: foo + bar.

Here’s a set of examples of these rules:

PRQL

from employees
# Requires parentheses, because it contains a pipe
derive is_proximate = (distance | in 0..20)
# Requires parentheses, because it's a function call
derive total_distance = (sum distance)
# `??` doesn't require parentheses, as it's not a function call
derive min_capped_distance = (min distance ?? 5)
# No parentheses needed, because no function call
derive travel_time = distance / 40
# No inner parentheses needed around `1+1` because no function call
derive distance_rounded_2_dp = (math.round 1+1 distance)
derive {
  # Requires parentheses, because it contains a pipe
  is_far = (distance | in 100..),
  # The left value of the range requires parentheses,
  # because of the minus sign
  is_negative = (distance | in (-100..0)),
  # ...this is equivalent
  is_negative = (distance | in (-100)..0),
  # _Technically_, this doesn't require parentheses, because it's
  # the RHS of an assignment in a tuple
  # (this is especially confusing)
  average_distance = average distance,
}
# Requires parentheses because of the minus sign
sort (-distance)
# A tuple is fine too
sort {-distance}

SQL

SELECT
  *,
  distance BETWEEN 0 AND 20 AS is_proximate,
  SUM(distance) OVER () AS total_distance,
  MIN(COALESCE(distance, 5)) OVER () AS min_capped_distance,
  distance / 40 AS travel_time,
  ROUND(distance, 1 + 1) AS distance_rounded_2_dp,
  distance >= 100 AS is_far,
  distance BETWEEN -100 AND 0,
  distance BETWEEN -100 AND 0 AS is_negative,
  AVG(distance) OVER () AS average_distance
FROM
  employees
ORDER BY
  distance DESC

For example, the snippet below produces an error because the sum function call is not in a tuple.

PRQL

from employees
derive total_distance = sum distance

Error

Error:
   ╭─[:2:29]
   
 2 │ derive total_distance = sum distance
                              ────┬───
                                  ╰───── Unknown name `distance`
───╯

…while with parentheses, it works at expected:

PRQL

from employees
derive other_distance = (sum distance)

SQL

SELECT
  *,
  SUM(distance) OVER () AS other_distance
FROM
  employees

Note

We’re continuing to think whether these rules can be more intuitive. We’re also planning to make the error messages much better, so the compiler can help out.

Wrapping lines

Line breaks in PRQL have semantic meaning, so to wrap a single logical line into multiple physical lines, we can use \ at the beginning of subsequent physical lines:

PRQL

from artists
select is_europe =
\ country == "DE"
\ || country == "FR"
\ || country == "ES"

SQL

SELECT
  country = 'DE'
  OR country = 'FR'
  OR country = 'ES' AS is_europe
FROM
  artists

Wrapping will “jump over” empty lines or lines with comments. For example, the select here is only one logical line:

PRQL

from tracks
# This would be a really long line without being able to split it:
select listening_time_years = (spotify_plays + apple_music_plays + pandora_plays)
# We can toggle between lines when developing:
# \ * length_seconds
\ * length_s
#   min  hour day  year
\ / 60 / 60 / 24 / 365

SQL

SELECT
  (
    spotify_plays + apple_music_plays + pandora_plays
  ) * length_s / 60 / 60 / 24 / 365 AS listening_time_years
FROM
  tracks

Info

Note that PRQL differs from most languages, which use a \ at the end of the preceding line. Because PRQL aims to be friendly for data exploration, we want to make it possible to comment out any line, including the final line, without breaking the query. This requires all lines after the first to be structured similarly, and for the character to be at the start of each following line.

See Pipes for more details on line breaks.

Case

Search for the first condition that evaluates to true and return its associated value. If none of the conditions match, null is returned.

PRQL

from employees
derive distance = case [
  city == "Calgary" => 0,
  city == "Edmonton" => 300,
]

SQL

SELECT
  *,
  CASE
    WHEN city = 'Calgary' THEN 0
    WHEN city = 'Edmonton' THEN 300
    ELSE NULL
  END AS distance
FROM
  employees

To set a default, a true condition can be used:

PRQL

from employees
derive distance = case [
  city == "Calgary" => 0,
  city == "Edmonton" => 300,
  true => "Unknown",
]

SQL

SELECT
  *,
  CASE
    WHEN city = 'Calgary' THEN 0
    WHEN city = 'Edmonton' THEN 300
    ELSE 'Unknown'
  END AS distance
FROM
  employees

Ranges

Range start..end represents as set of values between start and end, inclusive (greater of equal to start and less than or equal to end).

To express a range that is open on one side, either start or end can be omitted.

Ranges can be used in filters with the in function, with any type of literal, including dates:

PRQL

from events
filter (created_at | in @1776-07-04..@1787-09-17)
filter (magnitude | in 50..100)
derive is_northern = (latitude | in 0..)

SQL

SELECT
  *,
  latitude >= 0 AS is_northern
FROM
  events
WHERE
  created_at BETWEEN DATE '1776-07-04' AND DATE '1787-09-17'
  AND magnitude BETWEEN 50 AND 100

Ranges can also be used in take:

PRQL

from orders
sort {-value, created_at}
take 101..110

SQL

SELECT
  *
FROM
  orders
ORDER BY
  value DESC,
  created_at
LIMIT
  10 OFFSET 100

Note

Half-open ranges are generally less intuitive to read than a simple >= or <= operator.

See also

Roadmap

We’d like to use ranges for other types, such as whether an object is in an array or list literal.

Comments

Character # denotes a comment until the end of the line.

PRQL

from employees  # Comment 1
# Comment 2
aggregate {average salary}

SQL

SELECT
  AVG(salary)
FROM
  employees

There’s no distinct multiline comment syntax.

Parameters

Parameter is a placeholder for a value provided after the compilation of the query.

It uses the following syntax: $id, where id is an arbitrary alpha numeric string.

Most database engines only support numeric positional parameter ids (i.e $3).

PRQL

from employees
filter id == $1

SQL

SELECT
  *
FROM
  employees
WHERE
  id = $1

Importing data

From

Specifies a data source.

PRQL

from artists

SQL

SELECT
  *
FROM
  artists

To introduce an alias, use an assign expression:

PRQL

from e = employees
select e.first_name

SQL

SELECT
  first_name
FROM
  employees AS e

Table names containing spaces or special characters need to be contained within backticks:

PRQL

from `artist tracks`

SQL

SELECT
  *
FROM
  "artist tracks"

default_db.tablename can be used if the table name matches a function from the standard library.

Note

We realize this is an awkward workaround. Track & 👍 #3271 for resolving this.

PRQL

default_db.group  # in place of `from group`
take 1

SQL

SELECT
  *
FROM
  "group"
LIMIT
  1

Reading files

There are a couple of functions mainly designed for DuckDB to read from files:

PRQL

prql target:sql.duckdb

from (read_parquet "artists.parquet")
join (read_csv "albums.csv") (==track_id)

SQL

WITH table_0 AS (
  SELECT
    *
  FROM
    read_parquet('artists.parquet')
),
table_1 AS (
  SELECT
    *
  FROM
    read_csv_auto('albums.csv')
)
SELECT
  table_0.*,
  table_1.*
FROM
  table_0
  JOIN table_1 ON table_0.track_id = table_1.track_id

Note

These don’t currently have all the DuckDB options. If those would be helpful, please log an issue and it’s a fairly easy addition.

Info

We may be able to reduce the boilerplate WITH table_x AS SELECT * FROM... in future versions.

When specifying file names directly in the FROM clause without using functions, which is allowed in DuckDB, enclose the file names in backticks `` as follows:

PRQL

from `artists.parquet`

SQL

SELECT
  *
FROM
  "artists.parquet"

See also

How do I: create ad-hoc relations?

It’s often useful to make a small inline relation, for example when exploring how a database will evaluate an expression, or for a small lookup table. This can be quite verbose in SQL.

PRQL offers two approaches — array literals, and a from_text transform.

Array literals

Because relations (aka a table) in PRQL are just arrays of tuples, they can be expressed with array and tuple syntax:

PRQL

from [
  {a=5, b=false},
  {a=6, b=true},
]
filter b == true
select a

SQL

WITH table_0 AS (
  SELECT
    5 AS a,
    false AS b
  UNION
  ALL
  SELECT
    6 AS a,
    true AS b
)
SELECT
  a
FROM
  table_0
WHERE
  b = true

PRQL

let my_artists = [
  {artist="Miles Davis"},
  {artist="Marvin Gaye"},
  {artist="James Brown"},
]

from artists
join my_artists (==artist)
join albums (==artist_id)
select {artists.artist_id, albums.title}

SQL

WITH table_0 AS (
  SELECT
    'Miles Davis' AS artist
  UNION
  ALL
  SELECT
    'Marvin Gaye' AS artist
  UNION
  ALL
  SELECT
    'James Brown' AS artist
),
my_artists AS (
  SELECT
    artist
  FROM
    table_0
)
SELECT
  artists.artist_id,
  albums.title
FROM
  artists
  JOIN my_artists ON artists.artist = my_artists.artist
  JOIN albums ON artists.artist_id = albums.artist_id

from_text

from_text takes a string in a common format, and converts it to table. It accepts a few formats:

  • format:csv parses CSV (default),

  • format:json parses either:

    • an array of objects each of which represents a row, or

    • an object with fields columns & data, where columns take an array of column names and data takes an array of arrays.

PRQL

from_text """
a,b,c
1,2,3
4,5,6
"""
derive {
    d = b + c,
    answer = 20 * 2 + 2,
}

SQL

WITH table_0 AS (
  SELECT
    '1' AS a,
    '2' AS b,
    '3' AS c
  UNION
  ALL
  SELECT
    '4' AS a,
    '5' AS b,
    '6' AS c
)
SELECT
  a,
  b,
  c,
  b + c AS d,
  20 * 2 + 2 AS answer
FROM
  table_0

PRQL

from_text format:json """
[
    {"a": 1, "m": "5"},
    {"a": 4, "n": "6"}
]
"""

SQL

WITH table_0 AS (
  SELECT
    1 AS a,
    '5' AS m
  UNION
  ALL
  SELECT
    4 AS a,
    NULL AS m
)
SELECT
  a,
  m
FROM
  table_0

PRQL

from_text format:json """
{
    "columns": ["a", "b", "c"],
    "data": [
        [1, "x", false],
        [4, "y", null]
    ]
}
"""

SQL

WITH table_0 AS (
  SELECT
    1 AS a,
    'x' AS b,
    false AS c
  UNION
  ALL
  SELECT
    4 AS a,
    'y' AS b,
    NULL AS c
)
SELECT
  a,
  b,
  c
FROM
  table_0

See also

Variables — let & into

Variables assign a name — say x — to an expression, like in most programming languages. The name can then be used in any expression, acting as a substitute for the expression x.

Syntactically, variables can take 3 forms.

  • let declares the name before the expression.

    let my_name = x
    
  • into declares the name after the expression. This form is useful for quick pipeline splitting and conforms with the “flow from top to bottom” rule of pipelines.

    x
    into my_name
    
  • The final expression of a pipeline defaults to taking the name main.

    from x
    

    … is equivalent to:

    let main = x
    

When compiling to SQL, relational variables are compiled to Common Table Expressions (or sub-queries in some cases).

PRQL

let top_50 = (
  from employees
  sort salary
  take 50
  aggregate {total_salary = sum salary}
)

from top_50      # Starts a new pipeline

SQL

WITH table_0 AS (
  SELECT
    salary
  FROM
    employees
  ORDER BY
    salary
  LIMIT
    50
), top_50 AS (
  SELECT
    COALESCE(SUM(salary), 0) AS total_salary
  FROM
    table_0
)
SELECT
  total_salary
FROM
  top_50

PRQL

from employees
take 50
into first_50

from first_50

SQL

WITH first_50 AS (
  SELECT
    *
  FROM
    employees
  LIMIT
    50
)
SELECT
  *
FROM
  first_50

Variables can be assigned an s-string containing the whole SQL query s-string, enabling us to use features which PRQL doesn’t yet support.

PRQL

let grouping = s"""
  SELECT SUM(a)
  FROM tbl
  GROUP BY
    GROUPING SETS
    ((b, c, d), (d), (b, d))
"""

from grouping

SQL

WITH table_0 AS (
  SELECT
    SUM(a)
  FROM
    tbl
  GROUP BY
    GROUPING SETS ((b, c, d), (d), (b, d))
)
SELECT
  *
FROM
  table_0

Functions

Functions have two types of parameters:

  1. Positional parameters, which require an argument.
  2. Named parameters, which optionally take an argument, otherwise using their default value.

So this function is named fahrenheit_to_celsius and has one parameter temp:

PRQL

let fahrenheit_to_celsius = temp -> (temp - 32) / 1.8

from cities
derive temp_c = (fahrenheit_to_celsius temp_f)

SQL

SELECT
  *,
  (temp_f - 32) / 1.8 AS temp_c
FROM
  cities

This function is named interp, and has two positional parameters named high and x, and one named parameter named low which takes a default argument of 0. It calculates the proportion of the distance that x is between low and high.

PRQL

let interp = low:0 high x -> (x - low) / (high - low)

from students
derive {
  sat_proportion_1 = (interp 1600 sat_score),
  sat_proportion_2 = (interp low:0 1600 sat_score),
}

SQL

SELECT
  *,
  (sat_score - 0) / (1600 - 0) AS sat_proportion_1,
  (sat_score - 0) / (1600 - 0) AS sat_proportion_2
FROM
  students

Other examples

PRQL

let is_adult = col -> col >= 18
let writes_code = col -> (col | in ["PRQL", "Rust"])
let square = col -> (col | math.pow 2)
let starts_with_a = col -> (col | text.lower | text.starts_with("a"))

from employees
select {
    first_name,
    last_name,
    hobby,
    adult = is_adult age,
    age_squared = square age,
}
filter ((starts_with_a last_name) && (writes_code hobby))

SQL

WITH table_0 AS (
  SELECT
    first_name,
    last_name,
    hobby,
    age >= 18 AS adult,
    POW(age, 2) AS age_squared
  FROM
    employees
)
SELECT
  first_name,
  last_name,
  hobby,
  adult,
  age_squared
FROM
  table_0
WHERE
  LOWER(last_name) LIKE CONCAT('a', '%')
  AND hobby IN ('PRQL', 'Rust')

Piping values into functions

Consistent with the principles of PRQL, it’s possible to pipe values into functions, which makes composing many functions more readable. When piping a value into a function, the value is passed as an argument to the final positional parameter of the function. Here’s the same result as the examples above with an alternative construction:

PRQL

let interp = low:0 high x -> (x - low) / (high - low)

from students
derive {
  sat_proportion_1 = (sat_score | interp 1600),
  sat_proportion_2 = (sat_score | interp low:0 1600),
}

SQL

SELECT
  *,
  (sat_score - 0) / (1600 - 0) AS sat_proportion_1,
  (sat_score - 0) / (1600 - 0) AS sat_proportion_2
FROM
  students

and

PRQL

let fahrenheit_to_celsius = temp -> (temp - 32) / 1.8

from cities
derive temp_c = (temp_f | fahrenheit_to_celsius)

SQL

SELECT
  *,
  (temp_f - 32) / 1.8 AS temp_c
FROM
  cities

We can combine a chain of functions, which makes logic more readable:

PRQL

let fahrenheit_to_celsius = temp -> (temp - 32) / 1.8
let interp = low:0 high x -> (x - low) / (high - low)

from kettles
derive boiling_proportion = (temp_c | fahrenheit_to_celsius | interp 100)

SQL

SELECT
  *,
  ((temp_c - 32) / 1.8 - 0) / (100 - 0) AS boiling_proportion
FROM
  kettles

Late binding

Functions can bind to any variable that is in scope when the function is executed. For example, here cost_total refers to the column that’s introduced in the from.

PRQL

let cost_share = cost -> cost / cost_total

from costs
select {materials, labor, overhead, cost_total}
derive {
  materials_share = (cost_share materials),
  labor_share = (cost_share labor),
  overhead_share = (cost_share overhead),
}

SQL

SELECT
  materials,
  labor,
  overhead,
  cost_total,
  materials / cost_total AS materials_share,
  labor / cost_total AS labor_share,
  overhead / cost_total AS overhead_share
FROM
  costs

Standard library

The standard library currently contains commonly used functions that are used in SQL. It’s not yet as broad as we’d like, and we’re very open to expanding it.

Currently s-strings are an escape-hatch for any function that isn’t in our standard library. If we find ourselves using them for something frequently, raise an issue and we’ll add it to the stdlib.

Here’s the source of the current PRQL std:

Note

PRQL 0.9.0 has started supporting different DB implementations for standard library functions. The source is the std.sql.

# The PRQL standard library defines the following functions and transforms.
# The definitions are whitespace insensitive, and have this form:
#
# ```
# let my_func = param1 param2 ...  -> <return_type> body_expr
# ```
#
# Where:
#   * `my_func` is the name of the function
#   * `param1` is the first parameter optionally followed by a type in "< ... >"
#   * `param2` etc. follow the same pattern as param1
#   * `<return_type>` is the type of result wrapped in "< ... >"
#   * `body_expr` defines the function body that creates the result.
#     It can be PRQL code or `internal ...` to indicate internal compiler code.

# Operators

let mul = left right -> <int || float> internal std.mul
let div_i = left right -> <int || float> internal std.div_i
let div_f = left right -> <int || float> internal std.div_f
let mod = left right -> <int || float> internal std.mod
let add = left right -> <int || float || timestamp || date> internal std.add
let sub = left right -> <int || float || timestamp || date> internal std.sub
let eq = left right -> <bool> internal std.eq
let ne = left right -> <bool> internal std.ne
let gt = left right -> <bool> internal std.gt
let lt = left right -> <bool> internal std.lt
let gte = left right -> <bool> internal std.gte
let lte = left right -> <bool> internal std.lte
let and = left<bool> right<bool> -> <bool> internal std.and
let or = left<bool> right<bool> -> <bool> internal std.or
let coalesce = left right -> internal std.coalesce
let regex_search = text pattern -> <bool> internal std.regex_search

let neg = expr<int || float> -> <int || float> internal std.neg
let not = expr<bool> -> <bool> internal std.not

# Types

## Type primitives
type int = int
type float = float
type bool = bool
type text = text
type date = date
type time = time
type timestamp = timestamp
type `func` = func
type anytype = anytype

## Generic array
# TODO: an array of anything, not just nulls
type array = [anytype]

## Scalar
type scalar = int || float || bool || text || date || time || timestamp || null
type tuple = {..anytype}

## Range
type range = {start = scalar, end = scalar}

## Relation (an array of tuples)
type relation = [tuple]

## Transform
type transform = func relation -> relation

# Functions

## Relational transforms
let from = func
  `default_db.source` <relation>
  -> <relation> source

let select = func
  columns <anytype>
  tbl <relation>
  -> <relation> internal select

let filter = func
  condition <bool>
  tbl <relation>
  -> <relation> internal filter

let derive = func
  columns <anytype>
  tbl <relation>
  -> <relation> internal derive

let aggregate = func
  columns <anytype>
  tbl <relation>
  -> <relation> internal aggregate

let sort = func
  by <anytype>
  tbl <relation>
  -> <relation> internal sort

let take = func
  expr <anytype>
  tbl <relation>
  -> <relation> internal take

let join = func
  `default_db.with` <relation>
  condition <bool>
  `noresolve.side`:inner
  tbl <relation>
  -> <relation> internal join

let group = func
  by<scalar || tuple>
  pipeline <transform>
  tbl <relation>
  -> <relation> internal group

let window = func
  rows:0..-1
  range:0..-1
  expanding <bool>:false
  rolling <int>:0
  pipeline <transform>
  tbl <relation>
  -> <relation> internal window

let append = `default_db.bottom`<relation> top<relation> -> <relation> internal append
let intersect = `default_db.bottom`<relation> top<relation> -> <relation> (
  t = top
  join (b = bottom) (tuple_every (tuple_map _eq (tuple_zip t.* b.*)))
  select t.*
)
let remove = `default_db.bottom`<relation> top<relation> -> <relation> (
  t = top
  join side:left (b = bottom) (tuple_every (tuple_map _eq (tuple_zip t.* b.*)))
  filter (tuple_every (tuple_map _is_null b.*))
  select t.*
)
let loop = func
  pipeline <transform>
  top <relation>
  -> <relation> internal loop

## Aggregate functions
# These return either a scalar when used within `aggregate`, or a column when used anywhere else.

let min = column <array> -> <int || float || null> internal std.min

let max = column <array> -> <int || float || null> internal std.max

let sum = column <array> -> <int || float> internal std.sum

let average = column <array> -> <float || null> internal std.average

let stddev = column <array> -> <float || null> internal std.stddev

let all = column <array> -> <bool> internal std.all

let any = column <array> -> <bool> internal std.any

let concat_array = column <array> -> <text> internal std.concat_array

# Counts number of items in the column.
# Note that the count will include null values.
let count = column<array> -> <int> internal count

# Deprecated in favour of filterning input to the [std.count] function (not yet implemented).
@{deprecated}
let count_distinct = column <array> -> internal std.count_distinct

## Window functions
let lag =   offset <int>    column <array> -> internal std.lag
let lead =  offset <int>    column <array> -> internal std.lead
let first      = column <array> -> internal std.first
let last       = column <array> -> internal std.last
let rank       = column <array> -> internal std.rank
let rank_dense = column <array> -> internal std.rank_dense
let row_number = column <array> -> internal row_number

# Mathematical functions
module math {
  let abs = column -> <int || float> internal std.math.abs
  let floor = column -> <int> internal std.math.floor
  let ceil = column -> <int> internal std.math.ceil
  let pi = -> <float> internal std.math.pi
  let exp = column -> <int || float> internal std.math.exp
  let ln = column -> <int || float> internal std.math.ln
  let log10 = column -> <int || float> internal std.math.log10
  let log = base<int || float> column -> <int || float> internal std.math.log
  let sqrt = column -> <int || float> internal std.math.sqrt
  let degrees = column -> <int || float> internal std.math.degrees
  let radians = column -> <int || float> internal std.math.radians
  let cos = column -> <int || float> internal std.math.cos
  let acos = column -> <int || float> internal std.math.acos
  let sin = column -> <int || float> internal std.math.sin
  let asin = column -> <int || float> internal std.math.asin
  let tan = column -> <int || float> internal std.math.tan
  let atan = column -> <int || float> internal std.math.atan
  let pow = exponent<int || float> column -> <int || float> internal std.math.pow
  let round = n_digits column -> <scalar> internal std.math.round
}

## Misc functions
let as = `noresolve.type` column -> <scalar> internal std.as
let in = pattern value -> <bool> internal in

## Tuple functions
let tuple_every = func list -> <bool> internal tuple_every
let tuple_map = func fn <func> list -> internal tuple_map
let tuple_zip = func a b -> internal tuple_zip
let _eq = func a -> internal _eq
let _is_null = func a -> _param.a == null

## Misc
let from_text = input<text> `noresolve.format`:csv -> <relation> internal from_text

## Text functions
module text {
  let lower = column -> <text> internal std.text.lower
  let upper = column -> <text> internal std.text.upper
  let ltrim = column -> <text> internal std.text.ltrim
  let rtrim = column -> <text> internal std.text.rtrim
  let trim = column -> <text> internal std.text.trim
  let length = column -> <int> internal std.text.length
  let extract = offset<int> length<int> column -> <text> internal std.text.extract
  let replace = pattern<text> replacement<text> column -> <text> internal std.text.replace
  let starts_with = prefix<text> column -> <bool> internal std.text.starts_with
  let contains = substr<text> column -> <bool> internal std.text.contains
  let ends_with = suffix<text> column -> <bool> internal std.text.ends_with
}

## Date functions
module date {
  let to_text = format<text> column -> <text> internal std.date.to_text
}

## File-reading functions, primarily for DuckDB
let read_parquet = source<text> -> <relation> internal std.read_parquet
let read_csv = source<text> -> <relation> internal std.read_csv


## PRQL compiler functions
module `prql` {
  let version = -> <text> internal prql_version
}

# Deprecated, will be removed in 0.12.0
let prql_version = -> <text> internal prql_version

And a couple of examples:

PRQL

from employees
derive {
  gross_salary = (salary + payroll_tax | as int),
  gross_salary_rounded = (gross_salary | math.round 0),
  time = s"NOW()",  # an s-string, given no `now` function exists in PRQL
}

SQL

SELECT
  *,
  CAST(salary + payroll_tax AS int) AS gross_salary,
  ROUND(CAST(salary + payroll_tax AS int), 0) AS gross_salary_rounded,
  NOW() AS time
FROM
  employees

Example of different implementations of division and integer division:

PRQL

prql target:sql.sqlite

from [{x = 13, y = 5}]
select {
  quotient = x / y,
  int_quotient = x // y,
}

SQL

WITH table_0 AS (
  SELECT
    13 AS x,
    5 AS y
)
SELECT
  (x * 1.0 / y) AS quotient,
  ROUND(ABS(x / y) - 0.5) * SIGN(x) * SIGN(y) AS int_quotient
FROM
  table_0

PRQL

prql target:sql.mysql

from [{x = 13, y = 5}]
select {
  quotient = x / y,
  int_quotient = x // y,
}

SQL

WITH table_0 AS (
  SELECT
    13 AS x,
    5 AS y
)
SELECT
  (x / y) AS quotient,
  (x DIV y) AS int_quotient
FROM
  table_0

Transforms

Transforms are functions that take a relation and produce a relation.

Usually they are chained together into a pipeline, which resembles an SQL query.

Transforms were designed with a focus on modularity, so each of them is fulfilling a specific purpose and has defined invariants (properties of the relation that are left unaffected). That’s often referred to as “orthogonality” and its goal is to keep transform functions composable by minimizing interference of their effects. Additionally, it also keeps the number of transforms low.

For example, select and derive will not change the number of rows, while filter and take will not change the number of columns.

In SQL, we can see this lack of invariant when an aggregation function is used in the SELECT clause. Before, the number of rows was kept constant, but introduction of an aggregation function caused the whole statement to produce only one row (per group).

These are the currently available transforms:

See also

  • from — from is the main way of getting data into a pipeline (it’s not listed above since it’s not technically a transform, since it doesn’t receive an input).

Aggregate

Summarizes many rows into one row.

When applied:

  • without group, it produces one row from the whole table,
  • within a group pipeline, it produces one row from each group.
aggregate {expression or assign operations}

Note

Currently, all declared aggregation functions are min, max, count, average, stddev, avg, sum and count_distinct. We are in the process of filling out std lib.

Examples

PRQL

from employees
aggregate {
  average salary,
  ct = count salary
}

SQL

SELECT
  AVG(salary),
  COUNT(*) AS ct
FROM
  employees

PRQL

from employees
group {title, country} (
  aggregate {
    average salary,
    ct = count salary,
  }
)

SQL

SELECT
  title,
  country,
  AVG(salary),
  COUNT(*) AS ct
FROM
  employees
GROUP BY
  title,
  country

Aggregate is required

Unlike in SQL, using an aggregation function in derive or select (or any other transform except aggregate) will not trigger aggregation. By default, PRQL will interpret such attempts functions as window functions:

PRQL

from employees
derive {avg_sal = average salary}

SQL

SELECT
  *,
  AVG(salary) OVER () AS avg_sal
FROM
  employees

This ensures that derive does not manipulate the number of rows, but only ever adds a column. For more information, see window transform.

Append

Concatenates two tables together.

Equivalent to UNION ALL in SQL. The number of rows is always the sum of the number of rows from the two input tables. To replicate UNION DISTINCT, see set operations.

PRQL

from employees_1
append employees_2

SQL

SELECT
  *
FROM
  employees_1
UNION
ALL
SELECT
  *
FROM
  employees_2

Remove

experimental

Removes rows that appear in another relation, like EXCEPT ALL. Duplicate rows are removed one-for-one.

PRQL

from employees_1
remove employees_2

SQL

SELECT
  *
FROM
  employees_1 AS t
EXCEPT
  ALL
SELECT
  *
FROM
  employees_2 AS b

Intersection

experimental

PRQL

from employees_1
intersect employees_2

SQL

SELECT
  *
FROM
  employees_1 AS t
INTERSECT
ALL
SELECT
  *
FROM
  employees_2 AS b

Set operations

experimental

To imitate set operations i.e. (UNION, EXCEPT and INTERSECT), you can use the following functions:

let distinct = rel -> (from t = _param.rel | group {t.*} (take 1))
let union = `default_db.bottom` top -> (top | append bottom | distinct)
let except = `default_db.bottom` top -> (top | distinct | remove bottom)
let intersect_distinct = `default_db.bottom` top -> (top | intersect bottom | distinct)

Don’t mind the default_db. and noop, these are compiler implementation detail for now.

Derive

Computes one or more new columns.

derive {
  name = expression,
  # or
  column,
}

Examples

PRQL

from employees
derive gross_salary = salary + payroll_tax

SQL

SELECT
  *,
  salary + payroll_tax AS gross_salary
FROM
  employees

PRQL

from employees
derive {
  gross_salary = salary + payroll_tax,
  gross_cost = gross_salary + benefits_cost
}

SQL

SELECT
  *,
  salary + payroll_tax AS gross_salary,
  salary + payroll_tax + benefits_cost AS gross_cost
FROM
  employees

Filter

Picks rows based on their values.

filter boolean_expression

Examples

PRQL

from employees
filter age > 25

SQL

SELECT
  *
FROM
  employees
WHERE
  age > 25

PRQL

from employees
filter (age > 25 || department != "IT")

SQL

SELECT
  *
FROM
  employees
WHERE
  age > 25
  OR department <> 'IT'

PRQL

from employees
filter (department | in ["IT", "HR"])

SQL

SELECT
  *
FROM
  employees
WHERE
  department IN ('IT', 'HR')

PRQL

from employees
filter (age | in 25..40)

SQL

SELECT
  *
FROM
  employees
WHERE
  age BETWEEN 25 AND 40

Group

Partitions the rows into groups and applies a pipeline to each of the groups.

group {key_columns} (pipeline)

The partitioning of groups are determined by the key_columns (first argument).

The most conventional use of group is with aggregate:

PRQL

from employees
group {title, country} (
  aggregate {
    average salary,
    ct = count salary
  }
)

SQL

SELECT
  title,
  country,
  AVG(salary),
  COUNT(*) AS ct
FROM
  employees
GROUP BY
  title,
  country

In concept, a transform in context of a group does the same transformation to the group as it would to the table — for example finding the employee who joined first across the whole table:

PRQL

from employees
sort join_date
take 1

SQL

SELECT
  *
FROM
  employees
ORDER BY
  join_date
LIMIT
  1

To find the employee who joined first in each department, it’s exactly the same pipeline, but within a group expression:

PRQL

from employees
group role (
  sort join_date  # taken from above
  take 1
)

SQL

WITH table_0 AS (
  SELECT
    *,
    ROW_NUMBER() OVER (
      PARTITION BY role
      ORDER BY
        join_date
    ) AS _expr_0
  FROM
    employees
)
SELECT
  *
FROM
  table_0
WHERE
  _expr_0 <= 1

Join

Adds columns from another relation, matching rows based on a condition.

join side:{inner|left|right|full} rel (condition)

Parameters

  • side specifies which rows to include, defaulting to inner.
  • rel - the relation to join with, possibly including an alias, e.g. a=artists.
  • condition - the criteria on which to match the rows from the two relations. Theoretically, join will produce a cartesian product of the two input relations and then filter the result by the condition. It supports two additional features:
    • Names this & that: Along name this, which refers to the first input relation, condition can use name that, which refers to the second input relation.
    • Self equality operator: If the condition is an equality comparison between two columns with the same name (i.e. (this.col == that.col)), it can be expressed with only (==col).

Examples

PRQL

from employees
join side:left positions (employees.id==positions.employee_id)

SQL

SELECT
  employees.*,
  positions.*
FROM
  employees
  LEFT JOIN positions ON employees.id = positions.employee_id


PRQL

from employees
join side:left p=positions (employees.id==p.employee_id)

SQL

SELECT
  employees.*,
  p.*
FROM
  employees
  LEFT JOIN positions AS p ON employees.id = p.employee_id


PRQL

from tracks
join side:left artists (
  # This adds a `country` condition, as an alternative to filtering
  artists.id==tracks.artist_id && artists.country=='UK'
)

SQL

SELECT
  tracks.*,
  artists.*
FROM
  tracks
  LEFT JOIN artists ON artists.id = tracks.artist_id
  AND artists.country = 'UK'


In SQL, CROSS JOIN is a join that returns each row from first relation matched with all rows from the second relation. To accomplish this, we can use condition true, which will return all rows of the cartesian product of the input relations:

from shirts
join hats true

this & that can be used to refer to the current & other table respectively:

PRQL

from tracks
join side:inner artists (
  this.id==that.artist_id
)

SQL

SELECT
  tracks.*,
  artists.*
FROM
  tracks
  JOIN artists ON tracks.id = artists.artist_id


If the join conditions are of form left.x == right.x, we can use “self equality operator”:

PRQL

from employees
join positions (==emp_no)

SQL

SELECT
  employees.*,
  positions.*
FROM
  employees
  JOIN positions ON employees.emp_no = positions.emp_no

Loop

Experimental

loop {step_function} {initial_relation}

Iteratively applies step function to initial relation until the step returns an empty table. Returns a relation that contains rows of initial relation and all intermediate relations.

This behavior could be expressed with following pseudo-code:

def loop(step, initial):
    result = []
    current = initial
    while current is not empty:
        result = append(result, current)
        current = step(current)

    return result

Examples

PRQL

from [{n = 1}]
loop (
    filter n<4
    select n = n+1
)

# returns [1, 2, 3, 4]

SQL

WITH RECURSIVE table_0 AS (
  SELECT
    1 AS n
),
table_1 AS (
  SELECT
    n
  FROM
    table_0
  UNION
  ALL
  SELECT
    n + 1
  FROM
    table_1
  WHERE
    n < 4
)
SELECT
  n
FROM
  table_1 AS table_2

Note

The behavior of WITH RECURSIVE may depend on the database configuration in MySQL. The compiler assumes the behavior described by the Postgres documentation and will not produce correct results for alternative configurations of MySQL.

Note

Currently, loop may produce references to the recursive CTE in sub-queries, which is not supported by some database engines, e.g. SQLite. For now, we suggest step functions are kept simple enough to fit into a single SELECT statement.

Select

Picks and computes columns.

select {
  name = expression,
  # or
  column,
}
# or
select !{column}

Examples

PRQL

from employees
select name = f"{first_name} {last_name}"

SQL

SELECT
  CONCAT(first_name, ' ', last_name) AS name
FROM
  employees

PRQL

from employees
select {
  name = f"{first_name} {last_name}",
  age_eoy = dob - @2022-12-31,
}

SQL

SELECT
  CONCAT(first_name, ' ', last_name) AS name,
  dob - DATE '2022-12-31' AS age_eoy
FROM
  employees

PRQL

from employees
select first_name

SQL

SELECT
  first_name
FROM
  employees

PRQL

from e=employees
select {e.first_name, e.last_name}

SQL

SELECT
  first_name,
  last_name
FROM
  employees AS e

Excluding columns

We can use ! to exclude a list of columns. This can operate in two ways:

  • We use SELECT * EXCLUDE / SELECT * EXCEPT for the columns supplied to select !{} in dialects which support it.
  • Otherwise, the columns must have been defined prior in the query (unless all of a table’s columns are excluded); for example in another select or a group transform. In this case, we evaluate and specify the columns that should be included in the output SQL.

Some examples:

PRQL

prql target:sql.bigquery
from tracks
select !{milliseconds, bytes}

SQL

SELECT
  * EXCEPT (milliseconds, bytes)
FROM
  tracks

PRQL

from tracks
select {track_id, title, composer, bytes}
select !{title, composer}

SQL

SELECT
  track_id,
  bytes
FROM
  tracks

PRQL

from artists
derive nick = name
select !{artists.*}

SQL

SELECT
  name AS nick
FROM
  artists

Note that ! is also the NOT operator, so without the tuple it has a different meaning:

PRQL

prql target:sql.bigquery
from tracks
select !is_compilation

SQL

SELECT
  NOT is_compilation
FROM
  tracks

Sort

Order rows based on the values of one or more expressions (generally columns).

sort {(+|-) column}

Parameters

  • One expression or a tuple of expressions to sort by
  • Each expression can be prefixed with:
    • +, for ascending order, the default
    • -, for descending order
  • When using prefixes, even a single expression needs to be in a tuple or parentheses. (Otherwise, sort -foo is parsed as a subtraction between sort and foo.)

Examples

PRQL

from employees
sort age

SQL

SELECT
  *
FROM
  employees
ORDER BY
  age

PRQL

from employees
sort {-age}

SQL

SELECT
  *
FROM
  employees
ORDER BY
  age DESC

PRQL

from employees
sort {age, -tenure, +salary}

SQL

SELECT
  *
FROM
  employees
ORDER BY
  age,
  tenure DESC,
  salary

We can also use expressions:

PRQL

from employees
sort {s"substr({first_name}, 2, 5)"}

SQL

WITH table_0 AS (
  SELECT
    *,
    substr(first_name, 2, 5) AS _expr_0
  FROM
    employees
)
SELECT
  *
FROM
  table_0
ORDER BY
  _expr_0

Ordering guarantees

Ordering is persistent through a pipeline in PRQL. For example:

PRQL

from employees
sort tenure
join locations (==employee_id)

SQL

SELECT
  employees.*,
  locations.*
FROM
  employees
  JOIN locations ON employees.employee_id = locations.employee_id
ORDER BY
  employees.tenure

Here, PRQL pushes the sort down the pipeline, compiling the ORDER BY to the end of the query. Consequently, most relation transforms retain the row order.

The explicit semantics are:

  • sort introduces a new order,
  • group resets the order,
  • join retains the order of the left relation,
  • database tables don’t have a known order.

Comparatively, in SQL, relations possess no order, being orderable solely within the context of the query result, LIMIT statement, or window function. The lack of inherent order can result in an unexpected reshuffling of a previously ordered relation from a JOIN or windowing operation.

Info

To be precise — in PRQL, a relation is an array of tuples and not a set or a bag. The persistent nature of this order remains intact through sub-queries and intermediate table definitions.

For instance, an SQL query such as:

WITH albums_sorted AS (
  SELECT *
  FROM albums
  ORDER BY title
)
SELECT *
FROM albums_sorted
JOIN artists USING (artist_id)

…doesn’t guarantee any row order (indeed — even without the JOIN, the SQL standard doesn’t guarantee an order, although most implementations will respect it).

Specification

This chapter explains PRQL’s semantics: how expressions are interpreted and their meaning. It’s intended for advanced users and compiler contributors.

Null handling

SQL has an unconventional way of handling NULL values, since it treats them as unknown values. As a result, in SQL:

  • NULL is not a value indicating a missing entry, but a placeholder for anything possible,
  • NULL = NULL evaluates to NULL, since one cannot know if one unknown is equal to another unknown,
  • NULL <> NULL evaluates to NULL, using same logic,
  • to check if a value is NULL, SQL introduces IS NULL and IS NOT NULL operators,
  • DISTINCT column may return multiple NULL values.

For more information, check out the Postgres documentation.

PRQL, on the other hand, treats null as a value, which means that:

  • null == null evaluates to true,
  • null != null evaluates to false,
  • distinct column cannot contain multiple null values.

PRQL

from employees
filter first_name == null
filter null != last_name

SQL

SELECT
  *
FROM
  employees
WHERE
  first_name IS NULL
  AND last_name IS NOT NULL

Note that PRQL doesn’t change how NULL is compared between columns, for example in joins. (PRQL compiles to SQL and so can’t change the behavior of the database).

For more context or to provide feedback check out the discussion on issue #99.

Name resolution

Because PRQL primarily handles relational data, it has specialized scoping rules for referencing columns.

Scopes

In PRQL’s compiler, a scope is the collection of all names one can reference from a specific point in the program.

In PRQL, names in the scope are composed from namespace and variable name which are separated by a dot, similar to SQL. Namespaces can contain many dots, but variable names cannot.

Example

Name my_table.some_column is a variable some_column from namespace my_table.

Name foo.bar.baz is a variable baz from namespace foo.bar.

When processing a query, a scope is maintained and updated for each point in the query.

It start with only namespace std, which is the standard library. It contains common functions like sum or count, along with all transform functions such as derive and group.

In pipelines (or rather in transform functions), scope is also injected with namespaces of tables which may have been referenced with from or join transforms. These namespaces contain simply all the columns of the table and possibly a wildcard variable, which matches any variable (see the algorithm below). Within transforms, there is also a special namespace that does not have a name. It is called a “frame” and it contains columns of the current table the transform is operating on.

Resolving

For each ident we want to resolve, we search the scope’s items in order. One of three things can happen:

  • Scope contains an exact match, e.g. a name that matches in namespace and the variable name.

  • Scope does not contain an exact match, but the ident did not specify a namespace, so we can match a namespace that contains a * wildcard. If there’s a single namespace, the matched namespace is also updated to contain this new variable name.

  • Otherwise, the nothing is matched and an error is raised.

Translating to SQL

When translating into an SQL statement which references only one table, there is no need to reference column names with table prefix.

PRQL

from employees
select first_name

SQL

SELECT
  first_name
FROM
  employees

But when there are multiple tables and we don’t have complete knowledge of all table columns, a column without a prefix (i.e. first_name) may actually reside in multiple tables. Because of this, we have to use table prefixes for all column names.

PRQL

from employees
derive {first_name, dept_id}
join d=departments (==dept_id)
select {first_name, d.title}

SQL

SELECT
  employees.first_name,
  d.title
FROM
  employees
  JOIN departments AS d ON employees.dept_id = d.dept_id

As you can see, employees.first_name now needs table prefix, to prevent conflicts with potential column with the same name in departments table. Similarly, d.title needs the table prefix.

Modules

Warning

The module facility is in discussion. This page documents our understanding of the way the final PRQL compiler will likely work. The PRQL compiler currently uses these techniques to compile the std, date, text, and math modules into the language.

However, at this time (Spring 2024), the module facility does not work within a PRQL query itself. That is, a module statement in a query cannot import files from the local file system.

Design goals for modules:

  1. Allow importing declarations from other files.

  2. Have namespaces for things like std.

  3. Have a hierarchical structure so we can represent files in directories.

  4. Have an unambiguous module structure within a project.

Definition

A module is a namespace that contains declarations. A module is itself a declaration, which means that it can contain nested child modules.

This means that modules form a tree graph, which we call “the module structure”.

For the sake of this document, we will express the module structure with module keyword and a code block encased in curly braces:

module my_playlists {
    let bangers = ... # a declaration

    module soundtracks {
        let movie_albums = ... # another declaration
    }
}

The syntax module name { ...decls... } is not part of PRQL language, with the objection that it is unnecessary as it only adds more ways of defining modules. If a significant upside of this syntax is found, it may be added in the future.

Name resolution

Any declarations within a module can be referenced from the outside of the module:

# using module structure declared above
module my_playlists

let great_tracks = my_playlists.bangers

let movie_scores = my_playlists.soundtracks.movie_albums

Identifiers are resolved relative to current module.

module my_playlists {
    module soundtracks {
        let movie_albums = (from albums | filter id == 3)
    }

    from soundtracks.movie_albums
}
from my_playlists.soundtracks.movie_albums

If an identifier cannot be resolved relative to the current module, it tries to resolve relative to the parent module. This is repeated, stepping up the module hierarchy until a match is found or root of the module structure is reached.

module my_playlists {
    let decl_1 = ...

    module soundtracks {
        let decl_2 = ...
    }

    module upbeat_rock {
        let decl_3 = ...

        from decl_1 | join soundtracks.decl2 | join decl_3
    }
}

Main var declaration

The final variable declaration in a module can omit the leading let main = and acquire an implicit name main.

module my_playlists {
    let bangers = (from tracks | take 10)

    from playlists | take 10
}

let album_titles = my_playlists.main

When a module is referenced as a value, the main variable is used instead. This is especially useful when referring to a module which is to be compiled to RQ (and later SQL).

# last line from previous example could thus be shortened to:
let album_titles = my_playlists

File importing

Warning

The examples below do not work. At this time (Spring 2024), the module facility does not work within a PRQL query itself. That is, a module statement in a query cannot import files from the local file system.

To include PRQL source code from other files, we can use the following syntax:

module my_playlists

This loads either ./my_playlists.prql (a leaf module) or ./my_playlists/_my_playlists.prql (a directory module) and uses its contents as module my_playlists. If none or both of the files are present, a compilation error is raised.

Only directory modules can contain module declarations. If a leaf module contains a module declaration, a compilation error is raised, suggesting the leaf module to be converted into a directory module. This is a step toward any module structure having a single “normalized” representation in the file system. Such normalization is desired because it restrains the possible file system layouts to a comprehensible and predictable layout, while not sacrificing any functionality.

Described importing rules don’t achieve this “single normalized representation” in full, since any leaf modules could be replaced by a directory module with zero submodules, without any semantic changes. Restricting directory modules to have at least one sub-module would not improve approachability enough to justify adding this restriction.

For example, the following module structure is annotated with files names in which the modules would reside:


module my_project {
    # _my_project.prql

    module sales {
        # sales.prql
    }

    module projections {
        # projections/_projections.prql

        module year_2023 {
            # projections/year_2023.prql
        }

        module year_2024 {
            # projections/year_2024.prql
        }
    }
}

If module my_project.sales wants to add a submodule util, it has to be converted to a directory modules. This means that it has to be moved to sales/_sales.prql. The new module would reside in sales/util.prql.

The annotated layout is not the only possible layout for this module structure, since any of the modules sales, year_2023 or year_2024 could be converted into a directory module with zero sub-modules.

Point 4 of design goals means that each declaration within a project has a single fully-qualified name within this project. This is ensured by strict rules regarding importing files and the fact that the module structure is a tree.

Declaration order

The order of declarations in a module holds no semantic value, except the “last main variable”.

References between modules can be cyclic.

module mod_a {
    let decl_a_1 = ...
    let decl_a_2 = (from mod_b.decl_b | take 10)
}
module mod_b {
    let decl_b = (from mod_a.decl_a | take 10)
}

References between variable declarations cannot be cyclic.

let decl_a = (from decl_b)
let decl_b = (from decl_a) # error: cyclic reference
module mod_a {
    let decl_a = (from mod_b.decl_b)
}
module mod_b {
    let decl_b = (from mod_a.decl_a) # error: cyclic reference
}

Compiler interface

prqlc provides two interfaces for compiling files.

Multi-file interface requires three arguments:

  • path to the file containing the module which is the root of the module structure,
  • identifier of the pipeline that should be compiled to RQ (this can also be an identifier of a module that has a main pipeline) and,
  • a “file loader”, which can load files on-demand.

The path to the root module can be automatically detected by searching for .prql files starting with _ in the current working directory.

Example prqlc usage:

$ prqlc compile _project.prql sales.projections.orders_2024
$ prqlc compile sales.projections.orders_2024

Single-file interface requires a single argument; the PRQL source. Any attempts to load modules in this mode result in compilation errors. This interface is needed, for example, when integrating the compiler with a database connector (i.e. JDBC) where no other files can be loaded.

Built-in module structure

As noted above, this facility is in discussion.

# root module of every project
module project {
	module std {
		let sum = a -> ...
		let mean = a -> ...
	}

	module default_db {
		# all inferred tables and defined CTEs
	}

	let main = (
		from t = tracks
		select [track_id, title]
	)
}

Example

Warning

The examples below do not work. At this time (Spring 2024), the module facility does not work within a PRQL query itself. That is, a module statement in a query cannot import files from the local file system.

This is an example project, where each of code block is a separate file.

# _project.prql

module employees
module sales
module util
# employees.prql

let employees = (...)

let salaries = (...)

let departments = (...)
# sales/_sales.prql

module orders
module projections

let revenue_by_source = (...)
# sales/orders.prql

let current_year = (...)

let archived = (...)

let by_employee = (from orders | join employees.employees ...)
# sales/projections.prql

let orders_2023 = (from orders.current_year | append orders.archived)

let orders_2024 = (...)
# util.prql

let pretty_print_num = col -> (...)

Sources:

Type system

Status: under development

The type system determines the allowed values of a term.

Purpose

Each of the SQL DBMSs has their own type system. Thanks to the SQL standard, they are very similar, but have key differences regardless. For example, SQLite does not have a type for date or time or timestamps, but it has functions for handling date and time that take ISO 8601 strings or integers that represent Unix timestamps. So it does support most of what is possible to do with dates in other dialects, even though it stores data with a different physical layout and uses different functions to achieve that.

PRQL’s task is to define it’s own description of data formats, just as how it already defines common data transformations.

This is done in two steps:

  1. Define PRQL’s Type System (PTS), following principles we think a relational language should have (and not fixate on what existing SQL DBMSs have).

  2. Define a mapping between SQL Type System (STS) and PTS, for each of the DBMSs. Ideally we’d want that to be a bijection, so each type in PTS would be represented by a single type in STS and vice-versa. Unfortunately this is not entirely possible, as shown below.

In practical terms, we want for a user to be able to:

  • … express types of their database with PRQL (map their STS into PTS). In some cases, we can allow to say “your database is not representable with PRQL, change it or use only a subset of it”. An example of what we don’t want to support are arrays with arbitrary indexes in Postgres (i.e. 2-based index for arrays).

    This task of mapping to PTS could be automated by LSP server, by introspecting user’s SQL database and generating PRQL source.

  • … express their SQL queries in PRQL. Again, using mapping from STS to PTS, one should be able to express any SQL operation in PRQL.

    For example, translate MSSQL DATEDIFF to subtraction operator - in PRQL.

    For now, this mapping is manual, but should be documented and may be automated.

  • … use any PRQL feature in their database. Here we are mapping from PTS into an arbitrary STS.

    For example, translate PRQL’s datetime operations to use TEXT in SQLite.

    As of now, prqlc already does a good job of automatically doing this mapping.

Example of the mapping between PTS and two STSs:

PTSSTS PostgresSTS SQLite
int32integerINTEGER
int64bigintINTEGER
timestamptimestampTEXT

Principles

Algebraic types - have a way of expressing sum and product types. In Rust, sum would be an enum and product would be tuple or a struct. In SQL, product would be a row, since it can contain different types, all at once. Sum would be harder to express, see this post.

The value proposition here is that algebraic types give a lot modeling flexibility, all while being conceptually simple.

Composable - as with transformation, we’d want types to compose together.

Using Python, JavaScript, C++ or Rust, one could define many different data structures that would correspond to our idea of “relation”. Most of them would be an object/struct that has column names and types and then a generic array of arrays for rows.

PRQL’s type system should also be able to express relations as composed from primitive types, but have only one idiomatic way of doing so.

In practice, this means that builtin types include only primitives (int, text, bool, float), tuple (for product), enum (for sum) and array (for repeating). An SQL row translates to a tuple, and a relation translates to an array of tuples.

Composability also leads to a minimal type system, which does not differentiate between tuples, objects and structs. A single product type is enough.

No subtyping - avoid super types and inheritance.

Subtyping is a natural extension to a type system, where a type can be a super type of some other type. This is base mechanism for Object Oriented Programming, but is also present in most dynamically types languages. For example, a type number might be super type of int and float.

PTS does not have subtyping, because it requires dynamic dispatch and because it adds unnecessary complexity to generic type arguments.

Dynamic dispatch, is a mechanism that would be able, for example, to call appropriate to_string function for each element of an array of number. This array contains both elements of type int and type float, with different to_string implementations.

Definition

For any undefined terms used in this section, refer to set theory and mathematical definitions in general.

A “type of a variable” is a “set of all possible values of that variable”.

Primitives

At the moment of writing, PRQL defines following primitive types: int, float, bool, text, date, time and timestamp. New primitive types will be added in the future and some of existing types might be split into smaller subsets (see section “Splitting primitives”).

Tuples

Tuple type is a product type.

It contains n ordered fields, where n is known at compile-time. Each field has a type itself and an optional name. Fields are not necessarily of the same type.

In other languages, similar constructs are named record, struct, tuple, named tuple or (data)class.

type my_row = {id = int, bool, name = str}

Arrays

Array is a container type that contains n ordered fields, where n is not known at compile-time. All fields are of the same type and cannot be named.

type array_of_int = [int]

Functions

type floor_signature = func float -> int

Union

type status = (
  paid = () ||
  unpaid = float ||
  {reason = text, cancelled_at = timestamp} ||
)

This is “a sum type”.

Type annotations

Variable annotations and function parameters may specify type annotations:

let a <t> = x

The value of x (and thus a) must be an element of t.

let my_func = func x <t> -> y

The value of argument supplied to x must be an element of t.

let my_func = func x -> <t> y

The value of function body y must be an element of t.

Physical layout

Logical type is user-facing the notion of a type that is the building block of the type system.

Physical layout is the underlying memory layout of the data represented by a variable.

In many programming languages, physical layout of a logical type is dependent on the target platform. Similarly, physical layout of a PRQL logical type is dependent on representation of that type in the target STS.

PTS logical type  --->  STS logical type  ---> STS physical layout

Note that not all STS types do not have a single physical layout. Postgres has a logical (pseudo)type anyelement, which is a super type of any data type. It can be used as a function parameter type, but does not have a single physical layout so it cannot be used in a column declaration.

For now, PRQL does not define physical layouts of any type. It is not needed since PRQL is not used for DDL (see section “Splitting primitives”) or does not support raw access to underlying memory.

As a consequence, results of a PRQL query cannot be robustly compared across DBMSs, since the physical layout of the result will vary.

In the future, PRQL may define a common physical layout of types, probably using Apache Arrow.

Examples

type my_relation = [{
	id = int,
	title = text,
	age = int
}]

type invoices = [{
    invoice_id = int64,
    issued_at = timestamp,
    labels = [text]

    #[repr(json)]
    items = [{
        article_id = int64,
        count = int16 where x -> x >= 1,
    }],
    paid_by_user_id = (int64 || null),
}]

Appendix

Splitting primitives

This document mentions int32 and int64 as distinct types, but there is no need for that in the initial implementation. The built-in int can associate with all operations on integers and translate PRQL to valid SQL regardless of the size of the integer. Later, int cam be replaced by int8, int16, int32, int64.

The general rule for “when to make a distinction between types” would be “as soon as the types carry different information and we find an operation that would be expressed differently”. In this example, that would require some operation on int32 to have different syntax than same operation over int64.

We can have such relaxed rule because PRQL is not aiming to be a Data Definition Language and does not have to bother with exact physical layout of types.

Type representations

There are cases where a PTS type has multiple possible and valid representations in some STSs.

For such cases, we’d want to support the use of alternative representations for storing data, but also application of any function that is defined for the original type.

Using SQLite as an example again, users may have some temporal data stored as INTEGER unix timestamp and some as TEXT that contains ISO 8601 without timezone. From the user’s perspective, both of these types are timestamps and should be declared as such. But when compiling operations over these types to SQL, the compiler should consider their different representations in STS. For example a difference between two timestamps timestamp - timestamp can be translated to a normal int subtraction for INTEGER repr, but must apply SQLite’s function unixepoch when dealing with TEXT repr.

Table declarations should therefore support annotations that give hints about which representation is used:

table foo {
    #[repr(text)]
    created_at: timestamp,
}

A similar example is an “array of strings type” in PTS that could be represented by a text[] (if DBMS supports arrays) or json or it’s variant jsonb in Postgres. Again, the representation would affect operators: in Postgres, arrays would be accessed with my_array[1] and json arrays would use my_json_array -> 1. This example may not be applicable, if we decide that we want a separate JSON type in PST.

RQ functions, targets and reprs

This part is talks about technical implementations, not the language itself

Idea

RQ contains a single node kind for expressing operations and functions: BuiltInFunction (may be renamed in the future).

It is a bottleneck that we can leverage when trying to affect how an operator or a function interacts with different type representations on different targets.

Idea is to implement the BuiltInFunction multiple times and annotate it with it intended target and parameter representation. Then we can teach the compiler to pick the appropriate function implementation that suit current repr and compilation target.

Specifics

RQ specification is an interface that contains functions, identified by name (i.e. std.int8.add). These functions have typed parameters and a return value. If an RQ function call does not match the function declaration in number or in types of the parameters, this is considered an invalid RQ AST.

We provide multiple implementations for each RQ function. They are annotated with a target (i.e. #[target(sql.sqlite)]) and have their params annotated with type reprs (i.e. #[repr(int)]).

# using a made-up syntax

#[target(sql.sqlite)]
func std.int8.add
    #[repr(int8)] x
    #[repr(int8)] y
    -> s"{x} + {y}"

Each RQ type has one canonical repr that serves as the reference implementation for other reprs and indicates the amount of contained data (i.e. 1 bit, 8 bits, 64 bits).

Example

Let’s say for example, that we’d want to support 8bit integer arithmetic, and that we’d want the result of 127 + 1 to be -128 (ideally we’d handle this better, but bear with me for the sake of the example). Because some RDBMSs don’t support 8bit numbers and do all their integer computation with 64bit numbers (SQLite), we need to implement an alternative type representation for that target.

The logical type int8 could have the following two reprs:

  • canonical repr_int8 that contains 8 bits in two’s complement, covering integer values in range -128 to 127 (inclusive),
  • repr_int64 that contains 64 bits of data, but is using only the values that are also covered by repr_int8.

Now we’d implement function std.int8.add for each of the reprs. Let’s assume that the int8 implementation is straightforward and that databases don’t just change the data type when a number overflows. The impl for int64 requires a CASE statement that checks if the value would overflow and subtact 256 in that case.

The goal here is that the results of the two impls are equivalent. To validate that, we also need a way to convert between the reprs, or another to_string function, implemented for both reprs.

PRQL Changelog

0.13.2

0.13.2 is a tiny release to fix an issue publishing 0.13.1 to crates.io.

0.13.1

0.13.1 is a small release containing a few bug fixes and improvements. Velocity has slowed down a bit in recent months, we’re still hoping to finish the new resolver and the new formatter in the near future.

It has 97 commits from 10 contributors. Selected changes:

Features:

  • Add a option to the experimental documentation generator to output the docs in HTML format. The option is given using the --format=html option. (@vanillajonathan, 4791)

  • The version of the library is now read from git describe. This doesn’t affect libraries built on git tags (such as our releases), only those built when developing. When reporting bugs, this helps identify the exact version. (@max-sixty & @m-span, #4804)

Fixes:

  • Raw strings (r"...") are retained through prqlc fmt (@max-sixty, #4848)

  • Strings containing an odd contiguous number of quotes are now delimited by an odd number of quotes when being formatted. The previous implementation would use an even number, which is invalid PRQL. (@max-sixty, #4850)

  • A few more keywords are quoted, such as user, which is a reserved keyword in PostgreSQL. (@max-sixty)

0.13.0 — 2024-07-25

0.13.0 brings a new debug logging framework, a big refactor of the parser, a new highlighter, an ** operator for exponentiation, a few bug fixes, and lots of other changes. It has 153 commits from 11 contributors.

Our work continues on rewriting the resolver and completing prqlc fmt.

Selected changes:

Language:

  • Parentheses are always required around pipelines, even within tuples. For example:

    from artists
    # These parentheses are now required
    derive {a=(b | math.abs)}
    # No change — doesn't affect expressions or function calls without pipelines
    derive {x = 3 + 4}
    

    This is a small breaking change. The new behavior matches the existing documentation. (@max-sixty, #4775)

  • A new ** operator for exponentiation. (@aljazerzen & @max-sixty, #4125)

Features:

  • prqlc compile --debug-log=log.html will generate an HTML file with a detailed log of the compilation process. (@aljazerzen, #4646)
  • Added prqlc debug json-schema command to auto-generate JSON Schema representations of commonly exposed IR types such as PL and RQ. (@kgutwin, #4698)
  • Add documentation comments to the output of the documentation generator. (@vanillajonathan, #4729)
  • Add CLI syntax highlighting to prqlc. You can try it as prqlc experimental highlight example.prql. (@vanillajonathan, #4755)

Fixes:

  • Using in with an empty array pattern (e.g. expr | in []) will now output a constant false condition instead of an expr IN (), which is syntactically invalid in some SQL dialects (@Globidev, #4598)

Integrations:

  • The Snap package previously released on the edge channel is now released on the stable channel. (@vanillajonathan, #4784)

Internal changes:

  • Major reorganization of prqlc-parserprqlc-ast is merged into prqlc-parser, and prqlc-parser’s files are rearranged, including its exports. This is part of an effort to modularize the compiler by stage, reducing the amount of context that’s required to understand a single stage. There will likely be some further changes (more detail in the PR description). (@m-span, #4634)

    • This is a breaking change for any libraries that depend on prqlc-parser (which should be fairly rare).
  • Renamed prql-compiler-macros to prqlc-macros for consistency with other crates (@max-sixty, #4565)

  • prql-compiler, the old name for prqlc, is removed as a facade to prqlc. It had been deprecated for a few versions and will no longer be updated. (@max-sixty)

  • New benchmarks (@max-sixty, #4654)

New Contributors:

  • @Globidev, with #4598

0.12.2 — 2024-06-10

0.12.2 is a very small release which renames prql-js to prqlc-js to match our standard naming scheme. Within node the package is imported as prqlc.

It also fixes a mistake in the prqlc-python release pipeline.

0.12.1 — 2024-06-09

0.12.1 is a tiny hotfix release which fixes how intra-prql crate dependencies were specified.

0.12.0 — 2024-06-08

0.12.0 contains a few months of smaller features. Our focus has been on rewriting the resolver, an effort that is still ongoing.

It has 239 commits from 12 contributors. Selected changes (most are not listed here, possibly we should be more conscientious about adding them…):

Features:

  • Add prqlc lex command to the CLI (@max-sixty)
  • Add prqlc debug lineage command to the CLI, creating an expression lineage graph from a query (@kgutwin, #4533)
  • Initial implementation of an experimental documentation generator that generates Markdown documentation from .prql files. (@vanillajonathan, #4152).
  • Join’s side parameter can take a reference that resolves to a literal (note: this is an experimental feature which may change in the future) (@kgutwin, #4499)

Fixes:

  • Support expressions on left hand side of std.in operator. (@kgutwin, #4498)
  • Prevent panic for from {} and std (@m-span, #4538)

Web:

  • The browser dist files are now built with wasm-pack’s web target. As a result, they should be usable as ES Modules, through JS CDNs, and for example with Observable Framework (@srenatus, #4274).

Integrations:

  • The syntax highlighter package for Sublime Text is now published (@vanillajonathan).
  • The VSCode Great Icons icon pack extension shows a database icon for .prql files. (@EmmanuelBeziat)
  • Tokei, a source lines of code counter now has support for .prql files. (@vanillajonathan)
  • Add syntax highlight file for the micro text editor. (@vanillajonathan)

New Contributors:

  • @srenatus, with #4274
  • @jacquayj, with #4332
  • @pdelewski, with #4337
  • @m-span, with #4422
  • @kgutwin, with #4498

0.11.4 — 2024-02-25

0.11.4 is a hotfix release, fixing a CI issue that caused the CLI binaries to be built without the cli feature.

0.11.3 — 2024-02-10

0.11.3 is a very small release, mostly a rename of the Python bindings.

The release has 13 commits from 4 contributors.

Internal changes:

  • As part of making our names more consistent, the Python bindings are renamed. prql-python becomes a package published and importable as prqlc. The internal Rust crate is named prqlc-python.

0.11.2 — 2024-02-07

0.11.2 contains lots of internal changes, lots of syntax highlighting, and the beginning of lutra, a query runner.

This release has 122 commits from 9 contributors. Selected changes:

Features:

  • Initial implementation of lutra, a query runner. (@aljazerzen, #4182, #4174, #4134)
  • prqlc fmt works on projects with multiple files. (@max-sixty, #4028)

Fixes:

  • Reduce stack memory usage (@aljazerzen, #4103)

Integrations:

  • Add syntax highlight file for GtkSourceView. (@vanillajonathan, #4062)
  • Add syntax highlight file for CotEditor. (@vanillajonathan)
  • Add syntax highlight file for Sublime Text. (@vanillajonathan, #4127)
  • sloc, a source lines of code counter now has support for .prql files. (@vanillajonathan)

Internal changes:

  • prql-compiler has been renamed to prqlc, and we’ve established a more consistent naming scheme. The existing crate will still be published, re-exporting prqlc, so no dependencies will break. A future version will add a deprecation warning.
  • The prqlc-clib crate was renamed to prqlc-c, and associated artifacts were renamed. We’re trying to make names consistent (ideally for the final time!), and have a plan to rename some other bindings. (@max-sixty, #4077)
  • Add lots of whitespace items to the lexer, in preparation for the completion of prqlc fmt (@max-sixty, #4109, #4105)
  • Table declarations (@aljazerzen, #4126)

New Contributors:

  • @kaspermarstal, with #4124

0.11.1 — 2023-12-26

0.11.1 fixes a couple of small bugs; it comes a few days after 0.11.

This release has 16 commits from 6 contributors. Selected changes:

Features:

  • Infer the type of array literals to be the union of types of its items. (@aljazerzen, #3989)
  • prql module is added and the prql_version function is renamed to the prql.version function. The old prql_version function is deprecated and will be removed in the future release. (@eitsupi, #4006)

Fixes:

  • Do not compile to DISTINCT ON when take n is used with group for the targets clickhouse, duckdb and postgres. (@PrettyWood, #3988)
  • Fix take n rows for mssql dialect by switching from TOP to FETCH (@PrettyWood, #3994)

0.11.0 — 2023-12-19

0.11.0 introduces new date, text & math modules with lots of standard functions, including a new date.to_text function. It contains a few bugs fixes, and lots of internal improvements to the compiler.

This release has 119 commits from 9 contributors. Selected changes:

Language:

  • Breaking: group’s by columns are now excluded from the partition. (#3490)
  • Breaking: round is now in the math module and needs to be called via math.round. (#3928)
  • Breaking: lower and upper are now in the text module and need to be called via text.lower and text.upper. (#3913, #3973)

Features:

  • The std.in function now supports a list of values (@PrettyWood, #3883)
  • Most standard mathematical functions are now supported: abs, floor, ceil, pi, exp, ln, log10, log, sqrt, degrees, radians, cos, acos, sin, asin, tan, atan, pow and round.
    Those functions are in the math module (@PrettyWood, #3909, #3916 & 3928)
  • Most standard string functions are now supported: ltrim, rtrim, trim, length, extract, replace. Utility functions starts_with, contains and ends_with are also available.
    Those functions are in the text module (@PrettyWood, #3913, #3973)
  • Formatting a date to a text is now available for Clickhouse, DuckDB, MySQL, MSSQL and Postgres. A new date module has been added with the to_text function (@PrettyWood, #3951, #3954 & #3955)

Fixes:

  • Fix an issue with arithmetic precedence (@max-sixty, #3846)
  • + and - can be used after a cast (@PrettyWood, #3923)
  • The Lezer grammar had plenty of improvements and fixes. (@vanillajonathan)

Web:

  • The Playground now uses Vite. (@vanillajonathan)

Internal changes:

  • Bump prql-compiler’s MSRV to 1.70.0 (@eitsupi, #3876)

New Contributors:

  • @PrettyWood, with #3883

0.10.1 — 2023-11-14

0.10.1 is a small release containing some internal fixes of the compiler.

This release has 36 commits from 7 contributors. Selected changes:

Features:

  • The std.sql.read_csv function and the std.sql.read_parquet function supports the sql.glaredb target. (@eitsupi, #3749)

Fixes:

  • Fix the bug of compiling to DISTINCT ON when take 1 is used with group by for the targets sql.clickhouse, sql.duckdb and sql.postgres. (@aljazerzen, #3792)

Integrations:

  • Enable integration tests for GlareDB. (@eitsupi, #3749)
  • trapd00r/LS_COLORS, a collection of LS_COLORS definitions colorizes .prql files. (@vanillajonathan)
  • vivid, a themeable LS_COLORS generator colorizes .prql files. (@vanillajonathan)
  • colorls, displays .prql files with a database icon. (@vanillajonathan)
  • Emoji File Icons, a VS Code extension displays .prql files with a database emoji icon. (@vanillajonathan)
  • eza, a modern ls replacement colorizes .prql files. (@vanillajonathan)
  • lsd, next gen ls command displays .prql files with a database icon. (@vanillajonathan)

0.10.0 — 2023-10-26

0.10.0 contains lots of small improvements, including support for new types of literal notation, support for read_* functions in more dialects, playground improvements, and a better Lezer grammar (which we’re planning on using for a Jupyter extension).

This release has 155 commits from 9 contributors. Selected changes:

Language:

  • Breaking: Case syntax now uses brackets [] rather than braces {}. To convert previous PRQL queries to this new syntax simply change case { ... } to case [ ... ]. (@AaronMoat, #3517)

Features:

  • Breaking: The std.sql.read_csv function is now compiled to read_csv by default. Please set the target sql.duckdb to use the DuckDB’s read_csv_auto function as previously. (@eitsupi, #3599)
  • Breaking: The std.every function is renamed to std.all (@aljazerzen, #3703)
  • The std.sql.read_csv function and the std.sql.read_parquet function supports the sql.clickhouse target. (@eitsupi, #1533)
  • Add std.prql_version function to return PRQL version (@hulxv, #3533)
  • A new type anytype is added. (@aljazerzen, #3703)
  • Add support for hex escape sequences in strings. Example "Hello \x51". (@vanillajonathan, #3568)
  • Add support for long Unicode escape sequences. Example "Hello \u{01F422}". (@vanillajonathan, #3569)
  • Add support for binary numerical notation. Example filter status == 0b1111000011110000. (@vanillajonathan, #3661)
  • Add support for hexadecimal numerical notation. Example filter status == 0xff. (@vanillajonathan, #3654)
  • Add support for octal numerical notation. Example filter status == 0o777. (@vanillajonathan, #3672)
  • New compile target sql.glaredb for GlareDB and integration tests for it (However, there is a bug in the test and it is currently not running). (@universalmind303, @scsmithr, @eitsupi, #3669)

Web:

  • Allow cmd-/ (Mac) or ctrl-/ (Windows) to toggle comments in the playground editor (@AaronMoat, #3522)

  • Limit maximum height of the playground editor’s error panel to avoid taking over whole screen (@AaronMoat, #3524)

  • The playground now uses Vite (@vanillajonathan).

Integrations:

  • Add a CLI command prqlc collect to collect a project’s modules into a single file (@aljazerzen, #3739)
  • Add a CLI command prqlc debug expand-pl to parse & and expand into PL without resolving (@aljazerzen, #3739)
  • Bump prqlc’s MSRV to 1.70.0 (@eitsupi, #3521)
  • Pygments, a syntax highlighting library now has syntax highlighting for PRQL. (@vanillajonathan, #3564)
  • chroma, a syntax highlighting library written in Go and used by the static website generator Hugo. (@vanillajonathan, #3597)
  • scc, a source lines of code counter now has support for .prql files. (@vanillajonathan)
  • gcloc a source lines of code counter now has support for .prql files. (@vanillajonathan)
  • cloc a source lines of code counter now has support for .prql files. (@AlDanial)
  • gocloc a source lines of code counter now has support for .prql files. (@vanillajonathan)
  • The Quarto VS Code extension supports editing PRQL code blocks (prqlr is required to render Quarto Markdown with PRQL code blocks). (@jjallaire)

Internal:

  • Rename some of the internal crates, and refactored their paths in the repo. (@aljazerzen, #3683).
  • Add a justfile for developers who prefer that above our Taskfile.yaml (@aljazerzen, #3681)

New Contributors:

  • @hulxv, with #3533
  • @AaronMoat, with #3522
  • @jangorecki, with #3634

0.9.5 — 2023-09-16

0.9.5 adds a line-wrapping character, fixes a few bugs, and improves our CI. The release has 77 commits from 8 contributors. Selected changes are below.

Look out for some conference talks coming up over the next few weeks, including QCon SF on Oct 2 and date2day on Oct 12.

Language:

  • A new line-wrapping character, for lines that are long and we want to break up into multiple physical lines. This is slightly different from from many languages — it’s on the subsequent line:

    from artists
    select is_europe =
    \ country == "DE"
    \ || country == "FR"
    \ || country == "ES"
    

    This allows for easily commenting out physical lines while maintaining a correct logical line; for example:

    from artists
    select is_europe =
    \ country == "DE"
    \ || country == "FR"
    \ || country == "FR"
    -\ || country == "ES"
    +#\ || country == "ES"
    

    (@max-sixty, #3408)

Fixes:

  • Fix stack overflow on very long queries in Windows debug builds (@max-sixty, #2908)

  • Fix panic when unresolved lineage appears in group or window (@davidot, #3266)

  • Fix a corner-case in handling precedence, and remove unneeded parentheses in some outputs (@max-sixty, #3472)

Web:

  • Compiler panics are now printed to the console (@max-sixty, #3446)

Integrations:

  • Ace, the JavaScript code editor now has syntax highlighting for PRQL. (@vanillajonathan, #3493)

Internal changes:

  • Simplify & speed up lexer (@max-sixty, #3426, #3418)

New Contributors:

  • @davidot, with #3450

0.9.4 — 2023-08-24

0.9.4 is a small release with some improvements and bug fixes in the compiler and prqlc. And, the documentation and CI are continually being improved.

This release has 110 commits from 9 contributors. Selected changes:

Features:

  • Strings can be delimited with any odd number of quote characters. The logic for lexing quotes is now simpler and slightly faster. Escapes in single-quote-delimited strings escape single-quotes rather than double-quotes. (@max-sixty, #3274)

Fixes:

  • S-strings within double braces now parse correctly (@max-sixty, #3265)

Documentation:

  • New docs for strings (@max-sixty, #3281)

Web:

  • Improve syntax highlighting for numbers in the book & website (@max-sixty, #3261)
  • Add ClickHouse integration to docs (@max-sixty, #3251)

Integrations:

  • prqlc no longer displays a prompt when piping a query into its stdin (@max-sixty, #3248).
  • Add a minimal example for use prql-lib with Zig (@vanillajonathan, #3372)

Internal changes:

  • Overhaul our CI to run a cohesive set of tests depending on the specific changes in the PR, and elide all others. This cuts CI latency to less than three minutes for most changes, and enables GitHub’s auto-merge to wait for all relevant tests. It also reduces the CI time on merging to main, by moving some tests to only run on specific path changes or on our nightly run.

    We now have one label we can add to PRs to run more tests — pr-nightly. (@max-sixty, #3317 & others).

  • Auto-merge PRs for backports or pre-commit updates (@max-sixty, #3246)

  • Add a workflow to create an issue when the scheduled nightly workflow fails (@max-sixty, #3304)

New Contributors:

  • @FinnRG, with #3292
  • @sitiom, with #3353

0.9.3 — 2023-08-02

0.9.3 is a small release, with mostly documentation, internal, and CI changes.

This release has 85 commits from 10 contributors.

We’d like to welcome @not-my-profile as someone who has helped with lots of internal refactoring in the past couple of weeks.

New Contributors:

  • @vthriller, with #3171
  • @postmeback, with #3216

0.9.2 — 2023-07-25

0.9.2 is a hotfix release to fix an issue in the 0.9.0 & 0.9.1 release pipelines.

0.9.1 — 2023-07-25

0.9.1 is a hotfix release to fix an issue in the 0.9.0 release pipeline.

0.9.0 — 2023-07-24

0.9.0 is probably PRQL’s biggest ever release. We have dialect-specific standard-libraries, a regex operator, an initial implementation of multiple-file projects & modules, lots of bug fixes, and many many internal changes.

We’ve made a few backward incompatible syntax changes. Most queries will work with a simple find/replace; see below for details.

The release has 421 commits from 12 contributors.

A small selection of the changes:

Language:

  • The major breaking change is a new syntax for lists, which have been renamed to tuples, and are now represented with braces {} rather than brackets [].

    To convert previous PRQL queries to this new syntax simply change [ ... ] to { ... }.

    We made the syntax change to incorporate arrays. Almost every major language uses [] for arrays. We are adopting that convention — arrays use [], tuples will use {}. (Though we recognize that {} for tuples is also rare (Hi, Erlang!), but didn’t want to further load parentheses with meaning.)

    Arrays are conceptually similar to columns — their elements have a single type. Array syntax can’t contain assignments.

    As part of this, we’ve also formalized tuples as containing both individual items (select {foo, baz}), and assignments (select {foo=bar, baz=fuz}).

  • Some significant changes regarding SQL dialects:

    • Operators and functions can be defined on per-dialect basis. (@aljazerzen, #2681)
    • Breaking: The sql.duckdb target supports DuckDB 0.8 (@eitsupi, #2810).
    • Breaking: The sql.hive target is removed (@eitsupi, #2837).
  • New arithmetic operators. These compile to different function or operator depending on the target.

    • Breaking: Operator / now always performs floating division (@aljazerzen, #2684). See the Division docs for details.

    • Truncated integer division operator // (@aljazerzen, #2684). See the Division docs for details.

    • Regex search operator ~= (@max-sixty, #2458). An example:

      from tracks
      filter (name ~= "Love")
      

      …compiles to;

      SELECT
        *
      FROM
        tracks
      WHERE
        REGEXP(name, 'Love')
      

      …though the exact form differs by dialect; see the Regex docs for more details.

  • New aggregation functions: every, any, average, and concat_array. Breaking: Remove avg in favor of average.

  • Breaking: We’ve changed our function declaration syntax to match other declarations. Functions were one of the first language constructs in PRQL, and since then we’ve added normal declarations there’s no compelling reason for functions to be different.

    let add = a b -> a + b
    

    Previously, this was:

    func add a b -> a + b
    
  • Experimental modules, which allow importing declarations from other files. Docs are forthcoming.

  • Relation literals create a relation (a “table”) as an array of tuples. This example demonstrates the new syntax for arrays [] and tuples {}. (@aljazerzen, #2605)

    from [{a=5, b=false}, {a=6, b=true}]
    filter b == true
    select a
    
  • this can be used to refer to the current pipeline, for situations where plain column name would be ambiguous:

    from x
    derive sum = my_column
    select this.sum   # does not conflict with `std.sum`
    

    Within a join transform, there is also a reference to the right relation: that.

  • Breaking: functions count, rank and row_number now require an argument of the array to operate on. In most cases you can directly replace count with count this. The non_null argument of count has been removed.

Features:

  • We’ve changed how we handle colors.

    Options::color is deprecated and has no effect. Code which consumes prql_compiler::compile should instead accept the output with colors and use a library such as anstream to handle the presentation of colors. To ensure minimal disruption, prql_compiler will currently strip color codes when a standard environment variable such as CLI_COLOR=0 is set or when it detects stderr is not a TTY.

    We now use the anstream library in prqlc & prql-compiler.

    (@max-sixty, #2773)

  • prqlc can now show backtraces when the standard backtrace env var (RUST_BACKTRACE) is active. (@max-sixty, #2751)

Fixes:

  • Numbers expressed with scientific notation — 1e9 — are now handled correctly by the compiler (@max-sixty, #2865).

Integrations:

  • prql-python now provides type hints (@philpep, #2912)

Internal changes:

  • Annotations in PRQL. These have limited support but are currently used to specify binding strengths. They’re modeled after Rust’s annotations, but with @ syntax, more similar to traditional decorators. (#2729)

    @{binding_strength=11}
    let mod = l r -> s"{l} % {r}"
    
  • Remove BigQuery’s special handling of quoted identifiers, now that our module system handles its semantics (@max-sixty, #2609).

  • ClickHouse is tested in CI (@eitsupi, #2815).

New Contributors:

  • @maxmcd, with #2533
  • @khoa165, with #2876
  • @philpep, with #2912
  • @not-my-profile, with #2971

0.8.1 — 2023-04-29

0.8.1 is a small release with a new list-targets command in prqlc, some documentation improvements, and some internal improvements.

This release has 41 commits from 8 contributors.

From the broader perspective of the project, we’re increasing the relative prioritization of it being easy for folks to actually use PRQL — either with existing tools, or a tool we’d build. We’ll be thinking about & discussing the best way to do that over the next few weeks.

0.8.0 — 2023-04-14

0.8.0 renames the and & or operators to && & || respectively, reorganizes the Syntax section in the book, and introduces read_parquet & read_csv functions for reading files with DuckDB.

This release has 38 commits from 8 contributors. Selected changes:

Features:

  • Rename and to && and or to ||. Operators which are symbols are now consistently infix, while “words” are now consistently functions (@aljazerzen, #2422).

  • New functions read_parquet and read_csv, which mirror the DuckDB functions, instructing the database to read from files (@max-sixty, #2409).

0.7.1 — 2023-04-03

0.7.1 is a hotfix release to fix prql-js’s npm install behavior when being installed as a dependency.

This release has 17 commits from 4 contributors.

0.7.0 — 2023-04-01

0.7.0 is a fairly small release in terms of new features, with lots of internal improvements, such as integration tests with a whole range of DBs, a blog post on Pi day, RFCs for a type system, and more robust language bindings.

There’s a very small breaking change to the Rust API, hence the minor version bump.

Here’s our April 2023 Update, from our Readme:

April 2023 update

PRQL is being actively developed by a growing community. It’s ready to use by the intrepid, either as part of one of our supported extensions, or within your own tools, using one of our supported language bindings.

PRQL still has some minor bugs and some missing features, and probably is only ready to be rolled out to non-technical teams for fairly simple queries.

Here’s our current Roadmap and our Milestones.

Our immediate focus for the code is on:

  • Building out the next few big features, including types and modules.
  • Ensuring our supported features feel extremely robust; resolving any priority bugs.

We’re also spending time thinking about:

  • Making it really easy to start using PRQL. We’re doing that by building integrations with tools that folks already use; for example our VS Code extension & Jupyter integration. If there are tools you’re familiar with that you think would be open to integrating with PRQL, please let us know in an issue.
  • Making it easier to contribute to the compiler. We have a wide group of contributors to the project, but contributions to the compiler itself are quite concentrated. We’re keen to expand this; #1840 for feedback.

The release has 131 commits from 10 contributors. Particular credit goes to to @eitsupi & @jelenkee, who have made significant contributions, and @vanillajonathan, whose prolific contribution include our growing language bindings.

A small selection of the changes:

Features:

  • prqlc compile adds --color & --include-signature-comment options. (@max-sixty, #2267)

Web:

  • Added the PRQL snippets from the book to the Playground (@jelenkee, #2197)

Internal changes:

  • Breaking: The compile function’s Options now includes a color member, which determines whether error messages use ANSI color codes. This is technically a breaking change to the API. (@max-sixty, #2251)
  • The Error struct now exposes the MessageKind enum. (@vanillajonathan, #2307)
  • Integration tests run in CI with DuckDB, SQLite, PostgreSQL, MySQL and SQL Server (@jelenkee, #2286)

New Contributors:

  • @k-nut, with #2294

0.6.1 — 2023-03-12

0.6.1 is a small release containing an internal refactoring and improved bindings for C, PHP & .NET.

This release has 54 commits from 6 contributors. Selected changes:

Fixes:

  • No longer incorrectly compile to DISTINCT when a take 1 refers to a different set of columns than are in the group. (@max-sixty, with thanks to @cottrell, #2109)
  • The version specification of the dependency Chumsky was bumped from 0.9.0 to 0.9.2. 0.9.0 has a bug that causes an infinite loop. (@eitsupi, #2110)

Documentation:

Integrations:

  • [prql-lib] Added C++ header file. (@vanillajonathan, #2126)

Internal changes:

  • Many of the items that were in the root of the repo have been aggregated into web & bindings, simplifying the repo’s structure. There’s also grammars & packages (@max-sixty, #2135, #2117, #2121).

0.6.0 — 2023-03-08

0.6.0 introduces a rewritten parser, giving us the ability to dramatically improve error messages, renames switch to case and includes lots of minor improvements and fixes. It also introduces loop, which compiles to WITH RECURSIVE, as a highly experimental feature.

There are a few cases of breaking changes, including switching switch to case, in case that’s confusing. There are also some minor parsing changes outlined below.

This release has 108 commits from 11 contributors. Selected changes:

Features:

  • Add a (highly experimental) loop language feature, which translates to WITH RECURSIVE. We expect changes and refinements in upcoming releases. (#1642, @aljazerzen)

  • Rename the experimental switch function to case given it more closely matches the traditional semantics of case. (@max-sixty, #2036)

  • Change the case syntax to use => instead of -> to distinguish it from function syntax.

  • Convert parser from pest to Chumsky (@aljazerzen, #1818)

    • Improved error messages, and the potential to make even better in the future. Many of these improvements come from error recovery.
    • String escapes (\n \t).
    • Raw strings that don’t escape backslashes.
    • String interpolations can only contain identifiers and not any expression.
    • Operator associativity has been changed from right-to-left to left-to-right to be more similar to other conventional languages.
    • and now has a higher precedence than or (of same reason as the previous point).
    • Dates, times and timestamps have stricter parsing rules.
    • let, func, prql, case are now treated as keywords.
    • Float literals without fraction part are not allowed anymore (1.).
  • Add a --format option to prqlc parse which can return the AST in YAML (@max-sixty, #1962)

  • Add a new subcommand prqlc jinja. (@aljazerzen, #1722)

  • Breaking: prql-compiler no longer passes text containing {{ & }} through to the output. (@aljazerzen, #1722)

    For example, the following PRQL query

    from {{foo}}
    

    was compiled to the following SQL previously, but now it raises an error.

    SELECT
      *
    FROM
      {{ foo }}
    

    This pass-through feature existed for integration with dbt.

    We’re again considering how to best integrate with dbt, and this change is based on the idea that the jinja macro should run before the PRQL compiler.

    If you’re interested in dbt integration, subscribe or 👍 to https://github.com/dbt-labs/dbt-core/pull/5982.

  • A new compile target "sql.any". When "sql.any" is used as the target of the compile function’s option, the target contained in the query header will be used. (@aljazerzen, #1995)

  • Support for SQL parameters with similar syntax (#1957, @aljazerzen)

  • Allow : to be elided in timezones, such as 0800 in @2020-01-01T13:19:55-0800 (@max-sixty, #1991).

  • Add std.upper and std.lower functions for changing string casing (@Jelenkee, #2019).

Fixes:

  • prqlc compile returns a non-zero exit code for invalid queries. (@max-sixty, #1924)
  • Identifiers can contain any alphabetic unicode characters (@max-sixty, #2003)

Documentation:

  • Operator precedence (@aljazerzen, #1818)
  • Error messages for invalid queries are displayed in the book (@max-sixty, #2015)

Integrations:

  • [prql-php] Added PHP bindings. (@vanillajonathan, #1860)
  • [prql-dotnet] Added .NET bindings. (@vanillajonathan, #1917)
  • [prql-lib] Added C header file. (@vanillajonathan, #1879)
  • Added a workflow building a .deb on each release. (Note that it’s not yet published on each release). (@vanillajonathan, #1883)
  • Added a workflow building a .rpm on each release. (Note that it’s not yet published on each release). (@vanillajonathan, #1918)
  • Added a workflow building a Snap package on each release. (@vanillajonathan, #1881)

Internal changes:

  • Test that the output of our nascent autoformatter can be successfully compiled into SQL. Failing examples are now clearly labeled. (@max-sixty, #2016)
  • Definition files have been added to configure Dev Containers for Rust development environment. (@eitsupi, #1893, #2025, #2028)

New Contributors:

  • @linux-china, with #1971
  • @Jelenkee, with #2019

0.5.2 — 2023-02-18

0.5.2 is a tiny release to fix an build issue in yesterday’s prql-js 0.5.1 release.

This release has 7 commits from 2 contributors.

New Contributors:

  • @matthias-Q, with #1873

0.5.1 — 2023-02-17

0.5.1 contains a few fixes, and another change to how bindings handle default target / dialects.

This release has 53 commits from 7 contributors. Selected changes:

Fixes:

  • Delegate dividing literal integers to the DB. Previously integer division was executed during PRQL compilation, which could be confusing given that behavior is different across DBs. Other arithmetic operations are still executed during compilation. (@max-sixty, #1747)

Documentation:

  • Add docs on the from_text transform (@max-sixty, #1756)

Integrations:

  • [prql-js] Default compile target changed from Sql(Generic) to Sql(None). (@eitsupi, #1856)
  • [prql-python] Compilation options can now be specified from Python. (@eitsupi, #1807)
  • [prql-python] Default compile target changed from Sql(Generic) to Sql(None). (@eitsupi, #1861)

New Contributors:

  • @vanillajonathan, with #1766

0.5.0 — 2023-02-08

0.5.0 contains a few fixes, some improvements to bindings, lots of docs improvements, and some work on forthcoming features. It contains one breaking change in the compiler’s Options interface.

This release has 74 commits from 12 contributors. Selected changes:

Features:

  • Change public API to use target instead of dialect in preparation for feature work (@aljazerzen, #1684)

  • prqlc watch command which watches filesystem for changes and compiles .prql files to .sql (@aljazerzen, #1708)

Fixes:

  • Support double brackets in s-strings which aren’t symmetric (@max-sixty, #1650)
  • Support Postgres’s Interval syntax (@max-sixty, #1649)
  • Fixed tests for prql-elixir with macOS (@kasvith, #1707)

Documentation:

  • Add a documentation test for prql-compiler, update prql-compiler README, and include the README in the prql book section for Rust bindings. The code examples in the README are included and tested as doctests in the prql-compiler (@nkicg6, #1679)

Internal changes:

  • Add tests for all PRQL website examples to prql-python to ensure compiled results match expected SQL (@nkicg6, #1719)

New Contributors:

  • @ruslandoga, with #1628
  • @RalfNorthman, with #1632
  • @nicot, with #1662

0.4.2 — 2023-01-25

Features:

  • New from_text format-arg string-arg function that supports JSON and CSV formats. format-arg can be format:csv or format:json. string-arg can be a string in any format. (@aljazerzen & @snth, #1514)

    from_text format:csv """
    a,b,c
    1,2,3
    4,5,6
    """
    
    from_text format:json '''
        [{"a": 1, "b": "x", "c": false }, {"a": 4, "b": "y", "c": null }]
    '''
    
    from_text format:json '''{
        "columns": ["a", "b", "c"],
        "data": [
            [1, "x", false],
            [4, "y", null]
        ]
    }'''
    

    For now, the argument is limited to string constants.

Fixes

  • Export constructor for SQLCompileOptions (@bcho, #1621)
  • Remove backticks in count_distinct (@aljazerzen, #1611)

New Contributors

  • @1Kinoti, with #1596
  • @veenaamb, with #1614

0.4.1 — 2023-01-18

0.4.1 comes a few days after 0.4.0, with a couple of features and the release of prqlc, the CLI crate.

0.4.1 has 35 commits from 6 contributors.

Features:

  • Inferred column names include the relation name (@aljazerzen, #1550):

    from albums
    select title # name used to be inferred as title only
    select albums.title # so using albums was not possible here
    
  • Quoted identifiers such as dir/*.parquet are passed through to SQL. (@max-sixty, #1516).

  • The CLI is installed with cargo install prqlc. The binary was renamed in 0.4.0 but required an additional --features flag, which has been removed in favor of this new crate (@max-sixty & @aljazerzen, #1549).

New Contributors:

  • @fool1280, with #1554
  • @nkicg6, with #1567

0.4.0 — 2023-01-15

0.4.0 brings lots of new features including case, select ![] and numbers with underscores. We have initial (unpublished) bindings to Elixir. And there’s the usual improvements to fixes & documentation (only a minority are listed below in this release).

0.4.0 also has some breaking changes: table is let, dialect is renamed to target, and the compiler’s API has changed. Full details below.

Features:

  • Defining a temporary table is now expressed as let rather than table (@aljazerzen, #1315). See the tables docs for details.

  • Experimental: The case function sets a variable to a value based on one of several expressions (@aljazerzen, #1278).

    derive var = case [
      score <= 10 -> "low",
      score <= 30 -> "medium",
      score <= 70 -> "high",
      true -> "very high",
    ]
    

    …compiles to:

    SELECT
      *,
      CASE
        WHEN score <= 10 THEN 'low'
        WHEN score <= 30 THEN 'medium'
        WHEN score <= 70 THEN 'high'
        ELSE 'very high'
      END AS var
    FROM
      bar
    

    Check out the case docs for more details.

  • Experimental: Columns can be excluded by name with select (@aljazerzen, #1329)

    from albums
    select ![title, composer]
    
  • Experimental: append transform, equivalent to UNION ALL in SQL. (@aljazerzen, #894)

    from employees
    append managers
    

    Check out the append docs for more details.

  • Numbers can contain underscores, which can make reading long numbers easier (@max-sixty, #1467):

    from numbers
    select {
        small = 1.000_000_1,
        big = 5_000_000,
    }
    
  • The SQL output contains a comment with the PRQL compiler version (@aljazerzen, #1322)

  • dialect is renamed to target, and its values are prefixed with sql. (@max-sixty, #1388); for example:

    prql target:sql.bigquery  # previously was `dialect:bigquery`
    
    from employees
    

    This gives us the flexibility to target other languages than SQL in the long term.

  • Tables definitions can contain a bare s-string (@max-sixty, #1422), which enables us to include a full CTE of SQL, for example:

    let grouping = s"""
      SELECT SUM(a)
      FROM tbl
      GROUP BY
        GROUPING SETS
        ((b, c, d), (d), (b, d))
    """
    
  • Ranges supplied to in can be half-open (@aljazerzen, #1330).

  • The crate’s external API has changed to allow for compiling to intermediate representation. This also affects bindings. See prql-compiler docs for more details.

Fixes:

[This release, the changelog only contains a subset of fixes]

  • Allow interpolations in table s-strings (@aljazerzen, #1337)

Documentation:

[This release, the changelog only contains a subset of documentation improvements]

  • Add docs on aliases in Select
  • Add JS template literal and multiline example (@BCsabaEngine, #1432)
  • JS template literal and multiline example (@BCsabaEngine, #1432)
  • Improve prql-compiler docs & examples (@aljazerzen, #1515)
  • Fix string highlighting in book (@max-sixty, #1264)

Web:

  • The playground allows querying some sample data. As before, the result updates on every keystroke. (@aljazerzen, #1305)

Integrations:

[This release, the changelog only contains a subset of integration improvements]

  • Added Elixir integration exposing PRQL functions as NIFs (#1500, @kasvith)
  • Exposed Elixir flavor with exceptions (#1513, @kasvith)
  • Rename prql-compiler binary to prqlc (@aljazerzen #1515)

Internal changes:

[This release, the changelog only contains a subset of internal changes]

  • Add parsing for negative select (@max-sixty, #1317)
  • Allow for additional builtin functions (@aljazerzen, #1325)
  • Add an automated check for typos (@max-sixty, #1421)
  • Add tasks for running playground & book (@max-sixty, #1265)
  • Add tasks for running tests on every file change (@max-sixty, #1380)

New contributors:

  • @EArazli, with #1359
  • @boramalper, with #1362
  • @allurefx, with #1377
  • @bcho, with #1375
  • @JettChenT, with #1385
  • @BlurrechDev, with #1411
  • @BCsabaEngine, with #1432
  • @kasvith, with #1500

0.3.1 - 2022-12-03

0.3.1 brings a couple of small improvements and fixes.

Features:

  • Support for using s-strings for from (#1197, @aljazerzen)

    from s"SELECT * FROM employees WHERE foo > 5"
    
  • Helpful error message when referencing a table in an s-string (#1203, @aljazerzen)

Fixes:

  • Multiple columns with same name created (#1211, @aljazerzen)
  • Renaming via select breaks preceding sorting (#1204, @aljazerzen)
  • Same column gets selected multiple times (#1186, @mklopets)

Internal:

  • Update Github Actions and Workflows to current version numbers (and avoid using Node 12)

0.3.0 — 2022-11-29

🎉 0.3.0 is the biggest ever change in PRQL’s compiler, rewriting much of the internals: the compiler now has a semantic understanding of expressions, including resolving names & building a DAG of column lineage 🎉.

While the immediate changes to the language are modest — some long-running bugs are fixed — this unlocks the development of many of the project’s long-term priorities, such as type-checking & auto-complete. And it simplifies the building of our next language features, such as match-case expressions, unions & table expressions.

@aljazerzen has (mostly single-handedly) done this work over the past few months. The project owes him immense appreciation.

Breaking changes:

We’ve had to make some modest breaking changes for 0.3:

  • Pipelines must start with from. For example, a pipeline with only derive foo = 5, with no from transform, is no longer valid. Depending on demand for this feature, it would be possible to add this back.

  • Shared column names now require == in a join. The existing approach is ambiguous to the compiler — id in the following example could be a boolean column.

    from employees
    -join positions [id]
    +join positions [==id]
    
  • Table references containing periods must be surrounded by backticks. For example, when referencing a schema name:

    -from public.sometable
    +from `public.sometable`
    

Features:

  • Change self equality op to == (#1176, @aljazerzen)
  • Add logging (@aljazerzen)
  • Add clickhouse dialect (#1090, @max-sixty)
  • Allow namespaces & tables to contain . (#1079, @aljazerzen)

Fixes:

  • Deduplicate column appearing in SELECT multiple times (#1186, @aljazerzen)
  • Fix uppercase table names (#1184, @aljazerzen)
  • Omit table name when only one ident in SELECT (#1094, @aljazerzen)

Documentation:

  • Add chapter on semantics’ internals (@aljazerzen, #1028)
  • Add note about nesting variables in s-strings (@max-sixty, #1163)

Internal changes:

  • Flatten group and window (#1120, @aljazerzen)
  • Split ast into expr and stmt (@aljazerzen)
  • Refactor associativity (#1156, @aljazerzen)
  • Rename Ident constructor to from_name (#1084, @aljazerzen)
  • Refactor rq folding (#1177, @aljazerzen)
  • Add tests for reported bugs fixes in semantic (#1174, @aljazerzen)
  • Bump duckdb from 0.5.0 to 0.6.0 (#1132)
  • Bump once_cell from 1.15.0 to 1.16.0 (#1101)
  • Bump pest from 2.4.0 to 2.5.0 (#1161)
  • Bump pest_derive from 2.4.0 to 2.5.0 (#1179)
  • Bump sqlparser from 0.25.0 to 0.27.0 (#1131)
  • Bump trash from 2.1.5 to 3.0.0 (#1178)

0.2.11 — 2022-11-20

0.2.11 contains a few helpful fixes.

Work continues on our semantic refactor — look out for 0.3.0 soon! Many thanks to @aljazerzen for his continued contributions to this.

Note: 0.2.10 was skipped due to this maintainer’s inability to read his own docs on bumping versions…

Features:

  • Detect when compiler version is behind query version (@MarinPostma, #1058)
  • Add __version__ to prql-python package (@max-sixty, #1034)

Fixes:

  • Fix nesting of expressions with equal binding strength and left associativity, such as a - (b - c) (@max-sixty, #1136)
  • Retain floats without significant digits as floats (@max-sixty, #1141)

Documentation:

  • Add documentation of prqlr bindings (@eitsupi, #1091)
  • Add a ‘Why PRQL’ section to the website (@max-sixty, #1098)
  • Add @snth to core-devs (@max-sixty, #1050)

Internal changes:

  • Use workspace versioning (@max-sixty, #1065)

0.2.9 — 2022-10-14

0.2.9 is a small release containing a bug fix for empty strings.

Fixes:

  • Fix parsing of empty strings (@aljazerzen, #1024)

0.2.8 — 2022-10-10

0.2.8 is another modest release with some fixes, doc improvements, bindings improvements, and lots of internal changes. Note that one of the fixes causes the behavior of round and cast to change slightly — though it’s handled as a fix rather than a breaking change in semantic versioning.

Fixes:

  • Change order of the round & cast function parameters to have the column last; for example round 2 foo_col / cast int foo. This is consistent with other functions, and makes piping possible:

    derive [
      gross_salary = (salary + payroll_tax | as int),
      gross_salary_rounded = (gross_salary | round 0),
    ]
    

Documentation:

  • Split DEVELOPMENT.md from CONTRIBUTING.md (@richb-hanover, #1010)
  • Make s-strings more prominent in website intro (@max-sixty, #982)

Web:

  • Add GitHub star count to website (@max-sixty, #990)

Integrations:

  • Expose a shortened error message, in particular for the VS Code extension (@aljazerzen, #1005)

Internal changes:

  • Specify 1.60.0 as minimum Rust version (@max-sixty, #1011)
  • Remove old wee-alloc code (@max-sixty, #1013)
  • Upgrade clap to version 4 (@aj-bagwell, #1004)
  • Improve book-building script in Taskfile (@max-sixty, #989)
  • Publish website using an artifact rather than a long-lived branch (@max-sixty, #1009)

0.2.7 — 2022-09-17

0.2.7 is a fairly modest release, six weeks after 0.2.6. We have some more significant features, including a union operator and an overhaul of our type system, as open PRs which will follow in future releases.

We also have new features in the VS Code extension, courtesy of @jiripospisil, including a live output panel.

Fixes:

  • range_of_ranges checks the Range end is smaller than its start (@shuozeli, #946)

Documentation:

  • Improve various docs (@max-sixty, #974, #971, #972, #970, #925)
  • Add reference to EdgeDB’s blog post in our FAQ (@max-sixty, #922)
  • Fix typos (@kianmeng, #943)

Integrations:

  • Add prql-lib, enabling language bindings with go (@sigxcpu76, #923)
  • Fix line numbers in JS exceptions (@charlie-sanders, #929)

Internal changes:

  • Lock the version of the rust-toolchain, with auto-updates (@max-sixty, #926, #927)

0.2.6 — 2022-08-05

Fixes:

  • Adjust fmt to only escape names when needed (@aljazerzen, #907)
  • Fix quoting on upper case table names (@max-sixty, #893)
  • Fix scoping of identical column names from multiple tables (@max-sixty, #908)
  • Fix parse error on newlines in a table (@sebastiantoh 🆕, #902)
  • Fix quoting of upper case table names (@max-sixty, #893)

Documentation:

  • Add docs on Architecture (@aljazerzen, #904)
  • Add Changelog (@max-sixty, #890 #891)

Internal changes:

  • Start trial using Conventional Commits (@max-sixty, #889)
  • Add crates.io release workflow, docs (@max-sixty, #887)

0.2.5 - 2022-07-29

0.2.5 is a very small release following 0.2.4 yesterday. It includes:

  • Add the ability to represent single brackets in an s-string, with two brackets (#752, @max-sixty)
  • Fix the “Copy to Clipboard” command in the Playground, for Firefox (#880, @mklopets)

0.2.4 - 2022-07-28

0.2.4 is a small release following 0.2.3 a few days ago. The 0.2.4 release includes:

  • Enrich our CLI, adding commands to get different stages of the compilation process (@aljazerzen , #863)
  • Fix multiple take n statements in a query, leading to duplicate proxy columns in generated SQL (@charlie-sanders)
  • Fix BigQuery quoting of identifiers in SELECT statements (@max-sixty)
  • Some internal changes — reorganize top-level functions (@aljazerzen), add a workflow to track our Rust compilation time (@max-sixty), simplify our simple prql-to-sql tests (@max-sixty)

Thanks to @ankane, prql-compiler is now available from homebrew core; brew install prql-compiler1.

1
we still need to update docs and add a release workflow for this:
<https://github.com/PRQL/prql/issues/866>

0.2.3 - 2022-07-24

A couple of weeks since the 0.2.2 release: we’ve squashed a few bugs, added some mid-sized features to the language, and made a bunch of internal improvements.

The 0.2.3 release includes:

  • Allow for escaping otherwise-invalid identifiers (@aljazerzen & @max-sixty)
  • Fix a bug around operator precedence (@max-sixty)
  • Add a section the book on the language bindings (@charlie-sanders)
  • Add tests for our Display representation while fixing some existing bugs. This is gradually becoming our code formatter (@arrizalamin)
  • Add a “copy to clipboard” button in the Playground (@mklopets)
  • Add lots of guidance to our CONTRIBUTING.md around our tests and process for merging (@max-sixty)
  • Add a prql! macro for parsing a prql query at compile time (@aljazerzen)
  • Add tests for prql-js (@charlie-sanders)
  • Add a from_json method for transforming json to a PRQL string (@arrizalamin)
  • Add a workflow to release prql-java to Maven (@doki23)
  • Enable running all tests from a PR by adding a pr-run-all-tests label (@max-sixty)
  • Have cargo-release to bump all crate & npm versions (@max-sixty)
  • Update prql-js to use the bundler build of prql-js (@mklopets)

As well as those contribution changes, thanks to those who’ve reported issues, such as @mklopets @huw @mm444 @ajfriend.

From here, we’re planning to continue squashing bugs (albeit more minor than those in this release), adding some features like union, while working on bigger issues such as type-inference.

We’re also going to document and modularize the compiler further. It’s important that we give more people an opportunity to contribute to the guts of PRQL, especially given the number and enthusiasm of contributions to project in general — and it’s not that easy to do so at the moment. While this is ongoing if anyone has something they’d like to work on in the more difficult parts of the compiler, let us know on GitHub or Discord, and we’d be happy to work together on it.

Thank you!

0.2.2 - 2022-07-10

We’re a couple of weeks since our 0.2.0 release. Thanks for the surge in interest and contributions! 0.2.2 has some fixes & some internal improvements:

  • We now test against SQLite & DuckDB on every commit, to ensure we’re producing correct SQL. (@aljazerzen)
  • We have the beginning of Java bindings! (@doki23)
  • Idents surrounded by backticks are passed through to SQL (@max-sixty)
  • More examples on homepage; e.g. join & window, lots of small docs improvements
  • Automated releases to homebrew (@roG0d)
  • prql-js is now a single package for Node, browsers & webpack (@charlie-sanders)
  • Parsing has some fixes, including >= and leading underscores in idents (@mklopets)
  • Ranges receive correct syntax highlighting (@max-sixty)

Thanks to Aljaž Mur Eržen @aljazerzen , George Roldugin @roldugin , Jasper McCulloch @Jaspooky , Jie Han @doki23 , Marko Klopets @mklopets , Maximilian Roos @max-sixty , Rodrigo Garcia @roG0d , Ryan Russell @ryanrussell , Steven Maude @StevenMaude , Charlie Sanders @charlie-sanders .

We’re planning to continue collecting bugs & feature requests from users, as well as working on some of the bigger features, like type-inference.

For those interesting in joining, we also have a new Contributing page.

0.2.0 - 2022-06-27

🎉 🎉 After several months of building, PRQL is ready to use! 🎉 🎉


How we got here:

At the end of January, we published a proposal of a better language for data transformation: PRQL. The reception was better than I could have hoped for — we were no. 2 on HackerNews for a day, and gained 2.5K GitHub stars over the next few days.

But man cannot live on GitHub Stars alone — we had to do the work to build it. So over the next several months, during many evenings & weekends, a growing group of us gradually built the compiler, evolved the language, and wrote some integrations.

We want to double-down on the community and its roots in open source — it’s incredible that a few of us from all over the globe have collaborated on a project without ever having met. We decided early-on that PRQL would always be open-source and would never have a commercial product (despite lots of outside interest to fund a seed round!). Because languages are so deep in the stack, and the data stack has so many players, the best chance of building a great language is to build an open language.


We still have a long way to go. While PRQL is usable, it has lots of missing features, and an incredible amount of unfulfilled potential, including a language server, cohesion with databases, and type inference. Over the coming weeks, we’d like to grow the number of intrepid users experimenting PRQL in their projects, prioritize features that will unblock them, and then start fulfilling PRQL’s potential by working through our roadmap.

The best way to experience PRQL is to try it. Check out our website and the Playground. Start using PRQL for your own projects in dbt, Jupyter notebooks and Prefect workflows.

Keep in touch with PRQL by following the project on Twitter, joining us on Discord, starring the repo.

Contribute to the project — we’re a really friendly community, whether you’re a recent SQL user or an advanced Rust programmer. We need bug reports, documentation tweaks & feature requests — just as much as we need compiler improvements written in Rust.


I especially want to give Aljaž Mur Eržen (@aljazerzen) the credit he deserves, who has contributed the majority of the difficult work of building out the compiler. Much credit also goes to @charlie-sanders, one of PRQL’s earliest supporters and the author of pyprql, and Ryan Patterson-Cross (@rbpatt2019), who built the Jupyter integration among other Python contributions.

Other contributors who deserve a special mention include: @roG0d, @snth, @kwigley


Thank you, and we look forward to your feedback!

Target & Version

Target dialect

PRQL allows specifying a target dialect at the top of the query, which allows PRQL to compile to a database-specific SQL flavor.

Examples

PRQL

prql target:sql.postgres

from employees
sort age
take 10

SQL

SELECT
  *
FROM
  employees
ORDER BY
  age
LIMIT
  10

PRQL

prql target:sql.mssql

from employees
sort age
take 10

SQL

SELECT
  *
FROM
  employees
ORDER BY
  age OFFSET 0 ROWS
FETCH FIRST
  10 ROWS ONLY

Dialects

Supported

Supported dialects support all PRQL language features where possible, are tested on every commit, and we’ll endeavor to fix bugs.

  • sql.clickhouse
  • sql.duckdb
  • sql.generic 1
  • sql.glaredb
  • sql.mysql
  • sql.postgres
  • sql.sqlite

Unsupported

Unsupported dialects have implementations in the compiler, but are tested minimally or not at all, and may have gaps for some features.

We’re open to contributions to improve our coverage of these, and to adding additional dialects.

  • sql.mssql
  • sql.ansi
  • sql.bigquery
  • sql.snowflake

Priority of targets

The compile target of a query is defined in the query’s header or as an argument to the compiler. option. The argument to the compiler takes precedence.

For example, the following shell example specifies sql.generic in the query and sql.duckdb in the --target option of the prqlc compile command. In this case, sql.duckdb takes precedence and the SQL output is based on the DuckDB dialect.

echo 'prql target:sql.generic
      from foo' | prqlc compile --target sql.duckdb

To use the target described in the query, a special target sql.any can be specified in the compiler option.

echo 'prql target:sql.generic
      from foo' | prqlc compile --target sql.any

Version

PRQL allows specifying a version of the language in the PRQL header, like:

PRQL

prql version:"0.13.1"

from employees

SQL

SELECT
  *
FROM
  employees

This has two roles, one of which is implemented:

  • The compiler will raise an error if the compiler is older than the query version. This prevents confusing errors when queries use newer features of the language but the compiler hasn’t yet been upgraded.
  • The compiler will compile for the major version of the query. This allows the language to evolve without breaking existing queries, or forcing multiple installations of the compiler. This isn’t yet implemented, but is a gating feature for PRQL 1.0.

The version of the compiler currently in use can be called using the special function std.prql.version in PRQL.

PRQL

[{version = prql.version}]

SQL

WITH table_0 AS (
  SELECT
    '0.13.2' AS version
)
SELECT
  version
FROM
  table_0

Note

This function was renamed from std.prql_version to prql.version in PRQL 0.11.1. std.prql_version will be removed in PRQL 0.12.0.


1: while there’s no “generic” DB to test sql.generic against, we still count it as supported.

Bindings

PRQL has bindings for many languages. These include:

We have three tiers of bindings:

  • Supported
  • Unsupported
  • Nascent

Supported

Supported bindings require:

  • A maintainer.
  • Implementations of the core compile functions.
  • Test coverage for these functions.
  • A published package to the language’s standard package repository.
  • A script in Taskfile.yaml to bootstrap a development environment.
  • Any dev tools, such as a linter & formatter, in pre-commit or MegaLinter.

The currently supported bindings are:

Most of these are in the main PRQL repo, and we gate any changes to the compiler’s API on compatible changes to the bindings.

Unsupported

Unsupported bindings work, but don’t fulfil all of the above criteria. We don’t gate changes to the compiler’s API. If they stop working, we’ll demote them to nascent.

Nascent

Nascent bindings are in development, and may not yet fully work.

Naming

Over time, we’re trying to move to a consistent naming scheme:

  • Crates are named prqlc-$lang.
  • Where possible, packages are published to each language’s package repository as prqlc.

prql-dotnet

prql-net offers PRQL bindings for .NET bindings as a netstandard2.0 library.

It provides the PrqlCompiler class which contains the ToJson and ToSql static methods.

It’s still at an early stage, and isn’t published to NuGet. Contributions are welcome.

Installation

Make sure that libprqlc_c.so (Linux), libprqlc_c.dylib (macOS) or libprqlc_c.dll (Windows) is in the project’s bin directory together with PrqlCompiler.dll and the rest of the project’s compiled files. I.e. {your_project}/bin/Debug/net7.0/.

The libprqlc_c library gets dynamically imported at runtime.

Usage

using Prql.Compiler;

var options = new PrqlCompilerOptions
{
    Format = false,
    SignatureComment = false,
};
var sql = PrqlCompiler.Compile("from employees", options);
Console.WriteLine(sql);

TODO

This is currently at 0.1.0 because we’re waiting to update prqlc-c for the latest API. When we’ve done that, we can match the version here with the broader PRQL version.

PRQL

PRQL bindings for Elixir.

Installation

def deps do
  [
    {:prql, "~> 0.1.0"}
  ]
end

Basic Usage

  iex> PRQL.compile("from customers")
      {:ok, "SELECT\n  *\nFROM\n  customers\n\n-- Generated by PRQL compiler version 0.3.1 (https://prql-lang.org)\n"}


  iex> PRQL.compile("from customers\ntake 10", dialect: :mssql)
  {:ok, "SELECT\n  *\nFROM\n  customers\nORDER BY\n  (\n    SELECT\n      NULL\n  ) OFFSET 0 ROWS\nFETCH FIRST\n  10 ROWS ONLY\n\n-- Generated by PRQL compiler version 0.3.1 (https://prql-lang.org)\n"}

Development

We are in the early stages of developing Elixir bindings.

We’re using Rustler to provide Rust bindings for prqlc.

Currently using the bindings in an Elixir project requires compiling the Rust crate from this repo:

  • Install dependencies with mix deps.get
  • Compile project mix compile
  • Run tests mix test

Future work includes publishing pre-compiled artifacts, so Elixir projects can run PRQL without needing a Rust toolchain.

Mac

We currently don’t enable compilation for Mac. This is possible to enable, but causes some issues with cargo’s compilation cache. Briefly: it requires RUSTFLAGS to be set, and because of https://github.com/rust-lang/cargo/issues/8716 & https://github.com/rust-lang/cargo/issues/8899, any compilation of a different target will bust the cache.

The possible future workarounds include:

  • Passing --target=aarch64-apple-darwin to every cargo call, which is inconvenient and can be difficult in some situations; e.g. Rust Analyzer. This disables passing RUSTFLAGS (I’m actually unclear why prql-elixir builds successfully in that case…)
  • Directing other cargo calls to different paths, such as /target-ra for Rust Analyzer and /target-book for the book building. But one cargo build from the terminal without either the target or target_dir specified will bust the cache!
  • Never compiling for other targets. But our standard tests run for --target=wasm32-unknown-unknown, so this requires refraining from using them.
  • Removing prql-elixir from our workspace, so that cargo commands in the PRQL workspace don’t require rust flags. This would work well, but means we need separate test coverage for this crate, which adds some weight to the tests.

If prql-elixir becomes more used (for example, we start publishing to Hex, or Mac developers want to work on it), then we can re-enable and deal with the caching issues. We can also re-enable them if the cargo issue is resolved.

To test on Mac temporarily — for example if there’s an error in GHA and we’re on a Mac locally — apply a diff like this (though this specific diff is now out-of-date), and then run cargo build from the prql-elixir path, which will enable the local .cargo/config.toml). (We could also make a feature like elixir-mac which enabled building on Mac).

diff --git a/prqlc/bindings/elixir/native/prql/Cargo.toml b/prqlc/bindings/elixir/native/prql/Cargo.toml
index 7194ca4f..9c7240ff 100644
--- a/prqlc/bindings/elixir/native/prql/Cargo.toml
+++ b/prqlc/bindings/elixir/native/prql/Cargo.toml
@@ -19,5 +19,5 @@ path = "src/lib.rs"
 prql-compiler = {path = "../../../../prql-compiler", default-features = false, version = "0.6.1"}

 # See Readme for details on Mac
-[target.'cfg(not(any(target_family="wasm", target_os = "macos")))'.dependencies]
+# [target.'cfg(not(any(target_family="wasm", target_os = "macos")))'.dependencies]
 rustler = "0.27.0"
diff --git a/prqlc/bindings/elixir/native/prql/src/lib.rs b/prqlc/bindings/elixir/native/prql/src/lib.rs
index 2c5c8f27..68e77217 100644
--- a/prqlc/bindings/elixir/native/prql/src/lib.rs
+++ b/prqlc/bindings/elixir/native/prql/src/lib.rs
@@ -1,5 +1,5 @@
 // See Readme for more information on Mac compiling
-#![cfg(not(target_os = "macos"))]
+// #![cfg(not(target_os = "macos"))]
 // These bindings aren't relevant on wasm
 #![cfg(not(target_family = "wasm"))]
 // TODO: unclear why we need this `allow`; it's required in `CompileOptions`,

prql-java

prql-java offers Java bindings to the prqlc Rust library. It exposes a Java native method public static native String toSql(String query).

It’s still at an early stage, and currently requires compiling locally, and isn’t published to Maven. Contributions are welcome.

Installation

<dependency>
    <groupId>org.prqllang</groupId>
    <artifactId>prql-java</artifactId>
    <version>${PRQL_VERSION}</version>
</dependency>

Usage

import org.prqllang.prql4j.PrqlCompiler;

class Main {
    public static void main(String[] args) {
        String sql = PrqlCompiler.toSql("from table");
        System.out.println(sql);
    }
}

prqlc-js

JavaScript bindings for prqlc.

Installation

npm install prqlc

Usage

Currently these functions are exposed

function compile(prql_query: string, options?: CompileOptions): string;

function prql_to_pl(prql_query: string): string;

function pl_to_prql(pl_json: string): string;

function pl_to_rq(pl_json: string): string;

function rq_to_sql(rq_json: string): string;

From Node.js

Direct usage

const prqlc = require("prqlc");

const sql = prqlc.compile(`from employees | select first_name`);
console.log(sql);

Options

const opts = new prqlc.CompileOptions();
opts.target = "sql.mssql";
opts.format = false;
opts.signature_comment = false;

const sql = prqlc.compile(`from employees | take 10`, opts);
console.log(sql);

Template literal

const prqlc = require("prqlc");
const prql = (string) => prqlc.compile(string[0] || "");

const sql = prql`from employees | select first_name`;
console.log(sql);

Template literal with newlines

const prqlc = require("prqlc");
const prql = (string) => prqlc.compile(string[0] || "");

const sql = prql`
    from employees
    select first_name
`;
console.log(sql);

From a browser

<html>
  <head>
    <script type="module">
      import init, { compile } from "./dist/web/prql_js.js";
      await init();

      const sql = compile("from employees | select first_name");
      console.log(sql);
    </script>
  </head>

  <body></body>
</html>

From a framework or a bundler

import compile from "prqlc/dist/bundler";

const sql = compile(`from employees | select first_name`);
console.log(sql);

Errors

Errors are returned as following object, serialized as a JSON array:

interface ErrorMessage {
  /// Message kind. Currently only Error is implemented.
  kind: "Error" | "Warning" | "Lint";
  /// Machine-readable identifier of the error
  code: string | null;
  /// Plain text of the error
  reason: string;
  /// A list of suggestions of how to fix the error
  hint: string | null;
  /// Character offset of error origin within a source file
  span: [number, number] | null;

  /// Annotated code, containing cause and hints.
  display: string | null;
  /// Line and column number of error origin within a source file
  location: SourceLocation | null;
}

/// Location within the source file.
/// Tuples contain:
/// - line number (0-based),
/// - column number within that line (0-based),
interface SourceLocation {
  start: [number, number];

  end: [number, number];
}

These errors can be caught as such:

try {
  const sql = prqlJs.compile(`from employees | foo first_name`);
} catch (error) {
  const errorMessages = JSON.parse(error.message).inner;

  console.log(errorMessages[0].display);
  console.log(errorMessages[0].location);
}

Development

Build:

npm run build

This builds Node, bundler and web packages in the dist path.

Test:

npm test

By default the wasm binaries are optimized on each run, even if the underlying code hasn’t changed, which can be slow. For a lower-latency dev loop, pass --profile=dev to npm install for a faster, less optimized build.

npm install prqlc --profile=dev

Notes

  • This uses wasm-pack to generate bindings1.
  • We’ve added an npm layer on top of the usual approach of just using wasm-pack, so we can distribute a single package with targets of node, bundler and no-modules — somewhat inverting the approach recommended by wasm-pack. The build instruction goes in a build script, rather than a pack script.
1
Though we would be very open to other approaches, given wasm-pack does not
seem maintained, and we're eliding many of its features to build for three
targets. See <https://github.com/PRQL/prql/issues/1836> for more details.

prql-php

prql-php offers PHP bindings to prqlc crate through FFI.

It provides the Compiler class which contains compile, prqlToPL, plToRQ and rqToSQL functions.

It’s still at an early stage, and isn’t published to Composer. Contributions are welcome.

Installation

The PHP FFI extension needs to be enabled. Set ffi.enable in your php.ini configuration file to "true".

Usage

<?php

use Prql\Compiler\Compiler;

$prql = new Compiler();
$result = $prql->compile("from employees");

echo $result->output;

Development

Environment

A way to establish a dev environment with PHP, the ext-ffi extension and Composer is to use a nix flake. After installing nix, enable experimental flakes feature:

mkdir -p ~/.config/nix
echo "experimental-features = nix-command flakes" >> ~/.config/nix/nix.conf

Now you can spawn a shell from prql-php/:

nix shell github:loophp/nix-shell#env-php81 --impure

This will pull-in ext-ffi extension, because it’s declared in composer.json.

Building

There is a task build-php script that:

  • runs cargo to build libprqlc_c,
  • copies libprqlc_c.* into lib,
  • copies prqlc.h into lib.

Tests

task build-php
task test-php

Code style

./vendor/bin/phpcs --standard=PSR12 src tests

PRQL - Python Bindings

Python bindings for PRQL, the Pipelined Relational Query Language.

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement. Like SQL, it’s readable, explicit and declarative. Unlike SQL, it forms a logical pipeline of transformations, and supports abstractions such as variables and functions. It can be used with any database that uses SQL, since it compiles to SQL.

PRQL can be as simple as:

from tracks
filter artist == "Bob Marley"     # Each line transforms the previous result
aggregate {                       # `aggregate` reduces each column to a value
  plays    = sum plays,
  longest  = max length,
  shortest = min length,          # Trailing commas are allowed
}

Installation

pip install prqlc

Usage

Basic usage:

import prqlc

prql_query = """
    from employees
    join salaries (==emp_id)
    group {employees.dept_id, employees.gender} (
      aggregate {
        avg_salary = average salaries.salary
      }
    )
"""

options = prqlc.CompileOptions(
    format=True, signature_comment=True, target="sql.postgres"
)

sql = prqlc.compile(prql_query)
sql_postgres = prqlc.compile(prql_query, options)

The following functions and classes are exposed:

def compile(prql_query: str, options: Optional[CompileOptions] = None) -> str:
    """Compiles a PRQL query into SQL."""
    ...

def prql_to_pl(prql_query: str) -> str:
    """Converts a PRQL query to PL AST in JSON format."""
    ...

def pl_to_prql(pl_json: str) -> str:
    """Converts PL AST as a JSON string into a formatted PRQL string."""
    ...

def pl_to_rq(pl_json: str) -> str:
    """Resolves and lowers PL AST (JSON) into RQ AST (JSON)."""
    ...

def rq_to_sql(rq_json: str, options: Optional[CompileOptions] = None) -> str:
    """Converts RQ AST (JSON) into a SQL query."""
    ...

class CompileOptions:
    def __init__(
        self,
        *,
        format: bool = True,
        target: str = "sql.any",
        signature_comment: bool = True,
    ) -> None:
    """Compilation options for SQL backend of the compiler.

    Args:
        format (bool): Pass generated SQL string through a formatter that splits
            it into multiple lines and prettifies indentation and spacing.
            Defaults to True.
        target (str): Target dialect to compile to. Defaults to "sql.any", which
            uses the 'target' argument from the query header to determine the
            SQL dialect. Other targets are available by calling the `get_targets`
            function.
        signature_comment (bool): Emits the compiler signature as a comment after
            the generated SQL. Defaults to True.

    """
    ...

def get_targets() -> list[str]:
    """List available target dialects for compilation."""
    ...

Debugging functions

The following functions are available within the prqlc.debug module. They are for experimental purposes and may be unstable.

def prql_lineage(prql_query: str) -> str:
    """Computes a column-level lineage graph from a PRQL query.

    Returns JSON-formatted string. See the docs for the `prqlc debug lineage`
    CLI command for more details.
    """
    ...

def pl_to_lineage(pl_json: str) -> str:
    """Computes a column-level lineage graph from PL AST (JSON)."""
    ...

Notes

These bindings are in a crate named prqlc-python and published to a Python package on PyPI named prqlc, available at https://pypi.org/project/prqlc. This crate is not published to crates.io.

The package is consumed by pyprql & dbt-prql.

Relies on pyo3 for all the magic.

R (prqlr)

R bindings for prqlc.

prqlr also includes knitr (R Markdown and Quarto) integration, which allows us to easily create documents with the PRQL conversion results embedded in.

Check out https://prql.github.io/prqlc-r/ for more context.

Note

prqlr is generously maintained by @eitsupi in the PRQL/prqlc-r repo.

Installation

install.packages("prqlr")

Please check the documentation of the prqlc crate.

Integrations

PRQL is building integrations with lots of external tools, including:

PRQL compiler

prqlc is the reference implementation of a compiler from PRQL to SQL, written in Rust. It also serves as the CLI.

For more on PRQL, check out the PRQL website or the PRQL repo.

CLI

prqlc serves as a CLI for the PRQL compiler. It is a single, dependency-free binary that compiles PRQL into SQL.

Usage

prqlc compile

This command works as a filter that compiles a PRQL string into an SQL string.

$ echo 'from employees | filter has_dog | select salary' | prqlc compile

SELECT
  salary
FROM
  employees
WHERE
  has_dog

A PRQL query can be executed with CLI tools compatible with SQL, such as DuckDB CLI.

$ curl -fsL https://raw.githubusercontent.com/PRQL/prql/0.12.2/prqlc/prqlc/tests/integration/data/chinook/albums.csv -o albums.csv
$ echo 'from `albums.csv` | take 3' | prqlc compile | duckdb
┌──────────┬───────────────────────────────────────┬───────────┐
│ album_id │                 title                 │ artist_id │
│  int64   │                varchar                │   int64   │
├──────────┼───────────────────────────────────────┼───────────┤
│        1 │ For Those About To Rock We Salute You │         1 │
│        2 │ Balls to the Wall                     │         2 │
│        3 │ Restless and Wild                     │         2 │
└──────────┴───────────────────────────────────────┴───────────┘

Executing this command without any argument will start interactive mode, allowing a PRQL query to be written interactively. In this mode, after writing PRQL and press Ctrl-d (Linux, macOS) or Ctrl-z (Windows) to display the compiled SQL.

prqlc compile

Just like when using it as a filter, SQL string output can be passed to the DuckDB CLI and similar tools.

$ prqlc compile | duckdb
Enter PRQL, then press ctrl-d to compile:

from `albums.csv`
take 3
┌──────────┬───────────────────────────────────────┬───────────┐
│ album_id │                 title                 │ artist_id │
│  int64   │                varchar                │   int64   │
├──────────┼───────────────────────────────────────┼───────────┤
│        1 │ For Those About To Rock We Salute You │         1 │
│        2 │ Balls to the Wall                     │         2 │
│        3 │ Restless and Wild                     │         2 │
└──────────┴───────────────────────────────────────┴───────────┘

Installation

Packaging status

via Homebrew (macOS, Linux)

brew install prqlc

via winget (Windows)

winget install prqlc

From GitHub release page

Precompiled binaries are available for Linux, macOS, and Windows on the PRQL release page.

From source

# From crates.io
cargo install prqlc
# From a local PRQL repository
cargo install --path prqlc/prqlc

Shell completions

The prqlc shell-completion command prints a shell completion script for supported shells, and saving the printed scripts to files makes for shells to load completions for each session.

Bash

For Linux:

prqlc shell-completion bash >/etc/bash_completion.d/prqlc

For macOS:

prqlc shell-completion bash >/usr/local/etc/bash_completion.d/prqlc

fish

prqlc shell-completion fish >~/.config/fish/completions/prqlc.fish

PowerShell

mkdir -Path (Split-Path -Parent $profile) -ErrorAction SilentlyContinue
prqlc shell-completion powershell >path/to/prqlc.ps1
echo 'Invoke-Expression -Command path/to/prqlc.ps1' >>$profile

zsh

prqlc shell-completion zsh >"${fpath[1]}/_prqlc"

Ensure that the following lines are present in ~/.zshrc:

autoload -U compinit
compinit -i

Helpers

Cheat sheets for prqlc are available on various websites and with various tools.

Library

For more usage examples and the library documentation, check out the prqlc documentation.

Library installation

cargo add prqlc

Examples

Compile a PRQL string to a SQLite dialect string:

#![allow(unused)]
fn main() {
// In a file src/main.rs

use prqlc::{compile, Options, DisplayOptions, Target, sql::Dialect};

let prql = "from employees | select {name, age}";
let opts = &Options {
    format: false,
    target: Target::Sql(Some(Dialect::SQLite)),
    signature_comment: false,
    display: DisplayOptions::Plain,
    ..Default::default()
};
let sql = compile(&prql, opts).unwrap();
assert_eq!("SELECT name, age FROM employees", sql);
}

ClickHouse

PRQL works natively in ClickHouse. Check out the ClickHouse docs for more details.

Jupyter

pyprql contains pyprql.magic, a thin wrapper of JupySQL’s SQL IPython magics. This allows us to run PRQL interactively on Jupyter/IPython.

Check out https://pyprql.readthedocs.io/ for more context.

Installation

pip install pyprql

Usage

When installing pyprql, the duckdb-engine package is also installed with it, so we can start using PRQL immediately to query CSV and Parquet files.

For example, running the example from the JupySQL documentation on IPython:

In [1]: %load_ext pyprql.magic

In [2]: !curl -sL https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv -o penguins.csv

In [3]: %prql duckdb://

In [4]: %prql from `penguins.csv` | take 3
Out[4]:
  species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g     sex
0  Adelie  Torgersen            39.1           18.7                181         3750    MALE
1  Adelie  Torgersen            39.5           17.4                186         3800  FEMALE
2  Adelie  Torgersen            40.3           18.0                195         3250  FEMALE

In [5]: %%prql
   ...: from `penguins.csv`
   ...: filter bill_length_mm > 40
   ...: take 3
   ...:
   ...:
Out[5]:
  species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g     sex
0  Adelie  Torgersen            40.3           18.0                195         3250  FEMALE
1  Adelie  Torgersen            42.0           20.2                190         4250    None
2  Adelie  Torgersen            41.1           17.6                182         3200  FEMALE

DuckDB

There’s a DuckDB community extension by @ywelsch at the DuckDB Community Extension Repository.

INSTALL prql FROM community;
LOAD prql;
-- Once the extension is loaded, you can write PRQL queries
from (read_csv 'https://raw.githubusercontent.com/PRQL/prql/0.8.0/prql-compiler/tests/integration/data/chinook/invoices.csv')
filter invoice_date >= @2009-02-01
take 5;

Check out the extension’s documentation for more details.

qStudio IDE

qStudio is a SQL GUI that lets you browse tables, run SQL scripts, and chart and export the results. qStudio runs on Windows, macOS and Linux, and works with every popular database including mysql, postgresql, mssql, kdb….

Note

qStudio relies on the PRQL compiler. You must ensure that prqlc is in your path. See the installation instructions for prqlc in the PRQL reference guide for details.

qStudio calls prqlc (the PRQL compiler) to generate SQL code from PRQL queries (.prql files) then runs the SQL against the selected database to display the results. For more details, check out:

Prefect

Because Prefect is in native Python, it’s extremely easy to integrate with PRQL.

With a Postgres Task, replace:

PostgresExecute.run(..., query=sql)

…with…

PostgresExecute.run(..., query=prql_python.compile(prql))

We’re big fans of Prefect, and if there is anything that would make the integration easier, please open an issue.

Visual Studio Code extension

PRQL has a Visual Studio Code extension that compiles a PRQL query in a VS Code editor and displays the resulting SQL code in a second pane on the side. This is very handy for editing, saving, and reusing PRQL queries in VS Code.

To install the VS Code extension, open VS Code and type Ctrl-Shift-P (Cmd-Shift-P on a Mac) and type PRQL. Install the extension as usual.

Repo for the PRQL VS Code extension

Extension on VS Marketplace

PostgreSQL

PL/PRQL is a PostgreSQL extension that lets you write functions with PRQL.

PL/PRQL functions serve as intermediaries, compiling the user’s PRQL code into SQL statements that PostgreSQL executes. The extension is based on the pgrx framework for developing PostgreSQL extensions in Rust. This framework manages the interaction with PostgreSQL’s internal APIs, type conversions, and other function hooks necessary to integrate PRQL with PostgreSQL.

Examples

PL/PRQL functions are defined using the plprql language specifier:

create function match_stats(int) returns table(player text, kd_ratio float) as $$
  from matches
  filter match_id == $1
  group player (
    aggregate {
      total_kills = sum kills,
      total_deaths = sum deaths
    }
  )
  filter total_deaths > 0
  derive kd_ratio = total_kills / total_deaths
  select { player, kd_ratio }
$$ language plprql;

select * from match_stats(1001)

 player  | kd_ratio
---------+----------
 Player1 |    0.625
 Player2 |      1.6
(2 rows)

You can also run PRQL code directly with the prql function which is useful for custom SQL in ORMs:

select prql('from matches | filter player == ''Player1''')
as (id int, match_id int, round int, player text, kills int, deaths int)
limit 2;

 id | match_id | round | player  | kills | deaths
----+----------+-------+---------+-------+--------
  1 |     1001 |     1 | Player1 |     4 |      1
  3 |     1001 |     2 | Player1 |     1 |      7
(2 rows)

-- Same as above without the need for the static types, but returns cursor
select prql('from matches | filter player == ''Player1''', 'player1_cursor');
fetch 2 from player1_cursor;

Getting Started

For installation instructions and more information on the extension, see the PL/PRQL repository.

Databend

Databend natively supports PRQL. For more details see the databend docs.

Rill

PRQL has had some work to integrate with Rill. See the Rill issues for more details.

Grammars / syntax highlighting

PRQL contains multiple grammar definitions to enable tools to highlight PRQL code. These are all intended to provide as good an experience as the grammar supports. Please raise any shortcomings in a GitHub issue.

The definitions are somewhat scattered around the codebase; this page serves as an index.


Since the Elm language coincidentally provides syntax highlighting suitable for PRQL, it may look better to mark PRQL code as Elm when the above definition files are not available.

For example, the following Markdown code block will be nicely highlighted on GitHub, Pandoc, and other Markdown renderers:

```elm
from employees
filter start_date > @2021-01-01
```

We hope that in the future these renderers will recognize PRQL code blocks and have syntax highlighting applied, and we are tracking these with several issues.

Contributing

If you’re interested in joining the community to build a better SQL, here are ways to start:

PRQL is evolving from a project with lots of excitement into a project that folks are using in their work and integrating into their tools. We’re actively looking for collaborators to lead that growth with us.

Areas for larger contributions

Compiler

The compiler is written in Rust, and there’s enough to do such that any level of experience with Rust is sufficient.

We try to keep a few onboarding issues on hand under the “good first issue” label. These have been screened to have sufficient context to get started (and we very much welcome questions where there’s some context missing).

To get started, check out the docs on Development and the Compiler architecture

And if you have questions, there are lots of friendly people on the Discord who will patiently help you.

Bindings & integrations

For PRQL to be successful, it needs to be available for the languages & tools that people already use.

  • We currently have bindings to the PRQL compiler in a few different languages; many of these can be improved, documented, and packaged in a better way.
  • If you have experience with packaging in an ecosystem that doesn’t currently have bindings, then creating PRQL bindings for that language we don’t currently support would be valuable to the project.
  • If there’s a tool that you use yourself to run data queries which you think would benefit from a PRQL integration, suggest one to us or the tool. If it’s open-source, build & share a prototype.

Relevant issues are labeled Integrations.

Language design

We decide on new language features in GitHub issues, usually under “language design” label.

You can also contribute by:

  • Finding instances where the compiler produces incorrect results, and post a bug report — feel free to use the playground.
  • Opening an issue / append to an existing issue with examples of queries that are difficult to express in PRQL — especially if more difficult than SQL.

With sufficient examples, suggest a change to the language! (Though suggestions without examples are difficult to engage with, so please do anchor suggestions in examples.)

Marketing

  • Improve our website. We have a few issues open on this front and are looking for anyone with at least some design skills.
  • Contribute towards the docs. Anything from shaping a whole section of the docs, to simply improving a confusing paragraph or fixing a typo.
  • Tell people about PRQL.
  • Find a group of users who would be interested in PRQL, help them get up to speed, help the project understand what they need.

Core team

If you have any questions or feedback and don’t receive a response on one of the general channels such as GitHub or Discord, feel free to reach out to:

Core team Emeritus

Thank you to those who have previously served on the core team:

  • @charlie-sanders — Charlie Sanders

Development

Setting up an initial dev environment

We can set up a local development environment sufficient for navigating, editing, and testing PRQL’s compiler code in two minutes:

  • Install rustup & cargo.

  • [Optional but highly recommended] Install cargo-insta, our testing framework:

    cargo install cargo-insta
    
  • That’s it! Running the unit tests for the prqlc crate after cloning the repo should complete successfully:

    cargo test --package prqlc --lib
    

    …or, to run tests and update the test snapshots:

    cargo insta test --accept --package prqlc --lib
    

    There’s more context on our tests in How we test below.

That’s sufficient for making an initial contribution to the compiler.


Setting up a full dev environment

Info

We really care about this process being easy, both because the project benefits from more contributors like you, and to reciprocate your future contribution. If something isn’t easy, please let us know in a GitHub Issue. We’ll enthusiastically help you, and use your feedback to improve the scripts & instructions.

For more advanced development; for example compiling for wasm or previewing the website, we have two options:

Option 1: Use the project’s task

Note

This is tested on macOS, should work on amd64 Linux, but won’t work on others (include Windows), since it relies on brew.

  • Install Task.

  • Then run the setup-dev task. This runs commands from our Taskfile.yaml, installing dependencies with cargo, brew, npm & pip, and suggests some VS Code extensions.

    task setup-dev
    

Option 2: Install tools individually

  • We’ll need cargo-insta, to update snapshot tests:

    cargo install cargo-insta
    
  • We’ll need Python, which most systems will have already. The easiest way to check is to try running the full tests:

    cargo test
    

    …and if that doesn’t complete successfully, ensure we have Python >= 3.7, to compile prqlc-python.

  • For more involved contributions, such as building the website, playground, book, or some release artifacts, we’ll need some additional tools. But we won’t need those immediately, and the error messages on what’s missing should be clear when we attempt those things. When we hit them, the Taskfile.yaml will be a good source to copy & paste instructions from.

Option 3: Use a Dev Container

This project has a devcontainer.json file and a pre-built dev container base Docker image. Learn more about Dev Containers at https://containers.dev/

Currently, the tools for Rust are already installed in the pre-built image, and, Node.js, Python and others are configured to be installed when build the container.

While there are a variety of tools that support Dev Containers, the focus here is on developing with VS Code in a container by GitHub Codespaces or VS Code Dev Containers extension.

To use a Dev Container on a local computer with VS Code, install the VS Code Dev Containers extension and its system requirements. Then refer to the links above to get started.

Option 4: Use nix development environment

Note

This is used by a member of the core team on Linux, but doesn’t currently work on Mac. We’re open to contributions to improve support.

A nix flake flake.nix provides 3 development environments:

  • default, for building the compiler
  • web, for the compiler and the website,
  • full, for the compiler, the website and the compiler bindings.

To load the shell:

  1. Install nix (the package manager). (only first time)

  2. Enable flakes, which are a (pretty stable) experimental feature of nix. (only first time)

    For non-NixOS users:

    mkdir -p ~/.config/nix/
    tee 'experimental-features = nix-command flakes' >> ~/.config/nix/nix.conf
    

    For NixOS users, follow instructions here.

  3. Run:

    nix develop
    

    To use the “web” or “full” shell, run:

    nix develop .#web
    

Optionally, you can install direnv, to automatically load the shell when you enter this repo. The easiest way is to also install direnv-nix and configure your .envrc with:

# .envrc
use flake .#full

Contribution workflow

We’re similar to most projects on GitHub — open a Pull Request with a suggested change!

Commits

  • If a change is user-facing, please add a line in CHANGELOG.md, with {message}, ({@contributor, #X}) where X is the PR number.
    • If there’s a missing entry, a follow-up PR containing just the changelog entry is welcome.
  • We’re using Conventional Commits message format, enforced through action-semantic-pull-request.

Merges

  • We merge any code that makes PRQL better
  • A PR doesn’t need to be perfect to be merged; it doesn’t need to solve a big problem. It needs to:
    • be in the right direction,
    • make incremental progress,
    • be explicit on its current state, so others can continue the progress.
  • That said, there are a few instances when we need to ensure we have some consensus before merging code — for example non-trivial changes to the language, or large refactorings to the library.
  • If you have merge permissions, and are reasonably confident that a PR is suitable to merge (whether or not you’re the author), feel free to merge.
    • If you don’t have merge permissions and have authored a few PRs, ask and ye shall receive.
  • The primary way we ratchet the code quality is through automated tests.
    • This means PRs almost always need a test to demonstrate incremental progress.
    • If a change breaks functionality without breaking tests, our tests were probably insufficient.
    • If a change breaks existing tests (for example, changing an external API), that indicates we should be careful about merging a change, including soliciting others’ views.
  • We use PR reviews to give general context, offer specific assistance, and collaborate on larger decisions.
    • Reviews around ‘nits’ like code formatting / idioms / etc are very welcome. But the norm is for them to be received as helpful advice, rather than as mandatory tasks to complete. Adding automated tests & lints to automate these suggestions is welcome.
    • If you have merge permissions and would like a PR to be reviewed before it merges, that’s great — ask or assign a reviewer.
    • If a PR hasn’t received attention after a day, please feel free to ping the pull request.
  • People may review a PR after it’s merged. As part of the understanding that we can merge quickly, contributors are expected to incorporate substantive feedback into a future PR.
  • We should revert quickly if the impact of a PR turns out not to be consistent with our expectations, or there isn’t as much consensus on a decision as we had hoped. It’s very easy to revert code and then re-revert when we’ve resolved the issue; it’s a sign of moving quickly. Other options which resolve issues immediately are also fine, such as commenting out an incorrect test or adding a quick fix for the underlying issue.

Docs

We’re very keen on contributions to improve our documentation.

This includes our docs in the book, on the website, in our code, or in a Readme. We also appreciate issues pointing out that our documentation was confusing, incorrect, or stale — if it’s confusing for you, it’s probably confusing for others.

Some principles for ensuring our docs remain maintainable:

  • Docs should be as close as possible to the code. Doctests are ideal on this dimension — they’re literally very close to the code and they can’t drift apart since they’re tested on every commit. Or, for example, it’s better to add text to a --help message, rather than write a paragraph in the Readme explaining the CLI.
  • We should have some visualization of how to maintain docs when we add them. Docs have a habit of falling out of date — the folks reading them are often different from those writing them, they’re sparse from the code, generally not possible to test, and are rarely the by-product of other contributions. Docs that are concise & specific are easier to maintain.
  • Docs should be specifically relevant to PRQL; anything else we can instead link to.

If something doesn’t fit into one of these categories, there are still lots of ways of getting the word out there — a blog post / gist / etc. Let us know and we’re happy to link to it / tweet it.

How we test

We use a pyramid of tests — we have fast, focused tests at the bottom of the pyramid, which give us low latency feedback when developing, and then slower, broader tests which ensure that we don’t miss anything as PRQL develops1.

Info

If you’re making your first contribution, you don’t need to engage with all this — it’s fine to just make a change and push the results; the tests that run in GitHub will point you towards any errors, which can be then be run locally if needed. We’re always around to help out.

Our tests, from the bottom of the pyramid to the top:

  • Static checks — we run a few static checks to ensure the code stays healthy and consistent. They’re defined in .pre-commit-config.yaml, using pre-commit. They can be run locally with

    task test-lint
    # or
    pre-commit run -a
    

    The tests fix most of the issues they find themselves. Most of them also run on GitHub on every commit; any changes they make are added onto the branch automatically in an additional commit.

    • Checking by MegaLinter, which includes more Linters, is also done automatically on GitHub. (experimental)
  • Unit tests & inline insta snapshots — we rely on unit tests to rapidly check that our code basically works. We extensively use Insta, a snapshot testing tool which writes out the values generated by our code, making it fast & simple to write and modify tests2

    These are the fastest tests which run our code; they’re designed to run on every save while you’re developing. We include a task which does this:

    task test-rust-fast
    # or
    cargo insta test --accept --package prqlc --lib
    # or, to run on every change:
    task -w test-rust-fast
    
  • Documentation — we compile all examples from our documentation in the Website, README, and PRQL Book, to test that they produce the SQL we expect, and that changes to our code don’t cause any unexpected regressions. These are included in:

    cargo insta test --accept
    
  • Database integration tests — we run tests with example queries against databases with actual data to ensure we’re producing correct SQL across our supported dialects. The in-process tests can be run locally with:

    task test-rust
    # or
    cargo insta test --accept --features=default,test-dbs
    

    More details on running with external databases are in the Readme.

Note

Integration tests use DuckDB, and so require a clang compiler to compile duckdb-rs. Most development systems will have one, but if the test command fails, install a clang compiler with:

  • On macOS, install xcode with xcode-select --install
  • On Debian Linux, apt-get update && apt-get install clang
  • On Windows, duckdb-rs isn’t supported, so these tests are excluded
  • GitHub Actions on every commit — we run tests relevant to a PR’s changes in CI — for example changes to docs will attempt to build docs, changes to a binding will run that binding’s tests. The vast majority of changes trigger tests which run in less than five minutes, and we should be reassessing their scope if they take longer than that. Once these pass, a pull request can be merged.

  • GitHub Actions on merge — we run a wider set tests on every merge to main. This includes testing across OSs, all our language bindings, a measure of test code coverage, and some performance benchmarks.

    If these tests fail after merging, we should revert the commit before fixing the test and then re-reverting.

    Most of these will run locally with:

    task test-all
    
  • GitHub Actions nightly — every night, we run tests that take longer, are less likely to fail, or are unrelated to code changes — such as security checks, bindings’ tests on multiple OSs, or expensive timing benchmarks.

    We can run these tests before a merge by adding a label pr-nightly to the PR.

The goal of our tests is to allow us to make changes quickly. If they’re making it more difficult to make changes, or there are missing tests that would offer the confidence to make changes faster, please raise an issue.


Website

The website is published together with the book and the playground, and is automatically built and released on any push to the web branch.

The web branch points to the latest release plus any website-specific fixes. That way, the compiler behavior in the playground matches the latest release while allowing us to fix mistakes in the docs with a tighter loop than every release.

Fixes to the playground, book, or website should have a pr-backport-web label added to their PR — a bot will then open & merge another PR onto the web branch once the initial branch merges.

The website components will run locally with:

# Run the main website
task web:run-website
# Run the PRQL online book
task web:run-book
# Run the PRQL playground
task web:run-playground

Bindings

We have a number of language bindings, as documented at https://prql-lang.org/book/project/bindings/index.html. Some of these are within our monorepo, some are in separate repos. Here’s a provisional framework for when we use the main prql repo vs separate repos for bindings:

FactorRationaleExample
Does someone want to sign up to maintain a repo?A different repo is harder for the core team to maintaintree-sitter-prql is well maintained
Can it change independently from the compiler?If it’s in a different repo, it can’t be changed in lockstep with the compilerprql-vscode is fine to change “behind” the language
Would a separate repo invite new contributors?A monorepo with all the rust code can be less inviting for those familiar with other langsprql-vscode had some JS-only contributors
Is there an convention for a stand-alone repo?A small number of ecosystems require a separate repohomebrew-prql needs to be named that way for a Homebrew tap

Releasing

Currently we release in a semi-automated way:

  1. PR & merge an updated Changelog. GitHub will produce a draft version at https://github.com/PRQL/prql/releases/new, including “New Contributors”.

    Use this script to generate a line introducing the enumerated changes:

    echo "It has $(git rev-list --count $(git rev-list --tags --max-count=1)..) commits from $(git shortlog --summary $(git rev-list --tags --max-count=1).. | wc -l | tr -d '[:space:]') contributors. Selected changes:"
    
  2. If the current version is correct, then skip ahead. But if the version needs to be changed — for example, we had planned on a patch release, but instead require a minor release — then run cargo release version $version -x && cargo release replace -x && task test-rust to bump the version, and PR the resulting commit.

  3. After merging, go to Draft a new release3, copy the changelog entry into the release description4, enter the tag to be created, and hit “Publish”.

  4. From there, both the tag and release is created and all packages are published automatically based on our release workflow.

  5. Run cargo release version patch -x --no-confirm && cargo release replace -x --no-confirm && task test-rust to bump the versions and add a new Changelog section; then PR the resulting commit. Note this currently contains task test-rust to update snapshot tests which contain the version.

  6. Check whether there are milestones that need to be pushed out.

  7. Review the Current Status on the README.md to ensure it reflects the project state.

We may make this more automated in future; e.g. automatic changelog creation.


1: Our approach is very consistent with @matklad’s advice, in his excellent blog post How to Test.

2: Here’s an example of an insta test — note that only the initial line of each test is written by us; the remainder is filled in by insta.

3: Only maintainers have access to this page.

4: Unfortunately GitHub’s markdown parser interprets linebreaks as newlines. I haven’t found a better way of editing the markdown to look reasonable than manually editing the text or asking LLM to help.

Language design

In a way PRQL is just a transpiler to SQL. This can cause its language design to gravitate toward thinking about PRQL features in terms of how they translate to SQL.

PRQL feature -> SQL feature -> relational result

This is flawed because:

  • it does not model interactions between features well,
  • SQL behavior can sometimes be misleading (the order of a subquery will not persist in the parent query) or even differs between dialects (set operations).

Instead, we should think of PRQL features in terms of how they affect PRQL expressions, which in most cases means how they affect relations.

PRQL feature -> relation
                   |
                   v
PRQL feature -> relation
                   |
                   v
PRQL feature -> relation
                   |
                   v
            relational result

Thinking about SQL comes in only at the last step when relation (or rather relational expression) is translated to an SQL expression.