Introduction

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement. Like SQL, it’s readable, explicit and declarative. Unlike SQL, it forms a logical pipeline of transformations, and supports abstractions such as variables and functions. It can be used with any database that uses SQL, since it transpiles to SQL.

Let’s get started with an example:

PRQL

from employees
filter start_date > @2021-01-01               # Clear date syntax
derive [                                      # `derive` adds columns / variables
  gross_salary = salary + (tax ?? 0),         # Terse coalesce
  gross_cost = gross_salary + benefits_cost,  # Variables can use other variables
]
filter gross_cost > 0
group [title, country] (                      # `group` runs a pipeline over each group
  aggregate [                                 # `aggregate` reduces each group to a value
    average gross_salary,
    sum_gross_cost = sum gross_cost,          # `=` sets a column name
  ]
)
filter sum_gross_cost > 100_000               # `filter` replaces both of SQL's `WHERE` & `HAVING`
derive id = f"{title}_{country}"              # F-strings like Python
derive country_code = s"LEFT(country, 2)"     # S-strings allow using SQL as an escape hatch
sort [sum_gross_cost, -country]               # `-country` means descending order
take 1..20                                    # Range expressions (also valid here as `take 20`)

SQL

WITH table_3 AS (
  SELECT
    title,
    country,
    salary + COALESCE(tax, 0) + benefits_cost AS _expr_1,
    salary + COALESCE(tax, 0) AS _expr_2
  FROM
    employees
  WHERE
    start_date > DATE '2021-01-01'
),
table_1 AS (
  SELECT
    title,
    country,
    AVG(_expr_2) AS _expr_0,
    SUM(_expr_1) AS sum_gross_cost
  FROM
    table_3 AS table_2
  WHERE
    _expr_1 > 0
  GROUP BY
    title,
    country
)
SELECT
  title,
  country,
  _expr_0,
  sum_gross_cost,
  CONCAT(title, '_', country) AS id,
  LEFT(country, 2) AS country_code
FROM
  table_1 AS table_0
WHERE
  sum_gross_cost > 100000
ORDER BY
  sum_gross_cost,
  country DESC
LIMIT
  20

As you can see, PRQL is a linear pipeline of transformations — each line of the query is a transformation of the previous line’s result.

You can see that in SQL, operations do not follow one another, which makes it hard to compose larger queries.

Queries

PRQL queries are composed of a sequence of transforms that form a pipeline that modifies data as it is passed from one to the next.

The major items of a query are listed at the left:

Pipelines

PRQL queries are a sequence of lines (or transforms) that form a pipeline. Each line transforms the data, and passes its result to the next.

The simplest pipeline is just:

PRQL

from employees

SQL

SELECT
  *
FROM
  employees

Adding transforms

As we add additional lines, each one transforms the result:

PRQL

from employees
derive gross_salary = (salary + payroll_tax)

SQL

SELECT
  *,
  salary + payroll_tax AS gross_salary
FROM
  employees

…and so on:

from employees
derive gross_salary = (salary + payroll_tax)
sort gross_salary

Compiling to SQL

PRQL compiles the query to SQL. The PRQL compiler tries to represent as many transforms as possible with a single SELECT statement. When necessary, the compiler “overflows” and creates CTEs (common table expressions):

PRQL

from e = employees
derive gross_salary = (salary + payroll_tax)
sort gross_salary
take 10
join d = department [==dept_no]
select [e.name, gross_salary, d.name]

SQL

WITH table_1 AS (
  SELECT
    name,
    salary + payroll_tax AS gross_salary,
    dept_no
  FROM
    employees AS e
  ORDER BY
    gross_salary
  LIMIT
    10
)
SELECT
  table_0.name,
  table_0.gross_salary,
  d.name
FROM
  table_1 AS table_0
  JOIN department AS d ON table_0.dept_no = d.dept_no

See also

Functions

Functions are a fundamental abstraction in PRQL — they allow us to run code in many places that we’ve written once. This reduces the number of errors in our code, makes our code more readable, and simplifies making changes.

Functions have two types of parameters:

  1. Positional parameters, which require an argument.
  2. Named parameters, which optionally take an argument, otherwise using their default value.

So this function is named fahrenheit_to_celsius and has one parameter temp:

PRQL

func fahrenheit_to_celsius temp -> (temp - 32) / 1.8

from cities
derive temp_c = (fahrenheit_to_celsius temp_f)

SQL

SELECT
  *,
  (temp_f - 32) / 1.8 AS temp_c
FROM
  cities

This function is named interp, and has two positional parameters named high and x, and one named parameter named low which takes a default argument of 0. It calculates the proportion of the distance that x is between low and high.

PRQL

func interp low:0 high x -> (x - low) / (high - low)

from students
derive [
  sat_proportion_1 = (interp 1600 sat_score),
  sat_proportion_2 = (interp low:0 1600 sat_score),
]

SQL

SELECT
  *,
  (sat_score - 0) / 1600 AS sat_proportion_1,
  (sat_score - 0) / 1600 AS sat_proportion_2
FROM
  students

Piping

Consistent with the principles of PRQL, it’s possible to pipe values into functions, which makes composing many functions more readable. When piping a value into a function, the value is passed as an argument to the final positional parameter of the function. Here’s the same result as the examples above with an alternative construction:

PRQL

func interp low:0 high x -> (x - low) / (high - low)

from students
derive [
  sat_proportion_1 = (sat_score | interp 1600),
  sat_proportion_2 = (sat_score | interp low:0 1600),
]

SQL

SELECT
  *,
  (sat_score - 0) / 1600 AS sat_proportion_1,
  (sat_score - 0) / 1600 AS sat_proportion_2
FROM
  students

and

PRQL

func fahrenheit_to_celsius temp -> (temp - 32) / 1.8

from cities
derive temp_c = (temp_f | fahrenheit_to_celsius)

SQL

SELECT
  *,
  (temp_f - 32) / 1.8 AS temp_c
FROM
  cities

We can combine a chain of functions, which makes logic more readable:

PRQL

func fahrenheit_to_celsius temp -> (temp - 32) / 1.8
func interp low:0 high x -> (x - low) / (high - low)

from kettles
derive boiling_proportion = (temp_c | fahrenheit_to_celsius | interp 100)

SQL

SELECT
  *,
  ((temp_c - 32) / 1.8 - 0) / 100 AS boiling_proportion
FROM
  kettles

Scope

Late binding

Functions can binding to any variables in scope when the function is executed. For example, here cost_total refers to the column that’s introduced in the from.

PRQL

func cost_share cost -> cost / cost_total

from costs
select [materials, labor, overhead, cost_total]
derive [
  materials_share = (cost_share materials),
  labor_share = (cost_share labor),
  overhead_share = (cost_share overhead),
]

SQL

SELECT
  materials,
  labor,
  overhead,
  cost_total,
  materials / cost_total AS materials_share,
  labor / cost_total AS labor_share,
  overhead / cost_total AS overhead_share
FROM
  costs

Variables

We can define a relation — similar to a CTE in SQL — as a variable with let:

PRQL

let top_50 = (
  from employees
  sort salary
  take 50
  aggregate [total_salary = sum salary]
)

from top_50      # Starts a new pipeline

SQL

WITH table_1 AS (
  SELECT
    salary
  FROM
    employees
  ORDER BY
    salary
  LIMIT
    50
), top_50 AS (
  SELECT
    SUM(salary) AS total_salary
  FROM
    table_1 AS table_0
)
SELECT
  total_salary
FROM
  top_50

We can even place a whole CTE in an s-string, enabling us to use features which PRQL doesn’t yet support.

PRQL

let grouping = s"""
  SELECT SUM(a)
  FROM tbl
  GROUP BY
    GROUPING SETS
    ((b, c, d), (d), (b, d))
"""

from grouping

SQL

WITH table_0 AS (
  SELECT
    SUM(a)
  FROM
    tbl
  GROUP BY
    GROUPING SETS ((b, c, d), (d), (b, d))
),
grouping AS (
  SELECT
    *
  FROM
    table_0 AS table_1
)
SELECT
  *
FROM
  grouping

Info

In PRQL tables are far less common than CTEs are in SQL, since a linear series of CTEs can be represented with a single pipeline.

Currently defining variables with let is restricted to relations. We’d like to extend this to expressions that evaluate to scalars.

Transforms

PRQL queries are a pipeline of transformations (“transforms”), where each transform takes the previous result and adjusts it in some way, before passing it onto to the next transform.

Because PRQL focuses on modularity, we have far fewer transforms than SQL, each one fulfilling a specific purpose. That’s often referred to as “orthogonality”.

These are the currently available transforms:

Aggregate

Summarizes many rows into one row.

When applied:

  • without group, it produces one row from the whole table,
  • within a group pipeline, it produces one row from each group.
aggregate [{expression or assign operations}]

Note

Currently, all declared aggregation functions are min, max, count, average, stddev, avg, sum and count_distinct. We are in the process of filling out std lib.

Examples

PRQL

from employees
aggregate [
  average salary,
  ct = count
]

SQL

SELECT
  AVG(salary),
  COUNT(*) AS ct
FROM
  employees

PRQL

from employees
group [title, country] (
  aggregate [
    average salary,
    ct = count
  ]
)

SQL

SELECT
  title,
  country,
  AVG(salary),
  COUNT(*) AS ct
FROM
  employees
GROUP BY
  title,
  country

Aggregate is required

Unlike in SQL, using an aggregation function in derive or select (or any other transform except aggregate) will not trigger aggregation. By default, PRQL will interpret such attempts functions as window functions:

PRQL

from employees
derive [avg_sal = average salary]

SQL

SELECT
  *,
  AVG(salary) OVER () AS avg_sal
FROM
  employees

This ensures that derive does not manipulate the number of rows, but only ever adds a column. For more information, see window transform.

Append

Concatenates two tables together.

Equivalent to UNION ALL in SQL. The number of rows is always the sum of the number of rows from the two input tables. To replicate UNION DISTINCT, see set operations.

PRQL

from employees_1
append employees_2

SQL

SELECT
  *
FROM
  employees_1
UNION
ALL
SELECT
  *
FROM
  employees_2

Remove

experimental

Removes rows that appear in another relation, like EXCEPT ALL. Duplicate rows are removed one-for-one.

PRQL

from employees_1
remove employees_2

SQL

SELECT
  *
FROM
  employees_1 AS t
EXCEPT
  ALL
SELECT
  *
FROM
  employees_2 AS b

Intersection

experimental

PRQL

from employees_1
intersect employees_2

SQL

SELECT
  *
FROM
  employees_1 AS t
INTERSECT
ALL
SELECT
  *
FROM
  employees_2 AS b

Set operations

experimental

To imitate set operations i.e. (UNION, EXCEPT and INTERSECT), you can use the following functions:

func distinct rel -> (from t = _param.rel | group [t.*] (take 1))
func union `default_db.bottom` top -> (top | append bottom | distinct)
func except `default_db.bottom` top -> (top | distinct | remove bottom)
func intersect_distinct `default_db.bottom` top -> (top | intersect bottom | distinct)

Don’t mind the default_db. and _param., these are compiler implementation detail for now.

Derive

Computes one or more new columns.

derive [
  {name} = {expression},
  # or
  {column},
]

Examples

PRQL

from employees
derive gross_salary = salary + payroll_tax

SQL

SELECT
  *,
  salary + payroll_tax AS gross_salary
FROM
  employees

PRQL

from employees
derive [
  gross_salary = salary + payroll_tax,
  gross_cost = gross_salary + benefits_cost
]

SQL

SELECT
  *,
  salary + payroll_tax AS gross_salary,
  salary + payroll_tax + benefits_cost AS gross_cost
FROM
  employees

Filter

Picks rows based on their values.

filter {boolean_expression}

Examples

PRQL

from employees
filter age > 25

SQL

SELECT
  *
FROM
  employees
WHERE
  age > 25

PRQL

from employees
filter (age > 25 || department != "IT")

SQL

SELECT
  *
FROM
  employees
WHERE
  age > 25
  OR department <> 'IT'

PRQL

from employees
filter (age | in 25..40)

SQL

SELECT
  *
FROM
  employees
WHERE
  age BETWEEN 25 AND 40

From

Specifies a data source.

from {table_reference}

Table names containing schemas or needing to be quoted for other reasons need to be contained within backticks.

Examples

PRQL

from employees

SQL

SELECT
  *
FROM
  employees

To introduce an alias, use an assign expression:

PRQL

from e = employees
select e.first_name

SQL

SELECT
  first_name
FROM
  employees AS e

Group

Partitions the rows into groups and applies a pipeline to each of the groups.

group [{key_columns}] {pipeline}

The partitioning of groups are determined by the key_columns (first argument).

The most conventional use of group is with aggregate:

PRQL

from employees
group [title, country] (
  aggregate [
    average salary,
    ct = count
  ]
)

SQL

SELECT
  title,
  country,
  AVG(salary),
  COUNT(*) AS ct
FROM
  employees
GROUP BY
  title,
  country

In concept, a transform in context of a group does the same transformation to the group as it would to the table — for example finding the employee who joined first across the whole table:

PRQL

from employees
sort join_date
take 1

SQL

SELECT
  *
FROM
  employees
ORDER BY
  join_date
LIMIT
  1

To find the employee who joined first in each department, it’s exactly the same pipeline, but within a group expression:

PRQL

from employees
group role (
  sort join_date  # taken from above
  take 1
)

SQL

WITH table_1 AS (
  SELECT
    *,
    ROW_NUMBER() OVER (
      PARTITION BY role
      ORDER BY
        join_date
    ) AS _expr_0
  FROM
    employees
)
SELECT
  *
FROM
  table_1 AS table_0
WHERE
  _expr_0 <= 1

Join

Adds columns from another table, matching rows based on a condition.

join side:{inner|left|right|full} {table} {[conditions]}

Parameters

  • side decides which rows to include, defaulting to inner.
  • Table reference
  • List of conditions
    • The result of join operation is a cartesian (cross) product of rows from both tables, which is then filtered to match all of these conditions.
    • If name is the same from both tables, it can be expressed with only ==col.

Examples

PRQL

from employees
join side:left positions [employees.id==positions.employee_id]

SQL

SELECT
  employees.*,
  positions.*
FROM
  employees
  LEFT JOIN positions ON employees.id = positions.employee_id

PRQL

from employees
join side:left p=positions [employees.id==p.employee_id]

SQL

SELECT
  employees.*,
  p.*
FROM
  employees
  LEFT JOIN positions AS p ON employees.id = p.employee_id

Self equality operator

If the join conditions are of form left.x == right.x, we can use “self equality operator”:

PRQL

from employees
join positions [==emp_no]

SQL

SELECT
  employees.*,
  positions.*
FROM
  employees
  JOIN positions ON employees.emp_no = positions.emp_no

Select

Picks and computes columns.

select [
  {name} = {expression},
  # or
  {column},
]
# or
select ![{column}]

Examples

PRQL

from employees
select name = f"{first_name} {last_name}"

SQL

SELECT
  CONCAT(first_name, ' ', last_name) AS name
FROM
  employees

PRQL

from employees
select [
  name = f"{first_name} {last_name}",
  age_eoy = dob - @2022-12-31,
]

SQL

SELECT
  CONCAT(first_name, ' ', last_name) AS name,
  dob - DATE '2022-12-31' AS age_eoy
FROM
  employees

PRQL

from employees
select first_name

SQL

SELECT
  first_name
FROM
  employees

PRQL

from e=employees
select [e.first_name, e.last_name]

SQL

SELECT
  first_name,
  last_name
FROM
  employees AS e

Excluding columns

We can use ! to exclude a list of columns. This can operate in two ways:

  • We use SELECT * EXCLUDE / SELECT * EXCEPT for the columns supplied to select ![] in dialects which support it.
  • Otherwise, the columns must have been defined prior in the query (unless all of a table’s columns are excluded); for example in another select or a group transform. In this case, we evaluate and specify the columns that should be included in the output SQL.

Some examples:

PRQL

prql target:sql.bigquery
from tracks
select ![milliseconds,bytes]

SQL

SELECT
  *
EXCEPT
  (milliseconds, bytes)
FROM
  tracks

PRQL

from tracks
select [track_id, title, composer, bytes]
select ![title, composer]

SQL

SELECT
  track_id,
  bytes
FROM
  tracks

PRQL

from artists
derive nick = name
select ![artists.*]

SQL

SELECT
  name AS nick
FROM
  artists

Note

In the final example above, the e representing the table / namespace is no longer available after the select statement. For example, this would raise an error:

from e=employees
select e.first_name
filter e.first_name == "Fred" # Can't find `e.first_name`

To refer to the e.first_name column in subsequent transforms, either refer to it using first_name, or if it requires a different name, assign one in the select statement:

PRQL

from e=employees
select fname = e.first_name
filter fname == "Fred"

SQL

WITH table_1 AS (
  SELECT
    first_name AS fname
  FROM
    employees AS e
)
SELECT
  fname
FROM
  table_1 AS table_0
WHERE
  fname = 'Fred'

Sort

Orders rows based on the values of one or more columns.

sort [{direction}{column}]

Parameters

  • One column or a list of columns to sort by
  • Each column can be prefixed with:
    • +, for ascending order, the default
    • -, for descending order
  • When using prefixes, even a single column needs to be in a list or parentheses. (Otherwise, sort -foo is parsed as a subtraction between sort and foo.)

Examples

PRQL

from employees
sort age

SQL

SELECT
  *
FROM
  employees
ORDER BY
  age

PRQL

from employees
sort [-age]

SQL

SELECT
  *
FROM
  employees
ORDER BY
  age DESC

PRQL

from employees
sort [age, -tenure, +salary]

SQL

SELECT
  *
FROM
  employees
ORDER BY
  age,
  tenure DESC,
  salary

We can also use expressions:

PRQL

from employees
sort [s"substr({first_name}, 2, 5)"]

SQL

WITH table_1 AS (
  SELECT
    *,
    substr(first_name, 2, 5) AS _expr_0
  FROM
    employees
  ORDER BY
    _expr_0
)
SELECT
  *
FROM
  table_1 AS table_0

Notes

Ordering guarantees

Most DBs will persist ordering through most transforms; for example, you can expect this result to be ordered by tenure.

PRQL

from employees
sort tenure
derive name = f"{first_name} {last_name}"

SQL

SELECT
  *,
  CONCAT(first_name, ' ', last_name) AS name
FROM
  employees
ORDER BY
  tenure

But:

  • This is an implementation detail of the DB. If there are instances where this doesn’t hold, please open an issue, and we’ll consider how to manage it.
  • Some transforms which change the existence of rows, such as join or group, won’t persist ordering; for example:

PRQL

from employees
sort tenure
join locations [==employee_id]

SQL

WITH table_1 AS (
  SELECT
    *
  FROM
    employees
  ORDER BY
    tenure
)
SELECT
  table_0.*,
  locations.*
FROM
  table_1 AS table_0
  JOIN locations ON table_0.employee_id = locations.employee_id

See Issue #1363 for more details.

Take

Picks rows based on their position.

take {n|range}

See Ranges for more details on how ranges work.

Examples

PRQL

from employees
take 10

SQL

SELECT
  *
FROM
  employees
LIMIT
  10

PRQL

from orders
sort [-value, created_at]
take 101..110

SQL

SELECT
  *
FROM
  orders
ORDER BY
  value DESC,
  created_at
LIMIT
  10 OFFSET 100

Window

Applies a pipeline to segments of rows, producing one output value for every input value.

window rows:{range} range:{range} expanding:false rolling:0 {pipeline}

For each row, the segment over which the pipeline is applied is determined by one of:

  • rows, which takes a range of rows relative to the current row position.
    • 0 references the current row.
  • range, which takes a range of values relative to current row value.

The bounds of the range are inclusive. If a bound is omitted, the segment will extend until the edge of the table or group.

For ease of use, there are two flags that override rows or range:

  • expanding:true is an alias for rows:..0. A sum using this window is also known as “cumulative sum”.
  • rolling:n is an alias for row:(-n+1)..0, where n is an integer. This will include n last values, including current row. An average using this window is also knows as a Simple Moving Average.

Some examples:

ExpressionMeaning
rows:0..2current row plus two following
rows:-2..0two preceding rows plus current row
rolling:3(same as previous)
rows:-2..4two preceding rows plus current row plus four following rows
rows:..0all rows from the start of the table up to & including current row
expanding:true(same as previous)
rows:0..current row and all following rows until the end of the table
rows:..all rows, which same as not having window at all

Example

PRQL

from employees
group employee_id (
  sort month
  window rolling:12 (
    derive [trail_12_m_comp = sum paycheck]
  )
)

SQL

SELECT
  *,
  SUM(paycheck) OVER (
    PARTITION BY employee_id
    ORDER BY
      month ROWS BETWEEN 11 PRECEDING AND CURRENT ROW
  ) AS trail_12_m_comp
FROM
  employees

PRQL

from orders
sort day
window rows:-3..3 (
  derive [centered_weekly_average = average value]
)
group [order_month] (
  sort day
  window expanding:true (
    derive [monthly_running_total = sum value]
  )
)

SQL

SELECT
  *,
  AVG(value) OVER (
    ORDER BY
      day ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
  ) AS centered_weekly_average,
  SUM(value) OVER (
    PARTITION BY order_month
    ORDER BY
      day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS monthly_running_total
FROM
  orders

Windowing by default

If you use window functions without window transform, they will be applied to the whole table. Unlike in SQL, they will remain window functions and will not trigger aggregation.

PRQL

from employees
sort age
derive rnk = rank

SQL

SELECT
  *,
  RANK() OVER (
    ORDER BY
      age ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS rnk
FROM
  employees
ORDER BY
  age

You can also only apply group:

PRQL

from employees
group department (
  sort age
  derive rnk = rank
)

SQL

SELECT
  *,
  RANK() OVER (
    PARTITION BY department
    ORDER BY
      age ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS rnk
FROM
  employees

Window functions as first class citizens

There is no limitaions where windowed expressions can be used:

PRQL

from employees
filter salary < (average salary)

SQL

WITH table_1 AS (
  SELECT
    *,
    AVG(salary) OVER () AS _expr_0
  FROM
    employees
)
SELECT
  *
FROM
  table_1 AS table_0
WHERE
  salary < _expr_0

Syntax

A summary of PRQL syntax:

SyntaxUsageExample
|Pipelinesfrom employees | select first_name
=Assigns & Aliasesfrom e = employees
derive total = (sum salary)
:Named args & Parametersinterp low:0 1600 sat_score
[]Listsselect [id, amount]
! && || == +, etcExpressions & Operatorsfilter a == b + c || d >= e
()Parenthesesderive celsius = (fahrenheit - 32) / 1.8
'', ""Stringsderive name = 'Mary'
` `Quoted identifiersselect `first name`
#Comments# A comment
@Dates & Times@2021-01-01
==Self-equality in joinjoin s=salaries [==id]
->Function definitionsfunc add a b -> a + b
=>Case statementcase [a==1 => c, a==2 => d ]
+/-Sort ordersort [-amount, +date]
??Coalesceamount ?? 0

Pipes

Pipes — the connection between transforms that make up a pipeline — can be either line breaks or a pipe character (|).

In almost all situations, line-breaks pipe the result of a line’s transform into the transform on the following line. For example, the filter transform operates on the result of from employees (which is just the employees table), and the select transform operates on the result of the filter transform.

PRQL

from employees
filter department == "Product"
select [first_name, last_name]

SQL

SELECT
  first_name,
  last_name
FROM
  employees
WHERE
  department = 'Product'

In the place of a line-break, it’s also possible to use the | character to pipe results, such that this is equivalent:

PRQL

from employees | filter department == "Product" | select [first_name, last_name]

SQL

SELECT
  first_name,
  last_name
FROM
  employees
WHERE
  department = 'Product'

A line-break doesn’t create a pipeline in a couple of cases:

  • within a list (e.g. the derive examples below),
  • when the following line is a new statement, which starts with a keyword of func, let or from.

Lists

Lists are represented with [], and can span multiple lines. A final trailing comma is optional.

PRQL

from numbers
derive [x = 1, y = 2]
derive [
  a = x,
  b = y
]
derive [
  c = a,
  d = b,
]

SQL

SELECT
  *,
  1 AS x,
  2 AS y,
  1 AS a,
  2 AS b,
  1 AS c,
  2 AS d
FROM
  numbers

Most transforms can take either a list or a single item, so these are equivalent:

PRQL

from employees
select [first_name]

SQL

SELECT
  first_name
FROM
  employees

PRQL

from employees
select first_name

SQL

SELECT
  first_name
FROM
  employees

Expressions and operators

PRQL allows expressions, like 2 + 3 or ((1 + x) * y) made up of various operators. In the example below, note the use of expressions to calculate the alias circumference and in the filter transform.

PRQL

from foo
select [
  circumference = diameter * 3.14159,
  color,
]
filter circumference > 10 && color != "red"

SQL

WITH table_1 AS (
  SELECT
    diameter * 3.14159 AS circumference,
    color
  FROM
    foo
)
SELECT
  circumference,
  color
FROM
  table_1 AS table_0
WHERE
  circumference > 10
  AND color <> 'red'

Operator precedence

This table shows operator precedence. Use parentheses () to prioritize operations and for function calls (see the discussion below.)

GroupOperatorsPrecedenceAssociativity
parentheses()0see below
identifier dot.1
unary- + ! ==2
range..3
mul* / %4left-to-right
add+ -5left-to-right
compare== != <= >= < >6left-to-right
coalesce??7left-to-right
and&&8left-to-right
or||9left-to-right
function call10

Parentheses

PRQL uses parentheses () for several purposes:

  • Parentheses group operands to control the order of evaluation, for example: ((1 + x) * y)

  • Parentheses delimit an inner transform for the group () and window () transforms.

  • Parentheses delimit a minus sign of a function argument, for example: add (-1) (-3)

  • Parentheses delimit nested function calls that contain a pipe, either the | symbol or a new line. “Nested” means within a transform; i.e. not just the main pipeline, for example: (column-name | in 0..20)

  • Parentheses wrap a function call that is part of a larger expression on the right-hand side of an assignment, for example: round 0 (sum distance)

  • Parentheses are not required for expressions that do not contain function calls, for example: foo + bar.

Here’s a set of examples of these rules:

PRQL

from employees
# Requires parentheses, because it contains a pipe
derive is_proximate = (distance | in 0..20)
# Requires parentheses, because it's a function call
derive total_distance = (sum distance)
# `??` doesn't require parentheses, as it's not a function call
derive min_capped_distance = (min distance ?? 5)
# No parentheses needed, because no function call
derive travel_time = distance / 40
# No inner parentheses needed around `1+1` because no function call
derive distance_rounded_2_dp = (round 1+1 distance)
derive [
  # Requires parentheses, because it contains a pipe
  is_far = (distance | in 100..),
  # The left value of the range requires parentheses,
  # because of the minus sign
  is_negative = (distance | in (-100..0)),
  # ...this is equivalent
  is_negative = (distance | in (-100)..0),
  # Doesn't require parentheses, because it's in a list (confusing, see footnote)!
  average_distance = average distance,
]
# Requires parentheses because of the minus sign
sort (-distance)
# A list is fine too
sort [-distance]

SQL

SELECT
  *,
  distance BETWEEN 0 AND 20 AS is_proximate,
  SUM(distance) OVER () AS total_distance,
  MIN(COALESCE(distance, 5)) OVER () AS min_capped_distance,
  distance / 40 AS travel_time,
  ROUND(distance, 2) AS distance_rounded_2_dp,
  distance >= 100 AS is_far,
  distance BETWEEN -100 AND 0,
  distance BETWEEN -100 AND 0 AS is_negative,
  AVG(distance) OVER () AS average_distance
FROM
  employees
ORDER BY
  distance DESC

Note: The total_distance statement below generates an error because the function is not in a list. (The PRQL compiler should display a better error message.)

PRQL

from employees
derive total_distance = sum distance # generates the error shown below
derive other_distance = (sum distance) # works as expected

Error

Error:
   ╭─[:2:29]
   │
 2 │ derive total_distance = sum distance # generates the error shown below
   │                             ────┬───
   │                                 ╰───── Unknown name distance
───╯

Note

We’re continuing to think whether these rules can be more intuitive. We’re also planning to make the error messages much better, so the compiler can help out.

Inner Transforms

Parentheses are also used for transforms (such as group and window) that pass their result to an “inner transform”. The example below applies the aggregate pipeline to each group of unique title and country values:

PRQL

from employees
group [title, country] (
  aggregate [
    average salary,
    ct = count
  ]
)

SQL

SELECT
  title,
  country,
  AVG(salary),
  COUNT(*) AS ct
FROM
  employees
GROUP BY
  title,
  country

Comments

Comments are represented by #.

PRQL

from employees  # Comment 1
# Comment 2
aggregate [average salary]

SQL

SELECT
  AVG(salary)
FROM
  employees

There’s no distinct multiline comment syntax.

Quoting identifiers

To use identifiers that are otherwise invalid, surround them with backticks. Depending on the dialect, these will remain as backticks or be converted to double-quotes.

PRQL

prql target:sql.mysql
from employees
select `first name`

SQL

SELECT
  `first name`
FROM
  employees

PRQL

prql target:sql.postgres
from employees
select `first name`

SQL

SELECT
  "first name"
FROM
  employees

PRQL

from `dir/*.parquet`

SQL

SELECT
  *
FROM
  "dir/*.parquet"

BigQuery also uses backticks to surround project & dataset names (even if valid identifiers) in the SELECT statement:

PRQL

prql target:sql.bigquery
from `project-foo.dataset.table`
join `project-bar.dataset.table` [==col_bax]

SQL

SELECT
  `project-foo.dataset.table`.*,
  `project-bar.dataset.table`.*
FROM
  `project-foo.dataset.table`
  JOIN `project-bar.dataset.table` ON `project-foo.dataset.table`.col_bax = `project-bar.dataset.table`.col_bax

Quoting schemas

Note

This is currently not great and we are working on improving it; see https://github.com/PRQL/prql/issues/1535 for progress.

If supplying a schema without a column — for example in a from or join transform, that also needs to be a quoted identifier:

PRQL

from `music.albums`

SQL

SELECT
  *
FROM
  music.albums

Parameters

PRQL will retain parameters like $1 in SQL output, which can then be supplied to the SQL query as a prepared query:

PRQL

from employees
filter id == $1

SQL

SELECT
  *
FROM
  employees
WHERE
  id = $1

Numbers

Numbers can contain underscores between numbers; which can make reading large numbers easier:

PRQL

from numbers
select [
    small = 1.000_000_1,
    big = 5_000_000,
]

SQL

SELECT
  1.0000001 AS small,
  5000000 AS big
FROM
  numbers

Keywords

At the moment, PRQL uses only four keywords:

  • prql
  • let
  • func
  • case

To use these names as columns or relations, use backticks: `case`.

It may seem that transforms are also keywords, but they are normal function within std namespace:

PRQL

std.from my_table
std.select [from = my_table.a, take = my_table.b]
std.take 3

SQL

SELECT
  a AS "from",
  b AS take
FROM
  my_table
LIMIT
  3

Language features

The pages of this section describe how PRQL handles various aspects of the language.

Case

Note

case is currently experimental and may change behavior in the near future

Info

case was previously (PRQL 0.4 to 0.5) called switch and renamed to case in PRQL 0.6.0.

PRQL uses case for both SQL’s CASE and IF statements. Here’s an example:

PRQL

from employees
derive distance = case [
  city == "Calgary" => 0,
  city == "Edmonton" => 300,
]

SQL

SELECT
  *,
  CASE
    WHEN city = 'Calgary' THEN 0
    WHEN city = 'Edmonton' THEN 300
    ELSE NULL
  END AS distance
FROM
  employees

If no condition is met, the value takes a null value. To set a default, use a true condition:

PRQL

from employees
derive distance = case [
  city == "Calgary" => 0,
  city == "Edmonton" => 300,
  true => "Unknown",
]

SQL

SELECT
  *,
  CASE
    WHEN city = 'Calgary' THEN 0
    WHEN city = 'Edmonton' THEN 300
    ELSE 'Unknown'
  END AS distance
FROM
  employees

Coalesce

We can coalesce values with an ?? operator. Coalescing takes either the first value or, if that value is null, the second value.

PRQL

from orders
derive amount ?? 0

SQL

SELECT
  *,
  COALESCE(amount, 0)
FROM
  orders

Dates & times

PRQL uses @ followed by a string to represent dates & times. This is less verbose than SQL’s approach of TIMESTAMP '2004-10-19 10:23:54' and more explicit than SQL’s implicit option of just using a string '2004-10-19 10:23:54'.

Note

Currently PRQL passes strings which can be compiled straight through to the database, and so many compatible formats string may work, but we may refine this in the future to aid in compatibility across databases. We’ll always support the canonical ISO8601 format described below.

Dates

Dates are represented by @{yyyy-mm-dd} — a @ followed by the date format.

PRQL

from employees
derive age_at_year_end = (@2022-12-31 - dob)

SQL

SELECT
  *,
  DATE '2022-12-31' - dob AS age_at_year_end
FROM
  employees

Times

Times are represented by @{HH:mm:ss.SSS±Z} with any parts not supplied being rounded to zero, including the timezone, which is represented by +HH:mm, -HH:mm or Z. This is consistent with the ISO8601 time format.

PRQL

from orders
derive should_have_shipped_today = (order_time < @08:30)

SQL

SELECT
  *,
  order_time < TIME '08:30' AS should_have_shipped_today
FROM
  orders

Timestamps

Timestamps are represented by @{yyyy-mm-ddTHH:mm:ss.SSS±Z} / @{date}T{time}, with any time parts not supplied being rounded to zero, including the timezone, which is represented by +HH:mm, -HH:mm or Z (: is optional). This is @ followed by the ISO8601 datetime format, which uses T to separate date & time.

PRQL

from commits
derive first_prql_commit = @2020-01-01T13:19:55-08:00

SQL

SELECT
  *,
  TIMESTAMP '2020-01-01T13:19:55-08:00' AS first_prql_commit
FROM
  commits

Intervals

Intervals are represented by {N}{periods}, such as 2years or 10minutes, without a space.

Note

These aren’t the same as ISO8601, because we evaluated P3Y6M4DT12H30M5S to be difficult to understand, but we could support a simplified form if there’s demand for it. We don’t currently support compound expressions, for example 2years10months, but most DBs will allow 2years + 10months. Please raise an issue if this is inconvenient.

PRQL

from projects
derive first_check_in = start + 10days

SQL

SELECT
  *,
  start + INTERVAL 10 DAY AS first_check_in
FROM
  projects

Examples

Here’s a fuller list of examples:

  • @20221231 is invalid — it must contain full punctuation (- and :),
  • @2022-12-31 is a date
  • @2022-12 or @2022 are invalid — SQL can’t express a month, only a date
  • @16:54:32.123456 is a time
  • @16:54:32, @16:54, @16 are all allowed, expressing @16:54:32.000000, @16:54:00.000000, @16:00:00.000000 respectively
  • @2022-12-31T16:54:32.123456 is a timestamp without timezone
  • @2022-12-31T16:54:32.123456Z is a timestamp in UTC
  • @2022-12-31T16:54+02 is timestamp in UTC+2
  • @2022-12-31T16:54+02:00 and @2022-12-31T16:54+02 are datetimes in UTC+2
  • @16:54+02 is invalid — time is always local, so it cannot have a timezone
  • @2022-12-31+02 is invalid — date is always local, so it cannot have a timezone

Roadmap

Datetimes

Datetimes are supported by some databases (e.g. MySql, BigQuery) in addition to timestamps. When we have type annotations, these will be represented by a timestamp annotated as a datetime:

derive pi_day = @2017-03-14T15:09:26.535898<datetime>

These are some examples we can then add:

  • @2022-12-31T16:54<datetime> is datetime without timezone
  • @2022-12-31<datetime> is forbidden — datetime must specify time
  • @16:54<datetime> is forbidden — datetime must specify date

Distinct

PRQL doesn’t have a specific distinct keyword. Instead, use group and take 1:

PRQL

from employees
select department
group department (
  take 1
)

SQL

SELECT
  DISTINCT department
FROM
  employees

This also works without a linebreak:

PRQL

from employees
select department
group department (take 1)

SQL

SELECT
  DISTINCT department
FROM
  employees

Selecting from each group

We are be able to select a single row from each group by combining group and sort:

PRQL

# youngest employee from each department
from employees
group department (
  sort age
  take 1
)

SQL

WITH table_1 AS (
  SELECT
    *,
    ROW_NUMBER() OVER (
      PARTITION BY department
      ORDER BY
        age
    ) AS _expr_0
  FROM
    employees
)
SELECT
  *
FROM
  table_1 AS table_0
WHERE
  _expr_0 <= 1

Note that we can’t always compile to DISTINCT; when the columns in the group aren’t all the available columns, we need to use a window function:

PRQL

from employees
group [first_name, last_name] (take 1)

SQL

WITH table_1 AS (
  SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY first_name, last_name) AS _expr_0
  FROM
    employees
)
SELECT
  *
FROM
  table_1 AS table_0
WHERE
  _expr_0 <= 1

Roadmap

When using Postgres dialect, we are planning to compile:

# youngest employee from each department
from employees
group department (
  sort age
  take 1
)

… to …

SELECT DISTINCT ON (department) *
FROM employees
ORDER BY department, age

Null handling

SQL has an unconventional way of handling NULL values, since it treats them as unknown values. As a result, in SQL:

  • NULL is not a value indicating a missing entry, but a placeholder for anything possible,
  • NULL = NULL evaluates to NULL, since one cannot know if one unknown is equal to another unknown,
  • NULL <> NULL evaluates to NULL, using same logic,
  • to check if a value is NULL, SQL introduces IS NULL and IS NOT NULL operators,
  • DISTINCT column may return multiple NULL values.

For more information, check out the Postgres documentation.

PRQL, on the other hand, treats null as a value, which means that:

  • null == null evaluates to true,
  • null != null evaluates to false,
  • distinct column cannot contain multiple null values.

PRQL

from employees
filter first_name == null
filter null != last_name

SQL

SELECT
  *
FROM
  employees
WHERE
  first_name IS NULL
  AND last_name IS NOT NULL

Note that PRQL doesn’t change how NULL is compared between columns, for example in joins. (PRQL compiles to SQL and so can’t change the behavior of the database).

For more context or to provide feedback check out the discussion on issue #99.

Ranges

PRQL has a concise range syntax start..end. If only one of start & end are supplied, the range is open on the empty side.

Ranges can be used in filters with the in function, with any type of literal, including dates:

PRQL

from events
filter (created_at | in @1776-07-04..@1787-09-17)
filter (magnitude | in 50..100)
derive is_northern = (latitude | in 0..)

SQL

SELECT
  *,
  latitude >= 0 AS is_northern
FROM
  events
WHERE
  created_at BETWEEN DATE '1776-07-04' AND DATE '1787-09-17'
  AND magnitude BETWEEN 50 AND 100

Like in SQL, ranges are inclusive.

As discussed in the take docs, ranges can also be used in take:

PRQL

from orders
sort [-value, created_at]
take 101..110

SQL

SELECT
  *
FROM
  orders
ORDER BY
  value DESC,
  created_at
LIMIT
  10 OFFSET 100

Note

Half-open ranges are generally less intuitive to read than a simple >= or <= operator.

Roadmap

We’d like to use ranges for other types, such as whether an object is in an array or list literal.

Regex expressions

Note

At this time, PRQL does not have a facility for regular expression (regex) processing. The Milestone for 0.5 talks about general plans for regex’s. The Discussion #1123 provides a workaround function that emulates %LIKE%.

Standard library

The standard library currently contains commonly used functions that are used in SQL. It’s not yet as broad as we’d like, and we’re very open to expanding it.

Currently s-strings are an escape-hatch for any function that isn’t in our standard library. If we find ourselves using them for something frequently, raise an issue and we’ll add it to the stdlib.

Note

Currently the stdlib implementation doesn’t support different DB implementations itself; those need to be built deeper into the compiler. We’ll resolve this at some point. Until then, we’ll only add functions here that are broadly supported by most DBs.

Here’s the source of the current PRQL std:

# The PRQL standard library defines the following functions and transforms.
# The definitions are whitespace insensitive, and have this form:
#
# func my_func <return_type> param1 param2 ...  -> body_expr
# where:
#   * my_func is the name of the function
#   * <return_type> is the type of result wrapped in "< ... >"
#   * param1 is the first parameter optionally followed by a type in "< ... >"
#   * param2 etc. follow the same pattern as param1
#   * -> body_expr defines the function body that creates the result.
#     It can be PRQL code or "null" to indicate internal compiler code

# Aggregate functions
# These return either a scalar when used within `aggregate`, or a column when used anywhere else.
func min <scalar || column> column -> null
func max <scalar || column> column -> null
func sum <scalar || column> column -> null
func avg <scalar || column> column -> null
func stddev <scalar || column> column -> null
func average <scalar || column> column -> null
func count <scalar || column> non_null:s"*" -> null
# TODO: Possibly make this into `count distinct:true` (or like `distinct:` as an
# abbreviation of that?)
func count_distinct <scalar || column> column -> null

# Window functions
func lag<column> offset column -> null
func lead<column> offset column -> null
func first<column> offset column -> null
func last<column> offset column -> null
func rank<column> -> null
func rank_dense<column> -> null
func row_number<column> -> null

# Other functions
func round<scalar> n_digits column -> null
func as<scalar> `noresolve.type` column -> null
func in<bool> pattern value -> null

# Transform type definitions
func from<table> `default_db.source`<table> -> null
func select<table> columns<column> tbl<table> -> null
func filter<table> condition<bool> tbl<table> -> null
func derive<table> columns<column> tbl<table> -> null
func aggregate<table> a<column> tbl<table> -> null
func sort<table> by tbl<table> -> null
func take<table> expr tbl<table> -> null
func join<table> `default_db.with`<table> filter `noresolve.side`:inner tbl<table> -> null
func group<table> by pipeline tbl<table> -> null
func window<table> rows:0..0 range:0..0 expanding:false rolling:0 pipeline tbl<table> -> null

func append<table> `default_db.bottom`<table> top<table> -> null
func intersect<table> `default_db.bottom`<table> top<table> -> (
    from t = _param.top
    join b = _param.bottom (all (map _eq (zip t.* b.*)))
    select t.*
)
func remove<table> `default_db.bottom`<table> top<table> -> (
    from t = _param.top
    join side:left b = _param.bottom (all (map _eq (zip t.* b.*)))
    filter (all (map _is_null b.*))
    select t.*
)
func loop<table> pipeline top<table> -> null

# List functions
func all<bool> list<list> -> null
func map<list> fn list<list> -> null
func zip<list> a<list> b<list> -> null
func _eq<list> a<list> -> null
func _is_null a -> _param.a == null

# Misc
func from_text<table> input<text> `noresolve.format`:csv -> null

# String functions
func lower <text> column -> null
func upper <text> column -> null

# type primitives
type int
type float
type bool
type text
type date
type time
type timestamp
type table
type column
type list
type scalar

# Source-reading functions, primarily for DuckDB
func read_parquet<table> source<text> -> s"SELECT * FROM read_parquet({source})"
func read_csv<table> source<text> -> s"SELECT * FROM read_csv_auto({source})"

And a couple of examples:

PRQL

from employees
derive [
  gross_salary = (salary + payroll_tax | as int),
  gross_salary_rounded = (gross_salary | round 0),
  time = s"NOW()",  # an s-string, given no `now` function exists in PRQL
]

SQL

SELECT
  *,
  CAST(salary + payroll_tax AS int) AS gross_salary,
  ROUND(CAST(salary + payroll_tax AS int), 0) AS gross_salary_rounded,
  NOW() AS time
FROM
  employees

from_text

It’s often useful to make a small table inline, for example when exploring how a database will evaluate an expression, or to have a small lookup table inline. This can be quite verbose in SQL.

PRQL uses from_text for this.

It accepts a few formats:

  • format:csv parses CSV (default),
  • format:json parses either:
    • an array of objects each of which represents a row, or
    • an object with fields columns & data, where columns take an array of column names and data takes an array of arrays.

PRQL

from_text """
a,b,c
1,2,3
4,5,6
"""
derive [
    d = b + c,
    answer = 20 * 2 + 2,
]

SQL

WITH table_0 AS (
  SELECT
    '1' AS a,
    '2' AS b,
    '3' AS c
  UNION
  ALL
  SELECT
    '4' AS a,
    '5' AS b,
    '6' AS c
)
SELECT
  a,
  b,
  c,
  b + c AS d,
  42 AS answer
FROM
  table_0 AS table_1

An example of adding a small lookup table:

PRQL

let temp_format_lookup = from_text format:csv """
country_code,format
uk,C
us,F
lr,F
de,C
"""

from temperatures
join temp_format_lookup [==country_code]

SQL

WITH table_0 AS (
  SELECT
    'uk' AS country_code,
    'C' AS format
  UNION
  ALL
  SELECT
    'us' AS country_code,
    'F' AS format
  UNION
  ALL
  SELECT
    'lr' AS country_code,
    'F' AS format
  UNION
  ALL
  SELECT
    'de' AS country_code,
    'C' AS format
),
temp_format_lookup AS (
  SELECT
    country_code,
    format
  FROM
    table_0 AS table_1
)
SELECT
  temperatures.*,
  temp_format_lookup.country_code,
  temp_format_lookup.format
FROM
  temperatures
  JOIN temp_format_lookup ON temperatures.country_code = temp_format_lookup.country_code

And JSON:

PRQL

let x = from_text format:json """{
    "columns": ["a", "b", "c"],
    "data": [
        [1, "x", false],
        [4, "y", null]
    ]
}"""

let y = from_text format:json """
    [
        {"a": 1, "m": "5"},
        {"a": 4, "n": "6"}
    ]
"""

from x | join y [==a]

SQL

WITH table_0 AS (
  SELECT
    1 AS a,
    'x' AS b,
    false AS c
  UNION
  ALL
  SELECT
    4 AS a,
    'y' AS b,
    NULL AS c
),
x AS (
  SELECT
    a,
    b,
    c
  FROM
    table_0 AS table_1
),
table_2 AS (
  SELECT
    1 AS a,
    '5' AS m
  UNION
  ALL
  SELECT
    4 AS a,
    NULL AS m
),
y AS (
  SELECT
    a,
    m
  FROM
    table_2 AS table_3
)
SELECT
  x.a,
  x.b,
  x.c,
  y.a,
  y.m
FROM
  x
  JOIN y ON x.a = y.a

Loop

Experimental

loop {step_function} {initial_relation}

Iteratively applies step function to initial relation until the step returns an empty table. Returns a relation that contains rows of initial relation and all intermediate relations.

This behavior could be expressed with following pseudo-code:

def loop(step, initial):
    result = []
    current = initial
    while current is not empty:
        result = append(result, current)
        current = step(current)

    return result

Examples

PRQL

from_text format:json '[{"n": 1 }]'
loop (
    filter n<4
    select n = n+1
)

# returns [1, 2, 3, 4]

SQL

WITH table_0 AS (
  SELECT
    1 AS n
),
table_4 AS (
  WITH RECURSIVE _loop AS (
    SELECT
      n
    FROM
      table_0 AS table_1
    UNION
    ALL
    SELECT
      n + 1
    FROM
      _loop AS table_2
    WHERE
      n < 4
  )
  SELECT
    *
  FROM
    _loop
)
SELECT
  n
FROM
  table_4 AS table_3

Note

The behavior of WITH RECURSIVE may depend on the database configuration in MySQL. The compiler assumes the behavior described by the Postgres documentation and will not produce correct results for alternative configurations of MySQL.

Note

Currently, loop may produce references to the recursive CTE in sub-queries, which is not supported by some database engines, e.g. SQLite. For now, we suggest step functions are kept simple enough to fit into a single SELECT statement.

Reading files

We have a couple of functions named read_*, which ask the DB to read files, designed for DuckDB:

PRQL

from (read_parquet 'artists.parquet')
join (read_csv 'albums.csv') [==track_id]

SQL

WITH table_0 AS (
  SELECT
    *
  FROM
    read_parquet('artists.parquet')
),
table_1 AS (
  SELECT
    *
  FROM
    read_csv_auto('albums.csv')
)
SELECT
  table_2.*,
  table_3.*
FROM
  table_0 AS table_2
  JOIN table_1 AS table_3 ON table_2.track_id = table_3.track_id

Note

These don’t currently have all the DuckDB options. If those would be helpful, please log an issue and it’s a fairly easy addition.

Info

We may be able to reduce the boilerplate WITH table_x AS SELECT * FROM... in future versions.

Strings

Strings in PRQL can use either single or double quotes:

PRQL

from my_table
select x = "hello world"

SQL

SELECT
  'hello world' AS x
FROM
  my_table

PRQL

from my_table
select x = 'hello world'

SQL

SELECT
  'hello world' AS x
FROM
  my_table

To quote a string containing quotes, either use the “other” type of quote, or use 3, 4, 5 or 6 quotes, and close with the same number.

PRQL

from my_table
select x = '"hello world"'

SQL

SELECT
  '"hello world"' AS x
FROM
  my_table

PRQL

from my_table
select x = """I said "hello world"!"""

SQL

SELECT
  'I said "hello world"!' AS x
FROM
  my_table

PRQL

from my_table
select x = """""I said """hello world"""!"""""

SQL

SELECT
  'I said """hello world"""!' AS x
FROM
  my_table

Strings can also contain any escape defined by JSON standard.

PRQL

from my_table
select x = "\t\tline ends here\n \\ "

SQL

SELECT
  '		line ends here
 \ ' AS x
FROM
  my_table

F-Strings and S-Strings

These special case strings can be used to:

F-strings - Build up a new string from a set of columns or values

S-strings - Insert SQL statements directly into the query. Use when PRQL doesn’t have an equivalent facility.

Warning

Currently PRQL allows multiline strings with either a single character or multiple character quotes. This may change for strings using a single character quote in future versions.

S-strings

An s-string inserts SQL directly, as an escape hatch when there’s something that PRQL doesn’t yet implement. For example, there’s no version() function in SQL that returns the Postgres version, so if we want to use that, we use an s-string:

PRQL

from my_table
select db_version = s"version()"

SQL

SELECT
  version() AS db_version
FROM
  my_table

Embed a column name in an s-string using braces. For example, PRQL’s standard library defines the average function as:

func average column -> s"AVG({column})"

So this compiles using the function:

PRQL

from employees
aggregate [average salary]

SQL

SELECT
  AVG(salary)
FROM
  employees

Here’s an example of a more involved use of an s-string:

PRQL

from de=dept_emp
join s=salaries side:left [
  (s.emp_no == de.emp_no),
  s"""({s.from_date}, {s.to_date})
  OVERLAPS
  ({de.from_date}, {de.to_date})"""
]

SQL

SELECT
  de.*,
  s.*
FROM
  dept_emp AS de
  LEFT JOIN salaries AS s ON s.emp_no = de.emp_no
  AND (s.from_date, s.to_date) OVERLAPS (de.from_date, de.to_date)

For those who have used Python, s-strings are similar to Python’s f-strings, but the result is SQL code, rather than a string literal. For example, a Python f-string of f"average({col})" would produce "average(salary)", with quotes; while in PRQL, s"average({col})" produces average(salary), without quotes.

Note that interpolations can only contain plain variable names and not whole expression like Python.

We can also use s-strings to produce a full table:

PRQL

from s"SELECT DISTINCT ON first_name, id, age FROM employees ORDER BY age ASC"
join s = s"SELECT * FROM salaries" [==id]

SQL

WITH table_0 AS (
  SELECT
    DISTINCT ON first_name,
    id,
    age
  FROM
    employees
  ORDER BY
    age ASC
),
table_1 AS (
  SELECT
    *
  FROM
    salaries
)
SELECT
  table_2.*,
  table_3.*
FROM
  table_0 AS table_2
  JOIN table_1 AS table_3 ON table_2.id = table_3.id

Note

S-strings in user code are intended as an escape-hatch for an unimplemented feature. If we often need s-strings to express something, that’s a sign we should implement it in PRQL or PRQL’s stdlib.

Braces

To output braces from an s-string, use double braces:

PRQL

from employees
derive [
  has_valid_title = s"regexp_contains(title, '([a-z0-9]*-){{2,}}')"
]

SQL

SELECT
  *,
  regexp_contains(title, '([a-z0-9]*-){2,}') AS has_valid_title
FROM
  employees

Precedence

The PRQL compiler simply places a literal copy of each variable into the resulting string, which means we may get surprising behavior when the variable is has multiple terms and the s-string isn’t parenthesized.

In this toy example, the salary + benefits / 365 gets precedence wrong:

PRQL

from employees
derive [
  gross_salary = salary + benefits,
  daily_rate = s"{gross_salary} / 365"
]

SQL

SELECT
  *,
  salary + benefits AS gross_salary,
  salary + benefits / 365 AS daily_rate
FROM
  employees

Instead, put the denominator {gross_salary} in parentheses:

PRQL

from employees
derive [
  gross_salary = salary + benefits,
  daily_rate = s"({gross_salary}) / 365"
]

SQL

SELECT
  *,
  salary + benefits AS gross_salary,
  (salary + benefits) / 365 AS daily_rate
FROM
  employees

F-strings

F-strings are a readable approach to building new strings from existing strings. Currently PRQL supports this for concatenating strings:

PRQL

from employees
select full_name = f"{first_name} {last_name}"

SQL

SELECT
  CONCAT(first_name, ' ', last_name) AS full_name
FROM
  employees

This can be much easier to read for longer strings, relative to the SQL approach:

PRQL

from web
select url = f"http{tls}://www.{domain}.{tld}/{page}"

SQL

SELECT
  CONCAT(
    'http',
    tls,
    '://www.',
    domain,
    '.',
    tld,
    '/',
    page
  ) AS url
FROM
  web

Note that interpolations can only contain plain variable names and not whole expression like Python.

Roadmap

In the future, f-strings may incorporate string formatting such as datetimes, numbers, and padding. If there’s a feature that would be helpful, please post an issue.

Target & Version

Target dialect

PRQL allows specifying a target dialect at the top of the query, which allows PRQL to compile to a database-specific SQL flavor.

Examples

PRQL

prql target:sql.postgres

from employees
sort age
take 10

SQL

SELECT
  *
FROM
  employees
ORDER BY
  age
LIMIT
  10

PRQL

prql target:sql.mssql

from employees
sort age
take 10

SQL

SELECT
  TOP (10) *
FROM
  employees
ORDER BY
  age

Supported dialects

Note

Note that dialect support is early — most differences are not implemented, and most dialects’ implementations are identical to generic’s. Contributions are very welcome.

  • sql.ansi
  • sql.bigquery
  • sql.clickhouse
  • sql.generic
  • sql.hive
  • sql.mssql
  • sql.mysql
  • sql.postgres
  • sql.sqlite
  • sql.snowflake
  • sql.duckdb

Version

PRQL allows specifying a version of the language in the PRQL header, like:

PRQL

prql version:"0.8.1"

from employees

SQL

SELECT
  *
FROM
  employees

This has two roles, one of which is implemented:

  • The compiler will raise an error if the compiler is older than the query version. This prevents confusing errors when queries use newer features of the language but the compiler hasn’t yet been upgraded.
  • The compiler will compile for the major version of the query. This allows the language to evolve without breaking existing queries, or forcing multiple installations of the compiler. This isn’t yet implemented, but is a gating feature for PRQL 1.0.

Bindings

PRQL has bindings for many languages. These include:

We have three tiers of bindings:

  • Supported
  • Unsupported
  • Nascent

Supported

Supported bindings require:

  • A maintainer.
  • Implementations of the core compile functions.
  • Test coverage for these functions.
  • A published package to the language’s standard package repository.
  • A script in Taskfile.yml to bootstrap a development environment.
  • Any dev tools, such as a linter & formatter, in pre-commit or MegaLinter.

The currently supported bindings are:

Most of these are in the main PRQL repo, and we gate any changes to the compiler’s API on compatible changes to the bindings.

Unsupported

Unsupported bindings work, but don’t fulfil all of the above criteria. We don’t gate changes to the compiler’s API. If they stop working, we’ll demote them to nascent.

Nascent

Nascent bindings are in development, and may not yet fully work.

prql-dotnet

prql-net offers PRQL bindings for .NET bindings as a netstandard2.0 library.

It provides the PrqlCompiler class which contains the ToJson and ToSql static methods.

It’s still at an early stage, and isn’t published to NuGet. Contributions are welcome.

Installation

Make sure that libprql_lib.so (Linux), libprql_lib.dylib (macOS) or libprql_lib.dll (Windows) is in your project’s bin directory together with PrqlCompiler.dll and the rest of your project’s compiled files. I.e. {your_project}/bin/Debug/net7.0/.

The libprql_lib library gets dynamically imported at runtime.

Usage

using Prql.Compiler;

var options = new PrqlCompilerOptions
{
    Format = false,
    SignatureComment = false,
};
var sql = PrqlCompiler.Compile("from employees", options);
Console.WriteLine(sql);

TODO

This is currently at 0.1.0 because we’re waiting to update prql-lib for the latest API. When we’ve done that, we can match the version here with the broader PRQL version.

PRQL

PRQL bindings for Elixir.

Installation

def deps do
  [
    {:prql, "~> 0.1.0"}
  ]
end

Basic Usage

  iex> PRQL.compile("from customers")
      {:ok, "SELECT\n  *\nFROM\n  customers\n\n-- Generated by PRQL compiler version 0.3.1 (https://prql-lang.org)\n"}


  iex> PRQL.compile("from customers\ntake 10", dialect: :mssql)
  {:ok, "SELECT\n  TOP (10) *\nFROM\n  customers\n\n-- Generated by PRQL compiler version 0.3.1 (https://prql-lang.org)\n"}

Development

We are in the early stages of developing Elixir bindings.

We’re using Rustler to provide Rust bindings for prql-compiler.

Currently using the bindings in an Elixir project requires compiling the Rust crate from this repo:

  • Install dependencies with mix deps.get
  • Compile project mix compile
  • Run tests mix test

Future work includes publishing pre-compiled artifacts, so Elixir projects can run PRQL without needing a Rust toolchain.

Mac

We currently don’t enable compilation for Mac. This is possible to enable, but causes some issues with cargo’s compilation cache. Briefly: it requires RUST_FLAGS to be set, and because of https://github.com/rust-lang/cargo/issues/8716 & https://github.com/rust-lang/cargo/issues/8899, any compilation of a different target will bust the cache.

The possible future workarounds include:

  • Passing --target=aarch64-apple-darwin to every cargo call, which is inconvenient and can be difficult in some situations; e.g. Rust Analyzer. This disables passing RUST_FLAGS (I’m actually unclear why prql-elixir builds successfully in that case…)
  • Directing other cargo calls to different paths, such as /target-ra for Rust Analyzer and /target-book for the book building. But one cargo build from the terminal without either the target or target_dir specified will bust the cache!
  • Never compiling for other targets. But our standard tests run for --target=wasm32-unknown-unknown, so this requires refraining from using them.
  • Removing prql-elixir from our workspace, so that cargo commands in the PRQL workspace don’t require rust flags. This would work well, but means we need separate test coverage for this crate, which adds some weight to the tests.

If prql-elixir becomes more used (for example, we start publishing to Hex, or Mac developers want to work on it), then we can re-enable and deal with the caching issues. We can also re-enable them if the cargo issue is resolved.

To test on Mac temporarily — for example if there’s an error in GHA and we’re on a Mac locally — apply a diff like this, and then run cargo build from the prql-elixir path, which will enable the local .cargo/config.toml). (We could also make a feature like elixir-mac which enabled building on Mac).

diff --git a/bindings/prql-elixir/native/prql/Cargo.toml b/bindings/prql-elixir/native/prql/Cargo.toml
index 7194ca4f..9c7240ff 100644
--- a/bindings/prql-elixir/native/prql/Cargo.toml
+++ b/bindings/prql-elixir/native/prql/Cargo.toml
@@ -19,5 +19,5 @@ path = "src/lib.rs"
 prql-compiler = {path = "../../../../prql-compiler", default-features = false, version = "0.6.1"}

 # See Readme for details on Mac
-[target.'cfg(not(any(target_family="wasm", target_os = "macos")))'.dependencies]
+# [target.'cfg(not(any(target_family="wasm", target_os = "macos")))'.dependencies]
 rustler = "0.27.0"
diff --git a/bindings/prql-elixir/native/prql/src/lib.rs b/bindings/prql-elixir/native/prql/src/lib.rs
index 2c5c8f27..68e77217 100644
--- a/bindings/prql-elixir/native/prql/src/lib.rs
+++ b/bindings/prql-elixir/native/prql/src/lib.rs
@@ -1,5 +1,5 @@
 // See Readme for more information on Mac compiling
-#![cfg(not(target_os = "macos"))]
+// #![cfg(not(target_os = "macos"))]
 // These bindings aren't relevant on wasm
 #![cfg(not(target_family = "wasm"))]
 // TODO: unclear why we need this `allow`; it's required in `CompileOptions`,

prql-java

prql-java offers Java bindings to the prql-compiler Rust library. It exposes a Java native method public static native String toSql(String query).

It’s still at an early stage, and currently requires compiling locally, and isn’t published to Maven. Contributions are welcome.

Installation

<dependency>
    <groupId>org.prqllang</groupId>
    <artifactId>prql-java</artifactId>
    <version>${PRQL_VERSION}</version>
</dependency>

Usage

import org.prqllang.prql4j.PrqlCompiler;

class Main {
    public static void main(String[] args) {
        String sql = PrqlCompiler.toSql("from table");
        System.out.println(sql);
    }
}

prql-js

JavaScript bindings for prql-compiler.

Installation

npm install prql-js

Usage

Currently these functions are exposed

function compile(prql_query: string, options?: CompileOptions): string;

function prql_to_pl(prql_query: string): string;

function pl_to_rq(pl_json: string): string;

function rq_to_sql(rq_json: string): string;

From Node.js

Direct usage

const prqljs = require("prql-js");

const sql = prqljs.compile(`from employees | select first_name`);
console.log(sql);

Options

const opts = new prql.CompileOptions();
opts.target = "sql.mssql";
opts.format = false;
opts.signature_comment = false;

const sql = prqljs.compile(`from employees | take 10`, opts);
console.log(sql);

Template literal

const prqljs = require("prql-js");
const prql = (string) => prqljs.compile(string[0] || "");

const sql = prql`from employees | select first_name`;
console.log(sql);

Template literal with newlines

const prqljs = require("prql-js");
const prql = (string) => prqljs.compile(string[0] || "");

const sql = prql`
    from employees
    select first_name
`;
console.log(sql);

From a browser

<html>
  <head>
    <script src="./node_modules/prql-js/dist/web/prql_js.js"></script>
    <script>
      const { compile } = wasm_bindgen;

      async function run() {
        await wasm_bindgen("./node_modules/prql-js/dist/web/prql_js_bg.wasm");
        const sql = compile("from employees | select first_name");

        console.log(sql);
      }

      run();
    </script>
  </head>

  <body></body>
</html>

From a framework or a bundler

import compile from "prql-js/dist/bundler";

const sql = compile(`from employees | select first_name`);
console.log(sql);

Errors

Errors are returned as following object, serialized as a JSON array:

interface ErrorMessage {
  /// Plain text of the error
  reason: string;
  /// A list of suggestions of how to fix the error
  hint: string | null;
  /// Character offset of error origin within a source file
  span: [number, number] | null;

  /// Annotated code, containing cause and hints.
  display: string | null;
  /// Line and column number of error origin within a source file
  location: SourceLocation | null;
}

/// Location within the source file.
/// Tuples contain:
/// - line number (0-based),
/// - column number within that line (0-based),
interface SourceLocation {
  start: [number, number];

  end: [number, number];
}

These errors can be caught as such:

try {
  const sql = prqlJs.compile(`from employees | foo first_name`);
} catch (error) {
  const errorMessages = JSON.parse(error.message).inner;

  console.log(errorMessages[0].display);
  console.log(errorMessages[0].location);
}

Development

Build:

npm run build

This builds Node, bundler and web packages in the dist path.

Test:

npm test

Notes

  • This uses wasm-pack to generate bindings1.
  • We’ve added an npm layer on top of the usual approach of just using wasm-pack, so we can distribute a single package with targets of node, bundler and no-modules — somewhat inverting the approach recommended by wasm-pack. The build instruction goes in a build script, rather than a pack script.
1
Though we would be very open to other approaches, given wasm-pack does not
seem maintained, and we're eliding many of its features to build for three
targets. See <https://github.com/PRQL/prql/issues/1836> for more details.

prql-python

prql-python offers Rust bindings to the prql-compiler Rust library. It exposes a Python method compile(query: str) -> str.

This is consumed by PyPrql & dbt-prql.

The crate is not published to crates.io; only to PyPI at https://pypi.org/project/prql-python/.

Installation

pip install prql-python

Usage

import prql_python as prql

prql_query = """
    from employees
    join salaries [==emp_id]
    group [employees.dept_id, employees.gender] (
      aggregate [
        avg_salary = average salaries.salary
      ]
    )
"""

options = prql.CompileOptions(
    format=True, signature_comment=True, target="sql.postgres"
)

sql = prql.compile(prql_query)
sql_postgres = prql.compile(prql_query, options)

Relies on pyo3 for all the magic.

R (prqlr)

R bindings for prql-compiler.

prqlr also includes knitr (R Markdown and Quarto) integration, which allows us to easily create documents with the PRQL conversion results embedded in.

Check out https://eitsupi.github.io/prqlr/ for more context.

Note

prqlr is generously maintained by @eitsupi in the eitsupi/prqlr repo.

Installation

install.packages("prqlr")

Usage

library(prqlr)

"
from employees
join salaries [emp_id]
group [dept_id, gender] (
  aggregate [
    avg_salary = average salary
  ]
)
" |>
  prql_compile()

PRQL compiler

prql-compiler contains the implementation of PRQL’s compiler, written in Rust.

For more on PRQL, check out the PRQL website or the PRQL repo.

For more usage examples and the library documentation, check out the prql-compiler documentation.

Installation

cargo add prql-compiler

Examples

Compile a PRQL string to a SQLite dialect string.

src/main.rs

#![allow(unused)]
fn main() {
use prql_compiler::{compile, Options, Target, sql::Dialect};

let prql = "from employees | select [name, age]";
let opts = &Options {
    format: false,
    target: Target::Sql(Some(Dialect::SQLite)),
    signature_comment: false,
    color: false,
};
let sql = compile(&prql, opts).unwrap();
assert_eq!("SELECT name, age FROM employees", sql);
}

Terminology

Relation: Standard definition of a relation in context of databases:

  • An ordered set of tuples of form (d_0, d_1, d_2, ...).
  • Set of all d_x is called an attribute or a column. It has a name and a type domain D_x.

Frame: descriptor of a relation. Contains list of columns (with names and types). Does not contain data.

Table: persistently stored relation. Some uses of this term actually mean to say “relation”.

Integrations

PRQL is building integrations with lots of external tools, including:

dbt-prql

Original docs at https://github.com/prql/dbt-prql

Note

As of Feb 2023, we’re again considering how to best integrate with dbt more closely. Ideally a file with a .prql extension will just work™.

If you’re interested in this, subscribe or 👍 to https://github.com/dbt-labs/dbt-core/pull/5982.

The original plugin is hosted at https://github.com/prql/dbt-prql, but only works with a minority of dialects, and isn’t a focus of development at the moment.

dbt-prql allows writing PRQL in dbt models. This combines the benefits of PRQL’s power & simplicity within queries, with dbt’s version control, lineage & testing across queries.

Once dbt-prql in installed, dbt commands compile PRQL between {% prql %} & {% endprql %} Jinja tags to SQL as part of dbt’s compilation. No additional config is required.

Examples

Simple example

{% prql %}
from employees
filter (age | in 20..30)
{% endprql %}

…would appear to dbt as:

SELECT
  employees.*
FROM
  employees
WHERE
  age BETWEEN 20
  AND 30

Less simple example

{% prql %}
from in_process = {{ source('salesforce', 'in_process') }}
derive expected_sales = probability * value
join {{ ref('team', 'team_sales') }} [name]
group name (
  aggregate (sum expected_sales)
)
{% endprql %}

…would appear to dbt as:

SELECT
  name,
  sum(in_process.probability * in_process.value) AS expected_sales
FROM
  {{ source('salesforce', 'in_process') }} AS in_process
  JOIN {{ ref('team', 'team_sales') }} USING(name)
GROUP BY
  name

…and then dbt will compile the source and refs to a full SQL query.

Replacing macros

dbt’s use of macros has saved many of us many lines of code, and even saved some people some time. But imperatively programming text generation with code like if not loop.last is not our highest calling. It’s the “necessary” part rather than beautiful part of dbt.

Here’s the canonical example of macros in the dbt documentation:

{%- set payment_methods = ["bank_transfer", "credit_card", "gift_card"] -%}

select
order_id,
{%- for payment_method in payment_methods %}
sum(case when payment_method = '{{payment_method}}' then amount end) as {{payment_method}}_amount
{%- if not loop.last %},{% endif -%}
{% endfor %}
from {{ ref('raw_payments') }}
group by 1

Here’s that model using PRQL1, including the prql Jinja tags.

{% prql %}
func filter_amount method -> s"sum(case when payment_method = '{method}' then amount end) as {method}_amount"

from {{ ref('raw_payments') }}
group order_id (
  aggregate [
    filter_amount bank_transfer,
    filter_amount credit_card,
    filter_amount gift_card,
  ]
)
{% endprql %}

As well the query being simpler in its final form, writing in PRQL also gives us live feedback around any errors, on every keystroke. Though there’s much more to come, check out the current version on PRQL Playground.

1
Note that when <https://github.com/prql/prql/issues/82> is implemented, we
can dispense with the s-string, and optionally dispense with the function.

```elm
from {{ ref('raw_payments') }}
group order_id (
 aggregate [
   bank_transfer_amount = amount | filter payment_method == 'bank'        | sum,
   credit_card_amount = amount   | filter payment_method == 'credit_card' | sum,
   gift_amount = amount          | filter payment_method == 'gift_card'   | sum,
 ]
)
```

or

```elm
func filter_amount method -> amount | filter payment_method == method | sum

from {{ ref('raw_payments') }}
group order_id (
 aggregate [
   bank_transfer_amount = filter_amount 'bank'
   credit_card_amount   = filter_amount 'credit_card'
   gift_amount          = filter_amount 'gift_card'
 ]
)
```

What it does

When dbt compiles models to SQL queries:

  • Any text in a dbt model between {% prql %} and {% endprql %} tags is compiled from PRQL to SQL before being passed to dbt.
  • The PRQL compiler passes text that’s containing {{ & }} through to dbt without modification, which allows us to embed Jinja expressions in PRQL. (This was added to PRQL specifically for this use-case.)
  • dbt will then compile the resulting model into its final form of raw SQL, and dispatch it to the database, as per usual.

There’s no config needed in the dbt project; this works automatically on any dbt command (e.g. dbt run) assuming dbt-prql is installed.

Installation

pip install dbt-prql

Current state

Currently this is new, but fairly feature-complete. It’s enthusiastically supported — if there are any problems, please open an issue.

How does it work?

It’s some dark magic, unfortunately.

dbt doesn’t allow adding behavior beyond the database adapters (e.g. dbt-bigquery) or Jinja-only plugins (e.g. dbt-utils). So this library hacks the Python import system to monkeypatch dbt’s Jinja environment with an additional Jinja extension on Python’s startup2.

2
Thanks to
[mtkennerly/poetry-dynamic-versioning](https://github.com/mtkennerly/poetry-dynamic-versioning)
for the technique.

This approach was discussed with the dbt team here and here.

This isn’t stable between dbt versions, since it relies on internal dbt APIs. The technique is also normatively bad — it runs a few lines of code every time the Python interpreter starts — whose errors could lead to very confusing bugs beyond the domain of the problem (though in the case of this library, it’s small and well-constructed™).

If there’s ever any concern that the library might be causing a problem, just set an environment variable DBT_PRQL_DISABLE=1, and this library won’t monkeypatch anything. It’s also fully uninstallable with pip uninstall dbt-prql.

Roadmap

Open to ideas; at the moment it’s fairly feature-complete. If we were unconstrained in dbt functionality:

  • If dbt allowed for external plugins, we’d enthusiastically move to that.
  • We’d love to have this work on .prql files without the {% prql %} tags; but with the current approach that would require quite invasive monkeypatching.
  • If we could add the dialect in automatically (i.e. prql dialect:snowflake), that would save a line per model.
  • If we could upstream this into dbt-core, that would be awesome. It may be on PRQL to demonstrate its staying power before that, though.

We may move this library to the https://github.com/prql/PyPrql or https://github.com/prql/prql repos. We’d prefer to keep it as its own package given the hackery above, but there’s no need for it to be its own repo.

Jupyter

pyprql contains pyprql.magic, a thin wrapper of JupySQL’s SQL IPython magics. This allows us to run PRQL interactively on Jupyter/IPython.

Check out https://pyprql.readthedocs.io/ for more context.

Installation

pip install pyprql

Usage

When installing pyprql, the duckdb-engine package is also installed with it, so we can start using PRQL immediately to query CSV and Parquet files.

For example, running the example from the JupySQL documentation on IPython:

In [1]: %load_ext pyprql.magic

In [2]: !curl -sL https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv -o penguins.csv

In [3]: %prql duckdb://

In [4]: %prql from `penguins.csv` | take 3
Out[4]:
  species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g     sex
0  Adelie  Torgersen            39.1           18.7                181         3750    MALE
1  Adelie  Torgersen            39.5           17.4                186         3800  FEMALE
2  Adelie  Torgersen            40.3           18.0                195         3250  FEMALE

In [5]: %%prql
   ...: from `penguins.csv`
   ...: filter bill_length_mm > 40
   ...: take 3
   ...:
   ...:
Out[5]:
  species     island  bill_length_mm  bill_depth_mm  flipper_length_mm  body_mass_g     sex
0  Adelie  Torgersen            40.3           18.0                195         3250  FEMALE
1  Adelie  Torgersen            42.0           20.2                190         4250    None
2  Adelie  Torgersen            41.1           17.6                182         3200  FEMALE

DuckDB

There’s an experimental DuckDB extension from @ywelsch at ywelsch/duckdb-prql.

Check out the Readme for more details.

Prefect

Because Prefect is in native Python, it’s extremely easy to integrate with PRQL.

With a Postgres Task, replace:

PostgresExecute.run(..., query=sql)

…with…

PostgresExecute.run(..., query=prql_python.compile(prql))

We’re big fans of Prefect, and if there is anything that would make the integration easier, please open an issue.

Visual Studio Code extension

PRQL has a Visual Studio Code extension that compiles a PRQL query in a VS Code editor and displays the resulting SQL code in a second pane on the side. This is very handy for editing, saving, and reusing PRQL queries in VS Code.

To install the VS Code extension, open VS Code and type Ctl-Shift-P (Cmd-Shift-P on a Mac) and type PRQL. Install the extension as usual.

Repo for the PRQL VS Code extension

Extension on VS Marketplace

Rill

PRQL has had some work to integrate with Rill. See the Rill ssues for more details.

Examples

These examples are rewritten from other languages such as SQL. They try to express real-world problems in PRQL, covering most of the language features. We are looking for different use-cases of data transformation, be it database queries, semantic business modeling or data cleaning.

If you want to help, translate some of your queries to PRQL and open a PR to add them here!

PRQL

from employees
filter country == "USA"                       # Each line transforms the previous result.
derive [                                     # This adds columns / variables.
  gross_salary = salary + payroll_tax,
  gross_cost = gross_salary + benefits_cost  # Variables can use other variables.
]
filter gross_cost > 0
group [title, country] (                     # For each group use a nested pipeline
  aggregate [                                # Aggregate each group to a single row
    average salary,
    average gross_salary,
    sum salary,
    sum gross_salary,
    average gross_cost,
    sum_gross_cost = sum gross_cost,
    ct = count,
  ]
)
sort sum_gross_cost
filter ct > 200
take 20

SQL

WITH table_1 AS (
  SELECT
    title,
    country,
    salary + payroll_tax + benefits_cost AS _expr_0,
    salary + payroll_tax AS _expr_1,
    salary
  FROM
    employees
  WHERE
    country = 'USA'
)
SELECT
  title,
  country,
  AVG(salary),
  AVG(_expr_1),
  SUM(salary),
  SUM(_expr_1),
  AVG(_expr_0),
  SUM(_expr_0) AS sum_gross_cost,
  COUNT(*) AS ct
FROM
  table_1 AS table_0
WHERE
  _expr_0 > 0
GROUP BY
  title,
  country
HAVING
  COUNT(*) > 200
ORDER BY
  sum_gross_cost
LIMIT
  20

PRQL

from employees
group [emp_no] (
  aggregate [
    emp_salary = average salary     # average salary resolves to "AVG(salary)" (from stdlib)
  ]
)
join titles [==emp_no]
group [title] (
  aggregate [
    avg_salary = average emp_salary
  ]
)
select salary_k = avg_salary / 1000 # avg_salary should resolve to "AVG(emp_salary)"
take 10                             # induces new SELECT
derive salary = salary_k * 1000     # salary_k should not resolve to "avg_salary / 1000"

SQL

WITH table_3 AS (
  SELECT
    AVG(salary) AS _expr_1,
    emp_no
  FROM
    employees
  GROUP BY
    emp_no
),
table_1 AS (
  SELECT
    AVG(table_2._expr_1) AS _expr_0
  FROM
    table_3 AS table_2
    JOIN titles ON table_2.emp_no = titles.emp_no
  GROUP BY
    titles.title
)
SELECT
  _expr_0 / 1000 AS salary_k,
  _expr_0 / 1000 * 1000 AS salary
FROM
  table_1 AS table_0
LIMIT
  10

Single item is coerced into a list

PRQL

from employees
select salary

SQL

SELECT
  salary
FROM
  employees

Same as above but with salary in a list:

PRQL

from employees
select [salary]

SQL

SELECT
  salary
FROM
  employees

Multiple items

PRQL

from employees
derive [
  gross_salary = salary + payroll_tax,
  gross_cost = gross_salary + benefits_cost
]

SQL

SELECT
  *,
  salary + payroll_tax AS gross_salary,
  salary + payroll_tax + benefits_cost AS gross_cost
FROM
  employees

Same as above but split into two lines:

PRQL

from employees
derive gross_salary = salary + payroll_tax
derive gross_cost = gross_salary + benefits_cost

SQL

SELECT
  *,
  salary + payroll_tax AS gross_salary,
  salary + payroll_tax + benefits_cost AS gross_cost
FROM
  employees

PRQL

let newest_employees = (
  from employees
  sort tenure
  take 50
)

let average_salaries = (
  from salaries
  group country (
    aggregate average_country_salary = (average salary)
  )
)

from newest_employees
join average_salaries [==country]
select [name, salary, average_country_salary]

SQL

WITH newest_employees AS (
  SELECT
    *
  FROM
    employees
  ORDER BY
    tenure
  LIMIT
    50
), average_salaries AS (
  SELECT
    country,
    AVG(salary) AS average_country_salary
  FROM
    salaries
  GROUP BY
    country
)
SELECT
  newest_employees.name,
  newest_employees.salary,
  average_salaries.average_country_salary
FROM
  newest_employees
  JOIN average_salaries ON newest_employees.country = average_salaries.country

Employees

These are homework tasks on employees database.

Clone and init the database (requires a local PostgreSQL instance):

psql -U postgres -c 'CREATE DATABASE employees;'
git clone https://github.com/vrajmohan/pgsql-sample-data.git
psql -U postgres -d employees -f pgsql-sample-data/employee/employees.dump

Execute a PRQL query:

cd prql-compiler
cargo run compile examples/employees/average-title-salary.prql | psql -U postgres -d employees

Task 1

rank the employee titles according to the average salary for each department.

My solution:

  • for each employee, find their average salary,
  • join employees with their departments and titles (duplicating employees for each of their titles and departments)
  • group by department and title, aggregating average salary
  • join with department to get department name

PRQL

from salaries
group [emp_no] (
  aggregate [emp_salary = average salary]
)
join t=titles [==emp_no]
join dept_emp side:left [==emp_no]
group [dept_emp.dept_no, t.title] (
  aggregate [avg_salary = average emp_salary]
)
join departments [==dept_no]
select [dept_name, title, avg_salary]

SQL

WITH table_3 AS (
  SELECT
    AVG(salary) AS _expr_0,
    emp_no
  FROM
    salaries
  GROUP BY
    emp_no
),
table_1 AS (
  SELECT
    t.title,
    AVG(table_2._expr_0) AS avg_salary,
    dept_emp.dept_no
  FROM
    table_3 AS table_2
    JOIN titles AS t ON table_2.emp_no = t.emp_no
    LEFT JOIN dept_emp ON table_2.emp_no = dept_emp.emp_no
  GROUP BY
    dept_emp.dept_no,
    t.title
)
SELECT
  departments.dept_name,
  table_0.title,
  table_0.avg_salary
FROM
  table_1 AS table_0
  JOIN departments ON table_0.dept_no = departments.dept_no

Task 2

Estimate distribution of salaries and gender for each department departments.

PRQL

from e=employees
join salaries [==emp_no]
group [e.emp_no, e.gender] (
  aggregate [
    emp_salary = average salaries.salary
  ]
)
join de=dept_emp [==emp_no] side:left
group [de.dept_no, gender] (
  aggregate [
    salary_avg = average emp_salary,
    salary_sd = stddev emp_salary,
  ]
)
join departments [==dept_no]
select [dept_name, gender, salary_avg, salary_sd]

SQL

WITH table_3 AS (
  SELECT
    e.gender,
    AVG(salaries.salary) AS _expr_0,
    e.emp_no
  FROM
    employees AS e
    JOIN salaries ON e.emp_no = salaries.emp_no
  GROUP BY
    e.emp_no,
    e.gender
),
table_1 AS (
  SELECT
    table_2.gender,
    AVG(table_2._expr_0) AS salary_avg,
    STDDEV(table_2._expr_0) AS salary_sd,
    de.dept_no
  FROM
    table_3 AS table_2
    LEFT JOIN dept_emp AS de ON table_2.emp_no = de.emp_no
  GROUP BY
    de.dept_no,
    table_2.gender
)
SELECT
  departments.dept_name,
  table_0.gender,
  table_0.salary_avg,
  table_0.salary_sd
FROM
  table_1 AS table_0
  JOIN departments ON table_0.dept_no = departments.dept_no

Task 3

Estimate distribution of salaries and gender for each manager.

PRQL

from e=employees
join salaries [==emp_no]
group [e.emp_no, e.gender] (
  aggregate [
    emp_salary = average salaries.salary
  ]
)
join de=dept_emp [==emp_no]
join dm=dept_manager [
  (dm.dept_no == de.dept_no) && s"(de.from_date, de.to_date) OVERLAPS (dm.from_date, dm.to_date)"
]
group [dm.emp_no, gender] (
  aggregate [
    salary_avg = average emp_salary,
    salary_sd = stddev emp_salary
  ]
)
derive mng_no = emp_no
join managers=employees [==emp_no]
derive mng_name = s"managers.first_name || ' ' || managers.last_name"
select [mng_name, managers.gender, salary_avg, salary_sd]

SQL

WITH table_5 AS (
  SELECT
    e.gender,
    AVG(salaries.salary) AS _expr_0,
    e.emp_no
  FROM
    employees AS e
    JOIN salaries ON e.emp_no = salaries.emp_no
  GROUP BY
    e.emp_no,
    e.gender
),
table_3 AS (
  SELECT
    AVG(table_4._expr_0) AS salary_avg,
    STDDEV(table_4._expr_0) AS salary_sd,
    dm.emp_no
  FROM
    table_5 AS table_4
    JOIN dept_emp AS de ON table_4.emp_no = de.emp_no
    JOIN dept_manager AS dm ON dm.dept_no = de.dept_no
    AND (de.from_date, de.to_date) OVERLAPS (dm.from_date, dm.to_date)
  GROUP BY
    dm.emp_no,
    table_4.gender
),
table_1 AS (
  SELECT
    salary_avg,
    salary_sd,
    emp_no
  FROM
    table_3 AS table_2
)
SELECT
  managers.first_name || ' ' || managers.last_name AS mng_name,
  managers.gender,
  table_0.salary_avg,
  table_0.salary_sd
FROM
  table_1 AS table_0
  JOIN employees AS managers ON table_0.emp_no = managers.emp_no

Task 4

Find distributions of titles, salaries and genders for each department.

PRQL

from de=dept_emp
join s=salaries side:left [
  (s.emp_no == de.emp_no),
  s"({s.from_date}, {s.to_date}) OVERLAPS ({de.from_date}, {de.to_date})"
]
group [de.emp_no, de.dept_no] (
  aggregate salary = (average s.salary)
)
join employees [==emp_no]
join titles [==emp_no]
select [dept_no, salary, employees.gender, titles.title]

SQL

WITH table_1 AS (
  SELECT
    de.dept_no,
    AVG(s.salary) AS salary,
    de.emp_no
  FROM
    dept_emp AS de
    LEFT JOIN salaries AS s ON s.emp_no = de.emp_no
    AND (s.from_date, s.to_date) OVERLAPS (de.from_date, de.to_date)
  GROUP BY
    de.emp_no,
    de.dept_no
)
SELECT
  table_0.dept_no,
  table_0.salary,
  employees.gender,
  titles.title
FROM
  table_1 AS table_0
  JOIN employees ON table_0.emp_no = employees.emp_no
  JOIN titles ON table_0.emp_no = titles.emp_no

Contributing

If you’re interested in joining the community to build a better SQL, here are ways to start:

PRQL is evolving from a project with lots of excitement into a project that folks are using in their work and integrating into their tools. We’re actively looking for collaborators to lead that growth with us.

Areas for larger contributions

Compiler

The compiler is written in Rust, and there’s enough to do such that any level of experience with Rust is sufficient.

We try to keep a few onboarding issues on hand under the “good first issue” label. These have been screened to have sufficient context to get started (and we very much welcome questions where there’s some context missing).

To get started, check out the docs on Development and the Compiler architecture

And if you have questions, there are lots of friendly people on the Discord who will patiently help you.

Bindings & integrations

For PRQL to be successful, it needs to be available for the languages & tools that people already use.

  • We currently have bindings to the PRQL compiler in a few different languages; many of these can be improved, documented, and packaged in a better way.
  • If you have experience with packaging in an ecosystem that doesn’t currently have bindings, then creating PRQL bindings for that language we don’t currently support would be valuable to the project.
  • If there’s a tool that you use yourself to run data queries which you think would benefit from a PRQL integration, suggest one to us or the tool. If it’s open-source, build & share a prototype.

Relevant issues are labeled Integrations.

Language design

We decide on new language features in GitHub issues, usually under “language design” label.

You can also contribute by:

  • Finding instances where the compiler produces incorrect results, and post a bug report — feel free to use the playground.
  • Opening an issue / append to an existing issue with examples of queries that are difficult to express in PRQL — especially if more difficult than SQL.

With sufficient examples, suggest a change to the language! (Though suggestions without examples are difficult to engage with, so please do anchor suggestions in examples.)

Marketing

  • Improve our website. We have a few issues open on this front and are looking for anyone with at least some design skills.
  • Contribute towards the docs. Anything from shaping a whole section of the docs, to simply improving a confusing paragraph or fixing a typo.
  • Tell people about PRQL.
  • Find a group of users who would be interested in PRQL, help them get up to speed, help the project understand what they need.

Core team

If you have any questions or feedback and don’t receive a response on one of the general channels such as GitHub or Discord, feel free to reach out to:

Core team Emeritus

Thank you to those who have previously served on the core team:

Development

Setting up an initial dev environment

We can set up a local development environment sufficient for navigating, editing, and testing PRQL’s compiler code in two minutes:

  • Install rustup & cargo.

  • [Optional but highly recommended] Install cargo-insta, our testing framework:

    cargo install cargo-insta
    
  • That’s it! Running the unit tests for the prql-compiler crate after cloning the repo should complete successfully:

    cargo test -p prql-compiler --lib
    

    …or, to run tests and update the test snapshots:

    cargo insta test --accept -p prql-compiler --lib
    

    There’s more context on our tests in How we test below.

That’s sufficient for making an initial contribution to the compiler.


Setting up a full dev environment

Note: We really care about this process being easy, both because the project benefits from more contributors like you, and to reciprocate your future contribution. If something isn’t easy, please let us know in a GitHub Issue. We’ll enthusiastically help you, and use your feedback to improve the scripts & instructions.

For more advanced development; for example compiling for wasm or previewing the website, we have two options:

Option 1: Use the project’s task

Note: This is tested on MacOS, should work on Linux, but won’t work on Windows.

  • Install Task; either brew install go-task/tap/go-task or as described on Task.

  • Then run the setup-dev task. This runs commands from our Taskfile.yml, installing dependencies with cargo, brew, npm & pip, and suggests some VS Code extensions.

    task setup-dev
    

Option 2: Install tools individually

  • We’ll need cargo-insta, to update snapshot tests:

    cargo install cargo-insta
    
  • We’ll need a couple of additional components, which most systems will have already. The easiest way to check whether they’re installed is to try running the full tests:

    cargo test
    

    …and if that doesn’t complete successfully, check we have:

    • A clang compiler, to compile the DuckDB integration tests, since we use `duckdb-rs’. To install one:

      • On macOS, install xcode with xcode-select --install
      • On Debian Linux, apt-get update && apt-get install clang
      • On Windows, duckdb-rs isn’t supported, so these tests are excluded
    • Python >= 3.7, to compile prql-python.

  • For more involved contributions, such as building the website, playground, book, or some release artifacts, we’ll need some additional tools. But we won’t need those immediately, and the error messages on what’s missing should be clear when we attempt those things. When we hit them, the Taskfile.yml will be a good source to copy & paste instructions from.

Building & testing the full project

We have a couple of tasks which incorporate all building & testing. While they don’t need to be run as part of a standard dev loop — generally we’ll want to run a more specific test — they can be useful as a backstop to ensure everything works, and as a reference for how each part of the repo is built & tested. They should be consistent with the GitHub Actions workflows; please report any inconsistencies.

To build everything:

task build-all

To run all tests; (which includes building everything):

task test-all

These require installing Task, either brew install go-task/tap/go-task or as described on Task.

Contribution workflow

We’re similar to most projects on GitHub — open a Pull Request with a suggested change!

Commits

  • If a change is user-facing, please add a line in CHANGELOG.md, with {message}, ({@contributor, #X}) where X is the PR number.
    • If there’s a missing entry, a follow-up PR containing just the changelog entry is welcome.
  • We’re using Conventional Commits message format, enforced through action-semantic-pull-request.

Merges

  • We merge any code that makes PRQL better
  • A PR doesn’t need to be perfect to be merged; it doesn’t need to solve a big problem. It needs to:
    • be in the right direction,
    • make incremental progress,
    • be explicit on its current state, so others can continue the progress.
  • If you have merge permissions, and are reasonably confident that a PR is suitable to merge (whether or not you’re the author), feel free to merge.
    • If you don’t have merge permissions and have authored a few PRs, ask and ye shall receive.
  • The primary way we ratchet the code quality is through automated tests.
    • This means PRs almost always need a test to demonstrate incremental progress.
    • If a change breaks functionality without breaking tests, our tests were probably insufficient.
    • If a change breaks existing tests (for example, changing an external API), that indicates we should be careful about merging a change, including soliciting others’ views.
  • We use PR reviews to give general context, offer specific assistance, and collaborate on larger decisions.
    • Reviews around ‘nits’ like code formatting / idioms / etc are very welcome. But the norm is for them to be received as helpful advice, rather than as mandatory tasks to complete. Adding automated tests & lints to automate these suggestions is welcome.
    • If you have merge permissions and would like a PR to be reviewed before it merges, that’s great — ask or assign a reviewer.
    • If a PR hasn’t received attention after a day, please feel free to ping the pull request.
  • People may review a PR after it’s merged. As part of the understanding that we can merge quickly, contributors are expected to incorporate substantive feedback into a future PR.
  • We should revert quickly if the impact of a PR turns out not to be consistent with our expectations, or there isn’t as much consensus on a decision as we had hoped. It’s very easy to revert code and then re-revert when we’ve resolved the issue; it’s a sign of moving quickly. Other options which resolve issues immediately are also fine, such as commenting out an incorrect test or adding a quick fix for the underlying issue.

Docs

We’re very keen on contributions to improve our documentation.

This includes our docs in the book, on the website, in our code, or in a Readme. We also appreciate issues pointing out that our documentation was confusing, incorrect, or stale — if it’s confusing for you, it’s probably confusing for others.

Some principles for ensuring our docs remain maintainable:

  • Docs should be as close as possible to the code. Doctests are ideal on this dimension — they’re literally very close to the code and they can’t drift apart since they’re tested on every commit. Or, for example, it’s better to add text to a --help message, rather than write a paragraph in the Readme explaining the CLI.
  • We should have some visualization of how to maintain docs when we add them. Docs have a habit of falling out of date — the folks reading them are often different from those writing them, they’re sparse from the code, generally not possible to test, and are rarely the by-product of other contributions. Docs that are concise & specific are easier to maintain.
  • Docs should be specifically relevant to PRQL; anything else we can instead link to.

If something doesn’t fit into one of these categories, there are still lots of ways of getting the word out there — a blog post / gist / etc. Let us know and we’re happy to link to it / tweet it.

How we test

We use a pyramid of tests — we have fast, focused tests at the bottom of the pyramid, which give us low latency feedback when developing, and then slower, broader tests which ensure that we don’t miss anything as PRQL develops1.

Note

If you’re making your first contribution, you don’t need to engage with all this — it’s fine to just make a change and push the results; the tests that run in GitHub will point you towards any errors, which can be then be run locally if needed. We’re always around to help out.

Our tests, from the bottom of the pyramid to the top:

  • Static checks — we run a few static checks to ensure the code stays healthy and consistent. They’re defined in .pre-commit-config.yaml, using pre-commit. They can be run locally with

    pre-commit run -a
    

    The tests fix most of the issues they find themselves. Most of them also run on GitHub on every commit; any changes they make are added onto the branch automatically in an additional commit.

    • Checking by MegaLinter, which includes more Linters, is also done automatically on GitHub. (experimental)
  • Unit tests & inline insta snapshots — we rely on unit tests to rapidly check that our code basically works. We extensively use Insta, a snapshot testing tool which writes out the values generated by our code, making it fast & simple to write and modify tests2

    These are the fastest tests which run our code; they’re designed to run on every save while you’re developing. We include a task which does this:

    task test-rust-fast
    # or
    cargo insta test --accept -p prql-compiler --lib
    # or, to run on every change:
    task -w test-rust-fast
    
  • Examples — we compile all examples in the PRQL Book, to test that they produce the SQL we expect, and that changes to our code don’t cause any unexpected regressions.

  • Integration tests — these run tests against real databases, to ensure we’re producing correct SQL.

  • GitHub Actions on every commit — we run the tests described up to this point on every commit to a pull request. These are designed to run in under five minutes, and we should be reassessing their scope if they grow beyond that. Once these pass, a pull request can be merged.

    These can be run locally with:

    task test-rust
    
  • GitHub Actions on specific changes — we run additional tests on pull requests when we identify changes to some paths, such as bindings to other languages.

  • GitHub Actions on merge — we run many more tests on every merge to main. This includes testing across OSs, all our language bindings, our task tasks, a measure of test code coverage, and some performance benchmarks.

    We can run these tests before a merge by adding a label pr-test-all to the PR.

    If these tests fail after merging, we revert the merged commit before fixing the test and then re-reverting.

    Most of these will run locally with:

    task test-all
    
  • GitHub Actions nightly — we run tests that take a long time or are unrelated to code changes, such as security checks, or expensive timing benchmarks, every night.

    We can run these tests before a merge by adding a label pr-cron to the PR.

The goal of our tests is to allow us to make changes quickly. If you find they’re making it more difficult for you to make changes, or there are missing tests that would give you the confidence to make changes faster, then please raise an issue.


Website

The website is published together with the book and the playground, and is automatically built and released on any push to the web branch.

The web branch points to the latest release plus any website-specific fixes. That way, the compiler behavior in the playground matches the latest release while allowing us to fix mistakes with a tighter loop than every release.

Fixes to the playground, book, or website should have a pr-backport-web label added to their PR — a bot will then open another PR onto the web branch once the initial branch merges.


Releasing

Currently we release in a semi-automated way:

  1. PR & merge an updated Changelog. GitHub will produce a draft version at https://github.com/PRQL/prql/releases/new, including “New Contributors”.

    We can use this script to generate the first line:

    echo "This release has $(git rev-list --count $(git rev-list --tags --max-count=1)..) commits from $(git shortlog --summary $(git rev-list --tags --max-count=1).. | wc -l | tr -d '[:space:]') contributors. Selected changes:"
    
  2. Run cargo release version patch -x && cargo release replace -x to bump the versions, then PR the resulting commit.

  3. After merging, go to Draft a new release3, copy the changelog entry into the release description4, enter the tag to be created, and hit “Publish”.

  4. From there, both the tag and release is created and all packages are published automatically based on our release workflow.

  5. Add in the sections for a new Changelog:

    ## 0.8.X — [unreleased]
    
    **Features**:
    
    **Fixes**:
    
    **Documentation**:
    
    **Web**:
    
    **Integrations**:
    
    **Internal changes**:
    
    **New Contributors**:
    
  6. Check whether there are milestones that need to be pushed out.

We may make this more automated in future; e.g. automatic changelog creation.


1: Our approach is very consistent with @matklad’s advice, in his excellent blog post How to Test.

2: Here’s an example of an insta test — note that only the initial line of each test is written by us; the remainder is filled in by insta.

3: Only maintainers have access to this page.

4: Unfortunately GitHub’s markdown parser interprets linebreaks as newlines. I haven’t found a better way of editing the markdown to look reasonable than manually editing the text.

Developing with Docker

The Dockerfile in this repo builds a Docker image that has current versions of our Rust development tools. This can be the lowest-effort way of setting up a Rust environment for those that don’t have one already.

Development cycle

The developer loop when using Docker is substantially the same as if the tools had been installed directly.

All the source files live in the prql directory on the host. As the source changes, the tools (running in the Docker container) can watch those directories and re-run so results are instantly visible.

When the Docker container exits (say, at the end of the development session), the prql directory on the local machine contains the latest files. Use git to pull or to push the prql repo from the host as normal.

To do all this, build the Docker image and start a container as described in the Installation section.

Installation

Once Docker is installed, build the Docker image with the following commands.

Note: It will take some time while Docker pulls in all the necessary developer tools.

cd <top-level-PRQL-directory>
docker build -t prql .

Optional: Install pre-commit on the machine that hosts Docker. It runs several Static Checks to ensure code consistency. You can also configure git to run pre-commit automatically for each commit with the second (one-time) command below.

pre-commit run -a   # Run checks manually
pre-commit install  # (one time) install the git hooks

Finally, start up the Docker container with:

cd <top-level-PRQL-directory>
docker run --rm -it -v $(pwd)/:/src -p 3000:3000 prql
  • There’ll be a root@xxxxxxxxx:/src/# prompt
  • Enter a command to run or test code; for example cargo test
  • Enter exit to stop the container

Running code with Docker

Currently our Docker image only supports running Rust dependencies. (adding hugo & nodejs so that the playground can run would be a welcome contribution.)

Use the docker run... command above, then enter the relevant commands; for example cargo insta test --accept or task run book — more details of the commands are in each component’s README.md file or our Development docs.

Note: The first time you run a component, it may take some time to install additional files. Once they’re built, start up is quick.

Developing the Dockerfile

When making updates to the Dockerfile, we have automated testing that the Dockerfile builds on each merge in test-all.yaml, and automated testing that the confirms all Rust tests pass, in nightly.yaml.

Add a label to the PR pr-test-all or pr-cron to run these tests on a PR.

Developing with Dev Containers

Note

Currently the Dev Container included in this repository only supports the amd64 platform.

Dev Containers are a way to package a number of developer tools (compilers, bundlers, package managers, loaders, etc.) into a single object. This is helpful when many people want to contribute to a project: each person only has to install the Dev Container on their own machine to start working. By definition, the Dev Container has a consistent set of tools that are known to work together. This avoids a fuss with finding the proper version of each of the build tools.

While there are a variety of tools that support Dev Containers, the focus here is on developing with VS Code in a container by GitHub Codespaces or VS Code Dev Containers extension.

To use a Dev Container on a local computer with VS Code, install the VS Code Dev Containers extension and its system requirements. Then refer to the links above to get started.

Internals

This chapter explains PRQL’s semantics: how expressions are interpreted and their meaning. It’s intended for advanced users and compiler contributors.

It’s also worth checking out the prql-compiler docs for more details on its API.

PRQL Compiler Architecture

The PRQL compiler operates in the following stages:

  1. Lexing & Parsing: PRQL source text is split into tokens with the Chumsky parser named “lexer”. The stream of tokens is then parsed into an Abstract Syntax Tree (AST).

  2. Semantic Analysis: This stage resolves names (identifiers), extracts declarations, and determines frames (table columns in each step). A Context is declared containing the root module, which maps accessible names to their declarations.

    The resolving process involves the following operations:

    • Assign an ID to each node (Expr and Stmt).
    • Extract function declarations and variable definitions into the appropriate Module, accessible from Context::root_mod.
    • Look up identifiers in the module and find the associated declaration. The identifier is replaced with a fully qualified name that guarantees a unique name in root_mod. In some cases, Expr::target is also set.
    • Convert function calls to transforms (from, derive, filter) from FuncCall to TransformCall, which is more convenient for later processing.
    • Determine the type of expressions. If an expression is a reference to a table, use the frame of the table as the type. If it is a TransformCall, apply the transform to the input frame to obtain the resulting type. For simple expressions, try to infer from ExprKind.
  3. Lowering: This stage converts the PL into RQ, which is more strictly typed and contains less information but is convenient for translating into SQL or other backends.

  4. SQL Backend: This stage converts RQ into SQL. Each relation is transformed into an SQL query. Pipelines are analyzed and split into “AtomicPipelines” at appropriate positions, which can be represented by a single SELECT statement.

    Splitting is performed back-to-front. First, a list of all output columns is created. The pipeline is then traversed backwards, and splitting occurs when an incompatible transform with those already present in the pipeline is encountered. Splitting can also be triggered by encountering an expression that cannot be materialized where it is used (e.g., a window function in a WHERE clause).

    This process is also called anchoring, as it anchors a column definition to a specific location in the output query.

    During this process, sql::context keeps track of:

    • Table instances in the query (to prevent mixing up multiple instances of the same table)
    • Column definitions, whether computed or a reference to a table column
    • Column names, as defined in RQ or generated

Helpers

Cheat sheets for prqlc are available on various websites and with various tools.

Name resolving

Because PRQL primarily handles relational data, it has specialized scoping rules for referencing columns.

Scopes

In PRQL’s compiler, a scope is the collection of all names one can reference from a specific point in the program.

In PRQL, names in the scope are composed from namespace and variable name which are separated by a dot, similar to SQL. Namespaces can contain many dots, but variable names cannot.

Example

Name my_table.some_column is a variable some_column from namespace my_table.

Name foo.bar.baz is a variable baz from namespace foo.bar.

When processing a query, a scope is maintained and updated for each point in the query.

It start with only namespace std, which is the standard library. It contains common functions like sum or count, along with all transform functions such as derive and group.

In pipelines (or rather in transform functions), scope is also injected with namespaces of tables which may have been referenced with from or join transforms. These namespaces contain simply all the columns of the table and possibly a wildcard variable, which matches any variable (see the algorithm below). Within transforms, there is also a special namespace that does not have a name. It is called a “frame” and it contains columns of the current table the transform is operating on.

Resolving

For each ident we want to resolve, we search the scope’s items in order. One of three things can happen:

  • Scope contains an exact match, e.g. a name that matches in namespace and the variable name.

  • Scope does not contain an exact match, but the ident did not specify a namespace, so we can match a namespace that contains a * wildcard. If there’s a single namespace, the matched namespace is also updated to contain this new variable name.

  • Otherwise, the nothing is matched and an error is raised.

Translating to SQL

When translating into an SQL statement which references only one table, there is no need to reference column names with table prefix.

PRQL

from employees
select first_name

SQL

SELECT
  first_name
FROM
  employees

But when there are multiple tables and we don’t have complete knowledge of all table columns, a column without a prefix (i.e. first_name) may actually reside in multiple tables. Because of this, we have to use table prefixes for all column names.

PRQL

from employees
derive [first_name, dept_id]
join d=departments [==dept_id]
select [first_name, d.title]

SQL

SELECT
  employees.first_name,
  d.title
FROM
  employees
  JOIN departments AS d ON employees.dept_id = d.dept_id

As you can see, employees.first_name now needs table prefix, to prevent conflicts with potential column with the same name in departments table. Similarly, d.title needs the table prefix.

Functions

Function call

The major distinction between PRQL and today’s conventional programming languages such as C or Python is the function call syntax. It consists of the function name followed by arguments separated by whitespace.

function_name arg1 arg2 arg3

If one of the arguments is also a function call, it must be encased in parentheses, so we know where arguments of inner function end and the arguments of outer function start.

outer_func arg_1 (inner_func arg_a, arg_b) arg_2

Pipeline

There is a alternative way of calling functions: using a pipeline. Regardless of whether the pipeline is delimited by pipe symbol | or a new line, the pipeline is equivalent to applying each of functions as the last argument of the next function.

a | foo 3 | bar 'hello' 'world' | baz

… is equivalent to …

baz (bar 'hello' 'world' (foo 3 a))

As you may have noticed, transforms are regular functions too!

PRQL

from employees
filter age > 50
sort name

SQL

SELECT
  *
FROM
  employees
WHERE
  age > 50
ORDER BY
  name

… is equivalent to …

PRQL

from employees | filter age > 50 | sort name

SQL

SELECT
  *
FROM
  employees
WHERE
  age > 50
ORDER BY
  name

… is equivalent to …

PRQL

filter age > 50 (from employees) | sort name

SQL

SELECT
  *
FROM
  employees
WHERE
  age > 50
ORDER BY
  name

… is equivalent to …

PRQL

sort name (filter age > 50 (from employees))

SQL

SELECT
  *
FROM
  employees
WHERE
  age > 50
ORDER BY
  name

As you can see, the first example with pipeline notation is much easier to comprehend, compared to the last one with the regular function call notation. This is why it is recommended to use pipelines for nested function calls that are 3 or more levels deep.

Currying and late binding

In PRQL, functions are first class citizens. As cool as that sounds, we need simpler terms to explain it. In essence in means that we can operate with functions are with any other value.

Grammars / syntax highlighting

PRQL contains multiple grammar definitions to enable tools to highlight PRQL code. These are all intended to provide as good an experience as the grammar supports. Please raise any shortcomings in a GitHub issue.

The definitions are somewhat scattered around the codebase; this page serves as an index.


Since the Elm language coincidentally provides syntax highlighting suitable for PRQL, it may look better to mark PRQL code as Elm when the above definition files are not available.

For example, the following Markdown code block will be nicely highlighted on GitHub, Pandoc, and other Markdown renderers:

```elm
from employees
filter start_date > @2021-01-01
```

We hope that in the future these renderers will recognize PRQL code blocks and have syntax highlighting applied, and we are tracking these with several issues.

PRQL Changelog

0.8.1 — 2023-04-29

0.8.1 is a small release with a new list-targets command in prqlc, some documentation improvements, and some internal improvements.

This release has 41 commits from 8 contributors.

From the broader perspective of the project, we’re increasing the relative prioritization of it being easy for folks to actually use PRQL — either with existing tools, or a tool we’d build. We’re thinking about & discussing the best way to do that over the next few weeks.

0.8.0 — 2023-04-14

0.8.0 renames the and & or operators to && & || respectively, reorganizes the Syntax section in the book, and introduces read_parquet & read_csv functions for reading files with DuckDB.

This release has 38 commits from 8 contributors. Selected changes:

Features:

  • Rename and to && and or to ||. Operators which are symbols are now consistently infix, while “words” are now consistently functions (@aljazerzen, #2422).

  • New functions read_parquet and read_csv, which mirror the DuckDB functions, instructing the database to read from files (@max-sixty, #2409).

0.7.1 — 2023-04-03

0.7.1 is a hotfix release to fix prql-js’s npm install behavior when being installed as a dependency.

This release has 17 commits from 4 contributors.

0.7.0 — 2023-04-01

0.7.0 is a fairly small release in terms of new features, with lots of internal improvements, such as integration tests with a whole range of DBs, a blog post on Pi day, RFCs for a type system, and more robust language bindings.

There’s a very small breaking change to the rust API, hence the minor version bump.

Here’s our April 2023 Update, from our Readme:

April 2023 update

PRQL is being actively developed by a growing community. It’s ready to use by the intrepid, either as part of one of our supported extensions, or within your own tools, using one of our supported language bindings.

PRQL still has some minor bugs and some missing features, and probably is only ready to be rolled out to non-technical teams for fairly simple queries.

Here’s our current Roadmap and our Milestones.

Our immediate focus for the code is on:

  • Building out the next few big features, including types and modules.
  • Ensuring our supported features feel extremely robust; resolving any priority bugs.

We’re also spending time thinking about:

  • Making it really easy to start using PRQL. We’re doing that by building integrations with tools that folks already use; for example our VS Code extension & Jupyter integration. If there are tools you’re familiar with that you think would be open to integrating with PRQL, please let us know in an issue.
  • Making it easier to contribute to the compiler. We have a wide group of contributors to the project, but contributions to the compiler itself are quite concentrated. We’re keen to expand this; #1840 for feedback.

The release has 131 commits from 10 contributors. Particular credit goes to to @eitsupi & @jelenkee, who have made significant contributions, and @vanillajonathan, whose prolific contribution include our growing language bindings.

A small selection of the changes:

Features:

  • prqlc compile adds --color & --include-signature-comment options. (@max-sixty, #2267)

Web:

  • Added the PRQL snippets from the book to the Playground (@jelenkee, #2197)

Internal changes:

  • Breaking: The compile function’s Options now includes a color member, which determines whether error messages use ANSI color codes. This is technically a breaking change to the API. (@max-sixty, #2251)
  • The Error struct now exposes the MessageKind enum. (@vanillajonathan, #2307)
  • Integration tests run in CI with DuckDB, SQLite, PostgreSQL, MySQL and SQL Server (@jelenkee, #2286)

New Contributors:

  • @k-nut, with #2294

0.6.1 — 2023-03-12

0.6.1 is a small release containing an internal refactoring and improved bindings for C, PHP & .NET.

This release has 54 commits from 6 contributors. Selected changes:

Fixes:

  • No longer incorrectly compile to DISTINCT when a take 1 refers to a different set of columns than are in the group. (@max-sixty, with thanks to @cottrell, #2109)
  • The version specification of the dependency Chumsky was bumped from 0.9.0 to 0.9.2. 0.9.0 has a bug that causes an infinite loop. (@eitsupi, #2110)

Documentation:

Integrations:

  • [prql-lib] Added C++ header file. (@vanillajonathan, #2126)

Internal changes:

  • Many of the items that were in the root of the repo have been aggregated into web & bindings, simplifying the repo’s structure. There’s also grammars & packages (@max-sixty, #2135, #2117, #2121).

0.6.0 — 2023-03-08

0.6.0 introduces a rewritten parser, giving us the ability to dramatically improve error messages, renames switch to case and includes lots of minor improvements and fixes. It also introduces loop, which compiles to WITH RECURSIVE, as a highly experimental feature.

There are a few cases of breaking changes, including switching switch to case, in case that’s confusing. There are also some minor parsing changes outlined below.

This release has 108 commits from 11 contributors. Selected changes:

Features:

  • Add a (highly experimental) loop language feature, which translates to WITH RECURSIVE. We expect changes and refinements in upcoming releases. (#1642, @aljazerzen)

  • Rename the experimental switch function to case given it more closely matches the traditional semantics of case. (@max-sixty, #2036)

  • Change the case syntax to use => instead of -> to distinguish it from function syntax.

  • Convert parser from pest to Chumsky (@aljazerzen, #1818)

    • Improved error messages, and the potential to make even better in the future. Many of these improvements come from error recovery.
    • String escapes (\n \t).
    • Raw strings that don’t escape backslashes.
    • String interpolations can only contain identifiers and not any expression.
    • Operator associativity has been changed from right-to-left to left-to-right to be more similar to other conventional languages.
    • and now has a higher precedence than or (of same reason as the previous point).
    • Dates, times and timestamps have stricter parsing rules.
    • let, func, prql, case are now treated as keywords.
    • Float literals without fraction part are not allowed anymore (1.).
  • Add a --format option to prqlc parse which can return the AST in YAML (@max-sixty, #1962)

  • Add a new subcommand prqlc jinja. (@aljazerzen, #1722)

  • Breaking: prql-compiler no longer passes text containing {{ & }} through to the output. (@aljazerzen, #1722)

    For example, the following PRQL query

    from {{foo}}
    

    was compiled to the following SQL previously, but now it raises an error.

    SELECT
      *
    FROM
      {{ foo }}
    

    This pass-through feature existed for integration with dbt.

    we’re again considering how to best integrate with dbt, and this change is based on the idea that the jinja macro should run before the PRQL compiler.

    If you’re interested in dbt integration, subscribe or 👍 to https://github.com/dbt-labs/dbt-core/pull/5982.

  • A new compile target "sql.any". When "sql.any" is used as the target of the compile function’s option, the target contained in the query header will be used. (@aljazerzen, #1995)

  • Support for SQL parameters with similar syntax (#1957, @aljazerzen)

  • Allow : to be elided in timezones, such as 0800 in @2020-01-01T13:19:55-0800 (@max-sixty, #1991).

  • Add std.upper and std.lower functions for changing string casing (@Jelenkee, #2019).

Fixes:

  • prqlc compile returns a non-zero exit code for invalid queries. (@max-sixty, #1924)
  • Identifiers can contain any alphabetic unicode characters (@max-sixty, #2003)

Documentation:

  • Operator precedence (@aljazerzen, #1818)
  • Error messages for invalid queries are displayed in the book (@max-sixty, #2015)

Integrations:

  • [prql-php] Added PHP bindings. (@vanillajonathan, #1860)
  • [prql-dotnet] Added .NET bindings. (@vanillajonathan, #1917)
  • [prql-lib] Added C header file. (@vanillajonathan, #1879)
  • Added a workflow building a .deb on each release. (Note that it’s not yet published on each release). (@vanillajonathan, #1883)
  • Added a workflow building a .rpm on each release. (Note that it’s not yet published on each release). (@vanillajonathan, #1918)
  • Added a workflow building a Snap package on each release. (@vanillajonathan, #1881)

Internal changes:

  • Test that the output of our nascent autoformatter can be successfully compiled into SQL. Failing examples are now clearly labeled. (@max-sixty, #2016)
  • Definition files have been added to configure Dev Containers for Rust development environment. (@eitsupi, #1893, #2025, #2028)

New Contributors:

  • @linux-china, with #1971
  • @Jelenkee, with #2019

0.5.2 — 2023-02-18

0.5.2 is a tiny release to fix an build issue in yesterday’s prql-js 0.5.1 release.

This release has 7 commits from 2 contributors.

New Contributors:

  • @matthias-Q, with #1873

0.5.1 — 2023-02-17

0.5.1 contains a few fixes, and another change to how bindings handle default target / dialects.

This release has 53 commits from 7 contributors. Selected changes:

Fixes:

  • Delegate dividing literal integers to the DB. Previously integer division was executed during PRQL compilation, which could be confusing given that behavior is different across DBs. Other arithmetic operations are still executed during compilation. (@max-sixty, #1747)

Documentation:

  • Add docs on the from_text transform (@max-sixty, #1756)

Integrations:

  • [prql-js] Default compile target changed from Sql(Generic) to Sql(None). (@eitsupi, #1856)
  • [prql-python] Compilation options can now be specified from Python. (@eitsupi, #1807)
  • [prql-python] Default compile target changed from Sql(Generic) to Sql(None). (@eitsupi, #1861)

New Contributors:

  • @vanillajonathan, with #1766

0.5.0 — 2023-02-08

0.5.0 contains a few fixes, some improvements to bindings, lots of docs improvements, and some work on forthcoming features. It contains one breaking change in the compiler’s Options interface.

This release has 74 commits from 12 contributors. Selected changes:

Features:

  • Change public API to use target instead of dialect in preparation for feature work (@aljazerzen, #1684)

  • prqlc watch command which watches filesystem for changes and compiles .prql files to .sql (@aljazerzen, #1708)

Fixes:

  • Support double brackets in s-strings which aren’t symmetric (@max-sixty, #1650)
  • Support Postgres’s Interval syntax (@max-sixty, #1649)
  • Fixed tests for prql-elixir with MacOS (@kasvith, #1707)

Documentation:

  • Add a documentation test for prql-compiler, update prql-compiler README, and include the README in the prql book section for Rust bindings. The code examples in the README are included and tested as doctests in the prql-compiler (@nkicg6, #1679)

Internal changes:

  • Add tests for all PRQL website examples to prql-python to ensure compiled results match expected SQL (@nkicg6, #1719)

New Contributors:

  • @ruslandoga, with #1628
  • @RalfNorthman, with #1632
  • @nicot, with #1662

0.4.2 — 2023-01-25

Features:

  • New from_text format-arg string-arg function that supports JSON and CSV formats. format-arg can be format:csv or format:json. string-arg can be a string in any format. (@aljazerzen & @snth, #1514)

    from_text format:csv """
    a,b,c
    1,2,3
    4,5,6
    """
    
    from_text format:json '''
        [{"a": 1, "b": "x", "c": false }, {"a": 4, "b": "y", "c": null }]
    '''
    
    from_text format:json '''{
        "columns": ["a", "b", "c"],
        "data": [
            [1, "x", false],
            [4, "y", null]
        ]
    }'''
    

    For now, the argument is limited to string constants.

Fixes

  • Export constructor for SQLCompileOptions (@bcho, #1621)
  • Remove backticks in count_distinct (@aljazerzen, #1611)

New Contributors

  • @1Kinoti, with #1596
  • @veenaamb, with #1614

0.4.1 — 2023-01-18

0.4.1 comes a few days after 0.4.0, with a couple of features and the release of prqlc, the CLI crate.

0.4.1 has 35 commits from 6 contributors.

Features:

  • Inferred column names include the relation name (@aljazerzen, #1550):

    from albums
    select title # name used to be inferred as title only
    select albums.title # so using albums was not possible here
    
  • Quoted identifiers such as dir/*.parquet are passed through to SQL. (@max-sixty, #1516).

  • The CLI is installed with cargo install prqlc. The binary was renamed in 0.4.0 but required an additional --features flag, which has been removed in favor of this new crate (@max-sixty & @aljazerzen, #1549).

New Contributors:

  • @fool1280, with #1554
  • @nkicg6, with #1567

0.4.0 — 2023-01-15

0.4.0 brings lots of new features including case, select ![] and numbers with underscores. We have initial (unpublished) bindings to Elixir. And there’s the usual improvements to fixes & documentation (only a minority are listed below in this release).

0.4.0 also has some breaking changes: table is let, dialect is renamed to target, and the compiler’s API has changed. Full details below.

Features:

  • Defining a temporary table is now expressed as let rather than table (@aljazerzen, #1315). See the tables docs for details.

  • Experimental: The case function sets a variable to a value based on one of several expressions (@aljazerzen, #1278).

    derive var = case [
      score <= 10 -> "low",
      score <= 30 -> "medium",
      score <= 70 -> "high",
      true -> "very high",
    ]
    

    …compiles to:

    SELECT
      *,
      CASE
        WHEN score <= 10 THEN 'low'
        WHEN score <= 30 THEN 'medium'
        WHEN score <= 70 THEN 'high'
        ELSE 'very high'
      END AS var
    FROM
      bar
    

    Check out the case docs for more details.

  • Experimental: Columns can be excluded by name with select (@aljazerzen, #1329)

    from albums
    select ![title, composer]
    
  • Experimental: append transform, equivalent to UNION ALL in SQL. (@aljazerzen, #894)

    from employees
    append managers
    

    Check out the append docs for more details.

  • Numbers can contain underscores, which can make reading long numbers easier (@max-sixty, #1467):

    from numbers
    select [
        small = 1.000_000_1,
        big = 5_000_000,
    ]
    
  • The SQL output contains a comment with the PRQL compiler version (@aljazerzen, #1322)

  • dialect is renamed to target, and its values are prefixed with sql. (@max-sixty, #1388); for example:

    prql target:sql.bigquery  # previously was `dialect:bigquery`
    
    from employees
    

    This gives us the flexibility to target other languages than SQL in the long term.

  • Tables definitions can contain a bare s-string (@max-sixty, #1422), which enables us to include a full CTE of SQL, for example:

    let grouping = s"""
      SELECT SUM(a)
      FROM tbl
      GROUP BY
        GROUPING SETS
        ((b, c, d), (d), (b, d))
    """
    
  • Ranges supplied to in can be half-open (@aljazerzen, #1330).

  • The crate’s external API has changed to allow for compiling to intermediate representation. This also affects bindings. See prql_compiler docs for more details.

Fixes:

[This release, the changelog only contains a subset of fixes]

  • Allow interpolations in table s-strings (@aljazerzen, #1337)

Documentation:

[This release, the changelog only contains a subset of documentation improvements]

  • Add docs on aliases in Select
  • Add JS template literal and multiline example (@BCsabaEngine, #1432)
  • JS template literal and multiline example (@BCsabaEngine, #1432)
  • Improve prql-compiler docs & examples (@aljazerzen, #1515)
  • Fix string highlighting in book (@max-sixty, #1264)

Web:

  • The playground allows querying some sample data. As before, the result updates on every keystroke. (@aljazerzen, #1305)

Integrations:

[This release, the changelog only contains a subset of integration improvements]

  • Added Elixir integration exposing PRQL functions as NIFs (#1500, @kasvith)
  • Exposed Elixir flavor with exceptions (#1513, @kasvith)
  • Rename prql-compiler binary to prqlc (@aljazerzen #1515)

Internal changes:

[This release, the changelog only contains a subset of internal changes]

  • Add parsing for negative select (@max-sixty, #1317)
  • Allow for additional builtin functions (@aljazerzen, #1325)
  • Add an automated check for typos (@max-sixty, #1421)
  • Add tasks for running playground & book (@max-sixty, #1265)
  • Add tasks for running tests on every file change (@max-sixty, #1380)

New contributors:

  • @EArazli, with #1359
  • @boramalper, with #1362
  • @allurefx, with #1377
  • @bcho, with #1375
  • @JettChenT, with #1385
  • @BlurrechDev, with #1411
  • @BCsabaEngine, with #1432
  • @kasvith, with #1500

0.3.1 - 2022-12-03

0.3.1 brings a couple of small improvements and fixes.

Features:

  • Support for using s-strings for from (#1197, @aljazerzen)

    from s"SELECT * FROM employees WHERE foo > 5"
    
  • Helpful error message when referencing a table in an s-string (#1203, @aljazerzen)

Fixes:

  • Multiple columns with same name created (#1211, @aljazerzen)
  • Renaming via select breaks preceding sorting (#1204, @aljazerzen)
  • Same column gets selected multiple times (#1186, @mklopets)

Internal:

  • Update Github Actions and Workflows to current version numbers (and avoid using Node 12)

0.3.0 — 2022-11-29

🎉 0.3.0 is the biggest ever change in PRQL’s compiler, rewriting much of the internals: the compiler now has a semantic understanding of expressions, including resolving names & building a DAG of column lineage 🎉.

While the immediate changes to the language are modest — some long-running bugs are fixed — this unlocks the development of many of the project’s long-term priorities, such as type-checking & auto-complete. And it simplifies the building of our next language features, such as match-case expressions, unions & table expressions.

@aljazerzen has (mostly single-handedly) done this work over the past few months. The project owes him immense appreciation.

Breaking changes:

We’ve had to make some modest breaking changes for 0.3:

  • Pipelines must start with from. For example, a pipeline with only derive foo = 5, with no from transform, is no longer valid. Depending on demand for this feature, it would be possible to add this back.

  • Shared column names now require == in a join. The existing approach is ambiguous to the compiler — id in the following example could be a boolean column.

    from employees
    -join positions [id]
    +join positions [==id]
    
  • Table references containing periods must be surrounded by backticks. For example, when referencing a schema name:

    -from public.sometable
    +from `public.sometable`
    

Features:

  • Change self equality op to == (#1176, @aljazerzen)
  • Add logging (@aljazerzen)
  • Add clickhouse dialect (#1090, @max-sixty)
  • Allow namespaces & tables to contain . (#1079, @aljazerzen)

Fixes:

  • Deduplicate column appearing in SELECT multiple times (#1186, @aljazerzen)
  • Fix uppercase table names (#1184, @aljazerzen)
  • Omit table name when only one ident in SELECT (#1094, @aljazerzen)

Documentation:

  • Add chapter on semantics’ internals (@aljazerzen, #1028)
  • Add note about nesting variables in s-strings (@max-sixty, #1163)

Internal changes:

  • Flatten group and window (#1120, @aljazerzen)
  • Split ast into expr and stmt (@aljazerzen)
  • Refactor associativity (#1156, @aljazerzen)
  • Rename Ident constructor to from_name (#1084, @aljazerzen)
  • Refactor rq folding (#1177, @aljazerzen)
  • Add tests for reported bugs fixes in semantic (#1174, @aljazerzen)
  • Bump duckdb from 0.5.0 to 0.6.0 (#1132)
  • Bump once_cell from 1.15.0 to 1.16.0 (#1101)
  • Bump pest from 2.4.0 to 2.5.0 (#1161)
  • Bump pest_derive from 2.4.0 to 2.5.0 (#1179)
  • Bump sqlparser from 0.25.0 to 0.27.0 (#1131)
  • Bump trash from 2.1.5 to 3.0.0 (#1178)

0.2.11 — 2022-11-20

0.2.11 contains a few helpful fixes.

Work continues on our semantic refactor — look out for 0.3.0 soon! Many thanks to @aljazerzen for his continued contributions to this.

Note: 0.2.10 was skipped due to this maintainer’s inability to read his own docs on bumping versions…

Features:

  • Detect when compiler version is behind query version (@MarinPostma, #1058)
  • Add __version__ to prql-python package (@max-sixty, #1034)

Fixes:

  • Fix nesting of expressions with equal binding strength and left associativity, such as a - (b - c) (@max-sixty, #1136)
  • Retain floats without significant digits as floats (@max-sixty, #1141)

Documentation:

  • Add documentation of prqlr bindings (@eitsupi, #1091)
  • Add a ‘Why PRQL’ section to the website (@max-sixty, #1098)
  • Add @snth to core-devs (@max-sixty, #1050)

Internal changes:

  • Use workspace versioning (@max-sixty, #1065)

0.2.9 — 2022-10-14

0.2.9 is a small release containing a bug fix for empty strings.

Fixes:

  • Fix parsing of empty strings (@aljazerzen, #1024)

0.2.8 — 2022-10-10

0.2.8 is another modest release with some fixes, doc improvements, bindings improvements, and lots of internal changes. Note that one of the fixes causes the behavior of round and cast to change slightly — though it’s handled as a fix rather than a breaking change in semantic versioning.

Fixes:

  • Change order of the round & cast function parameters to have the column last; for example round 2 foo_col / cast int foo. This is consistent with other functions, and makes piping possible:

    derive [
      gross_salary = (salary + payroll_tax | as int),
      gross_salary_rounded = (gross_salary | round 0),
    ]
    

Documentation:

  • Split DEVELOPMENT.md from CONTRIBUTING.md (@richb-hanover, #1010)
  • Make s-strings more prominent in website intro (@max-sixty, #982)

Web:

  • Add GitHub star count to website (@max-sixty, #990)

Integrations:

  • Expose a shortened error message, in particular for the VS Code extension (@aljazerzen, #1005)

Internal changes:

  • Specify 1.60.0 as minimum Rust version (@max-sixty, #1011)
  • Remove old wee-alloc code (@max-sixty, #1013)
  • Upgrade clap to version 4 (@aj-bagwell, #1004)
  • Improve book-building script in Taskfile (@max-sixty, #989)
  • Publish website using an artifact rather than a long-lived branch (@max-sixty, #1009)

0.2.7 — 2022-09-17

0.2.7 is a fairly modest release, six weeks after 0.2.6. We have some more significant features, including a union operator and an overhaul of our type system, as open PRs which will follow in future releases.

We also have new features in the VS Code extension, courtesy of @jiripospisil, including a live output panel.

Fixes:

  • range_of_ranges checks the Range end is smaller than its start (@shuozeli, #946)

Documentation:

  • Improve various docs (@max-sixty, #974, #971, #972, #970, #925)
  • Add reference to EdgeDB’s blog post in our FAQ (@max-sixty, #922)
  • Fix typos (@kianmeng, #943)

Integrations:

  • Add prql-lib, enabling language bindings with go (@sigxcpu76, #923)
  • Fix line numbers in JS exceptions (@charlie-sanders, #929)

Internal changes:

  • Lock the version of the rust-toolchain, with auto-updates (@max-sixty, #926, #927)

0.2.6 — 2022-08-05

Fixes:

  • Adjust fmt to only escape names when needed (@aljazerzen, #907)
  • Fix quoting on upper case table names (@max-sixty, #893)
  • Fix scoping of identical column names from multiple tables (@max-sixty, #908)
  • Fix parse error on newlines in a table (@sebastiantoh 🆕, #902)
  • Fix quoting of upper case table names (@max-sixty, #893)

Documentation:

  • Add docs on Architecture (@aljazerzen, #904)
  • Add Changelog (@max-sixty, #890 #891)

Internal changes:

  • Start trial using Conventional Commits (@max-sixty, #889)
  • Add crates.io release workflow, docs (@max-sixty, #887)

0.2.5 - 2022-07-29

0.2.5 is a very small release following 0.2.4 yesterday. It includes:

  • Add the ability to represent single brackets in an s-string, with two brackets (#752, @max-sixty)
  • Fix the “Copy to Clipboard” command in the Playground, for Firefox (#880, @mklopets)

0.2.4 - 2022-07-28

0.2.4 is a small release following 0.2.3 a few days ago. The 0.2.4 release includes:

  • Enrich our CLI, adding commands to get different stages of the compilation process (@aljazerzen , #863)
  • Fix multiple take n statements in a query, leading to duplicate proxy columns in generated SQL (@charlie-sanders)
  • Fix BigQuery quoting of identifiers in SELECT statements (@max-sixty)
  • Some internal changes — reorganize top-level functions (@aljazerzen), add a workflow to track our Rust compilation time (@max-sixty), simplify our simple prql-to-sql tests (@max-sixty)

Thanks to @ankane, prql-compiler is now available from homebrew core; brew install prql-compiler1.

1
we still need to update docs and add a release workflow for this:
<https://github.com/PRQL/prql/issues/866>

0.2.3 - 2022-07-24

A couple of weeks since the 0.2.2 release: we’ve squashed a few bugs, added some mid-sized features to the language, and made a bunch of internal improvements.

The 0.2.3 release includes:

  • Allow for escaping otherwise-invalid identifiers (@aljazerzen & @max-sixty)
  • Fix a bug around operator precedence (@max-sixty)
  • Add a section the book on the language bindings (@charlie-sanders)
  • Add tests for our Display representation while fixing some existing bugs. This is gradually becoming our code formatter (@arrizalamin)
  • Add a “copy to clipboard” button in the Playground (@mklopets)
  • Add lots of guidance to our CONTRIBUTING.md around our tests and process for merging (@max-sixty)
  • Add a prql! macro for parsing a prql query at compile time (@aljazerzen)
  • Add tests for prql-js (@charlie-sanders)
  • Add a from_json method for transforming json to a PRQL string (@arrizalamin)
  • Add a workflow to release prql-java to Maven (@doki23)
  • Enable running all tests from a PR by adding a pr-run-all-tests label (@max-sixty)
  • Have cargo-release to bump all crate & npm versions (@max-sixty)
  • Update prql-js to use the bundler build of prql-js (@mklopets)

As well as those contribution changes, thanks to those who’ve reported issues, such as @mklopets @huw @mm444 @ajfriend.

From here, we’re planning to continue squashing bugs (albeit more minor than those in this release), adding some features like union, while working on bigger issues such as type-inference.

We’re also going to document and modularize the compiler further. It’s important that we give more people an opportunity to contribute to the guts of PRQL, especially given the number and enthusiasm of contributions to project in general — and it’s not that easy to do so at the moment. While this is ongoing if anyone has something they’d like to work on in the more difficult parts of the compiler, let us know on GitHub or Discord, and we’d be happy to work together on it.

Thank you!

0.2.2 - 2022-07-10

We’re a couple of weeks since our 0.2.0 release. Thanks for the surge in interest and contributions! 0.2.2 has some fixes & some internal improvements:

  • We now test against SQLite & DuckDB on every commit, to ensure we’re producing correct SQL. (@aljazerzen)
  • We have the beginning of Java bindings! (@doki23)
  • Idents surrounded by backticks are passed through to SQL (@max-sixty)
  • More examples on homepage; e.g. join & window, lots of small docs improvements
  • Automated releases to homebrew (@roG0d)
  • prql-js is now a single package for Node, browsers & webpack (@charlie-sanders)
  • Parsing has some fixes, including >= and leading underscores in idents (@mklopets)
  • Ranges receive correct syntax highlighting (@max-sixty)

Thanks to Aljaž Mur Eržen @aljazerzen , George Roldugin @roldugin , Jasper McCulloch @Jaspooky , Jie Han @doki23 , Marko Klopets @mklopets , Maximilian Roos @max-sixty , Rodrigo Garcia @roG0d , Ryan Russell @ryanrussell , Steven Maude @StevenMaude , Charlie Sanders @charlie-sanders .

We’re planning to continue collecting bugs & feature requests from users, as well as working on some of the bigger features, like type-inference.

For those interesting in joining, we also have a new Contributing page.

0.2.0 - 2022-06-27

🎉 🎉 After several months of building, PRQL is ready to use! 🎉 🎉


How we got here:

At the end of January, we published a proposal of a better language for data transformation: PRQL. The reception was better than I could have hoped for — we were no. 2 on HackerNews for a day, and gained 2.5K GitHub stars over the next few days.

But man cannot live on GitHub Stars alone — we had to do the work to build it. So over the next several months, during many evenings & weekends, a growing group of us gradually built the compiler, evolved the language, and wrote some integrations.

We want to double-down on the community and its roots in open source — it’s incredible that a few of us from all over the globe have collaborated on a project without ever having met. We decided early-on that PRQL would always be open-source and would never have a commercial product (despite lots of outside interest to fund a seed round!). Because languages are so deep in the stack, and the data stack has so many players, the best chance of building a great language is to build an open language.


We still have a long way to go. While PRQL is usable, it has lots of missing features, and an incredible amount of unfulfilled potential, including a language server, cohesion with databases, and type inference. Over the coming weeks, we’d like to grow the number of intrepid users experimenting PRQL in their projects, prioritize features that will unblock them, and then start fulfilling PRQL’s potential by working through our roadmap.

The best way to experience PRQL is to try it. Check out our website and the Playground. Start using PRQL for your own projects in dbt, Jupyter notebooks and Prefect workflows.

Keep in touch with PRQL by following the project on Twitter, joining us on Discord, starring the repo.

Contribute to the project — we’re a really friendly community, whether you’re a recent SQL user or an advanced Rust programmer. We need bug reports, documentation tweaks & feature requests — just as much as we need compiler improvements written in Rust.


I especially want to give Aljaž Mur Eržen (@aljazerzen) the credit he deserves, who has contributed the majority of the difficult work of building out the compiler. Much credit also goes to Charlie Sanders (@charlie-sanders), one of PRQL’s earliest supporters and the author of PyPrql, and Ryan Patterson-Cross (@rbpatt2019), who built the Jupyter integration among other Python contributions.

Other contributors who deserve a special mention include: @roG0d, @snth, @kwigley


Thank you, and we look forward to your feedback!