Relations

PRQL is designed on top of relational algebra, which is the established data model used by modern SQL databases. A relation has a rigid mathematical definition, which can be simplified to “a table of data”. For example, the invoices table from the Chinook database (https://github.com/lerocha/chinook-database) looks like this:

invoice_idcustomer_idbilling_cityother columnstotal
12Stuttgart1.98
24Oslo3.96
38Brussels5.94
414Edmonton8.91
523Boston13.86
637Frankfurt0.99

A relation is composed of rows. Each row in a relation contains a value for each of the relation’s columns. Each column in a relation has a unique name and a designated data type. The table above is a relation, and has columns named invoice_idand customer_id each with a data type of “integer number”, a billing_city column with a data type of “text”, several other columns, and a total column that contains floating-point numbers.

Queries

The main purpose of PRQL is to build queries that combine and transform data from relations such as the invoices table above. Here is the most basic query:

from invoices

Note

Try each of these examples here in the Playground. Enter the query on the left-hand side, and click output.arrow in the right-hand side to see the result.

The result of the query above is not terribly interesting, it’s just the same relation as before.

select transform

The select function picks the columns to pass through based on a list and discards all others. Formally, that list is a tuple of comma-separated expressions wrapped in { ... }.

Suppose we only need the order_id and total columns. Use select to choose the columns to pass through. (Try it in the Playground.)

from invoices
select { order_id, total }

We can write the items in the tuple on one or several lines: trailing commas are ignored. In addition, we can assign any of the expressions to a variable that becomes the name of the resulting column in the SQL output.

from invoices
select {
  OrderID = invoice_id,
  Total = total,
}

This is the same query as above, rewritten on multiple lines, and assigning OrderID and Total names to the columns.

Once we select certain columns, subsequent transforms will have access only to those columns named in the tuple.

derive transform

To add columns to a relation, we can use the derive function. Let’s define a new column for Value Added Tax, set at 19% of the invoice total.

from invoices
derive { VAT = total * 0.19 }

The value of the new column can be a constant (such as a number or a string), or can be computed from the value of an existing column. Note that the value of the new column is assigned the name VAT.

join transform

The join transform also adds columns to the relation by combining the rows from two relations “side by side”. To determine which rows from each relation should be joined, join has match criteria, written in ( ... ).

from invoices
join customers ( ==customer_id )

This example “connects” the customer information from the customers relation with the information from the invoices relation, using identical values of the customer_id column from each relation to match the rows.

It is frequently useful to assign an alias to both relations being joined together so that each relation’s columns can be referred to uniquely.

from inv=invoices
join cust=customers ( ==customer_id )

In the example above, the alias inv represents the invoices relation and cust represents the customers relation. It then becomes possible to refer to inv.billing_city and cust.last_name unambiguously.

Summary

PRQL manipulates relations (tables) of data. The derive, select, and join transforms change the number of columns in a table. The first two never affect the number of rows in a table. join may change the number of rows, depending on the chosen type of join.

This final example combines the above into a single query. It illustrates a pipeline - the fundamental basis of PRQL. We simply add new lines (transforms) at the end of the query. Each transform modifies the relation produced by the statement above to produce the desired result.

from inv=invoices
join cust=customers (==customer_id)
derive { VAT = inv.total * 0.19 }
select {
  OrderID = inv.invoice_id,
  CustomerName = cust.last_name,
  Total = inv.total,
  VAT,
}