Pipelined Relational Query Language, pronounced “Prequel”

PRQL is a modern language for transforming data

— a simple, powerful, pipelined SQL replacement

          from invoices
filter invoice_date >= @1970-01-16
derive {
  transaction_fees = 0.8,
  income = total - transaction_fees
}
filter income > 1
group customer_id (
  aggregate {
    average total,
    sum_income = sum income,
    ct = count total,
  }
)
sort {-sum_income}
take 10
join c=customers (==customer_id)
derive name = f"{c.last_name}, {c.first_name}"
select {
  c.customer_id, name, sum_income
}
derive db_version = s"version()"

        

Why PRQL?

For data engineers

  • PRQL is concise, with abstractions such as variables & functions
  • PRQL is database agnostic, compiling to many dialects of SQL
  • PRQL isn’t limiting — it can contain embedded SQL where necessary
  • PRQL has bindings to most major languages (and more are in progress)
  • PRQL allows for column lineage and type inspection (in progress)

For analysts

  • PRQL is ergonomic for data exploration — for example, commenting out a filter, or a column in a list, maintains a valid query
  • PRQL is simple, and easy to understand, with a small number of powerful concepts
  • PRQL allows for powerful autocomplete, type-checking, and helpful error messages (in progress)

For tools

  • PRQL’s vision is a foundation to build on; we’re open-source and will never have a commercial product
  • PRQL is growing into a single secular standard which tools can target
  • PRQL is easy for machines to read & write

For HackerNews enthusiasts

  • The PRQL compiler is written in Rust
  • We talk about “orthogonal language features” a lot

Showcase

PRQL consists of a curated set of orthogonal transformations, which are combined together to form a pipeline. That makes it easy to compose and extend queries. The language also benefits from modern features, such syntax for dates, ranges and f-strings as well as functions, type checking and better null handling.

Playground Book
from employees
select {id, first_name, age}
sort age
take 10
SELECT
  id,
  first_name,
  age
FROM
  employees
ORDER BY
  age
LIMIT
  10
from track_plays
filter plays > 10_000                # Readable numbers
filter (length | in 60..240)         # Ranges with `..`
filter recorded > @2008-01-01        # Simple date literals
filter released - recorded < 180days # Nice interval literals
sort {-length}                       # Concise order direction
SELECT
  *
FROM
  track_plays
WHERE
  plays > 10000
  AND length BETWEEN 60 AND 240
  AND recorded > DATE '2008-01-01'
  AND released - recorded < INTERVAL 180 DAY
ORDER BY
  length DESC
from employees
# `filter` before aggregations...
filter start_date > @2021-01-01
group country (
  aggregate {max_salary = max salary}
)
# ...and `filter` after aggregations!
filter max_salary > 100_000
SELECT
  country,
  MAX(salary) AS max_salary
FROM
  employees
WHERE
  start_date > DATE '2021-01-01'
GROUP BY
  country
HAVING
  MAX(salary) > 100000
from track_plays
derive {
  finished = started + unfinished,
  fin_share = finished / started,        # Use previous definitions
  fin_ratio = fin_share / (1-fin_share), # BTW, hanging commas are optional!
}
SELECT
  *,
  started + unfinished AS finished,
  (started + unfinished) / started AS fin_share,
  (started + unfinished) / started / (1 - (started + unfinished) / started)
   AS fin_ratio
FROM
  track_plays
from web
# Just like Python
select url = f"https://www.{domain}.{tld}/{page}"
SELECT
  CONCAT('https://www.', domain, '.', tld, '/', page) AS url
FROM
  web
from employees
group employee_id (
  sort month
  window rolling:12 (
    derive {trail_12_m_comp = sum paycheck}
  )
)
SELECT
  *,
  SUM(paycheck) OVER (
    PARTITION BY employee_id
    ORDER BY
      month ROWS BETWEEN 11 PRECEDING AND CURRENT ROW
  ) AS trail_12_m_comp
FROM
  employees
let celsius_to_fahrenheit = temp -> temp * 9/5 + 32

from weather
select temp_f = (celsius_to_fahrenheit temp_c)
SELECT
  temp_c * 9 / 5 + 32 AS temp_f
FROM
  weather
# Most recent employee in each role
# Quite difficult in SQL...
from employees
group role (
  sort join_date
  take 1
)
WITH table_0 AS (
  SELECT
    *,
    ROW_NUMBER() OVER (
      PARTITION BY role
      ORDER BY
        join_date
    ) AS _expr_0
  FROM
    employees
)
SELECT
  *
FROM
  table_0
WHERE
  _expr_0 <= 1
# There's no `version` in PRQL, but s-strings
# let us embed SQL as an escape hatch:
from x
derive db_version = s"version()"
SELECT
  *,
  version() AS db_version
FROM x
from employees
join b=benefits (==employee_id)
join side:left p=positions (p.id==employees.employee_id)
select {employees.employee_id, p.role, b.vision_coverage}
SELECT
  employees.employee_id,
  p.role,
  b.vision_coverage
FROM
  employees
  JOIN benefits AS b ON employees.employee_id = b.employee_id
  LEFT JOIN positions AS p ON p.id = employees.employee_id
from users
filter last_login != null
filter deleted_at == null
derive channel = channel ?? "unknown"
SELECT
  *,
  COALESCE(channel, 'unknown') AS channel
FROM
  users
WHERE
  last_login IS NOT NULL
  AND deleted_at IS NULL
prql target:sql.mssql  # Will generate TOP rather than LIMIT

from employees
sort age
take 10
SELECT
  *
FROM
  employees
ORDER BY
  age OFFSET 0 ROWS
FETCH
  FIRST 10 ROWS ONLY

Principles

Pipelined

A PRQL query is a linear pipeline of transformations

Each line of the query is a transformation of the previous line’s result. This makes it easy to read, and simple to write.

Simple

PRQL serves both sophisticated engineers and analysts without coding experience.

By providing a small number of powerful & orthogonal primitives, queries are simple and composable — there’s only one way of expressing each operation. We can eschew the debt that SQL has built up.

Open

PRQL is open-source, with an open community

PRQL will always be fully open-source and will never have a commercial product. By compiling to SQL, PRQL is compatible with most databases, existing tools, and programming languages that manage SQL. We’re a welcoming community for users, contributors, and other projects.

Extensible

PRQL is designed to be extended, from functions to language bindings

PRQL has abstractions which make it a great platform to build on. Its explicit versioning allows changes without breaking backward-compatibility. And in the cases where PRQL doesn’t yet have an implementation, it allows embedding SQL with s-strings.

Analytical

PRQL’s focus is analytical queries

PRQL was originally designed to serve the growing need of writing analytical queries, emphasizing data transformations, development speed, and readability. We de-emphasize other SQL features such as inserting data or transactions.

Pipelines in action

Integrations

ClickHouse

ClickHouse natively supports PRQL with

SET dialect = 'prql'

Jupyter/IPython

pyprql contains a Jupyter extension, which executes a PRQL cell against a database. It can also set up an in-memory DuckDB instance, populated with a pandas DataFrame.

DuckDB

A DuckDB extension to execute PRQL

qStudio

qStudio is a SQL GUI that lets you browse tables, run SQL scripts, and chart and export the results. qStudio runs on Windows, macOS and Linux, and works with every popular database including mysql, postgresql, mssql, kdb…

Prefect

Add PRQL models to your Prefect workflows with a single function.

Visual Studio Code

Extension with syntax highlighting and live SQL compilation.

PostgreSQL

Write PRQL functions in PostgreSQL

Databend

Databend natively supports PRQL

Tools

Playground

Online in-browser playground that compiles PRQL to SQL as you type.

pyprql

Provides Jupyter/IPython cell magic and Pandas accessor.

pip install pyprql

prqlc

A CLI for PRQL compiler, written in Rust.

cargo install prqlc

brew install prqlc

winget install prqlc

Bindings

prqlc-python

Python bindings for prqlc.

prqlc-js

JavaScript bindings for prqlc.

prqlr

R bindings for prqlc.

prqlc

Compiler implementation, written in Rust. Compile, format & annotate PRQL queries.

Others

Java, C, C++, Elixir, .NET, and PHP have unsupported or nascent bindings.

What people are saying

It starts with FROM, it fixes trailing commas, and it's called PRQL?? If this is a dream, don't wake me up. — Jeremiah Lowin, Founder & CEO, Prefect.

Hamilton Ulmer

@hamiltonulmer
very excited for prql!
2
13:03 · May 6, 2022
Twitter

Armin Ronacher

@mitsuhiko
Oh wow I missed this. Clickhouse now supports PRQL: https://github.com/ClickHouse/ClickHouse/pull/50686
49
13:03 · Jul 25, 2022
Twitter

Swanand.

@_swanand
A few years ago, I started working on a language, called "dsql", short for declarative SQL, and a pun on "the sequel (to SQL)". I kinda chickened out of it then, the amount of study and research I needed was massive. prql here is better than I imagined: github.com/max-sixty/prql
20
13:18 · Jan 25, 2022
Twitter

Rishabh Software

@RishabhSoft
SQL's hold on data retrieval is slipping! 8 new databases are emerging, and some speak entirely new languages for data querying. Know more infoworld.com/article/365490… #SQL #DataQuery #GraphQL #PRQL #WebAssembly
0
16:32 · Apr 13, 2022
Twitter

Burak Emir

@burakemir
I want to give the PRQL a little boost here, "pipeline of transformations" is IMHO a good choice for readable query languages that need to deal with SQL-like aggregations, group by and count and sum all: github.com/max-sixty/prql
2
12:52 · Jan 25, 2022
Twitter