PRQL

Why PRQL?

For data engineers

PRQL is concise, with abstractions such as variables & functions
PRQL is database agnostic, compiling to many dialects of SQL
PRQL isn’t limiting — it can contain embedded SQL where necessary
PRQL has bindings to most major languages (and more are in progress)
PRQL allows for column lineage and type inspection (in progress)

For analysts

PRQL is ergonomic for data exploration — for example, commenting out a filter, or a column in a list, maintains a valid query
PRQL is simple, and easy to understand, with a small number of powerful concepts
PRQL allows for powerful autocomplete, type-checking, and helpful error messages (in progress)

For tools

PRQL’s vision is a foundation to build on; we’re open-source and will never have a commercial product
PRQL is growing into a single secular standard which tools can target
PRQL is easy for machines to read & write

For HackerNews enthusiasts

The PRQL compiler is written in Rust
We talk about “orthogonal language features” a lot

Showcase

PRQL consists of a curated set of orthogonal transformations, which are combined together to form a pipeline. That makes it easy to compose and extend queries. The language also benefits from modern features, such syntax for dates, ranges and f-strings as well as functions, type checking and better null handling.

Playground Book

from employees
select {id, first_name, age}
sort age
take 10

SELECT
  id,
  first_name,
  age
FROM
  employees
ORDER BY
  age
LIMIT
  10

from track_plays
filter plays > 10_000                # Readable numbers
filter (length | in 60..240)         # Ranges with `..`
filter recorded > @2008-01-01        # Simple date literals
filter released - recorded < 180days # Nice interval literals
sort {-length}                       # Concise order direction

SELECT
  *
FROM
  track_plays
WHERE
  plays > 10000
  AND length BETWEEN 60 AND 240
  AND recorded > DATE '2008-01-01'
  AND released - recorded < INTERVAL 180 DAY
ORDER BY
  length DESC

from employees
# `filter` before aggregations...
filter start_date > @2021-01-01
group country (
  aggregate {max_salary = max salary}
)
# ...and `filter` after aggregations!
filter max_salary > 100_000

SELECT
  country,
  MAX(salary) AS max_salary
FROM
  employees
WHERE
  start_date > DATE '2021-01-01'
GROUP BY
  country
HAVING
  MAX(salary) > 100000

from track_plays
derive {
  finished = started - unfinished,
  fin_share = finished / started,        # Use previous definitions
  fin_ratio = fin_share / (1-fin_share), # BTW, hanging commas are optional!
}

SELECT
  *,
  started - unfinished AS finished,
  (started - unfinished) / started AS fin_share,
  (started - unfinished) / started / (1 - (started - unfinished) / started)
   AS fin_ratio
FROM
  track_plays

from web
# Just like Python
select url = f"https://www.{domain}.{tld}/{page}"

SELECT
  CONCAT('https://www.', domain, '.', tld, '/', page) AS url
FROM
  web

from employees
group employee_id (
  sort month
  window rolling:12 (
    derive {trail_12_m_comp = sum paycheck}
  )
)

SELECT
  *,
  SUM(paycheck) OVER (
    PARTITION BY employee_id
    ORDER BY
      month ROWS BETWEEN 11 PRECEDING AND CURRENT ROW
  ) AS trail_12_m_comp
FROM
  employees

let celsius_to_fahrenheit = temp -> temp * 9/5 + 32

from weather
select temp_f = (celsius_to_fahrenheit temp_c)

SELECT
  temp_c * 9 / 5 + 32 AS temp_f
FROM
  weather

# Most recent employee in each role
# Quite difficult in SQL...
from employees
group role (
  sort join_date
  take 1
)

WITH table_0 AS (
  SELECT
    *,
    ROW_NUMBER() OVER (
      PARTITION BY role
      ORDER BY
        join_date
    ) AS _expr_0
  FROM
    employees
)
SELECT
  *
FROM
  table_0
WHERE
  _expr_0 <= 1

# There's no `version` in PRQL, but s-strings
# let us embed SQL as an escape hatch:
from x
derive db_version = s"version()"

SELECT
  *,
  version() AS db_version
FROM x

from employees
join b=benefits (==employee_id)
join side:left p=positions (p.id==employees.employee_id)
select {employees.employee_id, p.role, b.vision_coverage}

SELECT
  employees.employee_id,
  p.role,
  b.vision_coverage
FROM
  employees
  JOIN benefits AS b ON employees.employee_id = b.employee_id
  LEFT JOIN positions AS p ON p.id = employees.employee_id

from users
filter last_login != null
filter deleted_at == null
derive channel = channel ?? "unknown"

SELECT
  *,
  COALESCE(channel, 'unknown') AS channel
FROM
  users
WHERE
  last_login IS NOT NULL
  AND deleted_at IS NULL

prql target:sql.mssql  # Will generate TOP rather than LIMIT

from employees
sort age
take 10

SELECT
  *
FROM
  employees
ORDER BY
  age OFFSET 0 ROWS
FETCH
  FIRST 10 ROWS ONLY

Principles

Pipelined

A PRQL query is a linear pipeline of transformations

Each line of the query is a transformation of the previous line’s result. This makes it easy to read, and simple to write.

Simple

PRQL serves both sophisticated engineers and analysts without coding experience.

By providing a small number of powerful & orthogonal primitives, queries are simple and composable — there’s only one way of expressing each operation. We can eschew the debt that SQL has built up.

Open

PRQL is open-source, with an open community

PRQL will always be fully open-source and will never have a commercial product. By compiling to SQL, PRQL is compatible with most databases, existing tools, and programming languages that manage SQL. We’re a welcoming community for users, contributors, and other projects.

Extensible

PRQL is designed to be extended, from functions to language bindings

PRQL has abstractions which make it a great platform to build on. Its explicit versioning allows changes without breaking backward-compatibility. And in the cases where PRQL doesn’t yet have an implementation, it allows embedding SQL with s-strings.

Analytical

PRQL’s focus is analytical queries

PRQL was originally designed to serve the growing need of writing analytical queries, emphasizing data transformations, development speed, and readability. We de-emphasize other SQL features such as inserting data or transactions.

Pipelines in action

Integrations

ClickHouse

ClickHouse natively supports PRQL with

SET dialect = 'prql'

Jupyter/IPython

pyprql contains a Jupyter extension, which executes a PRQL cell against a database. It can also set up an in-memory DuckDB instance, populated with a pandas DataFrame.

DuckDB

A DuckDB extension to execute PRQL

QStudio

QStudio is a SQL GUI that lets you browse tables, run SQL scripts, and chart and export the results. QStudio runs on Windows, macOS and Linux, and works with every popular database including mysql, postgresql, mssql, kdb…

Prefect

Add PRQL models to your Prefect workflows with a single function.

Visual Studio Code

Extension with syntax highlighting and live SQL compilation.

PostgreSQL

Write PRQL functions in PostgreSQL

Databend

Databend natively supports PRQL

Tools

Playground

Online in-browser playground that compiles PRQL to SQL as you type.

pyprql

Provides Jupyter/IPython cell magic and Pandas accessor.

pip install pyprql

prqlc

A CLI for PRQL compiler, written in Rust.

cargo install prqlc

brew install prqlc

winget install prqlc

Bindings

prqlc-python

Python bindings for prqlc.

prqlc-js

JavaScript bindings for prqlc.

prqlr

R bindings for prqlc.

prqlc

Compiler implementation, written in Rust. Compile, format & annotate PRQL queries.

Others

Java, C, C++, Elixir, .NET, and PHP have unsupported or nascent bindings.

What people are saying

It starts with FROM, it fixes trailing commas, and it's called PRQL?? If this is a dream, don't wake me up. — Jeremiah Lowin, Founder & CEO, Prefect.

Hamilton Ulmer

@hamiltonulmer

very excited for prql!

13:03 · May 6, 2022

Twitter

Armin Ronacher

@mitsuhiko

Oh wow I missed this. Clickhouse now supports PRQL: https://github.com/ClickHouse/ClickHouse/pull/50686

13:03 · Jul 25, 2022

Twitter

I'm also really excited about efforts to create entire new query languages that compile to SQL, like Malloy and PRQL. — Wes McKinney, Creator of Pandas

Swanand.

@_swanand

A few years ago, I started working on a language, called "dsql", short for declarative SQL, and a pun on "the sequel (to SQL)". I kinda chickened out of it then, the amount of study and research I needed was massive. prql here is better than I imagined: github.com/max-sixty/prql

13:18 · Jan 25, 2022

Twitter

Column aliases would have saved me hundreds of hours over the course of my career. — @dvasdekis

Rishabh Software

@RishabhSoft

SQL's hold on data retrieval is slipping! 8 new databases are emerging, and some speak entirely new languages for data querying. Know more infoworld.com/article/365490… #SQL #DataQuery #GraphQL #PRQL #WebAssembly

16:32 · Apr 13, 2022

Twitter

Burak Emir

@burakemir

I want to give the PRQL a little boost here, "pipeline of transformations" is IMHO a good choice for readable query languages that need to deal with SQL-like aggregations, group by and count and sum all: github.com/max-sixty/prql

12:52 · Jan 25, 2022

Twitter

Having written some complex dbt projects...the first thing...it gets right is to start with the table and work down. This is an enormous readability boost in large projects and leads to great intellisense. — Ruben Slabbert

PRQL

Pipelined Relational Query Language, pronounced “Prequel”

PRQL is a modern language for transforming data

Why PRQL?

For data engineers

For analysts

For tools

For HackerNews enthusiasts

Showcase

Principles

Pipelined

Simple

Open

Extensible

Analytical

Pipelines in action

Integrations

ClickHouse

Jupyter/IPython

DuckDB

QStudio

Prefect

Visual Studio Code

PostgreSQL

Databend

Tools

Playground

pyprql

prqlc

Bindings

prqlc-python

prqlc-js

prqlr

prqlc

Others

What people are saying