Why PRQL?
For data engineers
- PRQL is concise, with abstractions such as variables & functions
- PRQL is database agnostic, compiling to many dialects of SQL
- PRQL isn’t limiting — it can contain embedded SQL where necessary
- PRQL has bindings to most major languages (and more are in progress)
- PRQL allows for column lineage and type inspection (in progress)
For analysts
- PRQL is ergonomic for data exploration — for example, commenting out a filter, or a column in a list, maintains a valid query
- PRQL is simple, and easy to understand, with a small number of powerful concepts
- PRQL allows for powerful autocomplete, type-checking, and helpful error messages (in progress)
For tools
- PRQL’s vision is a foundation to build on; we’re open-source and will never have a commercial product
- PRQL is growing into a single secular standard which tools can target
- PRQL is easy for machines to read & write
For HackerNews enthusiasts
- The PRQL compiler is written in Rust
- We talk about “orthogonal language features” a lot
Showcase
PRQL consists of a curated set of orthogonal transformations, which are combined together to form a pipeline. That makes it easy to compose and extend queries. The language also benefits from modern features, such syntax for dates, ranges and f-strings as well as functions, type checking and better null handling.
Playground Bookfrom employees
select {id, first_name, age}
sort age
take 10
SELECT
id,
first_name,
age
FROM
employees
ORDER BY
age
LIMIT
10
from track_plays
filter plays > 10_000 # Readable numbers
filter (length | in 60..240) # Ranges with `..`
filter recorded > @2008-01-01 # Simple date literals
filter released - recorded < 180days # Nice interval literals
sort {-length} # Concise order direction
SELECT
*
FROM
track_plays
WHERE
plays > 10000
AND length BETWEEN 60 AND 240
AND recorded > DATE '2008-01-01'
AND released - recorded < INTERVAL 180 DAY
ORDER BY
length DESC
from employees
# `filter` before aggregations...
filter start_date > @2021-01-01
group country (
aggregate {max_salary = max salary}
)
# ...and `filter` after aggregations!
filter max_salary > 100_000
SELECT
country,
MAX(salary) AS max_salary
FROM
employees
WHERE
start_date > DATE '2021-01-01'
GROUP BY
country
HAVING
MAX(salary) > 100000
from track_plays
derive {
finished = started - unfinished,
fin_share = finished / started, # Use previous definitions
fin_ratio = fin_share / (1-fin_share), # BTW, hanging commas are optional!
}
SELECT
*,
started - unfinished AS finished,
(started - unfinished) / started AS fin_share,
(started - unfinished) / started / (1 - (started - unfinished) / started)
AS fin_ratio
FROM
track_plays
from web
# Just like Python
select url = f"https://www.{domain}.{tld}/{page}"
SELECT
CONCAT('https://www.', domain, '.', tld, '/', page) AS url
FROM
web
from employees
group employee_id (
sort month
window rolling:12 (
derive {trail_12_m_comp = sum paycheck}
)
)
SELECT
*,
SUM(paycheck) OVER (
PARTITION BY employee_id
ORDER BY
month ROWS BETWEEN 11 PRECEDING AND CURRENT ROW
) AS trail_12_m_comp
FROM
employees
let celsius_to_fahrenheit = temp -> temp * 9/5 + 32
from weather
select temp_f = (celsius_to_fahrenheit temp_c)
SELECT
temp_c * 9 / 5 + 32 AS temp_f
FROM
weather
# Most recent employee in each role
# Quite difficult in SQL...
from employees
group role (
sort join_date
take 1
)
WITH table_0 AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY role
ORDER BY
join_date
) AS _expr_0
FROM
employees
)
SELECT
*
FROM
table_0
WHERE
_expr_0 <= 1
# There's no `version` in PRQL, but s-strings
# let us embed SQL as an escape hatch:
from x
derive db_version = s"version()"
SELECT
*,
version() AS db_version
FROM x
from employees
join b=benefits (==employee_id)
join side:left p=positions (p.id==employees.employee_id)
select {employees.employee_id, p.role, b.vision_coverage}
SELECT
employees.employee_id,
p.role,
b.vision_coverage
FROM
employees
JOIN benefits AS b ON employees.employee_id = b.employee_id
LEFT JOIN positions AS p ON p.id = employees.employee_id
from users
filter last_login != null
filter deleted_at == null
derive channel = channel ?? "unknown"
SELECT
*,
COALESCE(channel, 'unknown') AS channel
FROM
users
WHERE
last_login IS NOT NULL
AND deleted_at IS NULL
prql target:sql.mssql # Will generate TOP rather than LIMIT
from employees
sort age
take 10
SELECT
*
FROM
employees
ORDER BY
age OFFSET 0 ROWS
FETCH
FIRST 10 ROWS ONLY
Principles
Pipelined
A PRQL query is a linear pipeline of transformations
Each line of the query is a transformation of the previous line’s result. This makes it easy to read, and simple to write.
Simple
PRQL serves both sophisticated engineers and analysts without coding experience.
By providing a small number of powerful & orthogonal primitives, queries are simple and composable — there’s only one way of expressing each operation. We can eschew the debt that SQL has built up.
Open
PRQL is open-source, with an open community
PRQL will always be fully open-source and will never have a commercial product. By compiling to SQL, PRQL is compatible with most databases, existing tools, and programming languages that manage SQL. We’re a welcoming community for users, contributors, and other projects.
Extensible
PRQL is designed to be extended, from functions to language bindings
PRQL has abstractions which make it a great platform to build on. Its explicit versioning allows changes without breaking backward-compatibility. And in the cases where PRQL doesn’t yet have an implementation, it allows embedding SQL with s-strings.
Analytical
PRQL’s focus is analytical queries
PRQL was originally designed to serve the growing need of writing analytical queries, emphasizing data transformations, development speed, and readability. We de-emphasize other SQL features such as inserting data or transactions.
Pipelines in action
Integrations
Jupyter/IPython
pyprql contains a Jupyter extension, which executes a PRQL cell against a database. It can also set up an in-memory DuckDB instance, populated with a pandas DataFrame.DuckDB
A DuckDB extension to execute PRQLqStudio
qStudio is a SQL GUI that lets you browse tables, run SQL scripts, and chart and export the results. qStudio runs on Windows, macOS and Linux, and works with every popular database including mysql, postgresql, mssql, kdb…Prefect
Add PRQL models to your Prefect workflows with a single function.Visual Studio Code
Extension with syntax highlighting and live SQL compilation.PostgreSQL
Write PRQL functions in PostgreSQLDatabend
Databend natively supports PRQLTools
Playground
Online in-browser playground that compiles PRQL to SQL as you type.prqlc
A CLI for PRQL compiler, written in Rust.
cargo install prqlc
brew install prqlc
winget install prqlc
Bindings
prqlc-python
Python bindings for prqlc.prqlc-js
JavaScript bindings for prqlc.prqlr
R bindings for prqlc.prqlc
Compiler implementation, written in Rust. Compile, format & annotate PRQL queries.Others
Java, C, C++, Elixir, .NET, and PHP have unsupported or nascent bindings.What people are saying
It starts with FROM, it fixes trailing commas, and it's called PRQL?? If this is a dream, don't wake me up. — Jeremiah Lowin, Founder & CEO, Prefect.