# Calculate the digits of Pi with DuckDB and PRQL

Tobias Brandt· ·
Updated

TL;DR: PRQL recently added a `loop` construct which makes it Turing Complete and allows doing cool things like calculating Pi right in your database.

## Background

Last week saw the 0.6 release of PRQL which brought with it the capability to express Recursive CTEs in PRQL. “Recursive” CTEs aren’t actually truly recursive in the sense that that term is usually used, rather they use a “recursive” (i.e. self-referential) syntax to provide a looping construct in SQL.

PRQL is a modern, functional query language for transforming data. One of its goals is to simplify working with data wherever you can currently use SQL. As such it compiles to SQL while making available modern ergonomics such as f-strings, as well as not so modern features such as functions.

Given that the underlying semantics of Recursive CTEs are really about iteration or “looping”, we have called this feature `loop` in PRQL.

## Introducing `loop`

Recursive CTEs in SQL consist of two parts, an `initial_query` and an `update_query`. First the `initial_query` is executed and then the rows produced are fed to `update_query` which is applied to the result set. The `update_query` is then iteratively applied to the rows produced in the last iteration until no more rows are produced, at which point iteration stops. (For a great review of this as well as some interesting proposals to extend the semantics of Recursive CTEs in SQL, see the paper “A Fix for the Fixation on Fixpoints” by Denis Hirn and Torsten Grust.)

This behavior can be expressed with following pseudo-code:

``````def loop(step, initial_query):
result = []
current = initial_query()
while current is not empty:
result = append(result, current)
current = update_query(current)

return result
``````

The minimal `loop` example from the documentation in the PRQL book is:

``````from_text format:json '[{"n": 1 }]'
loop (
select n = n+1
filter n<=3
)
``````

Here we use a PRQL utility function `from_text` to conveniently turn a JSON representation of some example data into a SQL table (`from_text` currently also accepts CSV input).

Initially a row with `{"n":1}` is fed in. Then the update query is applied to this which in this case just increments `n` by one and filters the result set to only the rows where `n` is less than or equal to three. This produces `{"n":2}` in the next step and `{"n":3}` after that. On the next iteration `{"n":4}` is produced but that is eliminated by the filter condition. Since that leaves no new rows, iteration stops.

If you try the query above in the PRQL Playground, the result set you get (in the “output.arrow” tab) is:

n
1
2
3

## Fibonacci Numbers

Inspired by this, let’s try to calculate Fibonacci numbers which are often one of the first examples when recursion is introduced:

``````from_text format:json '[{"a":1, "b":1}]'
loop (
derive b_new = a + b
select [a=b, b=b_new]
)
take 7
``````

which produces the first 7 Fibonacci numbers.

a
1
1
2
3
5
8
13

You might have noticed that we didn’t actually include a `filter` in our loop this time. Instead we relied on the fact that DuckDB produces results lazily and since we only took 7 numbers it only produced what we needed.

Now, let’s set ourselves a bigger challenge.

## Calculating the digits of pi

March 14th is written on American calendars as 3/14 which reminds us of the first three digits of Pi=3.1415926535… . Therefore this day is commonly known as Pi day. In order to celebrate Pi Day 2023 and the recent release of `loop` in PRQL 0.6, why don’t we try to calculate the digits of Pi in PRQL!

For our query engine we will use DuckDB because it is really fast and has many modern features that make it ideally suited for the kinds of analytical queries that PRQL is targeting. There is even a duckdb-prql extension which allows you to write PRQL queries right inside DuckDB!

We follow the algorithms in the paper “Unbounded Spigot Algorithms for the Digits of Pi” by Jeremy Gibbons, in particular Rabinowitz and Wagon’s Spigot Algorithm. While it is not the most efficient algorithm presented in the paper, it has the advantage that it uses only standard data types while the more efficient algorithms rely on unbounded integer types which are not available in many database systems.

For my implementation, I adapted the Python implementation by John Burkardt found here. A bit of care had to be taken in porting the implementation to PRQL, as PRQL like SQL, is stateless so I introduced another state variable `k` to track the position inside the body of the loop. In order to ensure that all values of `k` are handled in the query we define a PRQL function called `loop_steps`. The ability to define functions in PRQL is one of its key features and I feel the query below really shows the power that the composability of functions brings to PRQL. PRQL is a functional language with features such as currying. For more details see our previous post A functional approach to relational queries.

The resulting PRQL query is the following:

``````prql target:sql.duckdb

let config = (
from_text format:json '[{"num_digits":50}]'
derive {
array_len = (10*num_digits)/3,
calc_len = 1+4,
loop_len = array_len + calc_len,
}
)

let loop_steps = step_0 step_i step_1 step_2 step_3 other -> case [
k==0 => step_0,
1 <= k and k <= array_len => step_i,
k==array_len+1 => step_1,
k==array_len+2 => step_2,
k==array_len+3 => step_3,
true => other,
]

let q_steps = step_q9 step_q10 step_j2 step_jg2 other -> case [
q==9 => step_q9,
q==10 => step_q10,
j==2 => step_j2,
j>2 => step_jg2,
true => other,
]

from config
select [
num_digits,
array_len,
loop_len,
j = 0,
k = 0,
q = 0,
a = s"[2 for i in range({array_len})]",
nines = 0,
predigit = 0,
output = '',
]
loop (
filter j < num_digits + 1
derive [
j_new = case [k==0 => j+1, true => j],
k_new = (k+1) % loop_len,
q_step_i = (10*s"{a}[{k}]"+q*(array_len-k+1))/(2*(array_len-k)+1),
q_new = loop_steps 0 q_step_i (q/10) q q q,

a_step_i = s"[CASE WHEN i=={k} THEN (10*{a}[i]+{q}*({array_len}-i+1))%(2*({array_len}-i)+1) ELSE {a}[i] END for i in generate_series(1,{array_len})]",
a_step_1 = s"[CASE WHEN i=={array_len} THEN {q}%10 ELSE {a}[i] END for i in generate_series(1,{array_len})]",
a_new = loop_steps a a_step_i a_step_1 a a a,

nines_new = loop_steps nines nines nines (q_steps (nines+1) 0 nines nines nines) (case [q!=9 and q!=10 and nines!=0 => 0, true => nines]) nines,
predigit_new = loop_steps predigit predigit predigit (q_steps predigit 0 q q q) predigit predigit,

output_step_2 = (q_steps '' s"({predigit}+1)::string || repeat('0', {nines})" s"{predigit}::string || '.'" s"{predigit}::string" ''),
output_step_3 = (case [q!=9 and q!=10 and nines!=0 => s"repeat('9', {nines})", true => '']),
output_new = loop_steps '' '' '' output_step_2 output_step_3 '',
]
select {
num_digits,
array_len,
loop_len,
j = j_new,
k = k_new,
q = q_new,
a = a_new,
nines = nines_new,
predigit = predigit_new,
output = output_new,
}
)
aggregate [pi=s"string_agg({output}, '')"]
``````

Go ahead and run this query right now in your browser with our Online PRQL Playground! There you can also see the SQL that is produced in the `output.sql` tab (for you convenience reproduced in the Appendix below) as well as the result of the calculation in the `output.arrow` tab.

The output you see will be something like the following:

``````┌────────────────────────────────────────────────────┐
│                         pi                         │
│                      varchar                       │
├────────────────────────────────────────────────────┤
│ 3.141592653589793238462643383279502884197169399375 │
└────────────────────────────────────────────────────┘
``````

Why don’t you play around with the `num_digits` parameter to try and see how many digits you can get on your laptop? (Unfortunately this algorithm is quite a bit more inefficient than the equivalent Python implementation as Recursive CTEs currently don’t allow for state to be kept outside of the result set.)

Something you might have noticed is that there are some expressions surrounded by `s""`. This is called an s-string (s for SQL) and allows us to include raw SQL inside our PRQL queries. We use this to include features from DuckDB that haven’t made it into PRQL yet, such as the Pythonesque list comprehensions.

The `loop` functionality in PRQL is brand new and is marked as experimental and we will be working to stabilise the feature and iterate on the design to make it as easy and useful as possible. We felt that it was exciting enough to share it at this stage (and in time for Pi Day) and make it available for you to try out and play around with. PRQL is developed completely in the open so let us know your use cases so that we can make PRQL the best tool for the data challenges that you face.

## Conclusion

Recursive CTEs in SQL make SQL Turing Complete and the same should hold for PRQL now that `loop` is included. As we saw above, not all algorithms are easily expressed in this paradigm, so while theoretically they could be, whether they should be is another question. However I hope this example demonstrates that `loop` brings great power to PRQL and in follow up posts I will demonstrate how we can use this to do tree and graph traversals which do come up in practice in the kind of analytical data work that PRQL is made for. After that I will also look at online algorithms such as moving averages an online gradient descent, so be sure to come back for those!

## Appendix

The following is the SQL query that was produced and run in DuckDB:

``````WITH table_0 AS (
SELECT
50 AS num_digits
),
config AS (
SELECT
num_digits,
10 * num_digits / 3 AS array_len,
5 AS calc_len,
10 * num_digits / 3 + 5 AS loop_len
FROM
table_0 AS table_1
),
table_6 AS (
WITH RECURSIVE loop AS (
SELECT
num_digits,
array_len,
loop_len,
0 AS _expr_0,
0 AS _expr_1,
0 AS _expr_2,
[2 for i in range(array_len)] AS _expr_3,
0 AS _expr_4,
0 AS _expr_5,
'' AS _expr_6
FROM
config
UNION
ALL
SELECT
num_digits,
array_len,
loop_len,
_expr_12 AS _expr_15,
_expr_11 AS _expr_16,
_expr_10 AS _expr_17,
_expr_9 AS _expr_18,
_expr_8 AS _expr_19,
_expr_7 AS _expr_20,
CASE
WHEN _expr_1 = 0 THEN ''
WHEN 1 <= _expr_1
AND _expr_1 <= array_len THEN ''
WHEN _expr_1 = array_len + 1 THEN ''
WHEN _expr_1 = array_len + 2 THEN _expr_13
WHEN _expr_1 = array_len + 3 THEN _expr_14
ELSE ''
END
FROM
(
SELECT
num_digits,
array_len,
loop_len,
CASE
WHEN _expr_1 = 0 THEN _expr_5
WHEN 1 <= _expr_1
AND _expr_1 <= array_len THEN _expr_5
WHEN _expr_1 = array_len + 1 THEN _expr_5
WHEN _expr_1 = array_len + 2 THEN CASE
WHEN _expr_2 = 9 THEN _expr_5
WHEN _expr_2 = 10 THEN 0
WHEN _expr_0 = 2 THEN _expr_2
WHEN _expr_0 > 2 THEN _expr_2
ELSE _expr_2
END
WHEN _expr_1 = array_len + 3 THEN _expr_5
ELSE _expr_5
END AS _expr_7,
CASE
WHEN _expr_1 = 0 THEN _expr_4
WHEN 1 <= _expr_1
AND _expr_1 <= array_len THEN _expr_4
WHEN _expr_1 = array_len + 1 THEN _expr_4
WHEN _expr_1 = array_len + 2 THEN CASE
WHEN _expr_2 = 9 THEN _expr_4 + 1
WHEN _expr_2 = 10 THEN 0
WHEN _expr_0 = 2 THEN _expr_4
WHEN _expr_0 > 2 THEN _expr_4
ELSE _expr_4
END
WHEN _expr_1 = array_len + 3 THEN CASE
WHEN _expr_2 <> 9
AND _expr_2 <> 10
AND _expr_4 <> 0 THEN 0
ELSE _expr_4
END
ELSE _expr_4
END AS _expr_8,
CASE
WHEN _expr_1 = 0 THEN _expr_3
WHEN 1 <= _expr_1
AND _expr_1 <= array_len THEN [CASE WHEN i==_expr_1 THEN (10*_expr_3[i] + _expr_2 *(array_len - i + 1)
) %(2 *(array_len - i) + 1)
ELSE _expr_3 [i]
END for i in generate_series(1, array_len) ]
WHEN _expr_1 = array_len + 1 THEN [CASE WHEN i==array_len THEN _expr_2%10 ELSE _expr_3[i]
END for i in generate_series(1, array_len) ]
WHEN _expr_1 = array_len + 2 THEN _expr_3
WHEN _expr_1 = array_len + 3 THEN _expr_3
ELSE _expr_3
END AS _expr_9,
CASE
WHEN _expr_1 = 0 THEN 0
WHEN 1 <= _expr_1
AND _expr_1 <= array_len THEN (
10 * _expr_3 [_expr_1] + _expr_2 * (array_len - _expr_1 + 1)
) / (2 * (array_len - _expr_1) + 1)
WHEN _expr_1 = array_len + 1 THEN _expr_2 / 10
WHEN _expr_1 = array_len + 2 THEN _expr_2
WHEN _expr_1 = array_len + 3 THEN _expr_2
ELSE _expr_2
END AS _expr_10,
(_expr_1 + 1) % loop_len AS _expr_11,
CASE
WHEN _expr_1 = 0 THEN _expr_0 + 1
ELSE _expr_0
END AS _expr_12,
_expr_1,
CASE
WHEN _expr_2 = 9 THEN ''
WHEN _expr_2 = 10 THEN (_expr_5 + 1) :: string || repeat('0', _expr_4)
WHEN _expr_0 = 2 THEN _expr_5 :: string || '.'
WHEN _expr_0 > 2 THEN _expr_5 :: string
ELSE ''
END AS _expr_13,
CASE
WHEN _expr_2 <> 9
AND _expr_2 <> 10
AND _expr_4 <> 0 THEN repeat('9', _expr_4)
ELSE ''
END AS _expr_14,
_expr_2,
_expr_4
FROM
loop AS table_2
WHERE
_expr_0 < num_digits + 1
) AS table_3
)
SELECT
*
FROM
loop
)
SELECT
string_agg(_expr_6, '') AS pi
FROM
table_6 AS table_5

-- Generated by PRQL compiler version:0.6.1 (https://prql-lang.org)
``````