Name resolution
Because PRQL primarily handles relational data, it has specialized scoping rules for referencing columns.
Scopes
In PRQL’s compiler, a scope is the collection of all names one can reference from a specific point in the program.
In PRQL, names in the scope are composed from namespace and variable name which are separated by a dot, similar to SQL. Namespaces can contain many dots, but variable names cannot.
Name my_table.some_column
is a variable some_column
from namespace my_table
.
Name foo.bar.baz
is a variable baz
from namespace foo.bar
.
When processing a query, a scope is maintained and updated for each point in the query.
It start with only namespace std
, which is the standard library. It contains
common functions like sum
or count
, along with all transform functions such
as derive
and group
.
In pipelines (or rather in transform functions), scope is also injected with
namespaces of tables which may have been referenced with from
or join
transforms. These namespaces contain simply all the columns of the table and
possibly a wildcard variable, which matches any variable (see the algorithm
below). Within transforms, there is also a special namespace that does not have
a name. It is called a “frame” and it contains columns of the current table
the transform is operating on.
Resolving
For each ident we want to resolve, we search the scope’s items in order. One of three things can happen:
-
Scope contains an exact match, e.g. a name that matches in namespace and the variable name.
-
Scope does not contain an exact match, but the ident did not specify a namespace, so we can match a namespace that contains a
*
wildcard. If there’s a single namespace, the matched namespace is also updated to contain this new variable name. -
Otherwise, the nothing is matched and an error is raised.
Translating to SQL
When translating into an SQL statement which references only one table, there is no need to reference column names with table prefix.
But when there are multiple tables and we don’t have complete knowledge of all
table columns, a column without a prefix (i.e. first_name
) may actually reside
in multiple tables. Because of this, we have to use table prefixes for all
column names.
PRQL
from employees
derive {first_name, dept_id}
join d=departments (==dept_id)
select {first_name, d.title}
SQL
SELECT
employees.first_name,
d.title
FROM
employees
JOIN departments AS d ON employees.dept_id = d.dept_id
As you can see, employees.first_name
now needs table prefix, to prevent
conflicts with potential column with the same name in departments
table.
Similarly, d.title
needs the table prefix.