Modules
The module
facility is in discussion.
This page documents our understanding of the way
the final PRQL compiler will likely work.
The PRQL compiler currently uses these techniques to compile
the std
, date
, text
, and math
modules into the language.
However, at this time (Spring 2024), the module
facility does not work
within a PRQL query itself.
That is, a module
statement in a query cannot import files
from the local file system.
Design goals for modules:
-
Allow importing declarations from other files.
-
Have namespaces for things like
std
. -
Have a hierarchical structure so we can represent files in directories.
-
Have an unambiguous module structure within a project.
Definition
A module is a namespace that contains declarations. A module is itself a declaration, which means that it can contain nested child modules.
This means that modules form a tree graph, which we call “the module structure”.
For the sake of this document, we will express the module structure with
module
keyword and a code block encased in curly braces:
module my_playlists {
let bangers = ... # a declaration
module soundtracks {
let movie_albums = ... # another declaration
}
}
The syntax
module name { ...decls... }
is not part of PRQL language, with the objection that it is unnecessary as it only adds more ways of defining modules. If a significant upside of this syntax is found, it may be added in the future.
Name resolution
Any declarations within a module can be referenced from the outside of the module:
# using module structure declared above
module my_playlists
let great_tracks = my_playlists.bangers
let movie_scores = my_playlists.soundtracks.movie_albums
Identifiers are resolved relative to current module.
module my_playlists {
module soundtracks {
let movie_albums = (from albums | filter id == 3)
}
from soundtracks.movie_albums
}
from my_playlists.soundtracks.movie_albums
If an identifier cannot be resolved relative to the current module, it tries to resolve relative to the parent module. This is repeated, stepping up the module hierarchy until a match is found or root of the module structure is reached.
module my_playlists {
let decl_1 = ...
module soundtracks {
let decl_2 = ...
}
module upbeat_rock {
let decl_3 = ...
from decl_1 | join soundtracks.decl2 | join decl_3
}
}
Main var declaration
The final variable declaration in a module can omit the leading let main =
and
acquire an implicit name main.
module my_playlists {
let bangers = (from tracks | take 10)
from playlists | take 10
}
let album_titles = my_playlists.main
When a module is referenced as a value, the main
variable is used instead.
This is especially useful when referring to a module which is to be compiled to
RQ (and later SQL).
# last line from previous example could thus be shortened to:
let album_titles = my_playlists
File importing
The examples below do not work.
At this time (Spring 2024), the module
facility does not work
within a PRQL query itself.
That is, a module
statement in a query cannot import files
from the local file system.
To include PRQL source code from other files, we can use the following syntax:
module my_playlists
This loads either ./my_playlists.prql
(a leaf module) or
./my_playlists/_my_playlists.prql
(a directory module) and uses its contents
as module my_playlists
. If none or both of the files are present, a
compilation error is raised.
Only directory modules can contain module declarations. If a leaf module contains a module declaration, a compilation error is raised, suggesting the leaf module to be converted into a directory module. This is a step toward any module structure having a single “normalized” representation in the file system. Such normalization is desired because it restrains the possible file system layouts to a comprehensible and predictable layout, while not sacrificing any functionality.
Described importing rules don’t achieve this “single normalized representation” in full, since any leaf modules could be replaced by a directory module with zero submodules, without any semantic changes. Restricting directory modules to have at least one sub-module would not improve approachability enough to justify adding this restriction.
For example, the following module structure is annotated with files names in which the modules would reside:
module my_project {
# _my_project.prql
module sales {
# sales.prql
}
module projections {
# projections/_projections.prql
module year_2023 {
# projections/year_2023.prql
}
module year_2024 {
# projections/year_2024.prql
}
}
}
If module my_project.sales
wants to add a submodule util
, it has to be
converted to a directory modules. This means that it has to be moved to
sales/_sales.prql
. The new module would reside in sales/util.prql
.
The annotated layout is not the only possible layout for this module structure,
since any of the modules sales
, year_2023
or year_2024
could be converted
into a directory module with zero sub-modules.
Point 4 of design goals means that each declaration within a project has a single fully-qualified name within this project. This is ensured by strict rules regarding importing files and the fact that the module structure is a tree.
Declaration order
The order of declarations in a module holds no semantic value, except the “last
main
variable”.
References between modules can be cyclic.
module mod_a {
let decl_a_1 = ...
let decl_a_2 = (from mod_b.decl_b | take 10)
}
module mod_b {
let decl_b = (from mod_a.decl_a | take 10)
}
References between variable declarations cannot be cyclic.
let decl_a = (from decl_b)
let decl_b = (from decl_a) # error: cyclic reference
module mod_a {
let decl_a = (from mod_b.decl_b)
}
module mod_b {
let decl_b = (from mod_a.decl_a) # error: cyclic reference
}
Compiler interface
prqlc
provides two interfaces for compiling files.
Multi-file interface requires three arguments:
- path to the file containing the module which is the root of the module structure,
- identifier of the pipeline that should be compiled to RQ (this can also be an
identifier of a module that has a
main
pipeline) and, - a “file loader”, which can load files on-demand.
The path to the root module can be automatically detected by searching for
.prql
files starting with _
in the current working directory.
Example prqlc usage:
$ prqlc compile _project.prql sales.projections.orders_2024
$ prqlc compile sales.projections.orders_2024
Single-file interface requires a single argument; the PRQL source. Any attempts to load modules in this mode result in compilation errors. This interface is needed, for example, when integrating the compiler with a database connector (i.e. JDBC) where no other files can be loaded.
Built-in module structure
As noted above, this facility is in discussion.
# root module of every project
module project {
module std {
let sum = a -> ...
let mean = a -> ...
}
module default_db {
# all inferred tables and defined CTEs
}
let main = (
from t = tracks
select [track_id, title]
)
}
Example
The examples below do not work.
At this time (Spring 2024), the module
facility does not work
within a PRQL query itself.
That is, a module
statement in a query cannot import files
from the local file system.
This is an example project, where each of code block is a separate file.
# _project.prql
module employees
module sales
module util
# employees.prql
let employees = (...)
let salaries = (...)
let departments = (...)
# sales/_sales.prql
module orders
module projections
let revenue_by_source = (...)
# sales/orders.prql
let current_year = (...)
let archived = (...)
let by_employee = (from orders | join employees.employees ...)
# sales/projections.prql
let orders_2023 = (from orders.current_year | append orders.archived)
let orders_2024 = (...)
# util.prql
let pretty_print_num = col -> (...)
Sources:
- Notes On Module System, by @matklad.