Transforms are functions that take a relation and produce a relation.
Usually they are chained together into a pipeline, which resembles an SQL query.
Transforms were designed with a focus on modularity, so each of them is fulfilling a specific purpose and has defined invariants (properties of the relation that are left unaffected). That’s often referred to as “orthogonality” and its goal is to keep transform functions composable by minimizing interference of their effects. Additionally, it also keeps the number of transforms low.
derive will not change the number of rows, while
take will not change the number of columns.
In SQL, we can see this lack of invariant when an aggregation function is used
SELECT clause. Before, the number of rows was kept constant, but
introduction of an aggregation function caused the whole statement to produce
only one row (per group).
These are the currently available transforms:
|Start from a table
|Compute new columns
SELECT *, ... AS ...
|Pick & compute columns
SELECT ... AS ...
|Pick rows based on their values
|Order rows based on the values of columns
|Add columns from another table, matching rows based on a condition
|Pick rows based on their position
|Partition rows into groups and applies a pipeline to each of them
|Summarize many rows into one row
|Apply a pipeline to overlapping segments of rows
|Iteratively apply a function to a relation until it’s empty
WITH RECURSIVE ...