This library implements Relational Algebra syntax in Python.
- expression tree visualization
- type validation
- query execution
Consider the simple database signature
PERSON / (person_id, name, age)
STUDENT / (person_id, guardian_id)
where each student has at least one guardian.
Suppose we wanted a query to find the IDs of all students who have one or more guardians not yet 18 years old. We could express this in relational algebra as
or
and so on.
On the other hand, expression
Meanwhile, expression
Even though this example is meant to be simple, it is already difficult to tell the difference between a correct and incorrect relational algebra expression!
This library is born out of a desire to automatically verify arbitrary relational algebra queries.
First, we convert the above schema into Python code.
Each attribute is represented by a list of string type names. Type names are semantic. For example, even though student IDs and guardian IDs are just person IDs, we specifically name those as types.
student_id = Attribute("sid")
guardian_id = Attribute("gid")
person_id = Attribute("pid") | student_id | guardian_id
name = Attribute("name")
age = Attribute("age")
A relation combines a sequence of attributes with some example elements of that relation. We specifically set up the elements to test our query:
PERSON = Relation(
attributes=(person_id, name, age),
elements=[
(1, "S_1", 11), # has G<18
(2, "S_2", 12), # has both guardians
(3, "S_3", 13), # has G=18
(4, "G<18", 17),
(5, "G=18", 18),
],
)
# ( pid|sid|gid , name , age )
# 1 , S_1 , 11
# 2 , S_2 , 12
# 3 , S_3 , 13
# 4 , G<18 , 17
# 5 , G=18 , 18
STUDENT = Relation(
attributes=(student_id, guardian_id),
elements=[
(1, 4),
(2, 4),
(2, 5),
(3, 5),
],
)
# ( sid , gid )
# 1 , 4
# 2 , 4
# 2 , 5
# 3 , 5
In our query, we deal with the age constant 18
.
To prevent this constant from being compared with non-age numbers, we build a single-attribute (age) relation with 18
as the only value.
AGE_OF_MAJORITY = ConstantRelation(age, 18)
We can also define the expected result of our query:
expected = Relation(
attributes=(student_id,),
elements=[
(1,),
(2,),
],
)
# ( sid )
# 1
# 2
The following code demonstrates various relational algebra operations using examples.
All people under the age of majority:
where PERSON
.
Attributes are one-indexed. This allows us to do something nifty later on...
select[3 |lt| AGE_OF_MAJORITY](PERSON)
# ( pid|sid|gid , name , age )
# 1 , S_1 , 11
# 2 , S_2 , 12
# 3 , S_3 , 13
# 4 , G<18 , 17
3 |lt| AGE_OF_MAJORITY
is an example of using the bitwise OR operation in Python to simulate custom infix operators such aslt
here. For more details on this method, see Tomer Filiba's blog.
All names:
project[2](PERSON)
# ( name )
# S_1
# S_2
# S_3
# G<18
# G=18
All distinct student IDs:
eliminate(project[1](STUDENT))
# ( sid )
# 1
# 2
# 3
All possible pairs of elements from the relations STUDENT
and PERSON
:
STUDENT |product| PERSON
# ( sid , gid , pid|sid|gid , name , age )
# 1 , 4 , 1 , S_1 , 11
# 1 , 4 , 2 , S_2 , 12
# 1 , 4 , 3 , S_3 , 13
# 1 , 4 , 4 , G<18 , 17
# 1 , 4 , 5 , G=18 , 18
# 2 , 4 , 1 , S_1 , 11
# 2 , 4 , 2 , S_2 , 12
# 2 , 4 , 3 , S_3 , 13
# 2 , 4 , 4 , G<18 , 17
# 2 , 4 , 5 , G=18 , 18
# 2 , 5 , 1 , S_1 , 11
# 2 , 5 , 2 , S_2 , 12
# 2 , 5 , 3 , S_3 , 13
# 2 , 5 , 4 , G<18 , 17
# 2 , 5 , 5 , G=18 , 18
# 3 , 5 , 1 , S_1 , 11
# 3 , 5 , 2 , S_2 , 12
# 3 , 5 , 3 , S_3 , 13
# 3 , 5 , 4 , G<18 , 17
# 3 , 5 , 5 , G=18 , 18
A join is equivalent to a product (STUDENT
and PERSON
on matching guardian ID:
or we could implement this as a join:
Where
-
$\#1$ : the first attribute ofPERSON
-
$\#2\ell$ : the second attribute ofSTUDENT
To express this in code, we use negative indices (e.g.,
STUDENT |join[1 |eq| -2]| PERSON
# ( sid , gid , gid , name , age )
# 1 , 4 , 4 , G<18 , 17
# 2 , 4 , 4 , G<18 , 17
# 2 , 5 , 5 , G=18 , 18
# 3 , 5 , 5 , G=18 , 18
Notice that the type of the first column of
PERSON
changed following the join. This library is smart enough to figure out that an equality-based selection condition (like$\#1 = \#2\ell$ ) will result in a type intersection between the participating columns.
All ages other than 18:
project[3](PERSON) |difference| AGE_OF_MAJORITY
# ( age )
# 11
# 12
# 13
# 17
All students or guardians:
project[1](STUDENT) |union| project[2](STUDENT)
# ( sid|gid )
# 1
# 2
# 2
# 3
# 4
# 4
# 5
# 5
The custom infix operators used in this library are implemented by overloading the bitwise OR operation |
, which is evaluated left to right.
Because of this, we have limited control over the order of operations.
For example, there is no way to give higher precedence to product (
To come full circle, we now implement all 4 example RA expressions.
The resolve
function lets us trace the evaluation tree of the expression on our sample database instance.
eliminate(project[1](STUDENT |join[1 |eq| -2]| select[3 |lt| AGE_OF_MAJORITY](PERSON)))
# ┌─
# │ ┌─
# │ │ ┌─
# │ │ ┤ ( sid , gid ) = ( sid , gid )
# │ │ ┤ 1 , 4
# │ │ ┤ 2 , 4
# │ │ ┤ 2 , 5
# │ │ ┤ 3 , 5
# │ │ ╞═
# │ │ ├ ┌─
# │ │ ├ │ ( pid|sid|gid , name , age ) = ( pid|sid|gid , name , age )
# │ │ ├ │ 1 , S_1 , 11
# │ │ ├ │ 2 , S_2 , 12
# │ │ ├ │ 3 , S_3 , 13
# │ │ ├ │ 4 , G<18 , 17
# │ │ ├ │ 5 , G=18 , 18
# │ │ ├ ├─
# │ │ ├ σ[#3<18] = ( pid|sid|gid , name , age )
# │ │ ├ 1 , S_1 , 11
# │ │ ├ 2 , S_2 , 12
# │ │ ├ 3 , S_3 , 13
# │ │ ├ 4 , G<18 , 17
# │ │ ├─
# │ │ × σ[#1=#2l] = ( sid , gid , gid , name , age )
# │ │ 1 , 4 , 4 , G<18 , 17
# │ │ 2 , 4 , 4 , G<18 , 17
# │ ├─
# │ π[#1] = ( sid )
# │ 1
# │ 2
# ├─
# elim = ( sid )
# 1
# 2
eliminate(project[5](select[1 |gt| 4, 2 |eq| 6](AGE_OF_MAJORITY |product| PERSON |product| STUDENT)))
# ┌─
# │ ┌─
# │ │ ┌─
# │ │ │ ┌─
# │ │ │ ┤ ┌─
# │ │ │ ┤ ┤ 18 = 18
# │ │ │ ┤ ╞═
# │ │ │ ┤ ├ ( sid|pid|gid , name , age ) = ( sid|pid|gid , name , age )
# │ │ │ ┤ ├ 1 , S_1 , 11
# │ │ │ ┤ ├ 2 , S_2 , 12
# │ │ │ ┤ ├ 3 , S_3 , 13
# │ │ │ ┤ ├ 4 , G<18 , 17
# │ │ │ ┤ ├ 5 , G=18 , 18
# │ │ │ ┤ ├─
# │ │ │ ┤ × = ( age , sid|pid|gid , name , age )
# │ │ │ ┤ 18 , 1 , S_1 , 11
# │ │ │ ┤ 18 , 2 , S_2 , 12
# │ │ │ ┤ 18 , 3 , S_3 , 13
# │ │ │ ┤ 18 , 4 , G<18 , 17
# │ │ │ ┤ 18 , 5 , G=18 , 18
# │ │ │ ╞═
# │ │ │ ├ ( sid , gid ) = ( sid , gid )
# │ │ │ ├ 1 , 4
# │ │ │ ├ 2 , 4
# │ │ │ ├ 2 , 5
# │ │ │ ├ 3 , 5
# │ │ │ ├─
# │ │ │ × = ( age , sid|pid|gid , name , age , sid , gid )
# │ │ │ 18 , 1 , S_1 , 11 , 1 , 4
# │ │ │ 18 , 1 , S_1 , 11 , 2 , 4
# │ │ │ 18 , 1 , S_1 , 11 , 2 , 5
# │ │ │ 18 , 1 , S_1 , 11 , 3 , 5
# │ │ │ 18 , 2 , S_2 , 12 , 1 , 4
# │ │ │ 18 , 2 , S_2 , 12 , 2 , 4
# │ │ │ 18 , 2 , S_2 , 12 , 2 , 5
# │ │ │ 18 , 2 , S_2 , 12 , 3 , 5
# │ │ │ 18 , 3 , S_3 , 13 , 1 , 4
# │ │ │ 18 , 3 , S_3 , 13 , 2 , 4
# │ │ │ 18 , 3 , S_3 , 13 , 2 , 5
# │ │ │ 18 , 3 , S_3 , 13 , 3 , 5
# │ │ │ 18 , 4 , G<18 , 17 , 1 , 4
# │ │ │ 18 , 4 , G<18 , 17 , 2 , 4
# │ │ │ 18 , 4 , G<18 , 17 , 2 , 5
# │ │ │ 18 , 4 , G<18 , 17 , 3 , 5
# │ │ │ 18 , 5 , G=18 , 18 , 1 , 4
# │ │ │ 18 , 5 , G=18 , 18 , 2 , 4
# │ │ │ 18 , 5 , G=18 , 18 , 2 , 5
# │ │ │ 18 , 5 , G=18 , 18 , 3 , 5
# │ │ ├─
# │ │ σ[#1>#4,#2=#6] = ( age , gid , name , age , sid , gid )
# │ │ 18 , 4 , G<18 , 17 , 1 , 4
# │ │ 18 , 4 , G<18 , 17 , 2 , 4
# │ ├─
# │ π[#5] = ( sid )
# │ 1
# │ 2
# ├─
# elim = ( sid )
# 1
# 2
eliminate(project[1](STUDENT |join[1 |eq| -1]| select[3 |lt| AGE_OF_MAJORITY](PERSON)))
# ┌─
# │ ┌─
# │ │ ┌─
# │ │ ┤ ( sid , gid ) = ( sid , gid )
# │ │ ┤ 1 , 4
# │ │ ┤ 2 , 4
# │ │ ┤ 2 , 5
# │ │ ┤ 3 , 5
# │ │ ╞═
# │ │ ├ ┌─
# │ │ ├ │ ( sid|pid|gid , name , age ) = ( sid|pid|gid , name , age )
# │ │ ├ │ 1 , S_1 , 11
# │ │ ├ │ 2 , S_2 , 12
# │ │ ├ │ 3 , S_3 , 13
# │ │ ├ │ 4 , G<18 , 17
# │ │ ├ │ 5 , G=18 , 18
# │ │ ├ ├─
# │ │ ├ σ[#3<18] = ( sid|pid|gid , name , age )
# │ │ ├ 1 , S_1 , 11
# │ │ ├ 2 , S_2 , 12
# │ │ ├ 3 , S_3 , 13
# │ │ ├ 4 , G<18 , 17
# │ │ ├─
# │ │ × σ[#1=#1ℓ] = ( sid , gid , sid , name , age )
# │ │ 1 , 4 , 1 , S_1 , 11
# │ │ 2 , 4 , 2 , S_2 , 12
# │ │ 2 , 5 , 2 , S_2 , 12
# │ │ 3 , 5 , 3 , S_3 , 13
# │ ├─
# │ π[#1] = ( sid )
# │ 1
# │ 2
# │ 2
# │ 3
# ├─
# elim = ( sid )
# 1
# 2
# 3
eliminate(project[1](STUDENT))
|difference| eliminate(project[1](STUDENT |join[1 |eq| -2]| select[3 |ge| AGE_OF_MAJORITY](PERSON)))
# ┌─
# ┤ ┌─
# ┤ │ ┌─
# ┤ │ │ ( sid , gid ) = ( sid , gid )
# ┤ │ │ 1 , 4
# ┤ │ │ 2 , 4
# ┤ │ │ 2 , 5
# ┤ │ │ 3 , 5
# ┤ │ ├─
# ┤ │ π[#1] = ( sid )
# ┤ │ 1
# ┤ │ 2
# ┤ │ 2
# ┤ │ 3
# ┤ ├─
# ┤ elim = ( sid )
# ┤ 1
# ┤ 2
# ┤ 3
# ╞═
# ├ ┌─
# ├ │ ┌─
# ├ │ │ ┌─
# ├ │ │ ┤ ( sid , gid ) = ( sid , gid )
# ├ │ │ ┤ 1 , 4
# ├ │ │ ┤ 2 , 4
# ├ │ │ ┤ 2 , 5
# ├ │ │ ┤ 3 , 5
# ├ │ │ ╞═
# ├ │ │ ├ ┌─
# ├ │ │ ├ │ ( sid|pid|gid , name , age ) = ( sid|pid|gid , name , age )
# ├ │ │ ├ │ 1 , S_1 , 11
# ├ │ │ ├ │ 2 , S_2 , 12
# ├ │ │ ├ │ 3 , S_3 , 13
# ├ │ │ ├ │ 4 , G<18 , 17
# ├ │ │ ├ │ 5 , G=18 , 18
# ├ │ │ ├ ├─
# ├ │ │ ├ σ[#3≥18] = ( sid|pid|gid , name , age )
# ├ │ │ ├ 5 , G=18 , 18
# ├ │ │ ├─
# ├ │ │ × σ[#1=#2ℓ] = ( sid , gid , gid , name , age )
# ├ │ │ 2 , 5 , 5 , G=18 , 18
# ├ │ │ 3 , 5 , 5 , G=18 , 18
# ├ │ ├─
# ├ │ π[#1] = ( sid )
# ├ │ 2
# ├ │ 3
# ├ ├─
# ├ elim = ( sid )
# ├ 2
# ├ 3
# ├─
# − = ( sid )
# 1