Math for devs - Relational Algebra

February 25, 2023

Do you remember set theory? A math topic that you probably learned in elementary school, revisited in high school, and then never again? Well, it's time to revisit it. Relational Algebra is a formal language based on set theory, and it's purpose is to query and manipulate data. But attention! Relational Algebra is not a programming language, it's more like a math notation, a formal and theoretical language.

Why do I need to know this?

Relational Algebra is the basis for SQL, the main language used in relational databases. Relational Algebra can also have some applications in NoSQL databases. If you want to understand how databases work "under the hood", you need to know the main concepts of Relational Algebra.

What is Relational Algebra?

Relational algebra is a declarative language for querying and manipulating data. As SQL, you just specify which data you want, what conditions it should satisfy, and how it should be transformed (grouped, sorted, aggregated, etc). The database engine will take care of the rest, so you don't have to worry about how the data is stored, nor how to access it.

Declarative Query Language Analogy

In the image above, we have a woman who wants to order a pizza. She doesn't know how to make a pizza, she doesn't know how to cook it, she doesn't know how to deliver it. She just knows what she wants, and she knows how to ask for it. The waiter is the SQL/Relational Algebra engine, he is an interface between the woman and the chef. The chef is the database engine, he understands the order passed by the waiter, and he knows how to make the pizza.

Declarative query languages usually hide the implementation details from the user. This is a good thing, because it allows the database engine to optimize the query execution, and it also allows the user to focus on the data, not on the algorithm required to access it.

SQL analogy

How to order a pizza in SQL:

SELECT * FROM pizza
WHERE size = 'large'
AND flavor = 'pepperoni';

How to order a pizza in Relational Algebra:

PIZZASσsize="large"    flavor="pepperoni"(pizza)PIZZAS \leftarrow \sigma_{size = "large" \space \space \wedge \space \space flavor = "pepperoni" }(pizza)

Relational Algebra Operations

Ted Codd, a mathematician and computer scientist, proposed eight operations for Relational Algebra. These operations are:

  • Selection (σ\sigma)
  • Projection (π\pi)
  • Cartesian Product (×\times)
  • Union (\cup)
  • Difference (-)
  • Join (\bowtie)
  • Division (÷\div)
  • Intersection (\cap)

In this article, we will only cover the first six operations. The other four are more advanced, and they are not widely used in practice.

Before we dive into the details of each operation, let's define our dataset. We will use a simple dataset with two tables:

Table 1: Players

idnameemailphonecountryId
1Lionel Messileomessi@afa.com+54 9111234-56781
2Cristiano Ronaldocr7@siii.pt+351 912345-6782
3Pelepele@goat.br+55 91234-56783
4Diego Maradonamaradona@afa.com+54 911234-56781

Table 2: Countries

idnamecode
1ArgentinaAR
2PortugalPT
3BrazilBR

Selection (σ\sigma)

The selection operation is used to filter the rows of a table. It is represented by the symbol σ\sigma.

σcondition(table)\sigma_{condition}(table)

The condition is a boolean expression that must be satisfied by the rows of the table. The result of the selection is a new (SET) table with only the rows that satisfy the condition.

Example

Let's say we want to select all the players from Argentina. We can do that with the following query:

SELECT * FROM players
WHERE countryId = 1;

The equivalent query in Relational Algebra is:

σcountryId=1(players)\sigma_{countryId = 1}(players)

The result of the selection is a new table with only the players from Argentina:

idnameemailphonecountryId
1Lionel Messileomessi@afa.com+54 9111234-56781
4Diego Maradonamaradona@afa.com+54 911234-56781

Projection (π\pi)

Instead of selecting rows, like the selection operation, the project operation selects columns. It is represented by the symbol π\pi. This operation allows us to filter the columns of a table.

πcolumns(table)\pi_{columns}(table)

The columns argument is a list of column names. The result of the projection is a new table with only the columns specified in the columns argument.

Example

Projection is very useful when we want to select only a few columns from a table. Let's say we want to select the name and email of all the players.

SELECT name, email FROM players;

The equivalent query in Relational Algebra is:

πname,email(players)\pi_{name, email}(players)

The result of the projection is a new table with only the name and email columns:

nameemail
Lionel Messileomessi@afa.com
Cristiano Ronaldocr7@siii.pt
Pelepele@goat.br
Diego Maradonamaradona@afa.com

We can also combine the selection and projection operations. Let's say we want to select the name and email of all the players from Argentina.

SELECT name, email FROM players
WHERE countryId = 1;

The equivalent query in Relational Algebra is:

πname,email (σcountryId=1(players))\pi_{name, email} \space (\sigma_{countryId = 1}(players))

In the expression above, we first apply the projection operation, to specify which columns we want to be returned by the query. Then, we apply the selection operation, to filter the rows of the table, containing only argentinian players.

Union (\cup)

The union operation is the equivalent of the SQL UNION operator. It is used to combine the rows of two tables. The result of the union operation is a new table with the rows of both tables.

table1table2table_1 \cup table_2

To perform the union operation, the tables must have the same number of columns, and the columns must have the same data types.

Example

Let's say we want to combine the players from Argentina and Brazil. We can do that with the following query:

SELECT * FROM players
WHERE countryId = 1
UNION
SELECT * FROM players
WHERE countryId = 3;

The equivalent query in Relational Algebra is:

ArgentinianPlayersσcountryId=1(players)BrazilianPlayersσcountryId=3(players)ArgentinianPlayers  BrazilianPlayersArgentinianPlayers \leftarrow \sigma_{countryId = 1}(players) \\ BrazilianPlayers \leftarrow \sigma_{countryId = 3}(players) \\ ArgentinianPlayers \space \cup \space BrazilianPlayers

In the expression above, we first query the players from Argentina, then we store the result in the ARGENTINIAN_PLAYERS variable. The same process is repeated for the players from Brazil, and the result is stored in the BRAZILIAN_PLAYERS variable. Finally, we apply the union operation to combine the rows of both tables. The result of the union operation is a new table with the players from Argentina and Brazil:

idnameemailphonecountryId
1Lionel Messileomessi@afa.com+54 9111234-56781
4Diego Maradonamaradona@afa.com+54 911234-56781
3Pelepele@goat.br+55 91234-56783

Difference (-)

With the difference operator, we have as result the rows of the first table that are not present in the second table. It is represented by the symbol -.

Everything that exists in the subset AA and does not exist in the superset BB is the difference between AA and BB.

You probably already saw an image like this one:

Difference

Example

Let's say we want to get all the players that are not brazilian. We can do that with the following query:

SELECT * FROM players
EXCEPT
SELECT * FROM players
WHERE countryId = 3;

The equivalent query in Relational Algebra is:

BrazilianPlayersσcountryId=3(players)players  BrazilianPlayersBrazilianPlayers \leftarrow \sigma_{countryId = 3}(players) \\ players \space - \space BrazilianPlayers

Cartesian Product (×\times)

The cartesian product operation is the equivalent of the SQL CROSS JOIN operator. It is used to combine the rows and columns of two tables. For example, if you have a table "A" with 3 columns and a table "B" with 2 columns, the result of the cartesian product operation is a new table with 5 columns, containing every single combination of rows from table "A" and table "B".

table1×table2table_1 \times table_2

Example

Let's say we want to get all the possible combinations of players and countries. We can do that with the following query:

SELECT * FROM players
CROSS JOIN countries;

The equivalent query in Relational Algebra is:

players × countriesplayers \space \times \space countries

The result of the cartesian product operation is a new table with 5 columns, containing every single combination of rows from table "players" and table "countries":

idnameemailphonecountryIdidnamecode
1Lionel Messileomessi@afa.com+54 9111234-567811ArgentinaAR
2Cristiano Ronaldocr7@siii.pt+351 912345-67822PortugalPT
3Pelepele@goat.br+55 91234-567833BrazilBR
4Diego Maradonamaradona@afa.com+54 911234-567811ArgentinaAR

Join (\bowtie)

The join operation is the equivalent of the SQL JOIN operator. It is used to combine the rows of two tables, based on a common column. The result of the join operation is a new table with the columns of both tables.

table1table2table_1 \bowtie table_2

To perform the join operation, the tables must have at least one column in common. If there are no common columns, the result of the join operation is an empty table.

Example

Let's say we want to get the name and email of all the players, along with the name of the country they are from. We can do that with the following query:

SELECT players.name, players.email, countries.name, clubs.name
FROM players
JOIN countries ON players.countryId = countries.id;

The equivalent query in Relational Algebra is:

πname,email,countryName (ρcountryId=id(playerscountries))\pi_{name, email, countryName} \space (\rho_{countryId = id}(players \bowtie countries))

In the expression above, we first apply the projection operation, to specify which columns we want to be returned by the query. Then, we apply the join operation, to combine the rows of the players table and the countries table, based on the countryId column. Finally, we apply the rename operation, to rename the id column to countryId.

The result of the join operation is a new table with the name, email, and country name of all the players:

nameemailcountryName
Lionel Messileomessi@afa.comArgentina
Cristiano Ronaldocr7@siii.ptPortugal
Pelepele@goat.brBrazil
Diego Maradonamaradona@afa.comArgentina

Conclusion

Now you already know the basics of Relational Algebra. If you want, you can do some extra research on the subject, and learn more about the other operations that are available in Relational Algebra.