# Math and SQL Part 5: Projection and Selection

# Math and SQL Part 5: Projection and Selection

Join the DZone community and get the full member experience.

Join For FreeBuild vs Buy a Data Quality Solution: Which is Best for You? Gain insights on a hybrid approach. Download white paper now!

In "Relational Theory and SQL," Chris Date says that tuples are unordered. Tuples are strongly ordered, but what Date is getting at is that a relation has no natural ordering of columns, and therefore from any relation with a given ordering, a new relation can be created with a different ordering. This process is called 'projection' and it is entirely implicit in SQL. In the interest of being concise, in that book, Date does not stick with his typical approach of starting from clear definitions and discussing what these mean.

This being said, understanding projection, selection, and the difference between them makes understanding relational databases far easier in my view.

Because SQL operates on bags, I will use both SQL (for bag examples) and Set::Relation (for set examples) in order to discuss the math differences. I will finally offer an idea of what a clearer English rendition of SQL's math semantics would be.

#### I: Projection - Transform One Relation into Vertical Subset

Projection, represented by a Pi character (π) creates a set of ordered tuples with an ordering based on the operation. What projection does, essentially, is take a subset (or the whole set) from each tuple, possibly re-ordering it in the process, in order to transform one relation into a derived relation.

In relational algebra, projection takes a set of tuples and returns a set of tuples with a subset of fields (possibly re-arranged).

For example, consider a relation *R*, where the tuple structure of *R* is *(employee_id, first_name, last_name, badge_number, office_id, tax_id)*

( | employee_id | first_name | last_name | office_id | tax_id | ) |

( | 1 | Eric | Young | 1 | 123-45-6789 | ) |

( | 2 | Thomas | Young | 1 | 324-55-3334 | ) |

( | 3 | Julie | James | 1 | 443-45-6234 | ) |

( | 4 | Jared | Young | 2 | 533-44-5442 | ) |

( | 5 | Todd | Kals | 2 | 532-44-2332 | ) |

Then we could write π_{(last_name, office_id)}(*R*), we would get:

( | last_name | office_id | ) |

( | Young | 1 | ) |

( | James | 1 | ) |

( | Young | 2 | ) |

( | Kals | 2 | ) |

Note that there is one fewer row in the output table because in both cases, you have a well-formed set.

The equivalent of the previous expression in SQL would be:

SELECT DISTINCT last_name, office_id FROM R;

From these examples a few things should be fairly clear. First tuples are strongly ordered and we give human-readable labels as a nicety, but that projection can be used to reorder the tuple elements arbitrarily. Consequently tuples are consistently ordered within a relation, but that order is largely arbitrary. Consequently tuples cannot be said to have a natural order. Instead they derive their structure and order from the relation they are a part of.

Unfortunately SQL makes all kinds of implicit tuple transformations either by implicit projection or by some sort of reverse operation (I will call that "overlay" for the moment). Because projection is entirely implicit in SQL, it is never stated. The SQL statement above in fact doesn't perform a selection operation but merely performs a projection.

#### II: Selection - Turn a Relation into a Horizontal Subset

Selection takes a relation and provides a subset of it based on arbitrary criteria. The structure of the relation is unchanged, but a subset of the correspondences are in the new relation. In other words, it selects tuples for further processing, and since the relation is a set, it is guaranteed to give a subset out. It is represented by the sigma character (σ) and the subscript gives the equality condition.

Given the same relation R, we can perform σ_{office_id=1}(R) and get:

( | employee_id | first_name | last_name | office_id | tax_id | ) |

( | 1 | Eric | Young | 1 | 123-45-6789 | ) |

( | 2 | Thomas | Young | 1 | 324-55-3334 | ) |

( | 3 | Julie | James | 1 | 443-45-6234 | ) |

Simple right?

In SQL terms, a plain selection is what is found in the where clause. So the equivalent in SQL is simply:

SELECT * FROM R WHERE office_id = 1;

As you can see from these examples, selection and projection are transformed in SQL in some relatively non-intuitive ways. At the same time, understanding relational databases demands a basic understanding of these two concepts.

#### Some Notes on Set::Relation

In the course of putting this together I spent a little time with Set::Relation. I found that the module seems to behave relatively well. The projection method combined with members() outputs a set. This was as far as I took it in my quick spin. As per the discussion below, Set::Relation operates under the assumption that relations are well-formed sets instead of bags. This can be very helpful in operating in a highly formal relational environment.

Selection could be done using the semijoin method, or using members() and grep.

#### Next: On the Great Null Debate

Build vs Buy a Data Quality Solution: Which is Best for You? Maintaining high quality data is essential for operational efficiency, meaningful analytics and good long-term customer relationships. But, when dealing with multiple sources of data, data quality becomes complex, so you need to know when you should build a custom data quality tools effort over canned solutions. Download our whitepaper for more insights into a hybrid approach.

Published at DZone with permission of Chris Travers , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}