# Understanding Function Properties in Object-Relational Algebra

# Understanding Function Properties in Object-Relational Algebra

Join the DZone community and get the full member experience.

Join For FreeHortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

### Definition and Types of Functions

The general approach of this object-relational algebra involves applying algebra to manipulate operations on functions of sets of tuples. A set of tuples is a relvar, or relational variable. The set of tuples in the relation is the domain of the function, and as with all functions, it represents exactly one result per domain element, so for every tuple, a single value will result. That value of course can be a set (including a null set), a tuple, or even a relvar. However, it is also necessary to note that a function must return the same basic type of result for every tuple. We can't return a simple value for one, a relation for another, and a set of values for a third.

Each element in a function can be seen as being

*relational*in the sense of within the expressive boundaries of relational algebra, or

*non-relational*in the sense of being outside those boundaries. Functions themselves can then be divided into three categories:

- Relational functions contain only relational elements and can be expressed solely in relational algebra.
- Non-relational functions contain only non-relational elements and cannot be further simplified in relational algebra in any circumstance, and
- Semi-relational functions contain both relational and non-relational elements, and these can be partly further simplified in relational operations.

A second level of categorization can be had regarding whether the domain of the relation is fully mathematically self-contained or not.

For example, suppose we have a relvar of employee data, E, and E contains an attribute dob, representing the employee's date of birth. Therefore, for every tuple in E, there is a dob element. We can then have two functions:

age(E) represents the current age of the employee at the time it is evaluated.

age2000(E) represents the age of the employee on New Year's Eve, 2000.

age2000(E) has a fully self-contained domain. The values depend on the values stored in the relvar E, and nowhere else. age(E) however does not have a fully self-contained domain. For any given relational operation, age(E) will behave like a function and we can use it as such, but the results will change over time. Oracle calls these determinate and indeterminate functions respectively. PostgreSQL divides these up into additional categories for planning purposes --- in addition to immutable functions whose output never changes for a given input, you have stable and volatile functions, the latter are not really a function per se of the input.

### Properties of Relational Functions

I didn't really start getting really excited about this until I started working with relational functions. Once I started though, there was no going back. Relational functions can be expressed in relational algebra and therefore can roughly map to subqueries in SQL. Of course this is not exact, but there may be cases where if is helpful to look at from this perspective. This is particularly important when looking at optimizing simple functions written in SQL when called in where clauses.

Relational functions can be expanded algebraically in other relational operations. For example:

Suppose we have a relvar L which represents landmarks, and has an attribute country_id which is a foreign key to country, and suppose further we have a relvar C which represents countries and has an attribute called name which represents the country's name. We can then define the following functions:

let country(L) = σ

_{id=L.country_id}(C)

let frenchl(L) = σ

_{(country(L)).name='France'}(L)

country(L) can be in-lined into frenchl(L), and this can be transformed into a subselect, and eventually (in a set- rather than bag- based approach at least) a left join.

The set of single-value returning relational functions is a proper superset to the number of functional dependencies in the database reachable through appropriate join dependencies. Without grouping and aggregation, these sets are identical. However, with grouping and aggregation, relational functions express a larger array of possible values.

### Properties of Non-Relational Functions

Non-relational functions cannot be expressed in relational algebra and therefore must be solved outside the stream of relational operations themselves. Non-relational functions must be treated as to their value types. A set-returning function can be treated as a set, a single-value returning function can be treated like a relvar's attibute, and a relvar-returning function can be treated like a relvar itself.

This field moves relational algebra from the area of solving relational queries into the area of symbolic simplification.

Next: The special "Join Series" function.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub. Join the discussion.

Published at DZone with permission of Chris Travers , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}