An Introduction to PostgreSQL Table Functions in YugabyteDB
In this article, take a look at an introduction to PostgreSQL table functions in YugabyteDB.
Join the DZone community and get the full member experience.Join For Free
Welcome to the first of a three-part series of posts on PostgreSQL’s table functions. These functions can be easily leveraged in a distributed SQL database like YugabyteDB, which is PostgreSQL compatible.
This series follows on from my “Using Stored Procedures in Distributed SQL Databases” post. In this series of posts we’ll cover:
- What table functions are and why they’re useful
- Demonstrate the use of some built-in SQL table functions
- Introduce how you can implement a user-defined table function, if no built-in one meets your needs, in the same way that you implement regular stored procedures and functions
What’s YugabyteDB? It is an open source, high-performance distributed SQL database built on a scalable and fault-tolerant design inspired by Google Spanner. YugabyteDB uses its own special distributed document store called DocDB. But it provides SQL and stored procedure functionality by re-using the “upper half” of the standard PostgreSQL source code. This is explained in the two part blog post “Distributed PostgreSQL on a Google Spanner Architecture”: (1) Storage Layer; and (2) Query Layer.
Please note that everything that I say in this three-part series applies to both vanilla PostgreSQL and YugabyteDB. All the code examples that I describe have the identical outcome in both environments. There are no special distributed SQL considerations we’ll need to make.
You might also want to read: Working With PostgreSQL Data Types in a Distributed SQL Database
What Are Table Functions?
In the context of programming languages in general, we’re used to the notion that a function can be invoked at any syntax spot where an expression is legal and that it computes a single value that depends on the actual arguments with which it is invoked. The value might be compound, like a record or an array of records — but it is nevertheless a single value.
SQL databases allow a new kind of function. A table function is one that returns a computed row set that depends on the actual arguments with which it is invoked. It can be used, therefore, in the from list of a subquery, in arbitrary combination with tables and views; and a subquery whose from list includes a table function can be used at any syntax spot in a SQL statement where a subquery can be used.
YugabyteDB inherits its set of built-in SQL functions from PostgreSQL. Among these, a few are built-in table functions. It also inherits the capability for implementing user-defined table functions. You do this by using the same constructs with which you implement regular stored procedures and functions.
Built-In SQL Table Functions
In this section I’ll describe three built-in SQL table functions.
The most obvious example of a built-in SQL table function is generate_series(). It returns a table with a single column of datatype integer. It provides a useful way to generate test data, as long as the required pattern can be specified using SQL expressions, like this:
Another example is unnest(). This generates a table from an array of row values. Here’s an example. It uses the same table t.
I’ll show in the “Compact syntax for bulk insert” use case, in my third post in this series, how, when you use “insert into… select… from unnest(…)” as a static SQL statement in a PL/pgSQL procedure, you can insert a set of arbitrarily programmatically computed rows using a single SQL statement. In this use, the argument of unnest() will be a local variable or formal parameter whose datatype is an array of the to-be-inserted records (or scalar values).
The unnest() built-in is also useful in a subquery’s where clause when the in predicate’s argument is a subquery.
I’m including this for completeness. It’s of limited usefulness because it must have an explicit literal tuple for every to-be-inserted row. This means that you can’t use it when you don’t know the number of computed rows until run-time. Having said this, I have seen people describe how to generate the text of an insert statement that uses a run-time-defined number of tuples and then execute it dynamically. This seems to me to be pointless because unnest() lets you implement the intended behavior using a SQL statement that’s fixed at compile time. Here’s an example, again using the same table t:
User-Defined Table Functions
A regular language plpgsql user-defined function is implemented using the plain return statement. Its argument can only be a single value. A user-defined table function, in contrast, is implemented using either return next, to return the next explicitly computed row, or return query, to return the entire result set of a select statement. You can define the statement using either static SQL or dynamic SQL. You can also implement a language sql table function by defining the body entirely with a single static SQL select statement. (You can use one of the function’s formal parameters at a syntax spot in the select statement where a value is needed.)
A Harmless Curiosity
It turns out that it’s legal to use a regular function in the from list of a SQL statement—even though (by definition) it returns just a single value. Here’s an example that uses a built-in SQL function:
Of course, you can do the same with a regular user-defined function. The ability to do this seems to be a consequence of a widely respected programming paradigm:
Don’t write defensive code (and take on its technical debt) to prevent a user doing something pointless, but harmless.
Rather, let the user decide not to do the pointless thing! In this use case (evaluate a SQL function in a client-side program), it’s shorter and clearer to use the bare syntax “select sqrt(4)”:
When you’re doing the high-level implementation design for an application that uses a SQL database (distributed or monolithic), the chances of choosing the best design are maximized when you fully understand the capabilities of each of your major modules (programmable browser, load balancer, application server, connection pool, database, and so on).
So, among other things, you need to understand the full range of possibilities for implementing business logic inside the database. This knowledge spans the SQL language, SQL built-in functions, and everything about implementing stored procedures. In particular, therefore, you need to know about table functions. When a use case calls for a table function, alternative implementations are typically cumbersome to program and inefficient to execute.
Published at DZone with permission of Bryn Llewellyn. See the original article here.
Opinions expressed by DZone contributors are their own.