{{announcement.body}}
{{announcement.title}}

Polymorphism in SQL Part One: Anyelement and Anyarray

DZone 's Guide to

Polymorphism in SQL Part One: Anyelement and Anyarray

This post explains how polymorphism is implemented by declaring a subprogram with formal parameters of the pseudo-types “anyelement” and “anyarray”.

· Database Zone ·
Free Resource

The term “polymorphism”, and its inflections, has various meanings in the context of programming languages in general. Roughly, it means “chameleon-like”. Of course, Wikipedia has something to say about it, here. PostgreSQL uses it with this meaning: Parametric polymorphism: when one or more types are not specified by name but by… symbols that can represent any type”.

The term “variadic” has a universal meaning, and again Wikipedia has something to say about it, here. “A variadic [subprogram] has indefinite “arity”, [that is, it] accepts a variable number of arguments”.

In this two part post, I explain how YugabyteDB stored procedures implement these two central programming language notions.

  • Part One (this post) explains how polymorphism is implemented by declaring a subprogram with formal parameters of the pseudo-types “anyelement” and “anyarray”.
  • Part Two explains how marking a subprogram’s formal parameter with the keyword “variadic” implements that notion. I also bring the two notions together to show you how to implement a non-aggregate version of any built-in aggregate function so that you can use it outside the context of a SQL subquery, just as the “least()” and “greatest()” built-ins already (but uniquely) do this for the “min()” and “max()” aggregate functions.

You can see, then, that I’m stretching the technical meaning of “polymorphism” in the title of this post series. I’m focusing on the chameleon metaphor to capture the idea that I’m showing two powerful techniques that let functions automagically adapt to the way in which they’re invoked.

YugabyteDB is an open source, high-performance distributed SQL database built on a scalable and fault-tolerant design inspired by Google Spanner. YugabyteDB uses its own special distributed document store called DocDB. But it provides SQL and stored procedure functionality by re-using the “upper half” of the standard PostgreSQL source code. This is explained in the two part blog post “Distributed PostgreSQL on a Google Spanner Architecture”: (1) Storage Layer; and (2) Query Layer.

If you’re already used to writing PostgreSQL subprograms in your day job, then I recommend that you read the two parts of my essay in a single short sitting.

Everything that I say in this post applies in the same way to vanilla PostgreSQL, as it does to YugabyteDB. And all the code examples that I describe have identical outcomes in the two environments. There are no special distributed SQL considerations.

Understanding “anyelement” and “anyarray”

A user-defined function is usually created like this:

SQL


The function’s requirements are stated by specifying how the output value is determined by its input values. And commonly the rules can be stated only in terms of inputs of specified data types. It’s the same story for a user-defined procedure — and the same if the implementation uses “language plpgsql” or “language sql”.

Note: My earlier post, Four Compelling Use Cases for PostgreSQL Table Functions, shows examples of these two ways to implement user-defined functions.

Sometimes, though, the requirements reflect the same general rule with a specialization for each of several different input data types. Here’s a compelling example:

  • Return an N-character text value that represents values of various data types according to rules that are specific to each different data type. The output should be right-justified, and should represent NULL as “<null>”.

A proper requirements statement would specify more detail. But I’ll simply show you two implementations that treat some interesting cases in a natural way. User-defined functions and procedures can be created in overload sets, where each set is for a function or procedure with a certain name and members of a set are distinguished from each other by the the data types of their formal parameters and/or by the number of these. So you could meet the present requirements by creating lots (and lots) of functions all called, say, “display()”. Here’s an example for a “boolean” input:

PLSQL


And here’s an example for a “bigint” input:

PLSQL


You can imagine similar overloads for other primitive scalar data types like “numeric”, “double precision”, “money”, “timestamp”, “timestamptz”, “interval” and so on. Maintenance would be simpler if the code for each data type were implemented in a single function. This would optimally self-document, in a single place, which data types it supported. And it could, for example, define a single manifest constant “width” to be used as the second argument for “lpad()” for values of all the supported data types. This is where “anyelement” comes to the rescue.

The anyelement Pseudo-Type

YSQL inherits the PostgreSQL type system. This implements several so-called pseudo-types. A pseudo-type cannot be used as a column data type, or for a local PL/pgSQL variable, but it can be used to declare a function’s formal parameters and result type. This blog post focuses on “anyelement” and “anyarray”.

Here’s the basic proof of concept. Make sure that you first drop any previously existing versions of the function “display()”.

PLSQL


Test it like this:

PLSQL


This is the result:

Plain Text


We see, then, that “anyelement” stands for any scalar (that is non-array) data type — simple (like “boolean” or “bigint”) or composite (like a user-defined “row” type). A function with a pseudo-type formal parameter like “anyelement” (or “anyarray”) is called a polymorphic function. We can now use the approach demonstrated by the proof of concept to encapsulate “boolean”, “bigint”, and “rt” implementations of “display()” into a single polymorphic function, like this. As before, make sure that you first drop any previously existing versions of the function “display()”. And if the type “rt” already exists, you can leave in in place.

PLSQL


Test it like this:

PLSQL


This is the result:

Plain Text


The anyarray Pseudo-Type

This is the natural partner to “anyelement”. It stands for an array of any data type. This simple example demonstrates the idea:

PLSQL


You take advantage, here, of a very useful built-in that materializes the elements of an array as a regular SQL table. Read about this, and more, in the “Array data types and functionality” major section of the YugabyteDB documentation—and, in particular, read about the “unnest()” built-in.

Test it like this:

PLSQL


This is the result:

Plain Text


Conclusion

So far, I’ve shown you just one of the two interesting and useful features that this two-part post explains:

  • How to use the “anyelement” and “anyarray” pseudo-types to write a polymorphic subprogram with a dynamically determined polymorphic return value.

YugabyteDB inherits both features from PostgreSQL thanks to its unique architecture that simply uses its SQL processing code as is.

I can’t resist saying here (but I’ll say it again in the conclusion to Part Two) that Oracle Database currently supports neither anything equivalent to the pseudo-types like “anyelement” and “anyarray”, nor the ability to create variadic user-defined subprograms. And, as I never miss a chance to point out, the other distributed SQL databases with which we compare YugabyteDB (CockroachDB, TiDB, and our common inspiration Google Cloud Spanner) don’t support stored procedures at all.

Please go on now to read Part Two. It explains how to write a variadic subprogram.

Topics:
kubernetes, mongodb, open source, oracle, postgresql, sql, yugabytedb

Published at DZone with permission of Bryn Llewellyn . See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}