Polymorphism in SQL Part Two – Variadic Functions
Polymorphism in SQL Part Two – Variadic Functions
We'll show you how to implement a non-aggregate version of any built-in aggregate function so that you can use it outside the context of a SQL subquery.
Join the DZone community and get the full member experience.Join For Free
Make sure that you read Part One of this two-part series, and in particular the common introduction, before continuing here. Part One explains how polymorphism is implemented by declaring a subprogram with formal parameters of the pseudo-types “anyelement” and “anyarray”.
This part explains how marking a subprogram’s formal parameter with the keyword “variadic” implements the second critical notion.
I'll finish by bringing the two notions together to show you how to implement a non-aggregate version of any built-in aggregate function so that you can use it outside the context of a SQL subquery, just as the “least()” and “greatest()” built-ins already (but uniquely) do this for the “min()” and “max()” aggregate functions.
Understanding Variadic Subprograms
You may have noticed that some of YSQL’s built-in functions can be invoked using a comma-separated list of any number of actual arguments. The built-ins “greatest()” and “least()” are like this.
This ability brings very useful end-user functionality, and you can imagine that it isn’t feasible to implement dedicated overloads of such a function with respectively one, then two, then three, and so on, up to some present maximum of N formal parameters. Rather, one single subprogram with a so-called “variadic” formal parameter provides the functionality.
In other words, a subprogram with such a parameter can be invoked using a comma-separated list of an arbitrary number of actual arguments. “Variadic” is a generic notion that lets you ask the question “Does language X support variadic functions?”) The good news is that YugabyteDB inherits from PostgreSQL the ability to implement user-defined subprograms with “variadic” formal parameters.
As mentioned, the “variadic” ordinary functions “greatest()” and “least()” are functionally equivalent to the aggregate functions “max()” and “min()”. It’s just that their invocation styles are different. It has always struck me as odd that other aggregate functions like, for example, “sum()” and “avg()”, don’t have built-in ordinary “variadic” equivalents. This is true not only for PostgreSQL (and therefore for YugabyteDB); it’s also true for other databases.
I’ll show you, now, how to write a “variadic” version of “avg()” that respects the same semantics for inputs with a mixture of NULLs and NOT NULL values as does the built-in aggregate function. I’ll use a naïve scheme. The following section shows a much better method. My aim, here, is just to demonstrate the technique that you use to implement a “variadic” function. Here it is:
Test it like this:
The result is “4.33…”. Notice that, while on the “outside” the actual arguments are presented as the comma-separated list “6, 3, null, 4”, on the “inside” these values are available as the elements of the array “arr”. The identical functionality could have been provided by implementing “mean()” like this:
and by invoking it thus:
So, “variadic” boils down to no more than syntax sugar. But it is no less useful for that. It’s far more convenient, and clear (when, but only when, you have the use case for this), just to write the input values as an ordinary list than to clutter the invocation with an array constructor whose purpose is entirely technical and has nothing to do with the semantics of the function.
This last point brings us to an interesting realization. It’s not very common, in an ordinary procedural program in any language to handle an unknown number of items using lots of specifically declared scalar variables. This is exactly where an array (in SQL and PL/pgSQL), or a functional equivalent in some other programming language, finds its use. Here’s a thought experiment:
The need arises, in some subprogram, to assign a number of emergent values. You’re sure that the possible number of values lies between two and nine. So you decide to implement this using nine variables, “v1”, “v2”, and so on through “v9”. You need to calculate some function of however many values you end up with in a particular execution of the subprogram. So you write a case statement like this:
Now you want to ask a “how to” question in connection with another part of this subprogram, and you post the whole text on, say Stack Exchange — even saying that, so far, it all works as intended. You’d be pilloried. No end of folks would seize the chance to tell you how silly your scheme is and that you ought to use an array! No amount of protesting that you’re cross-your-heart-hope-to-die certain that there’ll never be more than nine values would change their opinion.
Fortunately, you don’t need to overthink this. You can have your cake and eat it simply by implementing both the straight “array” variant of “mean()” and re-use that to implement a “variadic” variant. You need to realize only that such a function pair cannot be mutual overloads, so you must distinguish between them by name, like this. Be sure to drop the previous version of “mean()” first.
Test it like this:
Putting anyelement/anyarray and Variadic Together
I’ll use the aggregate functions “max()”, “min()”, “sum()”, and “avg()” as examples here. I’ll show you how to implement them all as user-defined “variadic” polymorphic functions. The implementation will show us one final critical notion: how to implement a function whose return data type polymorphically reflects the data type of the actual arguments with which it is invoked.
You’re likely to want an implementation for each of several input data types. Try “\df avg” to see what I mean. This is the result:
Notice that for some input data types (“numeric”, “double precision”, and “interval”), the output data type is the same as that of the input. But for the remaining input data types (“smallint”, “int”, “bigint”, and “real”), the output data type is different from that of the input. This is readily understood by considering the average of two integers, say, “3” and “4”. Common sense tells you that the answer that you want is “3.5” (a “numeric” value) — and that you would not want the answer to be rounded to the nearest integer. This difference between input and output data types, for some overloads, is significant. I’ll come back to this point presently.
You can implement the function like this, as a “language sql” function:
(Here again, you take advantage of the “unnest()” built-in.)
You might prefer to implement the function as a “language plpgsql” so that you can use an “assert” to check, in line with proper practice, that an invocation meets the specified prerequisites.
Notice that, while the input formal parameter is declared as “anyarray”, the return value is declared, as you’d expect, as the scalar “anyelement”. This is critical. It’s this that says that the function’s return data type should polymorphically reflect the data type of the actual arguments with which it is invoked. But its meaning is maybe not quite what you’d hope for. And if you don’t understand the meaning, then you risk run-time errors with the “language sql” implementation and, worse, silently getting wrong results with the “language plpgsql” implementation. I’ll discuss this point after showing the positive tests.
The implementation takes advantage of the pre-existing set of overloads of the built-in avg() function. And not only is it, therefore, terse; also, and more importantly, you get the proper semantics for these operations (especially with respect to possible NULL inputs) for free. This is huge. You can trivially implement “my_sum()”, “my_greatest()”, and “my_least)” simply by replacing “avg()” in the templates above with, respectively, “sum()”, “max()”, and “min()”. Test the user-defined polymorphic variadic “mean()” like this:
This is the result:
Test the polymorphic input/output behavior like this:
This is the result:
Now is the time to come back, as promised, to the fact that for some overloads of “avg()”, the output data type differs from the input data type. (There are similar cases for some of the overloads of “sum()”, “max()”, and “min()”.)
The rule used to determine the actual data type of the return value, for a given invocation of a user-defined polymorphic function with an “anyarray” input and an “anyelement” output is mechanical — and therefore not always what you want. It’s simply that when “anyarray” is “some_type”, “anyelement” is set to be the corresponding scalar “some_type”. But consider this:
This is the result (just as “\df avg” predicts):
It turns out that, for a “language sql” function, this difference between the actual data type of the computed value, and the calculated data type of the “anyelement” return, is fatal. Try this, after having created the “language sql” variant of “mean()”.
This is the outcome:
I think that the error text would be better if if said this:
But that’s just me being fussy. It’s clear what it means.
Now repeat the same test after having created the “language plpgsql” variant of “mean()”. This is the result:
So you silently got the wrong answer—and, in my book, that’s worse than a clear run-time error!
The workaround, however, is simple. And it works for both the “language sql” and the “language plpgsql” variants. Just write the invocation like this:
Now you get the right result, with no error:
I’ve shown you two interesting and useful features that YSQL stored procedures inherit from PostgreSQL:
- How to use the “anyelement” and “anyarray” pseudo-types to create a polymorphic user-defined function (or procedure) with a dynamically determined polymorphic return value.
- How to use the “variadic” keyword to create a user-defined function or procedure that can accept a comma-separated list of any number of input actual arguments.
And I then showed you this:
- How to use these two features together to implement ordinary versions of any aggregate function.
Finally, it’s worth pointing out that Oracle Database currently supports neither anything equivalent to the pseudo-types like “anyelement” and “anyarray”, nor the ability to create variadic user-defined subprograms. And, as I never miss a chance to point out, the other distributed SQL databases with which we compare YugabyteDB (CockroachDB, TiDB, and our common inspiration Google Cloud Spanner) don’t support stored procedures at all.
Published at DZone with permission of Bryn Llewellyn . See the original article here.
Opinions expressed by DZone contributors are their own.