{{announcement.body}}
{{announcement.title}}

4 Compelling Use Cases for PostgreSQL Table Functions

DZone 's Guide to

4 Compelling Use Cases for PostgreSQL Table Functions

In this article, explore four use cases for PostgreSQL table functions.

· Database Zone ·
Free Resource

Welcome to part three of a three-part series of posts on PostgreSQL’s table functions. These functions can be easily leveraged in a distributed SQL database like YugabyteDB, which is PostgreSQL compatible.

In this final post, I’ll discuss four realistic use cases that call for user-defined table functions and I’ll explain how each case is met using complete, self-contained code examples.

If you missed the two previous posts, make sure to check out:

Use Case #1: Parameterized View

In my blog post “Using Stored Procedures in Distributed SQL Databases”, I described an experiment to measure the speeds of transactions with a variable number of insert statements per transaction when these are implemented respectively as a sequence of top-level database calls or as a single call to a stored procedure that has the same effect. I computed the procedure-to-top-level-call speed ratios, as a function of the number of inserts per transaction. I did this by recording the result for each test in a table and by then running code like I describe in this section.

You might also want to read:  Essential PostgreSQL

This table is a stylized version of the table that I used:


SQL
xxxxxxxxxx
1
 
1
create table results(
2
  test_no int,
3
  method varchar(20),
4
  duration dec
5
    constraint results_duration_nn not null,
6
    constraint results_duration_chk check(duration > 0),
7
  constraint results_pk primary key (test_no, method));

Use this to insert some data to illustrate the behavior of the code that I describe in this section:

SQL
xxxxxxxxxx
1
11
 
1
insert into results(test_no, method, duration)
2
values
3
  (1, 'Method A', 10.1),
4
  (1, 'Method B',  8.6),
5
  (1, 'Method C',  5.4),
6
  (2, 'Method A', 20.6),
7
  (2, 'Method B', 17.3),
8
  (2, 'Method C',  9.9),
9
  (3, 'Method A', 30.4),
10
  (3, 'Method B', 23.8),
11
  (3, 'Method C', 16.2);

This table function computes the speed ratios for two specified methods for all the test_no values:

SQL
xxxxxxxxxx
1
18
 
1
create function speed_ratios(
2
  method_1 in results.method%type,
3
  method_2 in results.method%type)
4
  returns table(test_no results.test_no%type, ratio dec)
5
   language sql
6
as $body$
7
select
8
  test_no,
9
  method_a_duration/method_b_duration as ratio from (
10
  select
11
    test_no,
12
    t1.duration as method_a_duration,
13
    t2.duration as method_b_duration
14
  from results t1 inner join results t2
15
  using(test_no)
16
  where t1.method = method_1
17
  and   t2.method = method_2) as a;
18
$body$;

Test it like this:

SQL
xxxxxxxxxx
1
 
1
select
2
  test_no as "Test No.",
3
  to_char(ratio, '990.99')||'x' as "Speed Ratio"
4
from speed_ratios('Method A', 'Method B')
5
order by 1;

It produces this output:

SQL
xxxxxxxxxx
1
 
1
 Test no. | Speed Ratio 
2
----------+-------------
3
        1 |    1.17x
4
        2 |    1.19x
5
        3 |    1.28x


Of course, you could generate a report like this with a client-side program. But doing so would lose a major benefit. The table function approach lets you run arbitrary queries to answer arbitrary questions, like this:

SQL
xxxxxxxxxx
1
 
1
select to_char(
2
  (select max(ratio) from speed_ratios('Method A', 'Method B'))/
3
  (select max(ratio) from speed_ratios('Method B', 'Method C')),
4
  '990.99')
5
as x;

Use Case #2: Pretty Printed Ad Hoc Reports
for Administrators

In “Use case #1”, there are three distinct values for results.test_no in the example sample data that I use. That means that there are three distinct comparison tables to be looked at: “Method A” vs “Method B”“Method B” vs “Method C”, and “Method A” vs “Method C”. Rather than producing these by executing a .sql script with the three explicit “select… from speed_ratios(…)” statements, interspersed with appropriate \echo meta-commands, it would be nicer just to execute a single stored procedure. However, the only way to get output from an ordinary PL/pgSQL procedure is to use raise info (or raise notice). When you run such a program from a script, then every output line looks like this:

SQL
xxxxxxxxxx
1
 
1
ysqlsh:raise_info_demo.sql:9: INFO:  Some output


Moreover, the line numbers will, in general, all be different — so you can’t simply remove this awful clutter with a simple “search and replace all” in an ordinary text editor. This is especially frustrating for developers and administrators who, using other databases, assemble a kit-bag of scripts and programs to generate useful, and usably readable, reports that — just like for the present use case — assemble the output from several separate queries. Here’s where a table function comes to the rescue. This is the basic idea:

SQL
xxxxxxxxxx
1
26
 
1
create function pretty_report_speed_ratios()
2
  returns table(t varchar)
3
  language plpgsql
4
as $body$
5
begin
6
  for j in 1..5 loop
7
    t := '';
8
    return next;
9
  end loop;
10
 
11
  t := '==================================================';
12
  return next;
13
  t := 'Speed Ratios Report';
14
  return next;
15
  t := '==================================================';
16
  return next;
17
  t := '';
18
  return next;
19
 
20
  <<"loop over the method combinations">>
21
  for j in 1..3 loop
22
    ...
23
  end loop "loop over the method combinations";
24
 
25
end;
26
$body$;

Run the report like this:

SQL
xxxxxxxxxx
1
 
1
select t from pretty_report_speed_ratios();

The loop over the method combinations starts like this:

SQL
xxxxxxxxxx
1
23
 
1
<<"loop over the method combinations">>
2
for j in 1..3 loop
3
  declare
4
    method_1 results.method%type;
5
    method_2 results.method%type;
6
  begin
7
    case j
8
      when 1 then
9
        method_1 := 'Method A';
10
        method_2 := 'Method B';
11
      when 2 then
12
        method_1 := 'Method B';
13
        method_2 := 'Method C';
14
      when 3 then
15
        method_1 := 'Method A';
16
        method_2 := 'Method C';
17
    end case;
18
 
            
19
    <<"do each method-pair combination">>
20
      ...
21
    end "do each method-pair combination";
22
  end;
23
end loop "loop over the method combinations";


Here’s the final missing piece:

SQL
xxxxxxxxxx
1
21
 
1
<<"do each method-pair combination">>
2
declare
3
  tt varchar(20);
4
  rr varchar(20);
5
  n int := 0;
6
begin
7
  for tt, rr in (
8
    select
9
      to_char(test_no, '9999999'),
10
      to_char(ratio, '99990.99')||'x'
11
    from speed_ratios(method_1, method_2)
12
    order by 1)
13
  loop
14
    n := n + 1;
15
    t := tt||'   '||rr;
16
    if n = 1 then
17
      t := t||'  << '||quot||method_2||quot||' to '||quot||method_1||quot;
18
    end if;
19
    return next;
20
  end loop;
21
end "do each method-pair combination";

And here’s the output produced by  pretty_report_speed_ratios():

SQL
xxxxxxxxxx
1
19
 
1
==================================================
2
 Speed Ratios Report
3
 ==================================================
4
 
5
 Test No.  Speed Ratio
6
 --------  -----------
7
 
8
        1        1.17x  << 'Method B' to 'Method A'
9
        2        1.19x
10
        3        1.28x
11
 
12
        1        1.59x  << 'Method C' to 'Method B'
13
        2        1.75x
14
        3        1.47x
15
 
16
        1        1.87x  << 'Method C' to 'Method A'
17
        2        2.08x
18
        3        1.88x
19
 ==================================================

We can see at a glance that “Method B” is about 1.2x faster than “Method A”, that “Method C” is about 1.6x faster than “Method B”, and that, of course, “Method C” is, therefore, about 1.9x faster than “Method A”.

The full source text of the pretty_report_speed_ratios() table function is presented in the appendix so that you can simply copy and paste it into ysqlsh and then run it.

Use Case #3: Dynamic IN List

Assume that we have table t with this content:

SQL
xxxxxxxxxx
1
11
 
1
  k  |  v   
2
-----+------
3
 100 | t_01
4
 200 | t_04
5
 300 | t_42
6
 400 | t_04
7
 500 | t_10
8
 600 | t_42
9
 700 | t_07
10
 800 | t_16
11
 900 | t_42

The code to create this is shown in the “Built-in SQL table functions” section under “generate_series()”. Developers and administrators usually type an explicit in list in an ad hoc query like this:

SQL
xxxxxxxxxx
1
 
1
select k, v from t where k in (200, 300, 500, 900) order by 1;

But what if this is to be issued from a PL/pgSQL procedure and the in list members aren’t known until run-time? The naïve programmer tends to think that this calls for dynamic SQL where the in list is constructed programmatically. But there’s a much better and simpler way to write it:

SQL
xxxxxxxxxx
1
 
1
select k, v from t where k in (
2
  select unnest as k
3
  from unnest(
4
    array[200, 300, 500, 900]::int[]))
5
  order by 1;

The functionality of the unnest() built-in SQL table function can be implemented as a user-defined table function thus:

SQL
xxxxxxxxxx
1
13
 
1
create function my_unnest(ks int[])
2
  returns table(k int)
3
  language plpgsql
4
as $body$
5
declare
6
  element int;
7
begin
8
  foreach element in array ks loop
9
    k := element;
10
    return next;
11
  end loop;
12
end;
13
$body$;


Use it like this:

SQL
xxxxxxxxxx
1
 
1
select k, v from t where k in (
2
  select k
3
  from my_unnest(
4
    array[200, 300, 500, 900]::int[]))
5
  order by 1;

Of course, it’s pointless just to re-implement the built-in unnest(). Moreover, the built-in can accept an actual argument of any datatype. I illustrated a few possibilities in the section that described it. You’d have to implement each overload that you needed explicitly if you programmed it using PL/pgSQL.

My purpose in showing you my_unnest() is this:

  • firstly, to show you, ordinarily, how to iterate over the elements in an array in PL/pgSQL;
  • and secondly, to show you the overall structure of a table function that computes values that you can then use as an in list in a subquery.

Who knows what arbitrary computation you might want to use to produce the in list values—and, of course, the built-in unnest() simply has its fixed functionality.

Use Case #4: Compact Syntax for Bulk Insert

Consider the procedure created thus:

SQL
xxxxxxxxxx
1
11
 
1
create procedure insert_rows(vs in varchar[])
2
  language plpgsql
3
as $body$
4
declare
5
  element t.v%type;
6
begin
7
  foreach element in array vs loop
8
    insert into t(v) values(element);
9
  end loop;
10
end
11
$body$;

If we have a table created thus:

SQL
xxxxxxxxxx
1
 
1
create table t(
2
  k uuid
3
  default gen_random_uuid()
4
  constraint t_pk primary key,
5
  v varchar(100));


then we can populate it thus:

SQL
xxxxxxxxxx
1
 
1
call insert_rows(
2
  array['one', 'two', 'three']);

I used the same “default gen_random_uuid()” approach to populate the table’s primary key column in my code example in my “Using Stored Procedures in Distributed SQL Databases” post. I pointed out there that you must make gen_random_uuid() available by installing the PostgreSQL extension like this:

SQL
xxxxxxxxxx
1
 
1
create extension pgcrypto

as explained in the YugabyteDB documentation here.

Notice that the procedure simply loops over the elements of the input array and executes an explicit single-row insert for each value. This looks to be inefficient; but PL/pgSQL doesn’t support methods for bulk binding to insertupdate, or delete statements.

Here’s an alternative implementation of insert_rows() that uses unnest():

SQL
xxxxxxxxxx
1
 
1
create or replace procedure insert_rows(vs in varchar[])
2
  language plpgsql
3
as $body$
4
begin
5
  insert into t(v)
6
  select unnest from unnest(vs::varchar[]);
7
end;
8
$body$;

Of course, the use of this second implementation of insert_rows() is the same as that for the first implementation that I showed. It’s certainly shorter than the implementation that uses this:

SQL
xxxxxxxxxx
1
 
1
foreach element in array vs loop
2
  insert into t(v) values(element);
3
end loop;

It’s, therefore, easier to understand and review. However, my tests showed that the performance of the two implementations, and of a third that uses my_unnest() was the same within the confidence limits of my measurements.

Conclusion

When developers have a rich set of tools, they’re much better placed to implement an optimal approach for a new use case than when they’re limited by an impoverished toolkit. This three-part series of articles has presented several use cases where table functions showed their value. User-defined table functions are part of the larger picture of stored procedures, so to use them, you have to use a distributed SQL database that supports these!

YugabyteDB inherits PostgreSQL’s support, and in this post series, I’ve described examples implemented both in SQL and in PL/pgSQL. The examples have used a very wide range of PostgreSQL’s SQL and stored procedure functionality. The fact that all the code that I’ve shown works identically in YugabyteDB and vanilla PostgreSQL is a testament to the success of our strategy to use the upper half of the PostgreSQL query layer simply “as is” on top of our own unique storage layer. This architecture makes YugabyteDB unique among distributed SQL databases.

Appendix

Here is the complete source code for the use case “Pretty printed ad hoc reports for administrators”:

SQL
xxxxxxxxxx
1
80
 
1
create function pretty_report_speed_ratios()
2
  returns table(t varchar)
3
  language plpgsql
4
as $body$
5
declare
6
  quot constant varchar(1) not null := '''';
7
begin
8
  for j in 1..5 loop
9
    t := '';
10
    return next;
11
  end loop;
12
 
            
13
  t := '==================================================';
14
  return next;
15
  t := 'Speed Ratios Report';
16
  return next;
17
  t := '==================================================';
18
  return next;
19
  t := '';
20
  return next;
21
 
            
22
  t := 'Test No.  Speed Ratio';
23
  return next;
24
  t := '--------  -----------';
25
  return next;
26
 
            
27
  <<"loop over the method combinations">>
28
  for j in 1..3 loop
29
    declare
30
      method_1 results.method%type;
31
      method_2 results.method%type;
32
    begin
33
      case j
34
        when 1 then
35
          method_1 := 'Method A';
36
          method_2 := 'Method B';
37
        when 2 then
38
          method_1 := 'Method B';
39
          method_2 := 'Method C';
40
        when 3 then
41
          method_1 := 'Method A';
42
          method_2 := 'Method C';
43
      end case;
44
 
            
45
      t := '';
46
      return next;
47
 
            
48
      <<"do each method-pair combination">>
49
      declare
50
        tt varchar(20);
51
        rr varchar(20);
52
        n int := 0;
53
      begin
54
        for tt, rr in (
55
          select
56
            to_char(test_no, '9999999'),
57
            to_char(ratio, '99990.99')||'x'
58
          from speed_ratios(method_1, method_2)
59
          order by test_no)
60
        loop
61
          n := n + 1;
62
          t := tt||'   '||rr;
63
          if n = 1 then
64
            t := t||'  << '||
65
                    quot||method_2||quot||' to '||quot||method_1||quot;
66
          end if;
67
          return next;
68
        end loop;
69
      end "do each method-pair combination";
70
    end;
71
  end loop "loop over the method combinations";
72
 
            
73
  t := '==================================================';
74
  return next;
75
  for j in 1..5 loop
76
    t := '';
77
    return next;
78
  end loop;
79
end;
80
$body$;

Run it like this:

SQL
x
 
1
\t on
2
select t from pretty_report_speed_ratios();
3
\t off

The \t on meta-command simply turns off the clutter of the column header (the name of the column t), the underline for that, and the row count footer.

Further Reading

Using Object Types in Relational Tables, Part 4

Creating Tables Dum and Dee in PostgreSQL

Topics:
postgresql ,oracle ,mysql ,mongodb ,sql ,distributed database ,kubernetes ,cloud native

Published at DZone with permission of Bryn Llewellyn . See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}