Introduction to PostgreSQL Arrays, Basic Concepts and Normalization Considerations
Join the DZone community and get the full member experience.
Join For FreeThis is part of a two part series. The second will discuss indexing arrays which is a complex topic.
PostgreSQL has very good support for arrays of other data types. One
can have nested data types where an attribute of a relation stores an
array of tuples, and each of these tuples can hold arrays of other
tuples or primitives.
Arrays are extremely powerful in PostgreSQL but they pose a number of
significant issues as well. They can be used in ways which break first
normal form but mere use of an array does not necessarily break first
normal form's atomicity requirement because arrays differ in significant
fundamental ways from relations. Understanding arrays in PostgreSQL
can help avoid costly errors. Relations and arrays are subject to
fundamentally different constraints and therefore are equivalent in only
a few very special cases.
Arrays derive their power in PostgreSQL due to the fact that they have strong definitional requirements.
An Array is a Mathematical Matrix
PostgreSQL arrays follow the basic definition of a mathematical matrix. The array must be rectangular so if one element has an array in it, all other elements must have an array of the same dimensions. All members of an array must be of the same data types.
The following are thus valid Integer arrays:
- '{1,2,3,4,5}'
- '{{1,2},{3,4},{5,6},{7,8},{9,10}}'
The following are not valid integer arrays:
- '{1,2,3,{4, 5}}'
- '{{1,2},{3,4},{5}}'
Arrays vs Relations vs Tuples (and First Normal Form)
Arrays bear some relationship to relations and tuples, but all three are different in significant ways which means they are best at solving different problems.
Like tuples, arrays are ordered. However:
- Unlike tuples every array element must have the same data type. This means that arrays of text can represent tuples of any type.
- Unlike tuples, arrays do not have a fixed number of elements. Elements can be added without disturbing the basic type.
- Arrays are ordered, relations are not. This means that an array value is a domain, while a relation value is a set or bag of domains (depending on constraints).
- All data types in an array must be identical. Relations do not have this restriction.
For example we could store matrices for matrix arithmetic as numeric[] arrays. We could store some problems that are transformed into numeric[] matrices into the db as well (for example simultaneous linear equations) and these would not necessarily violate 1NF.
Example: Solving Simultaneous Linear Equations in PL/PGSQL
I will leave understanding this as an exercise to the reader. It is, however, a small example. This is a very basic program I originally wrote in BASIC on my C64 when I was younger as a homework solving program, and have since written a number other incarnations as a demonstration. it solves simultaneous linear equations by arraying them in matrix and reducing the matrix accordingly. The program will only accept an n by n+1 matrix, and then will calculate.
This example is useful because it shows that anything you can solve through matrix math you can solve using PostgreSQL arrays....
create or replace function solve_linear(numeric[]) returns numeric[] language plpgsql as $$ declare retval numeric[]; c_outer int; -- counters c_inner int; c_working int; upper1 int; upper2 int; ratio numeric; --caching of calculation begin IF array_upper($1, 1) <> array_upper($1, 2) - 1 THEN RAISE EXCEPTION 'bad input, must be n x n+1 matrix'; END IF; upper1 := array_upper($1, 1); upper2 := array_upper($1, 2); FOR c_outer IN 1 .. upper1 LOOP FOR c_inner IN 1 .. upper1 LOOP IF c_inner = c_outer THEN CONTINUE; END IF; ratio := $1[c_inner][c_outer]/$1[c_outer][c_outer]; for c_working in 1 .. upper2 loop $1[c_inner][c_working] := $1[c_inner][c_working] - ($1[c_outer][c_working] * ratio); end loop; end loop; end loop; retval := '{}'; for c_outer in 1 .. array_upper($1,1) loop retval := retval || $1[c_outer][array_upper($1,1) + 1]/ $1[c_outer][c_outer]; end loop; return retval; end; $$;
As an example:
select * from solve_linear('{{1,2,3},{2,3,4}}'::numeric[]);
solve_linear
----------------------------------------------------------
{-1.00000000000000000000000000000000,2.0000000000000000}
(1 row)
In other words, if we have 1x + 2y = 3 and 2x + 3y = 4, then these will intersect where x is -1 and y is 2.
Yes, it is order independent.
select * from solve_linear('{{2,3,4},{1,2,3}}'::numeric[]);
solve_linear
----------------------------------------------------------------------
{-1.0000000000000000000000000000000000000000,2.00000000000000000000}
(1 row)
However, ordering within a subarray is important:
select * from solve_linear('{{2,4,3},{1,3,2}}'::numeric[]);
solve_linear
---------------------------------------------------------------------
{0.5000000000000000000000000000000000000000,0.50000000000000000000}
(1 row)
One minor issue with the above code is that it throws a division by 0 if the lines are parallel:
select * from solve_linear('{{1,2,3},{1,2,6}}'::numeric[]);
ERROR: division by zero
CONTEXT: PL/pgSQL function "solve_linear" line 19 at assignment
Consequently error handling could be improved, but it isn't too bad for a demonstration.
As it turns out this is a fairly generalizable class of problems and may be particularly helpful when doing geometric math.
I hope this helps show some of the uses and value of PostgreSQL array
types. In the next post we will talk about arrays and indexes.....
Published at DZone with permission of Chris Travers, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments