There is probably no piece of relational database theory which is so poorly understood in professional literature than first normal form. This piece seeks to ground an understanding of 1NF in set mathematics not only so that it can be better leveraged but also so that one can be aware of when (and when not) to violate it.

#### I: Defining First Normal Form

In the previous post, we talked about the definition of function and relation. This is key to understanding first normal form.

First Normal Form requires two things, as defined by Codd:

- Each relation is a well-formed relation (i.e. a well-formed set of fact correlations).
- No element of the relation (i.e. correlated fact) is itself a set. Therefore sets within sets are not allowed.

*Each element of a relation must contain a single value for its domain.*

A clear mathematical way of looking at First Normal Form is:

- Each relation is a set of tuples.
- No tuple contains a set.

#### II: Sets and Ordinality

Let's further note that one defining feature of a set is that it has no ordering. More formally we would say that sets have no ordinality. {1, 2, 3} = {2, 1, 3}. For this reason, no duplicate rows may be allowed, as that means that a relation is not a well-formed set.

Tuples (the correspondences which make up the relation) do have mandatory ordering, and allow duplication. However the ordering is not "natural" and therefore can be specified in relational operations. In relational algebra, the project operation allows you to generate one tuple ordering from another.

Ordinality is an extremely important concept regarding normalization and arrays in the database, as we will see with SQL arrays.

#### III: SQL Arrays, Ordinality and 1NF

An SQL array is actually a mathematical matrix, carrying with it all the principle properties of this. In particular, two basic constraints exist:

- SQL arrays are ordinal, so [1, 2, 3] != [2, 1, 3]
- Multi-dimensional arrays must be well-formed matrices. I.e. [[2, 1, 3], [3, 2, 1]] is ok but [1, 2, [3, 2]] is not.

Arrays can be used in a number of different ways. Some of these violate first normal form. Others do not. For example the following does not violate 1NF (though I am deliberately avoiding PostgreSQL's built-in types for networking):

CREATE TABLE network_addrs ( host_id int primary key, hostname text, ip_as_octets int[] ); INSERT INTO network_addrs(1, 'myhost', ARRAY[192,168,1,124]);

CREATE TABLE web_page ( file_name text primary key, content text, tags text[] ); INSERT INTO web_page ('home.html', '<html></html>', ARRAY['foo', 'bar']);

#### IV: 1NF, Candidate Keys, and Functional Dependencies

select proname, nspname from pg_proc p join pg_namespace n ON p.pronamespace = n.oid;

select proname, nspname from pg_proc p join pg_namespace n ON pronamespace(p) = oid(n);

#### V: Notes about Tuples in Tuples

#### VI: When to Break 1NF

#### VII: Perl modules and Related Concepts

#### VIII: Final Thoughts

First Normal Form is often treated as a given in relational theory. However since most books aimed at database professionals don't cover the mathematical basis for the model, it is generally poorly understood. Understanding First Normal Form however is the beginning of understanding relational logic and relational database systems. It therefore deserves much more exploration than the brief space it is usually given.Understanding 1NF is particularly important when one decides to use a non-1NF database design. These have significant tradeoffs which are not evident to beginners. However, this is only the starting point. In future discussions we will cover related aspects which give a more complete understanding of the basics of the system.

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}