On Databases, Temporal Data Types, and Trains
Temporal data types—those dealing with time—can be troublesome if not handled correctly. See how they've developed and how to implement them properly.
The data type aspect of databases is a key feature, as is it when it comes to programming languages. I would guess that all programming languages, with the possible exception of assembly, provides a set of predefined “built in” datatypes. Some programming languages are limited in this respect, like Forth, where others have a larger set of types. A data type determines what data can be stored, what operations are allowed, and semantics.
One family of data types that is present in more or less all relational databases (I don’t say all here as I know someone will tell me about an arcane relational database system developed in Burundi where this is not true) is the temporal type, i.e. data types that hold a time value. This is in difference to most programming languages, where the native data types are numeric and strings, and all other types are extensions using some kind of structure style.
So, databases have temporal data types, programming languages do not (OK, that is a generalization). In this blog post, I will look at some aspects of temporal datatypes, and in a later blog post, I will dig even deeper into this outrageously interesting subject.
Temporal Datatypes in Databases
Before we get into the aspect of trains, let’s spend some time with the temporal datatypes themselves. As already stated, the type of an item, among other things, determines the semantics of the type. Let’s start with a look at the temporal data types in MariaDB, they are DATETIME, TIMESTAMP, DATE, TIME, and YEAR. If we, for a second, assume that you don’t know anything about how MariaDB looks at these, you might ask yourself what the difference is between TIMESTAMP and DATETIME, so let’s start with there.
Both DATETIME and TIMESTAMP store a date and a time, but there is where the similarities end. And by the way, I’m not saying that we should change the behavior of these datatypes, just that they are sometimes a bit odd.
The DATETIME data type is more recent and is more in line with other relational databases, but on the other hand, it takes up more space on disk. Both of these types also have microsecond support (since MariaDB 5.3). To enable this, you add a precision, such as DATETIME(6). The reason a TIMESTAMP is more compact is that it can only hold a limited range of dates, from Jan. 1, 1970, up to 2038. Well, that should be enough for most purposes right? Yes, just as representing a year with just 2 digits and allowing for 640 K of RAM was “enough” a few years back! Those kinds of assumptions really made the world a better place for us all.
As for the DATE, TIME, and YEAR datatypes, I will skip them for and now focus on DATETIME and TIMESTAMP.
One Thing You Do NOT Want to Use Temporal Types For
I had a customer, many years ago, possible way back during the Reagan administration or so, that had an issue. They were using TIMESTAMP, using millisecond precision, in data in their OLTP systems as a PRIMARY KEY. They had determined that there would not be more than 1 transaction per microsecond, and that this would work. It didn’t. For the simple reasons that:
- On average, there was a lot fewer than 1 transaction per microsecond, but during high load times, it could very well be more than this.
- Computers tend to get faster over time, and the load of popular services also increases, which means this scheme was bound to break faster the better it was.
- This was a stupid assumption.
Their solution was to have the transaction retry when they had a PRIMARY KEY violation, which was neither effective nor performant or practical. Don’t do something even remotely similar to this!
Other Relational Database Support for Temporal Data Types
Other relational databases also support temporal data types, and you might have sensed that I feel that the MariaDB temporal data types are a bit awkward. This is not so, though, as all relational database temporal data types have quirks, to say the least, so it is appropriate to have a look at this, too.
Let’s begin with Oracle, where there is support for DATE and TIMESTAMP. Oracle also supports INTERVAL types. As for Oracle, DATE is the oldest Oracle temporal datatypes, and it is rather odd in a few ways. One such oddity is that although the type is called DATE — and when querying it, by default, you get a proper date back — it actually stores the TIME also, up to seconds. Which means that two fields that look like they have the same value using the default format, when compared, might still fail. Odd, to say the least.
As for SQL Server, things are messier still. Here a TIMESTAMP is actually a table attribute that works much like the way the first TIMESTAMP column in a MariaDB table works — in that it keeps track of the last insert/update to the row. Then SQL Server has both DATETIME and a DATETIME2 data types, where the former has a limited date range. SQL Server also has a DATETIMEOFFSET, which is pretty odd. I will not get into NULL handling with temporal data types in SQL Server and I will avoid giving you a headache. Also in SQL Server are the DATE and TIME data types, as well as a SMALLDATETIME, where the latter is a more compact DATETIME with no fractional seconds and, again, a limited range.
I have not gotten into the issue of how relational databases treat temporal datatypes with incorrect data and NULL values. Note that “incorrect” data when it comes to temporal data is a fuzzy subject. Handling leap years is no big deal, but maybe you haven’t heard about leap seconds? They are there to compensate for the earth slowing down its rotation, and leap seconds are added now and then to compensate for this. In fact, we have one such coming up by the end of this year, the last minute of 2016 will have 61 seconds, so that 2016-12-31 23:59:60 is actually a valid time, something not recognized by MariaDB, Oracle, SQL Server or, for that matter, Linux (at least where I tested it) which all report this as an invalid time specification. If someone asks you, “How many seconds are there in a day?” your answer should be, in true engineering fashion, “It depends,” and if you write code assuming there are always 86400 (24 * 60 * 60) seconds in a day, you might be making a mistake, depending on the situation.
Another situation is with financial services, where domain-specific calendars are used in some cases, like a 360-day calendar where each year is considered to have 12 months of 30 days each. This is used, for example, with interest rate calculations, which is why every monthly mortgage payment of your house is the same amount, despite the fact that some months are shorter and others are longer in your calendar (but not in the calendar used by your bank).
SQL Standard Temporal Types
In the SQL Standard, let’s assume SQL-99, there are three temporal data types: TIMESTAMP, DATE and TIME. The Oracle TIMESTAMP that was added in Oracle 9 are reasonably well in line with Standard SQL-99.
The SQL Standard TIME and TIMESTAMP data types have a number of attributes, namely a precision, in terms of fractional seconds, and whether TIME ZONES are used or not. Which brings us to the issue of time zones. Are they an issue? Of course. But it would probably be even worse without them.
Time Zones and Trains
I guess you are wondering what trains have to do with all this and with relational databases in particular? Well, trains are nice and fun and the same goes for relational databases, right? Or maybe not. No, there is an aspect of temporal data that is related to trains. Let’s go back a few years in time to when trains were all new and hot, which is around the year 1996. No wait, that was Netscape. We are talking about is the mid-1800s when trains caused people to travel a lot more and to travel much longer distances. There were no time zones though, so noon was different for every station on the railroad, which was when the sun was at the zenith at that particular station. This meant that if you traveled for 1 hour, you might end up 53 minutes away from your origin.
In 1884, it was determined that we should all have one meridian (the one in Greenwich) and that we should have 24 time zones spread around the globe. And this would be a world standard and as such enforced, you better follow this, or else…
As we all know, standards are universally accepted more or less always and they are also backward compatible. And this is why the SQL standard works — so that all SQL databases can talk to each other, all SCSI connectors (remember SCSI? I do, but then I’m an old fart) fit any other SCSI connector, and, for a more modern example of a truly successful standard, all HDMI cables and connectors works with all other HDMI cables, connectors, screens, players and what have you.
This explains why the good intentions of having 24 time zones 1 hour apart around the globe isn’t really followed. In particular, India and Australia screw things up with 15, 30 and 45 minute time zones.
MariaDB Temporal Datatypes and Time Zones In Practice
And what has this got to do with relational databases, you ask? Well, there is one difference between TIMESTAMP and DATETIME in MariaDB that we haven’t mentioned so far, and that is timezone support. The MariaDB TIMESTAMP data type supports time zones, which the DATETIME datatypes does not. What does this mean then? Well, it means that a TIMESTAMP data values is stored together with the time zone of the value. To explain what this means, let’s look at an example, first we need a table to insert some data into for our tests:
MariaDB> CREATE TABLE temporaltest (timestamp_t timestamp, datetime_t datetime);
Before moving on from this, let’s look at how MariaDB interprets this table definition:
MariaDB> SHOW CREATE TABLE temporaltest\G *************************** 1. row *************************** Table: temporaltest Create Table: CREATE TABLE `temporaltest` ( `timestamp_t` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `datetime_t` datetime DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1
As you can see, MariaDB adds a few things to the simple TIMESTAMP column, like a NOT NULL clause that we didn’t specify and a default value that we didn’t ask for. This is for backward compatibility — to align more recent MariaDB TIMESTAMP column semantics with how a TIMESTAMP used to work way back during the Harding administration. Before we move on, let’s set the timezone of your server. The default for MariaDB is to use the timezone as defined by the host operating system, but in many cases, that is not a good idea in production use. To be able to set the time zone with MariaDB, we first have to import time zone information into MariaDB, and that is done by running this from the command line:
$ mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql -p
This assumes that the zoneinfo file is at the default location — if you have installed MariaDB from a tarball, it will be somewhere in that path. With that in place, let’s set the timezone:
MariaDB> SHOW VARIABLES LIKE 'time_zone'; +---------------+--------+ | Variable_name | Value | +---------------+--------+ | time_zone | SYSTEM | +---------------+--------+ 1 row in set (0.00 sec) MariaDB> SET time_zone = 'PST8PDT'; Query OK, 0 rows affected (0.00 sec)
So now we are running with the PST timezone instead of the one defined by the operating system. So far so good. Let’s then insert some data into the table we created above:
MariaDB> INSERT INTO temporaltest VALUES('2015-03-08 02:05:00', '2015-03-08 02:05:00'); MariaDB> SELECT * FROM temporaltest; +---------------------+---------------------+ | timestamp_t | datetime_t | +---------------------+---------------------+ | 2015-03-08 02:05:00 | 2015-03-08 02:05:00 | +---------------------+---------------------+ 1 row in set (0.00 sec)
OK, that looks fine, right? Now, let’s assume that we move this server to the east coast and set the time zone as appropriate for that and select the above data again:
MariaDB> SET time_zone = 'EST'; Query OK, 0 rows affected (0.00 sec) MariaDB> SELECT * FROM temporaltest; +---------------------+---------------------+ | timestamp_t | datetime_t | +---------------------+---------------------+ | 2015-03-07 21:05:00 | 2015-03-08 02:05:00 | +---------------------+---------------------+ 1 row in set (0.00 sec)
As you can see, with a different timezone we get data back adjusted to the timezone different for the TIMESTAMP column, but not for the DATETIME column. That does make a difference, right? If you run with MariaDB clients in different time zones, all those clients may well insert data using different time zones! If the server and the client are in different time zones, the data is converted to the timezone of the server and when retrieving data it is converted back to that of the client. Seems fair, right.
Maybe this thing with time zones wasn’t such a bad and difficult thing after all? Yes, it can be handled, but that was before Germany was about to run out of electricity and in an attempt to fix that caused years of suffering to cows and programmers across the globe. That story will be told in the next part of this series of blogs, though, so don’t touch that dial, I’ll be right back.