Database Fundamentals #8: All About Data Types
Database Fundamentals #8: All About Data Types
It's important to know the ins and outs of each of the myriad data types available in SQL Server. Look at those data types to see what you should use when designing your database.
Join the DZone community and get the full member experience.Join For Free
Built by the engineers behind Netezza and the technology behind Amazon Redshift, AnzoGraph™ is a native, Massively Parallel Processing (MPP) distributed Graph OLAP (GOLAP) database that executes queries more than 100x faster than other vendors.
SQL Server provides all different kinds of data types in order to give you more flexibility and control over how you store your data. You define these data types as you define the columns in your tables. If you wanted to store information about the date and time that a purchase was made online, you’re going to store the values in a column (or columns) that define dates and times in order to ensure accuracy in your data.
Choosing a Data Type
You could make the data type into one that stores just about anything you give it, such as one of the character types like char or varchar. But do this and you’re going to run into issues when you start running queries against that data. For example, your business may ask for the amount of time between purchases. Since you chose to store the date and time as character values, you won’t be able to use some of the functions provided by the date and time data types. Instead, you’ll have to either convert your data into date and time data types or you’ll have to write your own functions. But, if you used the datetime data types, you get all that functionality and more, such as date/time formatting for different countries and validation that the date and time you’re entering are valid dates and times, as well as Universal Time Code (UTC) values and offsets. The same thing is true of all the various data types. You don’t have to use a specific data type, but you’re sacrificing functionality if you don’t try to specify a data type appropriate to the business need.
Each data type takes up a different amount of space on your system. The measure of space for these data types at the lowest level is measure in bytes. A byte basically defines the amount of storage needed to define a single character. A value like "HORSE" would consist of five bytes. It can be much more complicated than this, but in general, this rule will hold up well. You want to try to store your information in the most efficient manner you can. This means, as much as possible, using the smallest possible data type.
Types of Data
Breaking down the data types into several different categories will make it easier to talk about them. You can think about these as some general types of data available to you in SQL Server:
|Data Type Category||Primary Use|
|Textual Data||For storing words, numbers, and symbols, both large and small.|
|Numeric Data||Stores numbers and only numbers can be integers, decimals, or floating point values.|
|Date and Time||Use to keep date and time information so that special calculations can be performed on that data.|
|Binary Data||For storing computer information such as Microsoft Word documents or even programs.|
|Other Data Types||Speciality data types described separately.|
Each of these data types stores a different kind of information and there are different rules and functions associated with them. Describing everything about these types of data is outside the scope of this blog, but you need to have a good understanding of these types before you can use them appropriately within your own tables.
Textual data represents any kind of information that can be brought into your system and displayed in a human readable form. That includes information like the words in this sentence, numbers, dates, times, just about anything you can think of that is readable by people. But it’s important to note that this information is stored as textual data and not as any of those other things. This data can be broken down into two general types: fixed length, and variable length data.
Fixed-Length Textual Data
Fixed-length textual data stores characters of a specified length. Regardless of the data that you enter, the fixed length data will always include enough blank values to completely fill the space. For example, if you create a column that has a length of ten and you put the value "DOG" into that column, SQL Server will automatically fill in seven blanks, which display as spaces, into the column. That’s because the length is fixed, so it must always be exactly what it says. This would be used for very stringent text requirements for values that must fill in a particular space. This is actually a somewhat more old-fashioned method for storing information so that you can make a series of columns of the data all the same length. Data types include CHAR and NCHAR.
These data types must have a length specified. The difference between the two is that the NCHAR data type allows you to store Unicode information. This allows you to define the character and the character set when storing it. By character set, we’re referring to foreign language characters such as Kanji. Fixed-length data can only be a maximum of 8,000 bytes. This means for a CHAR field, you can have 8,000 characters. For a NCHAR field, because you have to define the character set, you can only have 4,000 characters.
Variable-Length Textual Data
Variable-length data is a much more flexible means of storing textual data. You can store up to 2GB of information for each column. That’s a lot of text. But you’re not always going to want to store that much information. In fact, usually, you’re going to want to put limits on the size of information stored so that you have more control. Having a person’s last name be 2GB long doesn’t make any sense. So, you can put a limit on variable length textual data. This means the information stored will only be as long as what you define it, up to the limit defined. For example, if you define a variable length field of having a maximum of four values when you put the value "DOG" into it, you’re only ever storing three characters (or six for Unicode). Variable length textual data types include VARCHAR, NVARCHAR, TEXT, and NTEXT.
The last two data types, TEXT and NTEXT, are included for backward compatibility. They should not be used in any new databases you create since they’re scheduled for deprecation in an upcoming version of SQL Server. The Unicode storage is much the same as it is in the fixed length textual data. You can define a limit on variable length textual data up to 8,000; after that, things are stored differently so you can’t put a limit. For Unicode data, this would be 4,000. To store data past the 8,000/4,000 limit, use the keyword "MAX" when defining the column so you can store up to 2GB in that column.
VARCHAR and NVARCHAR are the most frequently used textual data types.
While you can store numbers within textual data, you’ll get a lot more functionality by using one of the many numeric data types. Numeric data types offer much more efficient methods for storing information than simply storing numbers as text. You can store more numbers in a smaller space by using these data types. You also get all sorts of mathematical functionality with numbers that you just don’t get with textual data. Numeric data types can be generalized between the precise numbers and the approximate numbers.
These numbers are of fixed-length and fixed decimal places. There is actually quite a bit of difference between the non-decimal numbers and the decimal numbers. The integer values will not allow for any type of decimal notation. This means you get numbers 0, 1, 2, -1, and -2 but you can’t store 1.1. The decimal numbers allow you to store both an integer and a decimal, but the integer is stored as an implied decimal, meaning 1.0 instead of 1. Basically, use the type of number that makes the most sense for the business need. For example, counting the number of online purchases made would be a whole, round number and probably use one of the fixed-length, integer data types. But the purchase values themselves would probably be stored as a decimal value.
The integer data types are listed along with the allowable values within them:
|BIT||1 byte||0, 1|
|TINYINT||1 byte||0 – 255|
|SMALLINT||2 bytes||-32,768 – 32,768|
|INT||4 bytes||-2,147,483,648 – 2,147,483,648|
|BIGINT||8 bytes||-9,223,372,036,854,775,808 – 9,223,372,036,854,775,808|
Each of these types stores more and more information, but you can see that they are storing the information much more efficiently than if you made the values text. For example, to type in a BIGINT data type into a VARCHAR you’d need 19 bytes instead of the 8 needed for the appropriate numeric data type.
The decimal data types are defined differently than the integer data types. You first define the precision. This represents the number of digits, in total, that will be stored. Depending on the precision, the number will take up more space. You then have to define the scale. The scale represents the number of digits to the right of the decimal point. This can be confusing. For example, if you want to define a number of like 312.213, you would give it a precision of 6 because it is six digits in total, and a scale of 3 because there are three digits to the right of the decimal point. You can store up to 38 digits, which means a maximum precision of 38. These are the decimal data types: DECIMAL and NUMERIC.
This is how setting different precision values affects storage size of the data types:
In addition to the general decimal types, there are also two types that are dedicated to storing values associated with money. The unique functionality they offer is that they will display as the appropriate money system that you define for them. However, they are much more limited in terms of size and flexibility so are not used that often. These types are SMALLMONEY and MONEY.
The SMALLMONEY data type stores values between -214,748.3648 and 214,748.3647 and is 4 bytes in size. MONEY expands the values to -922,337,203,684,477.5808 to 922,337,203,685,477.5807 with a size of 8 bytes.
These numbers use floating point storage and are therefore not considered to be completely accurate. In theory, as you modify the values, the floating point decimal could cause rounding errors. This is seldom evident, but might be for extremely high number values. The data types and their storage is defined as:
|Real||3.40E + 38 to -1.18E – 38, 0 and 1.18E – 38 to 3.40E + 38||4|
|Float(n)||– 1.79E+308 to -2.23E-308, 0 and 2.23E-308 to 1.79E+308||1-24||7||4|
These data types are used primarily for scientific types of data storage and use of them to store monetary values are not encourage
Date and Time Data
Date and time data types can be some of the more complicated to use and understand. While everyone gets the idea of a date or the time of day, the fact that these data types store information in ways that seem completely unrelated to the date and time that people thought they were recording leads to confusion. This is primarily the result of forgetting that you can store a date, a time, or a date and time. When storing a date and time, just because you didn’t supply either the date or the time value doesn’t mean that one wasn’t supplied. Depending on how you define the information when you store it, you could end up with interesting results.
The key is to use the right data type for the job. Here are time and date data types, their sizes, and uses:
|DATE||3||January 1, 1 CE – December 31, 9999 CE||Stores date values within the range specified.|
|DATETIME||8||January 1, 1753 – December 31, 9999 and times 00:00:00 – 23:59:59.997||Stores date and time values down to fractions of a second within the range specified.|
|SMALLDATETIME||4||January 1, 1900 – June 6, 2079 and times 00:00:00 – 23:59:59||Stores date and time values down to the second within the range specified.|
|DATETIME2(n)||6-8||January 1, 1 CE – DECEMBER 31, 9999 CE and times 00:00:00 – 23:59:59.9999999||A much wider range of time values for more accurate storage than that represented by the standard DATETIME data type. Scale can vary by changing (n).|
|TIME(n)||3-5||00:00:00 – 23:59:59.9999999||Stores time values down to the fraction of a second. Scale can vary by changing the value of (n).|
|DATETIMEOFFSET(n)||10||January 1, 1 CE – December 31, 9999 CE and times 00:00:00 – 23:59:59.9999999||Stores date and time values down to the fraction of a second as well as a time zone for that date based on an offset from GMT.|
As you can see, there is quite a large number of ways you can store date values. It’s always best to store the smallest data type for the type of values needed. Further, with dates and times, you need to be sure whether you need both date and time or simply need one or the other. With all that information, you can make a more informed choice.
Binary is a special data type for storing formatted, machine-readable binary information. You can store it as fixed length or variable length values, similar to the textual data type. This data is primarily used for special types of document management applications or storing encrypted data. You’ll seldom need to use this for most business applications. The data types are BINARY and VARBINARY.
The size of the data depends on the size defined. Like with textual data, you can store up to 8,000 bytes in the fixed data types and up to 2GB in the VARBINARY as long as you use the MAX keyword to define the data type.
Other Data Types
There are a number of special data types within SQL Server for dealing with very particular types of information. Detailing these will take too long, but you should be aware that some of these exist in order to take advantage of the special opportunities they offer. The special data types include the following.
This data type is specially designed to hold well-formed XML information. This means you can define an XML fragment or an entire XML document and have the formatting enforced by the data type. This may be useful if you need to deal with XML information directly.
This data type creates a special hexadecimal value based on information from your computer to make what should be a globally unique value that you can use to identify information. This is especially attractive when you would prefer to create unique values on the client side of an application rather than directly within the database. But these are rather large values, coming in at 16 bytes, so you want to be careful about how and where you use them.
The HIERARCHYID data type is used to create an identifier within a hierarchy such as a relationship tree used when setting up a business hierarchy or similar activity. The data type doesn’t create a tree, but it does store a location within a specified hierarchy.
This stores either geographic or geospatial information so that you can define shapes, points, and lines as part of a drawing or on a map. This offers a whole range of functionality relating to tieing database information to locations. Many businesses find this type of data extremely useful.
That is an overview of the types of information that you can store. With this information in hand, you can go on to create several tables in the database. Just remember that you really should pick the right data type for the right situation. Cramming the wrong data into the wrong data type just leads to problems and lots of rework.
Published at DZone with permission of Grant Fritchey , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.