When using a relational database, normalization can help keep the data free of errors and can also help ensure that the size of the database doesn't grow large with duplicated data. At the same time, some types of operations can be slower in a normalized environment. So, when should you normalize and when is it better to proceed without normalization?
What Is Normalization?
Database normalization is the process of organizing data within a database in the most efficient manner possible. For example, you likely do not want a username stored in several different tables within your database when you could store it in a single location and point to that user via an ID instead.
By keeping the unchanging user ID in the various tables that need the user, you can always point it back to the appropriate table to get the current username, which is stored in only a single location. Any updates to the username occur only in that place, making the data more reliable.
An example of using the first normal form. Source: ChatterBox's .NET.
What Is Good About Database Normalization?
A normalized database is advantageous when operations will be write-intensive or when ACID compliance is required. Some advantages include:
- Updates run quickly due to no data being duplicated in multiple locations.
- Inserts run quickly since there is only a single insertion point for a piece of data and no duplication is required.
- Tables are typically smaller than the tables found in non-normalized databases. This usually allows the tables to fit into the buffer, thus offering faster performance.
- Data integrity and consistency is an absolute must if the database must be ACID compliant. A normalized database helps immensely with such an undertaking.
What Are the Drawbacks of Database Normalization?
A normalized database is not as advantageous under conditions where an application is read-intensive. Here are some of the disadvantages of normalization:
- Since data is not duplicated, table joins are required. This makes queries more complicated, and thus read times are slower.
- Since joins are required, indexing does not work as efficiently. Again, this makes read times slower because the joins don't typically work well with indexing.
What If the Application Is Read-Intensive and Write-Intensive?
In some cases, it isn't as clear that one strategy should be used over the other. Obviously, some applications really need both normalized and non-normalized data to work as efficiently as possible.
In such cases, companies will often use more than one database: a relational data such as MySQL for ACID compliant and write-intensive operations and a NoSQL database such as MongoDB for read-intensive operations on data where duplication is not as big of an issue.
What NoSQL and SQL databases do well. Source: SlideShare.