9 of the Most Common Mistakes in Database Design
When designing a database, you want to avoid making these mistakes.
Join the DZone community and get the full member experience.Join For Free
As a database designer, when you are tasked with a database project, you can expect to run into a couple of challenges during the design process and after the database is deployed to production.
Some of these problems are unavoidable and outside your control. However, a number of them can be traced back to the quality of the database design itself. The decisions you make at this preliminary phase can have a profound impact on how well the database eventually works. The following are some of the most common mistakes of database design.
If you are building a house, you wouldn’t hire a contractor and immediately demand they start laying the foundation within an hour. That would be courting disaster. At a minimum, you’ll need to agree on the house plans and blueprints. It’s not any different when it comes to database design. The better your planning, the better the quality of your design output.
A good database is the result of careful forethought and not an aggregation of ad hoc ideas. Poor design planning can lead to structural problems that would be expensive to unwind once the database has been rolled out. It’s not always possible to anticipate every problem your database will run into but planning ensures you can reduce these to only those that are truly inevitable.
Failure to Understand the Purpose of the Data
Databases are created for a wide range of purposes. From tiny databases that store an individual’s personal data to massive enterprise databases that handle vast volumes of information. The designer must understand the purpose of the database in order to design it in a way that is optimally aligned with these objectives.
Critical questions to ask include the nature of the data, how it is obtained, how frequently it is stored and retrieved, its volume, and what applications will use it. A database where data is manually keyed in at the end of the business day will not thrive under the same design model as a sophisticated industrial database where data is captured and stored automatically and in real time.
The key is in settling on a design that ensures data efficiency, usability, and security (see PostgreSQL security). Ignoring the purpose of the data will lead to a design that ticks all the right boxes but is practically unsound.
Database design isn’t a rigidly deterministic process. Two developers could follow the very same rules of design but still end up with starkly different data layouts. That’s largely because of the inherent place of creativity in any software engineering project. Nevertheless, there are certain core principles of design that are vital in ensuring the database works optimally. One of these principles is normalization. Normalization refers to the techniques used to disaggregate tables into constituent parts. You do it until you have each table representing just one thing while the columns describe the attributes of the item the table represents. Normalization is an old computing concept and has been around for more than 3 decades. In fact, SQL was primarily created to read and manipulate normalized datasets. To understand normalization, it would thus be helpful to look at how SQL works.
SQL is an inherently additive language that is geared toward easily building up a set of results or values. Using the FROM clause, you can extract data from one table and use JOIN to add it to the contents of another table. You can work with a nearly unlimited number of tables to produce the type of data you need. The additive power of SQL is vital for database development and performance.
Indexes work best when they can be synchronized with the key value in entirety. When you must use LIKE, CHARINDEX, SUBSTRING and similar commands, to parse a value combined with values in one column, the SQL paradigm begins to disintegrate and the data is ever less searchable.
Normalizing your database is therefore critical for ease of development and consistently high performance. Nevertheless, there are levels to normalization and there’s such a thing as an over normalized database. Good normalization balances the demands of record inserting, updating, querying and deleting. The most widely accepted best practice is that databases must at the minimum be normalized to the third Normal Form (3NF). However, the fourth (4NF) and fifth (5NF) can be quite useful, are easy to understand and will be worth the effort once you know how to work with them.
Redundant tables and fields are a nightmare for database designers and administrators. They draw on system resources in order to keep them secure, current and backed up. Redundant records may not seem like much when you are talking about just a dozen or so. But in large databases where redundant fields could number thousands or millions, the computing resource overheads are substantial. They unnecessarily increase the size of the database thus reducing efficiency and increasing the risk of data corruption.
Of course, there are times where redundancy may be necessary but this should be the exception and not the rule. Even when redundancy is allowed, the reasons should be clearly documented to ensure removal by future database administrators when the reasons are no longer valid.
There are times when a user or application may need to query numerous columns of a table. As the number of row records in the table grows, the time it takes for these queries to complete will steadily rise. To speed up the queries and reduce the impact of overall table size, it’s prudent that you index the table columns so that the entries in each are almost immediately available when a SELECT query is invoked.
Unfortunately, speeding up the SELECT function usually results in a deterioration of the more routine INSERT, UPDATE and DELETE commands. This is largely because indexes themselves have to be constantly synchronized to the content of the database which in turn means substantial database engine overheads. Ironically, therefore, your attempts at expediting the SELECT queries may lead to a slower database overall. This is a classic case of over indexing.
This problem can be resolved by having just one index for all columns and that is distinct from the primary key used to query the table. You can also order the columns from the most to the least used. Indexing is always a delicate balance and it comes down to getting it right.
A Single Table for All Domain Values
An all-encompassing domain table isn’t the best approach for database design. Remember that relational databases are built around the idea that each object in the database is representative of just one thing. There should be no ambiguity over what any data set refers to. By navigating through the primary key, table name, column name, and relationships, one should quickly decipher what a data set means. Still, a lingering misconception around database design is that the more the tables, the more confusing and complex the database will be.
This is often the rationale for condensing several tables into one table on the assumption that it will simplify the design. It sounds like a good idea but usually ends in an inefficient and unwieldy database. The SQL code will be longwinded, difficult to read and unnatural. It will effectively be mixing apples and oranges. At first glance, domain tables look like an abstract container of text. This is true from an implementation standpoint but isn’t the best way to design a database.
As part of the normalization process, the isolating and breaking down of data ends in every row representing just one thing. And each domain table is distinct from all other domain tables.
The end result of multiple domain tables is:
- It becomes much easier to use the data in queries
- Data can be more naturally validated with foreign key constraints, something that is impractical for a single domain table design. You could do it with the single domain table but the keys required for each table would make maintenance a minefield.
- Whenever you need to add more data about a certain object, the task is as simple as adding one or more columns.
- Small-domain tables will fit into a single page of the hard disk, unlike a large domain table that will likely be sprawled across multiple disk sections. Having the tables in a single page means extracting data can be achieved with a single disk read.
- Having multiple domain tables doesn’t prevent you from using one editor for all rows. The domain tables most probably have the same underlying usage/structure.
Poor or Inconsistent Naming Conventions
Database designers and developers often see their role as entirely a technical one. Non-technical aspects like adherence to naming conventions tend to be pushed to the lower rungs of the list of priorities or even completely ignored. This can be a catastrophic mistake.
Naming may be at the designer’s discretion but it is, in fact, the first and most important element of database documentation (we’ll explore documentation mistakes in the next point). Database designers should see their work as something that will live long after they have moved to a different employer or role. Naming conventions are meant to make it easier for someone who wasn’t involved in the project at all to quickly understand the content of tables and columns. No future administrator, programmer, or user should have to wade through a 1000-page document to understand what a certain table or column name means. Exact details on how one should name their tables aren’t unanimously agreed on by the industry.
What’s most important though is consistency. Once you follow a specific style of naming your objects, stick to it throughout the database. As much as possible, table names must be a full or contracted description of what the table represents while each column name should quickly make it clear what information it represents. For simple databases, this isn’t hard to do. However, things can get complicated once you build tables that reference each other. Strictly following naming conventions is always the right way to go through.
Such convention includes having no character limit to the length of column or table names in order to eliminate the need to use acronyms that aren’t easily understood or remembered. Consider the column name CUST_DSCR. Anyone who reads that will have to make wild guesses on what that column contains. CUSTOMER_DESCRIPTION would be a far better column name and doesn’t force the reader to stretch their imagination.
Avoid redundancy — in a table named ‘Students,’ you don’t need to have columns labeled StudentName, StudentAddress or StudentGrade when Name, Address, and Grade will suffice. Also, don’t use reserved words. Labeling a column ‘Index’ can be confusing and be a source of errors. Instead, have a descriptive prefix such as StudentIndex.
If database developers and designers have a problem prioritizing naming conventions, they have a far bigger issue with documentation. To a developer, documentation sometimes feels like a trivial non-essential aspect of the development process. Yet, many otherwise excellently designed databases have died on the altar of poor documentation. Poor documentation greatly inhibits troubleshooting, structural improvement, upgrades, and continuity.
Database designers must always imagine that they will at some point no longer be involved in the support of the database. The documentation should make it easy for someone else to take over the database design, development or administration. Good documentation must contain definitions of columns, tables, relationships, and constraints that make it clear how each element is meant to be used. You’ll have a greater impact if you can include samples that illustrate what values are expected.
Some designers will use poor documentation as a means of ensuring job security i.e. no one else but them can fully understand the database. This is a shortsighted and doomed strategy since it almost always leads to management seeing through the designer’s intentions. Poor documentation also makes it harder for you as the designer to return years later to rework and improve the code.
You can meticulously follow through all the steps required to design a world-class database. However, you’ll be taking a blind leap into the dark if you don’t subject the database to rigorous testing. Unfortunately, the testing phase is what suffers the most when a project is running late. It’s self-defeating though because a database that’s quickly rushed through will immediately be bogged down by bugs and inconsistencies that would have easily been identified and resolved during the testing phase.
A bug-filled database won’t endear itself to users and administrators, a reputation pit that you’ll struggle to claw yourself out of even when the bugs are eventually fixed. When deep and expansive testing is done before the database goes live, it greatly reduces the number and scale of failures after deployment to production. Good testing will not find every single bug but it certainly helps you get rid of most of them.
Database development and design are at the core of any data-intensive project including nearly all business applications. The design process should therefore always be viewed in this context. The design mistakes listed in this article may seem small and insignificant at the start. Eventually, though, they greatly deteriorate database performance and are costly to fix. By making sure you get things right from the get-go, you increase the odds of building a database that is well suited to its intended purpose.
Published at DZone with permission of Mokhtar Ebrahim, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.