DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Useful System Table Queries in Relational Databases
  • Why Should Databases Go Natural?
  • SQL Interview Preparation Series: Mastering Questions and Answers Quickly
  • Query-First Approach in Cassandra

Trending

  • Driving DevOps With Smart, Scalable Testing
  • Advancing Robot Vision and Control
  • AI-Driven Root Cause Analysis in SRE: Enhancing Incident Resolution
  • How to Perform Custom Error Handling With ANTLR
  1. DZone
  2. Data Engineering
  3. Databases
  4. 9 of the Most Common Mistakes in Database Design

9 of the Most Common Mistakes in Database Design

When designing a database, you want to avoid making these mistakes.

By 
Mokhtar Ebrahim user avatar
Mokhtar Ebrahim
·
Jan. 21, 19 · Opinion
Likes (6)
Comment
Save
Tweet
Share
26.3K Views

Join the DZone community and get the full member experience.

Join For Free

As a database designer, when you are tasked with a database project, you can expect to run into a couple of challenges during the design process and after the database is deployed to production.

Some of these problems are unavoidable and outside your control. However, a number of them can be traced back to the quality of the database design itself. The decisions you make at this preliminary phase can have a profound impact on how well the database eventually works. The following are some of the most common mistakes of database design.

Poor Preplanning

If you are building a house, you wouldn’t hire a contractor and immediately demand they start laying the foundation within an hour. That would be courting disaster. At a minimum, you’ll need to agree on the house plans and blueprints. It’s not any different when it comes to database design. The better your planning, the better the quality of your design output.

A good database is the result of careful forethought and not an aggregation of ad hoc ideas. Poor design planning can lead to structural problems that would be expensive to unwind once the database has been rolled out. It’s not always possible to anticipate every problem your database will run into but planning ensures you can reduce these to only those that are truly inevitable.

Failure to Understand the Purpose of the Data

Databases are created for a wide range of purposes. From tiny databases that store an individual’s personal data to massive enterprise databases that handle vast volumes of information. The designer must understand the purpose of the database in order to design it in a way that is optimally aligned with these objectives.

Critical questions to ask include the nature of the data, how it is obtained, how frequently it is stored and retrieved, its volume, and what applications will use it. A database where data is manually keyed in at the end of the business day will not thrive under the same design model as a sophisticated industrial database where data is captured and stored automatically and in real time.

The key is in settling on a design that ensures data efficiency, usability, and security (see PostgreSQL security). Ignoring the purpose of the data will lead to a design that ticks all the right boxes but is practically unsound.

Inadequate Normalization

Database design isn’t a rigidly deterministic process. Two developers could follow the very same rules of design but still end up with starkly different data layouts. That’s largely because of the inherent place of creativity in any software engineering project. Nevertheless, there are certain core principles of design that are vital in ensuring the database works optimally. One of these principles is normalization. Normalization refers to the techniques used to disaggregate tables into constituent parts. You do it until you have each table representing just one thing while the columns describe the attributes of the item the table represents. Normalization is an old computing concept and has been around for more than 3 decades. In fact, SQL was primarily created to read and manipulate normalized datasets. To understand normalization, it would thus be helpful to look at how SQL works.

SQL is an inherently additive language that is geared toward easily building up a set of results or values. Using the FROM clause, you can extract data from one table and use JOIN to add it to the contents of another table. You can work with a nearly unlimited number of tables to produce the type of data you need. The additive power of SQL is vital for database development and performance.

Indexes work best when they can be synchronized with the key value in entirety. When you must use LIKE, CHARINDEX, SUBSTRING and similar commands, to parse a value combined with values in one column, the SQL paradigm begins to disintegrate and the data is ever less searchable.

Normalizing your database is therefore critical for ease of development and consistently high performance. Nevertheless, there are levels to normalization and there’s such a thing as an over normalized database. Good normalization balances the demands of record inserting, updating, querying and deleting. The most widely accepted best practice is that databases must at the minimum be normalized to the third Normal Form (3NF). However, the fourth (4NF) and fifth (5NF) can be quite useful, are easy to understand and will be worth the effort once you know how to work with them.

Redundant Records

Redundant tables and fields are a nightmare for database designers and administrators. They draw on system resources in order to keep them secure, current and backed up. Redundant records may not seem like much when you are talking about just a dozen or so. But in large databases where redundant fields could number thousands or millions, the computing resource overheads are substantial. They unnecessarily increase the size of the database thus reducing efficiency and increasing the risk of data corruption.

Of course, there are times where redundancy may be necessary but this should be the exception and not the rule. Even when redundancy is allowed, the reasons should be clearly documented to ensure removal by future database administrators when the reasons are no longer valid.

Poor Indexing

There are times when a user or application may need to query numerous columns of a table. As the number of row records in the table grows, the time it takes for these queries to complete will steadily rise. To speed up the queries and reduce the impact of overall table size, it’s prudent that you index the table columns so that the entries in each are almost immediately available when a SELECT query is invoked.

Unfortunately, speeding up the SELECT function usually results in a deterioration of the more routine INSERT, UPDATE and DELETE commands. This is largely because indexes themselves have to be constantly synchronized to the content of the database which in turn means substantial database engine overheads. Ironically, therefore, your attempts at expediting the SELECT queries may lead to a slower database overall. This is a classic case of over indexing.

This problem can be resolved by having just one index for all columns and that is distinct from the primary key used to query the table. You can also order the columns from the most to the least used. Indexing is always a delicate balance and it comes down to getting it right.

A Single Table for All Domain Values

An all-encompassing domain table isn’t the best approach for database design. Remember that relational databases are built around the idea that each object in the database is representative of just one thing. There should be no ambiguity over what any data set refers to. By navigating through the primary key, table name, column name, and relationships, one should quickly decipher what a data set means. Still, a lingering misconception around database design is that the more the tables, the more confusing and complex the database will be.

This is often the rationale for condensing several tables into one table on the assumption that it will simplify the design. It sounds like a good idea but usually ends in an inefficient and unwieldy database. The SQL code will be longwinded, difficult to read and unnatural. It will effectively be mixing apples and oranges. At first glance, domain tables look like an abstract container of text. This is true from an implementation standpoint but isn’t the best way to design a database.

As part of the normalization process, the isolating and breaking down of data ends in every row representing just one thing. And each domain table is distinct from all other domain tables.

The end result of multiple domain tables is:

  • It becomes much easier to use the data in queries
  • Data can be more naturally validated with foreign key constraints, something that is impractical for a single domain table design. You could do it with the single domain table but the keys required for each table would make maintenance a minefield.
  • Whenever you need to add more data about a certain object, the task is as simple as adding one or more columns.
  • Small-domain tables will fit into a single page of the hard disk, unlike a large domain table that will likely be sprawled across multiple disk sections. Having the tables in a single page means extracting data can be achieved with a single disk read.
  • Having multiple domain tables doesn’t prevent you from using one editor for all rows. The domain tables most probably have the same underlying usage/structure.

Poor or Inconsistent Naming Conventions

Database designers and developers often see their role as entirely a technical one. Non-technical aspects like adherence to naming conventions tend to be pushed to the lower rungs of the list of priorities or even completely ignored. This can be a catastrophic mistake.

Naming may be at the designer’s discretion but it is, in fact, the first and most important element of database documentation (we’ll explore documentation mistakes in the next point). Database designers should see their work as something that will live long after they have moved to a different employer or role. Naming conventions are meant to make it easier for someone who wasn’t involved in the project at all to quickly understand the content of tables and columns. No future administrator, programmer, or user should have to wade through a 1000-page document to understand what a certain table or column name means. Exact details on how one should name their tables aren’t unanimously agreed on by the industry.

What’s most important though is consistency. Once you follow a specific style of naming your objects, stick to it throughout the database. As much as possible, table names must be a full or contracted description of what the table represents while each column name should quickly make it clear what information it represents. For simple databases, this isn’t hard to do. However, things can get complicated once you build tables that reference each other. Strictly following naming conventions is always the right way to go through.

Such convention includes having no character limit to the length of column or table names in order to eliminate the need to use acronyms that aren’t easily understood or remembered. Consider the column name CUST_DSCR. Anyone who reads that will have to make wild guesses on what that column contains. CUSTOMER_DESCRIPTION would be a far better column name and doesn’t force the reader to stretch their imagination.

Avoid redundancy — in a table named ‘Students,’ you don’t need to have columns labeled StudentName, StudentAddress or StudentGrade when Name, Address, and Grade will suffice. Also, don’t use reserved words. Labeling a column ‘Index’ can be confusing and be a source of errors. Instead, have a descriptive prefix such as StudentIndex.

Poor Documentation

If database developers and designers have a problem prioritizing naming conventions, they have a far bigger issue with documentation. To a developer, documentation sometimes feels like a trivial non-essential aspect of the development process. Yet, many otherwise excellently designed databases have died on the altar of poor documentation. Poor documentation greatly inhibits troubleshooting, structural improvement, upgrades, and continuity.

Database designers must always imagine that they will at some point no longer be involved in the support of the database. The documentation should make it easy for someone else to take over the database design, development or administration. Good documentation must contain definitions of columns, tables, relationships, and constraints that make it clear how each element is meant to be used. You’ll have a greater impact if you can include samples that illustrate what values are expected.

Some designers will use poor documentation as a means of ensuring job security i.e. no one else but them can fully understand the database. This is a shortsighted and doomed strategy since it almost always leads to management seeing through the designer’s intentions. Poor documentation also makes it harder for you as the designer to return years later to rework and improve the code.

Inadequate Testing

You can meticulously follow through all the steps required to design a world-class database. However, you’ll be taking a blind leap into the dark if you don’t subject the database to rigorous testing. Unfortunately, the testing phase is what suffers the most when a project is running late. It’s self-defeating though because a database that’s quickly rushed through will immediately be bogged down by bugs and inconsistencies that would have easily been identified and resolved during the testing phase.

A bug-filled database won’t endear itself to users and administrators, a reputation pit that you’ll struggle to claw yourself out of even when the bugs are eventually fixed. When deep and expansive testing is done before the database goes live, it greatly reduces the number and scale of failures after deployment to production. Good testing will not find every single bug but it certainly helps you get rid of most of them.

Conclusion

Database development and design are at the core of any data-intensive project including nearly all business applications. The design process should therefore always be viewed in this context. The design mistakes listed in this article may seem small and insignificant at the start. Eventually, though, they greatly deteriorate database performance and are costly to fix. By making sure you get things right from the get-go, you increase the odds of building a database that is well suited to its intended purpose.

Database design Relational database sql

Published at DZone with permission of Mokhtar Ebrahim, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Useful System Table Queries in Relational Databases
  • Why Should Databases Go Natural?
  • SQL Interview Preparation Series: Mastering Questions and Answers Quickly
  • Query-First Approach in Cassandra

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!