Database Design Best Practices
Database Design Best Practices
Want to learn the best practices of designing a database? Read this article to learn about a few!
Join the DZone community and get the full member experience.Join For Free
RavenDB vs MongoDB: Which is Better? This White Paper compares the two leading NoSQL Document Databases on 9 features to find out which is the best solution for your next project.
In today's world, along with Relational databases, we need to consider Non-Relational (No-SQL) databases as well before we start designing database. For more than four decades now, SQL (Structured Query Languages) databases have been a primary data storage mechanism. NoSQL databases like MongoDB, Cassandra, Redis, etc. are gaining attraction in recent days, although NoSQL has existed since the 1960s. No matter, the primary goal of SQL or NoSQL is to store data, only they differ in approaches to storing, retrieving, etc. Also, no one can expect users to explicitly control concurrency, integrity, consistency, or data type validity. That’s why SQL pays a lot of attention to transactional guarantees, schemas, and referential integrity. The end user is often interested in aggregated reporting information, not in separate data items. Depending on a project's needs, we can choose to use a SQL or a NoSQL. We should also keep in mind that one is not a replacement for the other, it's just an alternative.
Let's see a few best practices to design a database and what all we need to consider in the database design process.
What Information (Data) Needs to Be Stored?
This step is to identify what data needs to be stored. Here, along with a database expert, we need a domain expert.
This requires skill on the part of the database designer to elicit the needed information from those with the domain knowledge. This is because those with the necessary domain knowledge frequently cannot express clearly what their system requirements for the database are as they are unaccustomed to thinking in terms of the discrete data elements that must be stored.
Why Are We Storing These?
It's necessary to know why we need these data to be stored. How is somebody going to use these, and who are they?
What Kind of Query Do We Need to Execute?
How are we going to use this data?
Once a designer is aware of the data that is to be stored in the database, they must then determine where the dependency is within the data.
Once the relationships and dependencies amongst the various pieces of information have been determined, it is possible to arrange the data into a logical structure, which can then be mapped into the storage objects supported by the database management system.
The above mentioned points are mostly valid for any types of databases, but there are few important steps that we need to consider based on the database types (relational or non-relational).
- Identifying entities: An entity is an object in the system that we want to model and store information about.
- Identifying entity attributes: Attributes are the characteristic of an entity.
- Identifying attributes types: The data type of attributes like char, date, number, etc.
- Identifying relations between entities: Relationship is a situation that exists between two database tables. Relationships allow databases to split and store data in different tables while linking disparate data items.
- Assigning keys (Primary, Foreign keys): Key is used for identifying unique rows from a table. It also establishes the relationship among tables.
- Identifying constraints needs to be applied to entities or on columns: Constraints enforce roles for data in a table.
- Normalization: Normalization is the process of organizing data in a database. It makes database more flexible by eliminating redundancy and inconsistent dependency.
Non-Relational (NoSQL) Database
- NoSQL data modeling is typically driven by application-specific access patterns, i.e. the types of queries to be supported.
- This data modeling often requires a deeper understanding of data structures and algorithms.
- Data duplication and denormalization are two keys aspects while designing NoSql.
Denormalization: This is the process of copying the same data into multiple documents or tables in order to simplify or optimize query processing or to fit the user’s data into a particular data model. It allows us to store data in a query-friendly structure to simplify query processing. Normalized data increases query execution time for query-join, especially in a distributed architecture.
Soft Schema: All NoSQL provide soft schema capabilities in one way or another:
- Graph Databases and Key-Value Stores typically do not put constraints on values, so values can be of any format.
- BigTable models support soft schema via a variable set of columns within a column family and a variable number of versions for one cell.
- Document databases are inherently schema-less, although some of them allow one to validate incoming data using a user-defined schema.
Soft schema allows one to form classes of entities with complex internal structures (nested entities) and to vary the structure of particular entities and helps in minimization of one-to-many relationships by means of nested entities and, consequently, reduction of joins.
Joins: Joins are rarely supported in NoSQL. For NoSQL, joins are often handled at design time as opposed to relational models where joins are handled at query execution time. Query time joins almost always mean a performance penalty, but in many cases, joins can be avoided using Denormalization and Aggregates, i.e. embedding nested entities. There will be cases where joins are inevitable and should be handled by an application.
Database Scalability: This is a concept in database design that emphasizes on the capability of a database to handle growth in the amount of data and users. Database systems scaling can be broadly categorized into two areas: vertical scaling and horizontal scaling.
- Vertical scaling: Also known as scaling up, it is the process of adding resources such as memory or more powerful CPUs to an existing server.
- Horizontal scaling: It is the process of adding more hardware to a system. This typically means adding nodes (new servers) to an existing system.
Both of the scalings can be combined, with resources added to existing servers to scale vertically and additional servers added to scale horizontally when required. It is wise to consider the tradeoffs between horizontal and vertical scaling as you consider each approach.
Note: This article only covers a few of the best practices and there are many others that exist. Also, there are many other factors (not covered here) that need to be considered while designing a database. Please feel free to add them in the comment section so that all of us can learn from each other.
Opinions expressed by DZone contributors are their own.