Over a million developers have joined DZone.

The Many Faces of Data Relationships

DZone's Guide to

The Many Faces of Data Relationships

There can be three types of data relationships: one-to-one, one-to-many, and many-to-many. Read on for a brief overview!

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Many-to-many relationships can be complicated, especially when it comes to the data in your tables — that is, unless you understand the different scenarios, their resolutions, and how to build a good relationship with your data.

Just as a general overview, let's for a moment get right back to basics and remind ourselves of what we mean by the terms "database" and "relationship." This will make it easier to visualize the different issues as we come to them.

First of all, the definition of "database." This is just a way of organizing information so that you can easily access, manage, and update it as you need to. As you know, this data is organized into rows, columns, and tables, and it's also indexed so that you can find what you need quickly and easily. So far, so obvious.

When we talk about the "relationship" between different items of data in the database, this specifies the logic used to combine data from one or more tables. You create relationships by connecting fields between two or more tables, and this determines what data is reflected in a dashboard widget (such as a pivot table). There can be three types of relationships:

  1. One-to-one relationship: In this scenario, the field used to connect both tables only has unique values in every row.
  2. One-to-many relationship: In this scenario, one table holds unique values for every row but the other table holds duplicate values for any or all of the corresponding values in the first table.
  3. Many-to-many relationship: In this scenario, on both sides of the table, there are duplicated values, causing excessive calculations for every query run against it.

Understanding an M2M Relationship and Its Consequences

Let's go over the definition and negative consequences.


You can consider the general rule that when a field from two or more tables contains the same value, and these values are duplicated in both tables, a connection has been made based on this field a many-to-many relationship is created.

We know, it's confusing, so here's a business case example: a hotel can have a data table with reservation data and a table with payment data. In both tables, the name of the guest is stored. A guest can have multiple reservations under their name, as well as multiple payments for their stay recorded in their name. If a relationship between the reservation and payment table was created based on the guest's name, a many-to-many relationship would be created (as the guest's name appears multiple times in each table).

Negative Consequences

The problem with this kind of relationship is that it can create complex data sets which either: do not return the correct results, or use excessive computing resources and don't return any results. Both scenarios lead to data havoc such as creating duplicates, incorrect results, and performance lags, which is why many people try to avoid many-to-many relationships altogether.

Resolutions: There are several methods to resolve or bypass a many-to-many relationship that should be chosen based on the business model and the logic of the business questions at hand. But, the first step is to test to see if you are dealing with a many-to-many relationship (see next section), and then according to the schema logic, apply the best resolution. For a very technical, step-by-step explanation on how to test and resolve M2M relationships, read this post. If you want a less technical overview, just keep reading.

Determine What Kind of Relationship You're In

You can easily test if a relationship is Many-to-Many by checking the data modeling of the relationship and determining the exact number of unique and duplicate values on each side of the relationship.

  1. If you get the same value for both the unique and duplicate values, then there is no duplication, and this relationship is either a one-to-many or a one-to-one. Continuing with our business case, this would mean that all Guest IDs appear only once-making all values unique. You can stop investigating at this stage given that even if the other side of the relationship has duplicate values for Guest ID, you'll still be dealing with a One to Many relationship-where the unique values are the 'Reservations' side, and the duplicate values are on the 'Payments' side.
  2. If the number of duplicate values is higher than the number of unique values, then this side of the relationship has duplicated values and you will need to investigate the other side of the relationship:
    • If the other side of the relationship yields unique values, this is a one-to-many relationship.
    • If not and there are more than two tables connected to this relationship, we've got a many-to-many relationship on our hands.

For those of you who want to get more technical, here is a simple SQL statement you can use to check for potential M2M relationships.

Best Ways to Resolve M2M Relationships

There are many ways to resolves M2M relationships, but here are common approaches based on the number of M2M relationships that exist as well as the number of tables involved.

If there are two tables and one relationship, you're best options would be to:

  1. Break this relationship into two separate one-to-many relationships.
  2. Create an aggregated table.

If there are more than two tables and more than one relationship, you're best options would be to:

  1. Use the lookup function to copy a value from one table and import it into another based on a logical test.
  2. Combine the two tables into one.

The most important aspect of understanding a many-to-many relationship is to know that they need to be avoided in order to ensure you have accurate data, no duplicate values, and your performance does not lag — a negative consequence of an M2M relationship that will ruin the user experience and discourage queries. 

Many-to-Many Relationships and SQL Databases

Here's the good news: SQL and other relational databases provide plenty of support for many-to-many relationships, thanks to their referential integrity. This keeps relationships working properly with underlying entities, and SQL is very good at combining this in queries.

This makes SQL much better than other programming languages at handling many to many relationships, as most do not have support for making references consistent with referents, or facilitating simple traversal.

With that in mind, let's now take a look at a simple example that shows how we visualize many to many relationships within an SQL database.

Picture a database that's used by a university application to keep track of student data. Two of the tables contained here are "student" and "paper," referring to all the different classes on offer that each student is able to take.

Each student will take many different papers simultaneously. This by itself would be a one-to-many relationship. However, each paper is also being studied by many different students at a time. That makes this a many-to-many relationship.

So how to resolve this in terms of structuring your SQL database?

Well, the relationship will be broken down into intermediary tables. In the example above, that might mean that there is a third table, containing the columns Student_ID, Paper_ID, Semester, and Year. Each student who takes a particular paper in a particular semester in a particular year would then be represented in a new row in the table. The Student_ID refers back to the Student master table and the Paper_ID refers back to the Paper master table. Voila! Many students can be related to many subjects.

Or, let's take another example, where you would need to create multiple instances on both side of the relationship: tracking orders on an e-commerce website.

Each customer is related to one order, but each order could contain multiple items. Plus, each item could appear in multiple orders. This makes it a many-to-many relationship.

In this situation, you would need to create an extra table. Let's assume you call this Items Ordered. This is called a junction table, and its sole purpose is to facilitate many to many relationships, by giving you somewhere that each of these values can connect to.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

big data ,data relationships ,many-to-many ,on-to-many ,sql

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}