DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Databases
  4. Merge Databases With Different Schema and Duplicate Entries

Merge Databases With Different Schema and Duplicate Entries

Removing duplicate entries from merged database tables can be anything but routine — and the source of performance woes.

Darren Perucci user avatar by
Darren Perucci
·
Jun. 06, 17 · Tutorial
Like (2)
Save
Tweet
Share
9.70K Views

Join the DZone community and get the full member experience.

Join For Free

Combining tables frequently results in duplicate entries that can be removed in several ways. The trick is knowing which way is best for a given situation. Often the only way to determine the best approach is by testing several and comparing their effect on database performance.

It is one of the most common operations in database management: Merge two tables that use different schema while also removing duplicate entries. Yet there are as many approaches to this problem as there are types of database tables. There are also as many potential glitches.

Here's a look at three ways to address the situation in SQL and MySQL.

All the News That's Fit to Merge

Combining multiple tables with similar values often creates duplicate entries. Several methods are available for eliminating duplicate values in SQL, but it can be tricky to determine which is best in a given situation.

In a StackOverflow post from October 2012, a number of approaches were proposed for removing duplicates from joined tables. The first was to convert an inner query to a common table expression (CTE):

A common table expression for an inner join often has a lower impact on performance than using the DISTINCT keyword to eliminate duplicates. Source: StackOverflow.

The second approach was to use the DISTINCT keyword, which one poster claims performs better in some cases. Also suggested was the use of the string_agg function and the group by clause.

Getting Up Close and Personal With the UNION Clause

One of the basic elements in the SQL toolbox is the UNION operator, which checks for duplicates and returns only distinct rows, and also stores data from both tables without duplicates:

Insert rows from a second table when their values don't match those of the joined table or create a new table that doesn't affect either of the original tables. Source: StackOverflow

Alternatively, you can use the SELECT INTO command to create a new table from the contents of two separate tables in a way that removes duplicates:

The SELECT INTO command creates a new table from the content of two others and removes duplicates in the original tables. Source: StackOverflow.

Combining Multiple Gigabyte-Size Tables Without a Performance Hit

It isn't unusual for database tables to become massive. Imagine merging a dozen tables with a total of nearly 10 million separate records and more than 3GB. The first suggestion on StackOverflow was to create a new table with a unique constraint on the set of columns that establish a row's uniqueness, then to use INSERT IGNORE INTO ... SELECT FROM to move rows from the old table to the new one, and finally to truncate the old tables and use INSERT INTO ... SELECT FROM to return the rows to the original table.

Another proposed solution was to create a specific view that combines the results of the 12 tables and then to filter the results by querying on the view you just created.

Database sql Merge (version control) Schema

Published at DZone with permission of Darren Perucci, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Best Practices for Writing Clean and Maintainable Code
  • How to Create a Real-Time Scalable Streaming App Using Apache NiFi, Apache Pulsar, and Apache Flink SQL
  • Problems of Cloud Cost Management: A Socio-Technical Analysis
  • Implementing Infinite Scroll in jOOQ

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: