DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Advanced Maintenance of a Multi-Database Citus Cluster With Flyway
  • DuckDB for Python Developers
  • Custom Model Context Protocol (MCP) for NL2SQL: A Rigorous Evaluation Framework on Oracle Database
  • Chat with Your Oracle Database: SQLcl MCP + GitHub Copilot

Trending

  • Stop Debugging Glue Jobs Manually: Building an Agentic Observability Layer for Data Pipelines
  • Observability for Agents and Workflows: Tracing Prompts, Tool Calls, and Business Outcomes End-to-End
  • When One MVP Is Really Four Systems: A Better Way to Plan Multi-Role Apps
  • Securing the AI Host: Spring AI MCP Server Communication With API Keys
  1. DZone
  2. Data Engineering
  3. Databases
  4. Merge Databases With Different Schema and Duplicate Entries

Merge Databases With Different Schema and Duplicate Entries

Removing duplicate entries from merged database tables can be anything but routine — and the source of performance woes.

By 
Darren Perucci user avatar
Darren Perucci
·
Jun. 06, 17 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
11.7K Views

Join the DZone community and get the full member experience.

Join For Free

Combining tables frequently results in duplicate entries that can be removed in several ways. The trick is knowing which way is best for a given situation. Often the only way to determine the best approach is by testing several and comparing their effect on database performance.

It is one of the most common operations in database management: Merge two tables that use different schema while also removing duplicate entries. Yet there are as many approaches to this problem as there are types of database tables. There are also as many potential glitches.

Here's a look at three ways to address the situation in SQL and MySQL.

All the News That's Fit to Merge

Combining multiple tables with similar values often creates duplicate entries. Several methods are available for eliminating duplicate values in SQL, but it can be tricky to determine which is best in a given situation.

In a StackOverflow post from October 2012, a number of approaches were proposed for removing duplicates from joined tables. The first was to convert an inner query to a common table expression (CTE):

A common table expression for an inner join often has a lower impact on performance than using the DISTINCT keyword to eliminate duplicates. Source: StackOverflow.

The second approach was to use the DISTINCT keyword, which one poster claims performs better in some cases. Also suggested was the use of the string_agg function and the group by clause.

Getting Up Close and Personal With the UNION Clause

One of the basic elements in the SQL toolbox is the UNION operator, which checks for duplicates and returns only distinct rows, and also stores data from both tables without duplicates:

Insert rows from a second table when their values don't match those of the joined table or create a new table that doesn't affect either of the original tables. Source: StackOverflow

Alternatively, you can use the SELECT INTO command to create a new table from the contents of two separate tables in a way that removes duplicates:

The SELECT INTO command creates a new table from the content of two others and removes duplicates in the original tables. Source: StackOverflow.

Combining Multiple Gigabyte-Size Tables Without a Performance Hit

It isn't unusual for database tables to become massive. Imagine merging a dozen tables with a total of nearly 10 million separate records and more than 3GB. The first suggestion on StackOverflow was to create a new table with a unique constraint on the set of columns that establish a row's uniqueness, then to use INSERT IGNORE INTO ... SELECT FROM to move rows from the old table to the new one, and finally to truncate the old tables and use INSERT INTO ... SELECT FROM to return the rows to the original table.

Another proposed solution was to create a specific view that combines the results of the 12 tables and then to filter the results by querying on the view you just created.

Database sql Merge (version control) Schema

Published at DZone with permission of Darren Perucci. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Advanced Maintenance of a Multi-Database Citus Cluster With Flyway
  • DuckDB for Python Developers
  • Custom Model Context Protocol (MCP) for NL2SQL: A Rigorous Evaluation Framework on Oracle Database
  • Chat with Your Oracle Database: SQLcl MCP + GitHub Copilot

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook