DZone
Database Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Database Zone > Data Replication Automation

Data Replication Automation

At a glance, it appears to be a simple task of exporting the records from the source database and inserting them into the target database. However, there are some issues.

Yigang Chen user avatar by
Yigang Chen
·
Jun. 28, 17 · Database Zone · Tutorial
Like (4)
Save
Tweet
2.00K Views

Join the DZone community and get the full member experience.

Join For Free

This article discusses a technology of data replication automation where specific records get copied from a source database into a target database based on Oracle/MySQL DB.

The Use Case

Consider the following use case. One of the services a QA group provides is replicating data environments — for example, copying specific records (which are uncovering some unseen business logic defects) from a staging environment into a testing environment to reproduce and debug an issue. Replication automation saves the effort of manually recreating data in the testing environment, which is usually complex, time-consuming, and error-prone. Additionally, the results can be unfaithful to the situation being examined in the source environment.

The Issues

At a glance, it appears to be a simple task of exporting the records from the source database and inserting them into the target database. This approach, however, presents some issues to address:

  • Referential integrity is broken. A record may reference another record, called the parent, through some foreign keys. To copy a record into a database, its parent must be copied first.

  • Business relations get lost. A record can reference other records (its parents) and at the same time can be referenced by yet some other records called the children. These references constitute the business structure and relations. Therefore, from a business perspective, to copy a record, its parents and children must also be copied.

To facilitate the discussion that follows, let's define:

  • The family of a record to be a union of the record's parents and children.

  • The data scenario of seed records to be a union of the below records: 

    1. Some records known to the user, called the seeds.

    2. The family of each of the record in #1.

    3. The family of each of the record in #2, and so forth.

With these definitions, the use case of data replication now reduces to: Given the seed records, export the data scenario from a source database and copy them into a target database in an order that complies with referential integrity.

The Solution

The key points in replication automation include:

  1. How to find the parent/child relations. This can be resolved by the information_schema.key_column_usage table available in the Oracle/MySQL DBMS. In particular, the columns table_name, column_name, referenced_table_name, and referenced_column_name implement the references.

  2. How to find the full definition of a record. This can be resolved by the information_schema.columns table, again available in the DBMS. In particular, the columns table_name, column_name, and data_type provide an adequate profile of a record.

  3. How to build an order of inserting records that complies with referential integrity. This can be resolved by a deque. A deque is a double-ended queue in which both the insertion and removal of elements can be performed at both end points. In our case, we impose the discipline of:

    • Inserting parents at the front-end.

    • Inserting children at the back-end.

    • Retrieving/removing records at the front-end.

The following code snippet outlines the solution to the replication automation, with implementation notes further below.

public void repliateDataScenario() {
  Set<Repord> doneRecords = new HashSet<Record>();

  // load replication queue with seed records.
  // reqQueue is an instance of a Deque class, RecQueue<Record>;
  // the Record class houses a table name and a list of primary/foreign keys of the table;
  // the seed records are represented by SELECT...FROM...WHERE... statements, which 
  // RepQueue.init() uses to retrieve the records and loads in the queue.
  repQueue.init();

  for (; !repQueue.isEmpty(); ) {
    Record nextRecord = repQueue.peek();
    // if already exported, discard front record and move on to the next
    if (doneRecords.contains(nextRecord)) {
      repQueue.pollFirst();
      continue;
    }

    // nextRecord is not exported; get together the family if hasn't
    if (!nextRecord.hasGotTogetherFamily()) {
      // the references [table_name, column_name, referenced_table_name, referenced_column_name]
      // are implemented by java Map; Record.queryParents() uses the maps to retrieve 
      // the record's parents, and insert them in front of the record in the queue.
      repQueue.insertFrontEnd(nextRecord.queryParents());
      // Record.queryChildren() uses the maps to retrieve the record's children, and
      // insert them at back of the record in the queue.
      repQueue.insertBackEnd(nextRecord.queryChildren());
      // mark done of querying family
      nextRecord.hasGotTogetherFamily(true);
      continue;
    }

    // otherwise export the record, register it in doneRecords set,
    // remove it from queue, and move on to the next
    nextRecord.export();
    doneRecords.add(nextRecord);
    repQueue.pollFirst();
  }
}

Implementation Notes

  • At Line 35, Record.export() retrieves the full record using the information_schema.columns meta information, then exports the record as an INSERT INTO...VALUES(...)  statement, and writes it to a script file; which upon completion can be executed in the target database to actually replicate the data scenario.
  • Lines 24 and 27 implement an insert_my_parents > insert_myself > insert_my_children linear order for each of the records. The referential integrity is thus honored during the insertion to the target database.

An Example

Consider below a simple data scenario of five records: 2 has one child (1), 3 has two children (1 and 5), 1 has two parents (2 and 3) and two children (4 and 5), 4 has one parent (1), and 5 has two parents (1 and 3).

Image title

Supposing 1 is the chosen seed record, the following list shows the states of doneRecords, repQueue, and the output through the loop of the above code snippet. In the list, 1 denotes a record 1 in repQueue that has not queried its family; and 1* denotes a record 1 that has queried its family.

doneRecords {}; repQueue <1>; output <>
doneRecords {}, repQueue <2,3,1*,4,5>; output <>
doneRecords {}, repQueue <2*,3,1*,4,5,1>; output <>
doneRecords {2}, repQueue <3,1*,4,5,1>; output <2>
doneRecords {2}, repQueue <3*,1*,4,5,1,1,5>; output <2>
doneRecords {2,3}, repQueue <1*,4,5,1,1,5>; output <2,3>
doneRecords {2,3,1}, repQueue <4,5,1,1,5>; output <2,3,1>
doneRecords {2,3,1}, repQueue <1,4*,5,1,1,5>; output <2,3,1>
doneRecords {2,3,1}, repQueue <4*,5,1,1,5>; output <2,3,1>
doneRecords {2,3,1,4}; repQueue <5,1,1,5>; output <2,3,1,4>
doneRecords {2,3,1,4}; repQueue <1,3,5*,1,1,5>; output <2,3,1,4>
doneRecords {2,3,1,4}; repQueue <3,5*,1,1,5>; output <2,3,1,4>
doneRecords {2,3,1,4}; repQueue <5*,1,1,5>; output <2,3,1,4>
doneRecords {2,3,1,4,5}; repQueue <1,1,5>; output <2,3,1,4,5>
doneRecords {2,3,1,4,5}; repQueue <1,5>; output <2,3,1,4,5>
doneRecords {2,3,1,4,5}; repQueue <5>; output <2,3,1,4,5>
doneRecords {2,3,1,4,5}; repQueue <>; output <2,3,1,4,5>

It can be seen that for any record R in the data scenario, the output achieves the linear order of insertion:

  • R's parents > R > R's children.

It can be shown that no matter how the seed records are chosen, the above property is achieved.

Database Data (computing) Replication (computing)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • How To Integrate Third-Party Login Systems in Your Web App Using OAuth 2.0
  • Debugging the Java Message Service (JMS) API Using Lightrun
  • Image Classification Using SingleStore DB, Keras, and Tensorflow
  • Making Your SSR Sites 42x Faster With Redis Cache

Comments

Database Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo