Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Comparing Tables in Different Databases With Dynamic Lookup

DZone's Guide to

Comparing Tables in Different Databases With Dynamic Lookup

Comparing tables in different server databases can be a challenge, but how about if they store 10 million records? For that, setting up a dynamic lookup might be the key.

· Database Zone
Free Resource

Learn how to create flexible schemas in a relational database using SQL for JSON.

Here, we are trying to compare two table objects in different server databases. If we search Google, we can find a lot of processes to compare two tables in different server databases, but today, we're dealing with tables with 10 million records.

Comparing two tables with huge datasets is not simple, especially when they are in different databases.

A common solution we find is to load the source table into a temporary table in the destination database, then compare the temporary table with the destination table.

Is it that simple? We have to load 10 million records into a temporary table first, then compare. That might lead to memory-related problems, which means our package might get stuck.

Another solution is by using a lookup transform. It's a good solution when we think about using two different server databases, but if we load 10 million records into a lookup transform... well, only God knows what happens then.

We might be able to use a merge join transform, but that's blocking. What's more, it takes two sorted inputs from the data flow by using a sort transform. Bad idea.

So, What's the Solution?

There is no pre-defined solution for that. It depends on how we design our SSIS package. How we load our data into our SSIS transform.

Here, our strategy is to load data depending on specified columns. If a column contains date datatypes, that's great for us. We can use the date range to load the data, but we have to do it in an automated fashion.

SSIS Solutions

Here, we are taking two tables with a few records just as an example. But remember, we're looking for a solution that scales to 10 million records.

Step 1: Source and Destination Table Objects 

Assume that the data is already migrated from the source to the destination.

CREATE TABLE [dbo].[Emp_Source]
(
    EmpID       INT        NOT NULL IDENTITY,
    EmpGrade   CHAR(1)    NOT NULL,
    DOJ        DATE       NOT NULL
);


CREATE TABLE [dbo].[Emp_Destination]
(
    EmpID       INT        NOT NULL IDENTITY,
    EmpGrade   CHAR(1)    NOT NULL,
    DOJ        DATE       NOT NULL

);


So, Source and Destination have the same data.

EmpID

EmpGrade

DOJ

1

C

2010-01-01

2

C

2010-01-01

3

C

2010-01-01

4

C

2010-01-01

5

C

2011-01-01

6

C

2011-01-01

7

C

2011-01-01

8

C

2011-01-01

9

B

2012-01-01

10

B

2012-01-01

11

B

2012-01-01

12

A

2013-01-01

13

A

2013-01-01

14

A

2013-01-01

15

C

2014-01-01

16

C

2014-01-01

17

C

2014-01-01

18

C

2015-01-01

19

C

2015-01-01

20

C

2015-01-01

21

C

2016-01-01

22

B

2017-01-01

23

B

2017-01-01

24

B

2017-01-01


Step 2: Create a Segment Table for a Limited Data Load in the SSIS Package

EmpID

EmpGrade

DOJ

Segment

1

C

2010-01-01

Segment-1

2

C

2010-01-01

3

C

2010-01-01

4

C

2010-01-01

Segment-2

5

C

2011-01-01

6

C

2011-01-01

7

C

2011-01-01

8

C

2011-01-01

9

B

2012-01-01

Segment-3

10

B

2012-01-01

11

B

2012-01-01

12

A

2013-01-01

Segment-4

13

A

2013-01-01

14

A

2013-01-01

15

C

2014-01-01

Segment-5

16

C

2014-01-01

17

C

2014-01-01

18

C

2015-01-01

Segement-6

19

C

2015-01-01

20

C

2015-01-01

21

C

2016-01-01

Segment-7

22

B

2017-01-01

Segment-8

23

B

2017-01-01

24

B

2017-01-01


CREATE TABLE [dbo].[tbl_Segment]
(
  SegmentName          VARCHAR(50)     NOT NULL,
  FromDate             DATE            NOT NULL,
  ToDate               DATE            NOT NULL
);

INSERT INTO [dbo].[tbl_Segment]
(SegmentName, FromDate, ToDate)
VALUES
('Segment-1', '2010-01-01', '2010-12-31'),
('Segment-1', '2011-01-01', '2011-12-31'),
('Segment-1', '2012-01-01', '2012-12-31'),
('Segment-1', '2013-01-01', '2013-12-31'),
('Segment-1', '2014-01-01', '2014-12-31'),
('Segment-1', '2015-01-01', '2015-12-31'),
('Segment-1', '2016-01-01', '2016-12-31'),
('Segment-1', '2017-01-01', '2017-12-31');

SELECT * FROM [dbo].[tbl_Segment];


SegmentName

FromDate

ToDate

Segment-1

2010-01-01

2010-12-31

Segment-1

2011-01-01

2011-12-31

Segment-1

2012-01-01

2012-12-31

Segment-1

2013-01-01

2013-12-31

Segment-1

2014-01-01

2014-12-31

Segment-1

2015-01-01

2015-12-31

Segment-1

2016-01-01

2016-12-31

Segment-1

2017-01-01

2017-12-31


By using this segmentation table, we can load a limited number of records in the SSIS package.

Step 3: SSIS Package Control Flow

Step 4: The Variable

Name

Data Type

SQL

String

v_FromDT

String

v_ToDT

String

vDateFromTo

Object


Step 5: Execute SQL – Get the Data Range

Step 6: ForEach Container

Step 7: Set the Variable Expression – Variable Name SQL

The Expression is:

"SELECT * FROM [dbo].[Emp_Source] WHERE DOJ BETWEEN '"+  @[User::v_FromDT] +"'  AND  '"+  @[User::v_ToDT]+"'"


Step 8: Data Flow Task

Step 9: OLEDB Source Settings

Step 10: The Lookup Transform

The main challenge is to create a dynamic lookup transform. To do that, just create a traditional lookup transform with the OLEDB connection manager that we create normally with SQL commands while not directly choosing a table object.

Now we have to make the dynamic lookup and use the same expression used before.

"SELECT * FROM [dbo].[Emp_Source] WHERE DOJ BETWEEN '"+  @[User::v_FromDT] +"'  AND  '"+  @[User::v_ToDT]+"'"


For that, we just select the data flow task where the lookup exists. Select the property of the Data Flow Task and choose the Expression. Then set the SQL Command Property by Expression. That’s all.

Hope you like it!

Create flexible schemas using dynamic columns for semi-structured data. Learn how.

Topics:
database ,lookup transform ,dynamic lookup ,sql ,tutorial

Published at DZone with permission of Joydeep Das, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}