Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Data Transition With High Performance Between Hash Partitioned Tables

DZone's Guide to

Data Transition With High Performance Between Hash Partitioned Tables

When transitioning data between tables, making the table metadata hash partitioned and can greatly increase performance. Here's an example!

· Database Zone ·
Free Resource

RavenDB vs MongoDB: Which is Better? This White Paper compares the two leading NoSQL Document Databases on 9 features to find out which is the best solution for your next project.  

In this article, I’ll share the solution of a specific problem I had. I believe this will raise awareness for those experiencing the same problem.

We vehemently require data transitions between tables containing large-sized data while doing ETL development. But these functions may take a great deal of time. Often, we give the hint /*+parallel*/ to shorten this time, or we apply DIY parallelism techniques. Sometimes, performing these methods may take too much time — and sometimes, they may not even work. The technique I'm about to explain (when used appropriately) can greatly improve performance.

If there is a data transition between the tables when we transport data (in other words, if we perform the transition using the same key fields in the source and target tables), then making both of the tables’ metadata hash partitioned and transferring data between the corresponding partitioned tables will speed up performance considerably because we'll have the same number of functioning parallels and partitioned tables.

Here's an example. 

Say there is one source and there is one target table, which are both tables that are divided into eight partitions according to their key fields. (The tables might not be partitioned according to their key fields, but we can change the metadata ourselves and make it suitable for that function.)

I’m creating my two tables in accordance with the solution for the problem and inserting 50M of entry data into my table.

CREATE TABLE SRC_TRANS
(
   TRANS_ID     NUMBER,
   TRANS_NAME   VARCHAR2 (10),
   TRANS_DATE   DATE
)
PARALLEL (DEGREE 8 INSTANCES DEFAULT)
PARTITION BY HASH
   (TRANS_ID)
   (
      PARTITION P1,
      PARTITION P2,
      PARTITION P3,
      PARTITION P4,
      PARTITION P5,
      PARTITION P6,
      PARTITION P7,
      PARTITION P8);
-----------------------------------

CREATE TABLE TRG_TRANS
(
   TRANS_ID     NUMBER,
   TRANS_NAME   VARCHAR2 (10),
   TRANS_DATE   DATE
)
PARALLEL (DEGREE 8 INSTANCES DEFAULT)
PARTITION BY HASH
   (TRANS_ID)
   (
      PARTITION P1,
      PARTITION P2,
      PARTITION P3,
      PARTITION P4,
      PARTITION P5,
      PARTITION P6,
      PARTITION P7,
      PARTITION P8);
-----------------------------------

BEGIN
   FOR i IN 1 .. 50000000
   LOOP
      EXECUTE IMMEDIATE 'insert into src_trans values(:1,:2,:3)'
         USING i, 'Test', SYSDATE;
   END LOOP;

   COMMIT;
END;

The procedure is complete. Now, I’ll examine the duration after performing the data transfer between the two tables without doing anything else.

SET TIMING ON

INSERT INTO TRG_TRANS
SELECT /*+ parallel (t 4) */ * FROM SRC_TRANS T;

COMMIT;

50000000 rows created.
Elapsed: 00:00:35.60

Let’s do this procedure while it’s divided into eight separate parts. As all the partitions are created as different physical tables in Oracle, we can start these procedures as different main sessions and parallel the procedure realistically. If our resource pool is wide enough, we can make every single part parallel, too. As both of the tables were hash partitioned to the same field, corresponding partitions will hold the same data. In this context, I’m coding SQLs that will transfer data reciprocally and I’m looking at the total duration after opening parallel sessions and running SQL.

INSERT INTO TRG_TRANS PARTITION (P1) SELECT /*+ parallel (t 4) */ * FROM SRC_TRANS  PARTITION(P1) T;

INSERT INTO TRG_TRANS PARTITION (P2) SELECT /*+ parallel (t 4) */ * FROM SRC_TRANS PARTITION(P2) T;

INSERT INTO TRG_TRANS PARTITION (P3) SELECT /*+ parallel (t 4) */ * FROM SRC_TRANS PARTITION(P3) T;

INSERT INTO TRG_TRANS PARTITION (P4) SELECT /*+ parallel (t 4) */ * FROM SRC_TRANS PARTITION(P4) T;

INSERT INTO TRG_TRANS PARTITION (P5) SELECT /*+ parallel (t 4) */ * FROM SRC_TRANS PARTITION(P5) T;

INSERT INTO TRG_TRANS PARTITION (P6) SELECT /*+ parallel (t 4) */ * FROM SRC_TRANS PARTITION(P6) T;

INSERT INTO TRG_TRANS PARTITION (P7) SELECT /*+ parallel (t 4) */ * FROM SRC_TRANS PARTITION(P7) T;

INSERT INTO TRG_TRANS PARTITION (P8) SELECT /*+ parallel (t 4) */ * FROM SRC_TRANS PARTITION(P8) T;

COMMIT;

Elapsed: 00:00:09.60

The difference between is almost four times less. If your system resources are able to hold more parallels for the subfunctions, this procedure may take less time. You will see that there’s a tremendous amount of time difference when this procedure is done between large tables. 

Get comfortable using NoSQL in a free, self-directed learning course provided by RavenDB. Learn to create fully-functional real-world programs on NoSQL Databases. Register today.

Topics:
oracle ,tutorial ,database ,partitioning ,tables ,database performance

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}