DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Backing Up Azure Infrastructure with Python and Aztfexport
  • Seamless Integration: Connecting AWS Lambda to RDS and Writing Data Effortlessly
  • Streaming Data Pipeline Architecture
  • AWS NoSQL Performance Lab Using Python

Trending

  • The Agent Protocol Stack: MCP vs. A2A vs. AG-UI
  • What Nobody Tells You About Multimodal Data Pipelines for AI Training
  • How to Test a PATCH API Request With REST-Assured Java
  • Improving DAG Failure Detection in Airflow Using AI Techniques
  1. DZone
  2. Data Engineering
  3. Databases
  4. Framework to Load Data From Oracle To Azure

Framework to Load Data From Oracle To Azure

Step-by-step process to create a Python framework to extract and load data from Oracle and load into Azure blob storage and Azure Dedicated pool with a code snippet.

By 
Dinesh Eswararaj user avatar
Dinesh Eswararaj
·
May. 02, 23 · Analysis
Likes (2)
Comment
Save
Tweet
Share
3.9K Views

Join the DZone community and get the full member experience.

Join For Free

Oracle is a popular relational database management system (RDBMS) used by many organizations for storing and managing their data. However, with the advent of cloud computing, many organizations are moving towards cloud-based solutions for their data storage and processing needs. Azure Blob Storage and Synapse Dedicated SQL Pool are two popular cloud-based solutions offered by Microsoft for storing and processing data. In this article, we will explore how to create a framework to load data from Oracle to Azure Blob Storage as Parquet file and then load it to Synapse Dedicated SQL Pool.

Data storage

Step 1: Set Up Azure Blob Storage

The first step in this process is to set up Azure Blob Storage. You can do this by creating an Azure account and then creating a storage account. Once you have created a storage account, you can create a container in the storage account where you will store the Parquet file. Make sure to note down the connection string for the storage account and the name of the container, as you will need these later.

Step 2: Install Required Libraries

Next, you need to install the required libraries to interact with Oracle and Azure Blob Storage. You can use the following commands to install the libraries:

Python
 
pip install cx_Oracle 

pip install azure-storage-blob


Step 3: Connect to Oracle

After installing the required libraries, you need to connect to Oracle. You can use the following code to create a connection:

Python
 
import cx_Oracle

conn = cx_Oracle.connect('username/password@hostname:port/servicename')


Replace the username, password, hostname, port, and service name with the appropriate values for your Oracle database.

Step 4: Query Data from Oracle

Once you have established a connection to Oracle, you can query the data that you want to transfer to Azure Blob Storage. You can use the following code to query the data:

Python
 
import pandas as pd

query = 'SELECT * FROM my_table'

df = pd.read_sql(query, conn)

Replace my_table with the name of the table that you want to query.


Step 5: Write Data to Parquet File

After querying the data, you need to write it to a Parquet file. You can use the following code to write the data to a Parquet file:

Python
 
import pyarrow as pa

import pyarrow.parquet as pq

table = pa.Table.from_pandas(df)

pq.write_table(table, 'my_file.parquet')

Replace my_file.parquet with the name that you want to give to the Parquet file.


Step 6: Upload Parquet File to Azure Blob Storage

Once you have created the Parquet file, you need to upload it to Azure Blob Storage. You can use the following code to upload the file:

Python
 
from azure.storage.blob import BlobServiceClient

connection_string = 'DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=mykey;EndpointSuffix=core.windows.net'

container_name = 'my_container'

blob_name = 'my_file.parquet'

blob_service_client = BlobServiceClient.from_connection_string(connection_string)

blob_client = blob_service_client.get_blob_client(container_name, blob_name)

with open('my_file.parquet', 'rb') as data:

    blob_client.upload_blob(data)


Replace myaccount, mykey, and my_container with the appropriate values for your storage account and container. Also, replace my_file.parquet with the name of the Parquet file that you created in Step 5.

Step 7: Load Parquet File to Synapse Dedicated SQL Pool

The final step is to load the Parquet file from Azure Blob Storage to Synapse Dedicated SQL Pool. You can use the following code to create an external data source in Synapse Dedicated SQL Pool: 

Python
 
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'password';


CREATE DATABASE SCOPED CREDENTIAL cred_name

WITH IDENTITY = 'username',

SECRET = 'password';


CREATE EXTERNAL DATA SOURCE data_source_name

WITH (

    TYPE = HADOOP,

    LOCATION = 'wasbs://container_name@account_name.blob.core.windows.net',

    CREDENTIAL = cred_name

);


Replace password, username, account_name, and container_name with the appropriate values for your Azure Blob Storage account and container.

Next, you can use the following code to create an external table in Synapse Dedicated SQL Pool:

Python
 
CREATE EXTERNAL TABLE external_table_name

(

    column1 datatype1,

    column2 datatype2,

    ...

)

WITH (

    LOCATION = '/path/to/parquet/file',

    DATA_SOURCE = data_source_name,

    FILE_FORMAT = 'parquet'

);


Replace external_table_name, column1, datatype1, and so on with the appropriate values for your external table.

Finally, you can use the following code to insert data from the external table to a regular table in Synapse Dedicated SQL Pool:

Python
 
INSERT INTO regular_table_name

SELECT column1, column2, ...

FROM external_table_name;


Replace regular_table_name, column1, column2, and so on with the appropriate values for your regular table.

Conclusion

In this article, we have discussed how to create a framework to load data from Oracle to Azure Blob Storage as Parquet file and then load it to Synapse Dedicated SQL Pool. By following these steps, you can easily transfer your data from Oracle to Synapse Dedicated SQL Pool and take advantage of the benefits of cloud-based solutions for your data storage and processing needs.

Data storage Relational database azure Python (language)

Opinions expressed by DZone contributors are their own.

Related

  • Backing Up Azure Infrastructure with Python and Aztfexport
  • Seamless Integration: Connecting AWS Lambda to RDS and Writing Data Effortlessly
  • Streaming Data Pipeline Architecture
  • AWS NoSQL Performance Lab Using Python

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook