DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • SQL Loader + Unix Script: Loading Multiple Data Files in Oracle DB Table
  • Fixing Common Oracle Database Problems
  • How To Replicate Oracle Data to BigQuery With Google Cloud Datastream
  • Optimize Slow Data Queries With Doris JOIN Strategies

Trending

  • AI, ML, and Data Science: Shaping the Future of Automation
  • Agile and Quality Engineering: A Holistic Perspective
  • A Guide to Developing Large Language Models Part 1: Pretraining
  • Stateless vs Stateful Stream Processing With Kafka Streams and Apache Flink
  1. DZone
  2. Data Engineering
  3. Databases
  4. Streamlining Data Integration

Streamlining Data Integration

This article demonstrates integrating Salesforce and Oracle into Amazon Redshift using Python and stored procedures, leveraging the ELT approach.

By 
Rajesh Remala user avatar
Rajesh Remala
·
Jun. 06, 24 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
4.5K Views

Join the DZone community and get the full member experience.

Join For Free

Integrating data from multiple sources like Salesforce and Oracle into Amazon Redshift is crucial for organizations looking to centralize their analytics. This article demonstrates how to connect to Salesforce and Oracle, extract data using SOQL and SQL queries, load it into Redshift staging tables, and perform transformations using Redshift stored procedures, all orchestrated through Python scripts.

Prerequisites

  • Salesforce: Access to Salesforce with the necessary API permissions.
  • Oracle: Access to an Oracle database with the necessary query permissions.
  • Amazon Redshift: An existing Redshift cluster.
  • Python: Installed with the necessary libraries (simple_salesforce, cx_Oracle, boto3, psycopg2).

Connecting to Salesforce and Extracting Data

First, let's connect to Salesforce and extract data using SOQL.

Python
 
from simple_salesforce import Salesforce
import pandas as pd

# Salesforce credentials
sf = Salesforce(username='your_username', 
                password='your_password', 
                security_token='your_security_token')

# SOQL query to fetch data from Salesforce
query = "SELECT Id, Name, AccountNumber FROM Account"
response = sf.query_all(query)

# Convert response to a DataFrame
data_sf = pd.DataFrame(response['records']).drop(columns='attributes')
print(data_sf.head())


  • Library Import: We import Salesforce from simple_salesforce for easy Salesforce API interaction and pandas for data manipulation.
  • Salesforce Connection: We establish a connection to Salesforce using the provided credentials. The best way to implement is to pass username and password as parameters from config file or using environmental variables, don't hardcode passwords.
  • SOQL Query: We execute a SOQL (Salesforce Object Query Language) query to retrieve specific fields (Id, Name, AccountNumber) from the Account object.
  • Data Conversion: The response is converted into a pandas DataFrame for easier manipulation and analysis.

Connecting to Oracle and Extracting Data

Next, let's connect to Oracle and extract data using SQL.

Python
 
import cx_Oracle

# Oracle credentials and connection details
oracle_dsn = cx_Oracle.makedsn("your_oracle_host", "your_oracle_port", service_name="your_service_name")
conn_oracle = cx_Oracle.connect(user="your_username", password="your_password", dsn=oracle_dsn)

# SQL query to fetch data from Oracle
sql_query = "SELECT ID, NAME, ACCOUNT_NUMBER FROM ACCOUNTS"
data_oracle = pd.read_sql(sql_query, con=conn_oracle)
print(data_oracle.head())

# Close Oracle connection
conn_oracle.close()


  • Library Import: We import cx_Oracle for Oracle database connections.
  • Oracle Connection: We establish a connection to Oracle using the provided credentials.
  • SQL Query: We execute a SQL query to retrieve specific fields (ID, NAME, ACCOUNT_NUMBER) from the ACCOUNTS table.
  • Data Conversion: The result is converted into a pandas DataFrame for easier manipulation and analysis.

Loading Data Into Redshift Staging Tables

Now, we load the extracted data from Salesforce and Oracle into Redshift staging tables.

Python
 
import boto3
import psycopg2
from io import StringIO

# Redshift credentials and connection details
redshift_host = 'your_redshift_host'
redshift_db = 'your_database'
redshift_user = 'your_user'
redshift_password = 'your_password'
redshift_port = 5439

# Connect to Redshift
conn_redshift = psycopg2.connect(
    host=redshift_host,
    dbname=redshift_db,
    user=redshift_user,
    password=redshift_password,
    port=redshift_port
)
cur_redshift = conn_redshift.cursor()

# Create staging tables (if they don't exist)
create_sf_table_query = """
CREATE TABLE IF NOT EXISTS staging_account_sf (
    Id VARCHAR(18),
    Name VARCHAR(255),
    AccountNumber VARCHAR(40)
);
"""
create_oracle_table_query = """
CREATE TABLE IF NOT EXISTS staging_account_oracle (
    ID VARCHAR(18),
    NAME VARCHAR(255),
    ACCOUNT_NUMBER VARCHAR(40)
);
"""
cur_redshift.execute(create_sf_table_query)
cur_redshift.execute(create_oracle_table_query)
conn_redshift.commit()

# Load Salesforce data into staging table
csv_buffer_sf = StringIO()
data_sf.to_csv(csv_buffer_sf, index=False, header=False)
csv_buffer_sf.seek(0)
cur_redshift.copy_from(csv_buffer_sf, 'staging_account_sf', sep=',')
conn_redshift.commit()

# Load Oracle data into staging table
csv_buffer_oracle = StringIO()
data_oracle.to_csv(csv_buffer_oracle, index=False, header=False)
csv_buffer_oracle.seek(0)
cur_redshift.copy_from(csv_buffer_oracle, 'staging_account_oracle', sep=',')
conn_redshift.commit()


  • Library Import: We import boto3 for AWS interactions, psycopg2 for PostgreSQL/Redshift connections, and StringIO for in-memory file operations.
  • Redshift Connection: We establish a connection to Redshift using the provided credentials.
  • Create Staging Tables: We create staging tables (staging_account_sf for Salesforce data and staging_account_oracle for Oracle data) if they don't already exist.
  • Data Loading: The Data Frames are converted to CSV format and loaded into the respective staging tables using copy_from, which efficiently loads data into Redshift.

Executing Stored Procedures for ELT

Once the data is in the staging tables, we can call stored procedures in Redshift to transform the data and load it into the final tables.

Python
 
# Call stored procedure for transformation
stored_procedure_query = "CALL transform_data_procedure();"
cur_redshift.execute(stored_procedure_query)
conn_redshift.commit()

# Verify data in the final table
verify_query = "SELECT * FROM final_account_table LIMIT 10;"
cur_redshift.execute(verify_query)
for row in cur_redshift.fetchall():
    print(row)

# Close the connection
cur_redshift.close()
conn_redshift.close()


  • Stored Procedure Call: We call a stored procedure (transform_data_procedure) in Redshift that performs the necessary transformations and loads the data into the final tables. This encapsulates the ELT (Extract, Load, Transform) logic within the database, leveraging Redshift's processing power.
  • Data Verification: We run a query to verify that the data has been correctly transformed and loaded into the final table (final_account_table).
  • Close Connection: Finally, we close the cursor and the database connection to clean up resources.

Conclusion

This script demonstrates a complete workflow for extracting data from Salesforce and Oracle, loading it into Amazon Redshift, and performing ELT operations using stored procedures. This approach leverages the strengths of each component: Salesforce and Oracle for CRM and relational data, Python for orchestration, and Redshift for scalable data transformations.

By centralizing data in Redshift, organizations can perform more comprehensive analyses and derive valuable insights from their Salesforce and Oracle data, enabling better decision-making and operational efficiency.

Amazon Redshift Extract, load, transform Oracle Database Data (computing) sql

Opinions expressed by DZone contributors are their own.

Related

  • SQL Loader + Unix Script: Loading Multiple Data Files in Oracle DB Table
  • Fixing Common Oracle Database Problems
  • How To Replicate Oracle Data to BigQuery With Google Cloud Datastream
  • Optimize Slow Data Queries With Doris JOIN Strategies

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!