DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Data Privacy and Security: A Developer's Guide to Handling Sensitive Data With DuckDB
  • How to Generate Customer Success Analytics in Snowflake
  • Using Datafold to Enhance DBT for Data Observability
  • Optimize Slow Data Queries With Doris JOIN Strategies

Trending

  • Unlocking Data with Language: Real-World Applications of Text-to-SQL Interfaces
  • How the Go Runtime Preempts Goroutines for Efficient Concurrency
  • Blue Skies Ahead: An AI Case Study on LLM Use for a Graph Theory Related Application
  • Immutable Secrets Management: A Zero-Trust Approach to Sensitive Data in Containers
  1. DZone
  2. Data Engineering
  3. Data
  4. Connecting an Autonomous Data Warehouse With Python

Connecting an Autonomous Data Warehouse With Python

In this article, take a look at how to connect an autonomous data warehouse with Python and see a machine learning example.

By 
Emrah Mete user avatar
Emrah Mete
·
Jan. 10, 20 · Tutorial
Likes (4)
Comment
Save
Tweet
Share
9.8K Views

Join the DZone community and get the full member experience.

Join For Free

Huge warehouse with extremely high ceilings

In this article, I will connect to an Oracle database running in the cloud (Oracle Autonomous Data Warehouse) and make a simple regression application in python environment with a sample data taken from here.

First of all, I will make this application in Autonomous Data Warehouse (DB) which is offered as a service in Oracle Cloud. All I need is an Oracle Cloud account. You can get Autonomous Data Warehouse service, which is one of the services of Always Free (Oracle Free Tier), free of charge and you can provision and use it on the Cloud in minutes without any installation. You can follow the link for detailed information.

You can start Autonomous Data Warehouse service via Oracle Cloud Infrastructure as shown in the video below.


The second component I need is to install the cx_oracle package in python to connect to the Oracle database in the Cloud from my local environment. The next step is to install an Oracle Client on my machine.

You can do the above mentioned installations by following the link.

I need a data set and a problem to implement it. I choose Boston Housing Prices as a problem. To solve this problem, I will construct a regression model. I get the data set from Kaggle (Boston Housing).

Let's first examine the BOSTON_HOUSING dataset.

Column Name Description Data Type
crim per capita crime rate by the town. Number
zn the proportion of residential land zoned for lots over 25,000 sq.ft. Number
indus the proportion of non-retail business acres per town. Number
chas Charles River dummy variable (= 1 if tract bounds river; 0 otherwise). Number
nox nitrogen oxides concentration (parts per 10 million). Number
rm average number of rooms per dwelling. Number
age the proportion of owner-occupied units built before 1940. Number
dis the weighted mean of distances to five Boston employment centers. Number
rad index of accessibility to radial highways. Number
tax full-value property-tax rate per $10,000. Number
ptratio the pupil-teacher ratio by the town. Number
black 1000(Bk – 0.63)^2 where Bk is the proportion of blacks by the town. Number
lstat lower status of the population (percent). Number
medv the median value of owner-occupied homes in $1000s.

Number

Now that we have reviewed the details with our dataset, let's load the BOSTON_HOUSING that we downloaded to our Oracle database.

First, create the Oracle table in which we will load the data set (train.csv) that we downloaded.


SQL
 




xxxxxxxxxx
1
18


 
1
CREATE TABLE BOSTON_HOUSING
2
(
3
   ID        NUMBER,
4
   CRIM      NUMBER,
5
   ZN        NUMBER,
6
   INDUS     NUMBER,
7
   CHAS      NUMBER,
8
   NOX       NUMBER,
9
   RM        NUMBER,
10
   AGE       NUMBER,
11
   DIS       NUMBER,
12
   RAD       NUMBER,
13
   TAX       NUMBER,
14
   PTRATIO   NUMBER,
15
   BLACK     NUMBER,
16
   LSTAT     NUMBER,
17
   MEDV      NUMBER
18
);


SQL
 




xxxxxxxxxx
1
18


 
1
CREATE TABLE BOSTON_HOUSING_TEST
2
(
3
   ID        NUMBER,
4
   CRIM      NUMBER,
5
   ZN        NUMBER,
6
   INDUS     NUMBER,
7
   CHAS      NUMBER,
8
   NOX       NUMBER,
9
   RM        NUMBER,
10
   AGE       NUMBER,
11
   DIS       NUMBER,
12
   RAD       NUMBER,
13
   TAX       NUMBER,
14
   PTRATIO   NUMBER,
15
   BLACK     NUMBER,
16
   LSTAT     NUMBER,
17
   MEDV      NUMBER
18
);



Now that we have created our table, we will load the dataset we downloaded as CSV into the table; we have multiple methods to do this:

  • Using Oracle External Table.
  • Using Oracle SQL Loader.
  • Using SQL-PL/SQL editors (Oracle SQL Developer, Toad, PL/SQL Developer, etc).

I will load the data set with the help of the editor I use. I use Oracle SQL Developer as an editor. With Oracle SQL Developer, you can load data as follows.

SQL
 




xxxxxxxxxx
1


 
1
SELECT * FROM BOSTON_HOUSING;





We have completed the dataset loading process.

When we observe the data, we see the details according to the various characteristics of the houses. Each row contains information on the specific characteristics of the house. Our basic parameters for regression analysis are as presented in this table. In this table, we predict the result of the regression analysis. The MEDV column is the target variable that we will use in this analysis.

Now let's start writing the necessary code on the Python side.

Python
 




xxxxxxxxxx
1
13


 
1
import cx_Oracle as cx
2
import pandas as pd
3
import warnings
4
warnings.filterwarnings('ignore')
5

          
6
#Connect to Autonomous Data Warehouse 
7
con = cx.connect("ADMIN","yourPass","mltest_high")
8

          
9
query = 'SELECT * from boston_housing'
10
data_train = pd.read_sql(query, con=con)
11

          
12
query = 'SELECT * from boston_housing_test'
13
data_test = pd.read_sql(query, con=con)



With cx_oracle, we connected to ADW, and from there we extracted the data from the relevant tables into the data frame via python.

Python
 




xxxxxxxxxx
1


 
1
data_train.head()
2
data_test.head()



Now let's create and test our model with the data we have.

Python
 




xxxxxxxxxx
1
15


 
1
######### Building Model with Boosting Method for Regression #######
2
from sklearn.ensemble import AdaBoostRegressor
3

          
4
X = data_train.iloc[:,1:14] # features
5
y = data_train.iloc[:,14:15] # target variable
6

          
7
#training
8
regr = AdaBoostRegressor(random_state=0, n_estimators=100)
9
regr.fit(X, y) 
10

          
11
#see feature importance
12
regr.feature_importances_
13

          
14
#model test with test data set from oracle database (data_test df)
15
data_test.iloc[1:2,1:14] # take one record from data set





Now let's take a test record and make an estimate using the model we created for this record.

Python
 




xxxxxxxxxx
1


 
1
test_rec = data_test.iloc[1:2,1:14]
2
testid = data_test.iloc[1:2,0:1] # get test record id for updating result in ADW (database)
3
id_final = int(test_id["ID"])
4

          
5

          
6
#predict value using model
7
res = regr.predict(test_rec)
8
pred_medv = res[0]



Now let's update the MEDV value of the estimated record in the database.

Python
 




xxxxxxxxxx
1


 
1
#update database record on ADW
2
query = 'update boston_housing_test set medv =:result where id=:testrecid'
3
cur = con.cursor()
4
cur.execute(query,[pred_medv,id_final])
5
con.commit()
6
con.close()



Yes, as we have seen, we extracted data from the Cloud database with Python and then used it in the training of the model we built with sklearn and tested it with a new value.

Data (computing) Data warehouse Cloud database Database design Python (language) sql Data set

Opinions expressed by DZone contributors are their own.

Related

  • Data Privacy and Security: A Developer's Guide to Handling Sensitive Data With DuckDB
  • How to Generate Customer Success Analytics in Snowflake
  • Using Datafold to Enhance DBT for Data Observability
  • Optimize Slow Data Queries With Doris JOIN Strategies

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!