Over a million developers have joined DZone.

Sqoop Oracle Example : Getting Started with Oracle -> HDFS Import/Extract

DZone's Guide to

Sqoop Oracle Example : Getting Started with Oracle -> HDFS Import/Extract

· Java Zone
Free Resource

Learn how to troubleshoot and diagnose some of the most common performance issues in Java today. Brought to you in partnership with AppDynamics.

In this post, we'll get Sqoop (1.99.3) connected to an Oracle database, extracting records to HDFS.

Add Oracle Driver to Sqoop Classpath

The first thing we'll need to do is copy the oracle JDBC jar file into the Sqoop lib directory.  Note, this directly may not exist.  You may need to create it.

For me, this amounted to:

➜  sqoop  mkdir lib
➜  sqoop  cp ~/git/boneill/data-lab/lib/ojdbc6.jar ./lib

Add YARN and HDFS to Sqoop Classpath

Next, you will need to add the HDFS and YARN jar files to the classpath of Sqoop.  If you recall from the initial setup, the classpath is controlled by the common.loader property in the server/conf/catalina.properties file.  To get things submitting to the YARN cluster properly, I added the following additional paths to the common.loader property:
Note, the added paths.

*IMPORTANT* : Restart your Sqoop server so it picks up the new jar files. 
(including the driver jar!)
Create JDBC Connection

After that, we can fire up the client, and create a connection with the following:

bin/sqoop.sh client
sqoop> create connection --cid 1
Creating connection for connector with id 1
Please fill following values to create new connection object
Name: my_datasource
Connection configuration
JDBC Driver Class: oracle.jdbc.driver.OracleDriver
JDBC Connection String: jdbc:oracle:thin:@change.me:1521:service.name
Username: your.user
Password: ***********
JDBC Connection Properties:
There are currently 0 values in the map:
Security related configuration options
Max connections: 10
New connection was successfully created with validation status FINE and persistent id 1

Create Sqoop Job

Next step is to make a job.  This is done with the following:

sqoop> create job --xid 1 --type import
Creating job for connection with id 1
Please fill following values to create new job object
Name: data_import

Database configuration

Schema name: MY_SCHEMA

Table name: MY_TABLE
Table SQL statement:
Table column names:
Partition column name: UID
Nulls in partition column:
Boundary query:

Output configuration

Storage type:
  0 : HDFS
Choose: 0
Output format:
Choose: 0
Compression format:
  0 : NONE
Choose: 0
Output directory: /user/boneill/dump/

Throttling resources

New job was successfully created with validation status FINE  and persistent id 3

Everything is fairly straight-forward. The output directory is the HDFS directory to which the output will be written.

Run the job!

This was actually the hardest step because the documentation is out of date. (AFAIK)  Instead of using "submission", as the documentation states.  Use the following:
sqoop> start job --jid 1
Submission details
Job ID: 3
Server URL: http://localhost:12000/sqoop/
Created by: bone
Creation date: 2014-10-14 13:27:57 EDT
Lastly updated by: bone
External ID: job_1413298225396_0001
2014-10-14 13:27:57 EDT: BOOTING  - Progress is not available
From there, you should be able to see the job in YARN!

After a bit of churning, you should be able to go over to HDFS and find your files in the output directory.

Best of luck all.  Let me know if you have any trouble.


Understand the needs and benefits around implementing the right monitoring solution for a growing containerized market. Brought to you in partnership with AppDynamics.


Published at DZone with permission of Brian O' Neill, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}