Sqoop Oracle Example : Getting Started with Oracle -> HDFS Import/Extract
Join the DZone community and get the full member experience.Join For Free
In this post, we'll get Sqoop (1.99.3) connected to an Oracle database, extracting records to HDFS.
Add Oracle Driver to Sqoop ClasspathThe first thing we'll need to do is copy the oracle JDBC jar file into the Sqoop lib directory. Note, this directly may not exist. You may need to create it.
For me, this amounted to:
➜ sqoop mkdir lib ➜ sqoop cp ~/git/boneill/data-lab/lib/ojdbc6.jar ./lib
Add YARN and HDFS to Sqoop Classpath
*IMPORTANT* : Restart your Sqoop server so it picks up the new jar files.
(including the driver jar!)
Create JDBC Connection
After that, we can fire up the client, and create a connection with the following:
bin/sqoop.sh client ... sqoop> create connection --cid 1 Creating connection for connector with id 1 Please fill following values to create new connection object Name: my_datasource Connection configuration JDBC Driver Class: oracle.jdbc.driver.OracleDriver JDBC Connection String: jdbc:oracle:thin:@change.me:1521:service.name Username: your.user Password: *********** JDBC Connection Properties: There are currently 0 values in the map: entry# HIT RETURN HERE! Security related configuration options Max connections: 10 New connection was successfully created with validation status FINE and persistent id 1
Create Sqoop Job
sqoop> create job --xid 1 --type import Creating job for connection with id 1 Please fill following values to create new job object Name: data_import Database configuration Schema name: MY_SCHEMA
Table name: MY_TABLE Table SQL statement: Table column names: Partition column name: UID Nulls in partition column: Boundary query: Output configuration Storage type: 0 : HDFS Choose: 0 Output format: 0 : TEXT_FILE 1 : SEQUENCE_FILE Choose: 0 Compression format: 0 : NONE ... Choose: 0 Output directory: /user/boneill/dump/ Throttling resources Extractors: Loaders: New job was successfully created with validation status FINE and persistent id 3
Everything is fairly straight-forward. The output directory is the HDFS directory to which the output will be written.
Run the job!
sqoop> start job --jid 1 Submission details Job ID: 3 Server URL: http://localhost:12000/sqoop/ Created by: bone Creation date: 2014-10-14 13:27:57 EDT Lastly updated by: bone External ID: job_1413298225396_0001 http://your_host:8088/proxy/application_1413298225396_0001/ 2014-10-14 13:27:57 EDT: BOOTING - Progress is not available
After a bit of churning, you should be able to go over to HDFS and find your files in the output directory.
Best of luck all. Let me know if you have any trouble.
Published at DZone with permission of Brian O' Neill, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.