Over a million developers have joined DZone.

Programmatically submitting jobs to a remote Hadoop Cluster

DZone's Guide to

Programmatically submitting jobs to a remote Hadoop Cluster

· Java Zone
Free Resource

Just released, a free O’Reilly book on Reactive Microsystems: The Evolution of Microservices at Scale. Brought to you in partnership with Lightbend.

I'm adding the ability to deploy a Map/Reduce job to a remote Hadoop cluster in Virgil. With this, Virgil allows users to make a REST POST to schedule a Hadoop job. (pretty handy)

To get this to work properly, Virgil needed to be able to remotely deploy a job. Ordinarily, to run a job against a remote cluster you issue a command from the shell:

hadoop jar $JAR_FILE $CLASS_NAME 

We wanted to do the same thing, but from within the Virgil runtime. It was easy enough to find the class we needed to use: RunJar. RunJar's main() method stages the jar and submits the job. Thus, to achieve the same functionality as the command line, we used the following:

List args = new ArrayList(); 
 RunJar.main(args.toArray(new String[0])); 

That worked just fine, but would result in a local job deployment. To get it to deploy to a remote cluster, we needed Hadoop to load the cluster configuration. For Hadoop, cluster configuration is spread across three files: core-site.xml, hdfs-site.xml, and mapred-site.xml. To get the Hadoop runtime to load the configuration, you need to include these files on your classpath. The key line is found in the configuration Hadoop Javadoc.

"Unless explicitly turned off, Hadoop by default specifies two resources, loaded in-order from the classpath:"

Once we dropped the cluster configuration onto the classpath, everything worked like a charm.

From http://brianoneill.blogspot.com/2011/12/programmatically-submitting-jobs-to.html

Strategies and techniques for building scalable and resilient microservices to refactor a monolithic application step-by-step, a free O'Reilly book. Brought to you in partnership with Lightbend.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}