Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Programmatically submitting jobs to a remote Hadoop Cluster

DZone's Guide to

Programmatically submitting jobs to a remote Hadoop Cluster

· Java Zone
Free Resource

Learn how to stop testing everything every sprint and only test the code you’ve changed. Brought to you by Parasoft.

I'm adding the ability to deploy a Map/Reduce job to a remote Hadoop cluster in Virgil. With this, Virgil allows users to make a REST POST to schedule a Hadoop job. (pretty handy)

To get this to work properly, Virgil needed to be able to remotely deploy a job. Ordinarily, to run a job against a remote cluster you issue a command from the shell:

hadoop jar $JAR_FILE $CLASS_NAME 


We wanted to do the same thing, but from within the Virgil runtime. It was easy enough to find the class we needed to use: RunJar. RunJar's main() method stages the jar and submits the job. Thus, to achieve the same functionality as the command line, we used the following:

List args = new ArrayList(); 
 args.add(locationOfJarFile); 
 args.add(className); 
 RunJar.main(args.toArray(new String[0])); 

 
That worked just fine, but would result in a local job deployment. To get it to deploy to a remote cluster, we needed Hadoop to load the cluster configuration. For Hadoop, cluster configuration is spread across three files: core-site.xml, hdfs-site.xml, and mapred-site.xml. To get the Hadoop runtime to load the configuration, you need to include these files on your classpath. The key line is found in the configuration Hadoop Javadoc.

"Unless explicitly turned off, Hadoop by default specifies two resources, loaded in-order from the classpath:"


Once we dropped the cluster configuration onto the classpath, everything worked like a charm.

From http://brianoneill.blogspot.com/2011/12/programmatically-submitting-jobs-to.html

Get the top tips for Java developers and best practices to overcome common challenges. Brought to you by Parasoft.

Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}