Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Run Scala-Implemented Hadoop Jobs on HDInsight

DZone's Guide to

Run Scala-Implemented Hadoop Jobs on HDInsight

The next steps after setting up a Scala app to execute a word count on Hadoop is uploading the app to HDInsight and creating a Hadoop cluster.

Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

Previously, we set up a Scala application in order to execute a simple word count on Hadoop.

What comes next is uploading our application to HDInsight. So, we shall proceed in creating a Hadoop cluster on HDInsight.

screenshot-from-2017-02-14-07-20-45

Then, we will create the Hadoop cluster.

screenshot-from-2017-02-16-07-55-42

As you can see, we specify the admin console credentials and the SSH user to log into the head node.

Our Hadoop cluster will be backed by an Azure storage account.

screenshot-from-2017-02-16-07-57-07

Then, it is time to upload our text files to the Azure storage account.

For more information on managing a storage account with Azure CLI, check the official guide. Any text file will work.

azure storage blob upload mytext.txt scalahadoopexample  example/data/input.txt


Now, we can SSH to our Hadoop node.

First, let’s run the examples that come packaged with the HDInsight Hadoop cluster.

 hadoop jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar wordcount /example/data/input.txt /example/data/results 

Check the results:

hdfs dfs -text /example/data/results/part-r-00000 

And then we are ready to SCP the Scala code to our Hadoop node and issue as the word count.

hadoop jar ScalaHadoop-assembly-1.0.jar /example/data/input.txt /example/data/results2 

And again, check the results:

hdfs dfs -text /example/data/results2/part-r-00000 

That’s it! HDinsight makes it pretty straight forward!

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.

Topics:
hdinsight ,big data ,hadoop ,scala ,tutorial

Published at DZone with permission of Emmanouil Gkatziouras, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}