DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Curious about the future of data-driven systems? Join our Data Engineering roundtable and learn how to build scalable data platforms.

Data Engineering: The industry has come a long way from organizing unstructured data to adopting today's modern data pipelines. See how.

Threat Detection: Learn core practices for managing security risks and vulnerabilities in your organization — don't regret those threats!

Managing API integrations: Assess your use case and needs — plus learn patterns for the design, build, and maintenance of your integrations.

Avatar

Hardik Pandya

Senior Software Developer at Oracle

Mississauga, CA

Joined Jan 2014

About

Java/J2EE Big Data Professional interested exploring real world software applications. While I am not an expert on any given area, would like to share my journey of software developer by sharing my experiences.

Stats

Reputation: 156
Pageviews: 606.2K
Articles: 6
Comments: 0
  • Articles

Articles

article thumbnail
Working With Akka Actors
Here we're going to look at the Akka actor model with a simple example: fetching weather data from Yahoo.
May 18, 2016
· 11,836 Views · 10 Likes
article thumbnail
Sqoop: Import Data From MySQL to Hive
Use Sqoop to move your MySQL data to Hive for even easier analysis with Hadoop.
April 14, 2016
· 146,189 Views · 8 Likes
article thumbnail
Writing Custom Hive UDF and UDAF
User Defined Functions and User Defined Aggregate Functions allow you to completely customize how Hive evaluates data and manipulate queries.
March 12, 2016
· 93,740 Views · 7 Likes
article thumbnail
How to Perform Joins in Apache Hive
Covering the basics of joins in hive in a detailed tutorial from Hardik Pandya.
March 5, 2016
· 26,968 Views · 5 Likes
article thumbnail
Running Hadoop MapReduce Application from Eclipse Kepler
it's very important to learn hadoop by practice. one of the learning curves is how to write the first map reduce app and debug in favorite ide, eclipse. do we need any eclipse plugins? no, we do not. we can do hadoop development without map reduce plugins this tutorial will show you how to set up eclipse and run your map reduce project and mapreduce job right from your ide. before you read further, you should have setup hadoop single node cluster and your machine. you can download the eclipse project from github . use case: we will explore the weather data to find maximum temperature from tom white’s book hadoop: definitive guide (3rd edition) chapter 2 and run it using toolrunner i am using linux mint 15 on virtualbox vm instance. in addition, you should have hadoop (mrv1 am using 1.2.1) single node cluster installed and running, if you have not done so, would strongly recommend you do it from here download eclipse ide, as of writing this, latest version of eclipse is kepler 1. create new java project 2. add dependencies jars right click on project properties and select java build path add all jars from $hadoop_home/lib and $hadoop_home (where hadoop core and tools jar lives) 3. create mapper package com.letsdobigdata; import java.io.ioexception; import org.apache.hadoop.io.intwritable; import org.apache.hadoop.io.longwritable; import org.apache.hadoop.io.text; import org.apache.hadoop.mapreduce.mapper; public class maxtemperaturemapper extends mapper { private static final int missing = 9999; @override public void map(longwritable key, text value, context context) throws ioexception, interruptedexception { string line = value.tostring(); string year = line.substring(15, 19); int airtemperature; if (line.charat(87) == '+') { // parseint doesn't like leading plus // signs airtemperature = integer.parseint(line.substring(88, 92)); } else { airtemperature = integer.parseint(line.substring(87, 92)); } string quality = line.substring(92, 93); if (airtemperature != missing && quality.matches("[01459]")) { context.write(new text(year), new intwritable(airtemperature)); } } } 4. create reducer package com.letsdobigdata; import java.io.ioexception; import org.apache.hadoop.io.intwritable; import org.apache.hadoop.io.text; import org.apache.hadoop.mapreduce.reducer; public class maxtemperaturereducer extends reducer { @override public void reduce(text key, iterable values, context context) throws ioexception, interruptedexception { int maxvalue = integer.min_value; for (intwritable value : values) { maxvalue = math.max(maxvalue, value.get()); } context.write(key, new intwritable(maxvalue)); } } 5. create driver for mapreduce job map reduce job is executed by useful hadoop utility class toolrunner package com.letsdobigdata; import org.apache.hadoop.conf.configured; import org.apache.hadoop.fs.path; import org.apache.hadoop.io.intwritable; import org.apache.hadoop.io.text; import org.apache.hadoop.mapreduce.job; import org.apache.hadoop.mapreduce.lib.input.fileinputformat; import org.apache.hadoop.mapreduce.lib.output.fileoutputformat; import org.apache.hadoop.util.tool; import org.apache.hadoop.util.toolrunner; /*this class is responsible for running map reduce job*/ public class maxtemperaturedriver extends configured implements tool{ public int run(string[] args) throws exception { if(args.length !=2) { system.err.println("usage: maxtemperaturedriver "); system.exit(-1); } job job = new job(); job.setjarbyclass(maxtemperaturedriver.class); job.setjobname("max temperature"); fileinputformat.addinputpath(job, new path(args[0])); fileoutputformat.setoutputpath(job,new path(args[1])); job.setmapperclass(maxtemperaturemapper.class); job.setreducerclass(maxtemperaturereducer.class); job.setoutputkeyclass(text.class); job.setoutputvalueclass(intwritable.class); system.exit(job.waitforcompletion(true) ? 0:1); boolean success = job.waitforcompletion(true); return success ? 0 : 1; } public static void main(string[] args) throws exception { maxtemperaturedriver driver = new maxtemperaturedriver(); int exitcode = toolrunner.run(driver, args); system.exit(exitcode); } } 6. supply input and output we need to supply input file that will be used during map phase and the final output will be generated in output directory by reduct task. edit run configuration and supply command line arguments. sample.txt reside in the project root. your project explorer should contain following ] 7. map reduce job execution 8. final output if you managed to come this far, once the job is complete, it will create output directory with _success and part_nnnnn , double click to view it in eclipse editor and you will see we have supplied 5 rows of weather data (downloaded from ncdc weather) and we wanted to find out the maximum temperature in a given year from input file and the output will contain 2 rows with max temperature in (centigrade) for each supplied year 1949 111 (11.1 c) 1950 22 (2.2 c) make sure you delete the output directory next time running your application else you will get an error from hadoop saying directory already exists. happy hadooping!
February 21, 2014
· 143,943 Views · 2 Likes
article thumbnail
How to Set Up a Multi-Node Hadoop Cluster on Amazon EC2, Part 1
Learn how to set up a four node Hadoop cluster using AWS EC2, PuTTy(gen), and WinSCP.
January 23, 2014
· 134,934 Views · 3 Likes

User has been successfully modified

Failed to modify user

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: