Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Some Hadoop and Hive Gotchas and Developer Tips

DZone's Guide to

Some Hadoop and Hive Gotchas and Developer Tips

· Cloud Zone
Free Resource

MongoDB Atlas is a database as a service that makes it easy to deploy, manage, and scale MongoDB. So you can focus on innovation, not operations. Brought to you in partnership with MongoDB.

This is my log on several mistakes (some pretty dumb on the hindsight :) ) that I did while getting started with Hadoop and Hive some time back, along with some tricks on debugging Hadoop and Hive. I am using Hadoop 0.20.203 and Hive 0.8.1.

localhost: Error: JAVA_HOME is not set

This almost undecipherable and cryptic error message :) during Hadoop startup (namenode/jobtracker etc.) says Hadoop cannot find the Java installation. Wait!! I have already set JAVA_HOME enviornment variable?? Seems it’s not enough. So where else to set it? Turns out that you have to set JAVA_HOME in hadoop-env.sh present in conf folder to get the elephant moving.

Name node mysteriously fails to start

When you start the namenode things seems fine except for the fact that the server is not up and running. And of course I hadn’t formatted the HDFS on the namenode. So why should it work right? :) So there goes. Format the darn namenode before doing anything distributed with Hadoop.

bin/hadoop namenode -format

java.io.IOException Call to localhost/127.0.0.1:9000 failed on local exception: java.io.EOFException

This one was bit tricky. After fiddling and struggling for some time found out that Hadoop dependency version used in the JobClient in order to communicate with JobTracker is different from the version that’s present inside the running Hadoop instance. Hadoop uses a homegrown RPC mechanism to communicate with job tracker and name nodes. And it seems certain different Hadoop versions have incompatibilities in this interface.

Now it’s time for some debugging tips.

Debugging Hadoop Local (Standalone) mode

Add debugging options for JVM as follows in conf/hadoop-env.sh.

export HADOOP_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=[DEBUG_PORT]"

Debugging Hive Server

Start Hive with following command line to remote debug Hive.

./hive --service hiveserver --debug[port=[DEBUG_PORT],mainSuspend=y,childSuspend=y]

 

 

END

MongoDB Atlas is the best way to run MongoDB on AWS — highly secure by default, highly available, and fully elastic. Get started free. Brought to you in partnership with MongoDB.

Topics:

Published at DZone with permission of Buddhika Chamith, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}