Over a million developers have joined DZone.

Hadoop and the mystery of the version number

· Big Data Zone

Compliments of Zaloni: Download free eBook "Architecting Data Lakes" to learn the key to building and managing a big data lake, brought to you in partnership with Zaloni.

When I’m working with people on Hadoop I ask what you would think is a simple question. What version of Hadoop are you using? The answer normally is one of several attempts to explain what’s installed including –

Answer Translation
Hortonworks/Cloudera This is my Hadoop Distribution.
Hortonworks 2 I know we aren’t using version 1.
Hadoop 2 I dont know my distro but I’m using Hadoop 2.
Apache someone else is working this. I have no idea.

In reality though it’s not as straight forward as you might think. I think the easiest way to get the most bang for your buck is to simply take a look at the version number of the package installed. So on yum based systems you could simply do

yum list hive\*
Loaded plugins:fastestmirror,priorities
Determining fastest mirrors
Installed Packages
hive.noarch  0.13.0.2.1.1.0-385.el6@HDP-2.1
hive-hcatalog.noarch0.13.0.2.1.1.0-385.el6@HDP-2.1
hive-jdbc.noarch0.13.0.2.1.1.0-385.el6@HDP-2.1
hive-webhcat.noarch  0.13.0.2.1.1.0-385.el6@HDP-2.1
Available Packages
hive-hcatalog-server.noarch  0.13.0.2.1.1.0-385.el6HDP-2.1
hive-metastore.noarch  0.13.0.2.1.1.0-385.el6HDP-2.1
hive-server.noarch0.13.0.2.1.1.0-385.el6HDP-2.1
hive-server2.noarch  0.13.0.2.1.1.0-385.el6HDP-2.1
hive-webhcat-server.noarch0.13.0.2.1.1.0-385.el6HDP-2.1
hivex.i6861.3.3-4.2.el6  base
hivex.x86_641.3.3-4.2.el6  base
hivex-devel.i6861.3.3-4.2.el6  base
hivex-devel.x86_641.3.3-4.2.el6  base

and get back of list of whats installed and whats available. You could also simply query the rpm database:

rpm-qa|grep hadoop
hadoop-2.4.0.2.1.1.0-385.el6.x86_64
hadoop-yarn-proxyserver-2.4.0.2.1.1.0-385.el6.x86_64
hadoop-hdfs-2.4.0.2.1.1.0-385.el6.x86_64
hadoop-yarn-2.4.0.2.1.1.0-385.el6.x86_64
hadoop-mapreduce-2.4.0.2.1.1.0-385.el6.x86_64
hadoop-yarn-resourcemanager-2.4.0.2.1.1.0-385.el6.x86_64
hadoop-libhdfs-2.4.0.2.1.1.0-385.el6.x86_64
hadoop-client-2.4.0.2.1.1.0-385.el6.x86_64
hadoop-mapreduce-historyserver-2.4.0.2.1.1.0-385.el6.x86_64
hadoop-yarn-nodemanager-2.4.0.2.1.1.0-385.el6.x86_64
hadoop-lzo-0.6.0-1.x86_64
hadoop-lzo-native-0.6.0-1.x86_64

If you run SLES you will need to do zypper and on windows look at your add/remove programs dialog on most major newer versions of windows. In the end you are still left with this cryptic string to decode. If you look closely there is a method to the madness and it helps to know this level of detail when working in an area like Hadoop where minor version numbers or a build number could make all the difference.

For example:
package name-version-architecture
hadoop-2.4.0.2.1.1.0-385-.el6.x86_64

The version number in this case is from a Hortonworks distribution so  we have a seven digit (8 places) version number.

package version-HDP Version-build number
2.4.0-2.1.1.0-build 385

It’s important to know both the version of Hadoop and the version of the package you are working on. For example if someone says “I’m working on Hive”. You really need to know what hive version AND what Hadoop version because the two are intimately linked. If someone gives you the hive package string:

hive-0.13.0.2.1.1.0-385.el6.noarch

It’s really not enough information for you to tell what version of Hadoop someone is using. You know they are using HDP 2.1.1.0 so one either asks for the same information on the Hadoop package installed OR goes to the release notes for the distro to decode the distribution version number into the Apache Hadoop version. Each distribution uses a different combination of packages and it pays to know EXACTLY what you are getting when you download a distro. Cloudera has exactly the same issues and their packaging may in fact be even more forthcoming in that they tell you how many patches were applied. Hortonworks does this in the context of their release notes.

package name-package version+CDH version+patches

hadoop-2.3.0+cdh5.1+384

Hopefully now you have a better understanding of Hadoop package versions.

Zaloni, the data lake company, provides data lake management and governance software and services. Learn more about Bedrock and Mica

Topics:

Published at DZone with permission of Adam Diaz, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}