DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • Building a Data Warehouse for Traditional Industry
  • Reporting in Microservices: How To Optimize Performance
  • What Is Data Engineering? Skills and Tools Required
  • Redis-Based Tomcat Session Management

Trending

  • REST vs. Message Brokers: Choosing the Right Communication
  • A Guide to Data-Driven Design and Architecture
  • Podman Desktop Review
  • Agile Estimation: Techniques and Tips for Success
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Hadoop and the mystery of the version number

Hadoop and the mystery of the version number

Adam Diaz user avatar by
Adam Diaz
·
Oct. 15, 14 · Interview
Like (0)
Save
Tweet
Share
9.14K Views

Join the DZone community and get the full member experience.

Join For Free

When I’m working with people on Hadoop I ask what you would think is a simple question. What version of Hadoop are you using? The answer normally is one of several attempts to explain what’s installed including –

Answer Translation
Hortonworks/Cloudera This is my Hadoop Distribution.
Hortonworks 2 I know we aren’t using version 1.
Hadoop 2 I dont know my distro but I’m using Hadoop 2.
Apache someone else is working this. I have no idea.

In reality though it’s not as straight forward as you might think. I think the easiest way to get the most bang for your buck is to simply take a look at the version number of the package installed. So on yum based systems you could simply do

yum list hive\*
Loaded plugins:fastestmirror,priorities
Determining fastest mirrors
Installed Packages
hive.noarch  0.13.0.2.1.1.0-385.el6@HDP-2.1
hive-hcatalog.noarch0.13.0.2.1.1.0-385.el6@HDP-2.1
hive-jdbc.noarch0.13.0.2.1.1.0-385.el6@HDP-2.1
hive-webhcat.noarch  0.13.0.2.1.1.0-385.el6@HDP-2.1
Available Packages
hive-hcatalog-server.noarch  0.13.0.2.1.1.0-385.el6HDP-2.1
hive-metastore.noarch  0.13.0.2.1.1.0-385.el6HDP-2.1
hive-server.noarch0.13.0.2.1.1.0-385.el6HDP-2.1
hive-server2.noarch  0.13.0.2.1.1.0-385.el6HDP-2.1
hive-webhcat-server.noarch0.13.0.2.1.1.0-385.el6HDP-2.1
hivex.i6861.3.3-4.2.el6  base
hivex.x86_641.3.3-4.2.el6  base
hivex-devel.i6861.3.3-4.2.el6  base
hivex-devel.x86_641.3.3-4.2.el6  base

and get back of list of whats installed and whats available. You could also simply query the rpm database:

rpm-qa|grep hadoop
hadoop-2.4.0.2.1.1.0-385.el6.x86_64
hadoop-yarn-proxyserver-2.4.0.2.1.1.0-385.el6.x86_64
hadoop-hdfs-2.4.0.2.1.1.0-385.el6.x86_64
hadoop-yarn-2.4.0.2.1.1.0-385.el6.x86_64
hadoop-mapreduce-2.4.0.2.1.1.0-385.el6.x86_64
hadoop-yarn-resourcemanager-2.4.0.2.1.1.0-385.el6.x86_64
hadoop-libhdfs-2.4.0.2.1.1.0-385.el6.x86_64
hadoop-client-2.4.0.2.1.1.0-385.el6.x86_64
hadoop-mapreduce-historyserver-2.4.0.2.1.1.0-385.el6.x86_64
hadoop-yarn-nodemanager-2.4.0.2.1.1.0-385.el6.x86_64
hadoop-lzo-0.6.0-1.x86_64
hadoop-lzo-native-0.6.0-1.x86_64

If you run SLES you will need to do zypper and on windows look at your add/remove programs dialog on most major newer versions of windows. In the end you are still left with this cryptic string to decode. If you look closely there is a method to the madness and it helps to know this level of detail when working in an area like Hadoop where minor version numbers or a build number could make all the difference.

For example:
package name-version-architecture
hadoop-2.4.0.2.1.1.0-385-.el6.x86_64

The version number in this case is from a Hortonworks distribution so  we have a seven digit (8 places) version number.

package version-HDP Version-build number
2.4.0-2.1.1.0-build 385

It’s important to know both the version of Hadoop and the version of the package you are working on. For example if someone says “I’m working on Hive”. You really need to know what hive version AND what Hadoop version because the two are intimately linked. If someone gives you the hive package string:

hive-0.13.0.2.1.1.0-385.el6.noarch

It’s really not enough information for you to tell what version of Hadoop someone is using. You know they are using HDP 2.1.1.0 so one either asks for the same information on the Hadoop package installed OR goes to the release notes for the distro to decode the distribution version number into the Apache Hadoop version. Each distribution uses a different combination of packages and it pays to know EXACTLY what you are getting when you download a distro. Cloudera has exactly the same issues and their packaging may in fact be even more forthcoming in that they tell you how many patches were applied. Hortonworks does this in the context of their release notes.

package name-package version+CDH version+patches

hadoop-2.3.0+cdh5.1+384

Hopefully now you have a better understanding of Hadoop package versions.

hadoop

Published at DZone with permission of , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Building a Data Warehouse for Traditional Industry
  • Reporting in Microservices: How To Optimize Performance
  • What Is Data Engineering? Skills and Tools Required
  • Redis-Based Tomcat Session Management

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: