DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
11 Monitoring and Observability Tools for 2023
Learn more
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Hadoop 101: An Explanation of the Hadoop Ecosystem

Hadoop 101: An Explanation of the Hadoop Ecosystem

Gil Allouche user avatar by
Gil Allouche
·
Dec. 17, 14 · Interview
Like (3)
Save
Tweet
Share
64.75K Views

Join the DZone community and get the full member experience.

Join For Free

Big data is taking off in 2014. More companies than ever are finding uses for it, both for managing everyday business routines and for finding solutions to complex business problems. It’s quickly moving away from it’s position as a hype word and establishing itself as a viable technology for businesses and entities both big and small.

Big data, simply put is the huge amounts of data that is all around us via smart devices, internet usage, social media, chat rooms, mobile apps, phone calls, purchasing history, and numerous other things. Big data technology gathers, stores and analyzes this huge amount of information, which is generally on the petabyte scale.

The technology is completely changing how people look at data and databases and how that data is used. In the military big data is being used to prevent injuries. In the NBA it’s being used to capture and analyze millions of individual movements during a game. The healthcare industry is using big data to fight cancer and heart disease. Car companies are using the technology to implement self-driving, auto-to-auto communicating cars.

Big data is changing the world. What, though, is the software behind all of this? What keeps the big data technology up and running?

Hadoop.

Many people assume that Hadoop is big data. It’s not. There was big data before Hadoop and there continues to be big data without Hadoop. However, Hadoop is a huge player now with big data. There’s a reason it’s synonymous with big data — so many people use it. You have your work cut out to find companies with big data who aren’t using some sort of Hadoop software. What exactly is Hadoop?

It’s a “software library” that gives users the ability to process “large data sets across clusters of computers using simple programming models.” In other words, it gives companies the capability to gather, store and analyze huge sets of data.

Additionally, an important part to understand about Hadoop is that it’s a “software library.” There’s a large library of programs that complement the base Hadoop framework and give companies the specific tools they need to get the desired Hadoop results.

Let’s take a look at the Hadoop ecosystem. This information and more can be found at Hadoop’s website.

There are modules contained within the Hadoop project — Hadoop Common, Hadoop Distributed File System, Hadoop YARN and Hadoop MapReduce. Together these systems give users the tools to support additional Hadoop projects that we’ll mention in below, along with the ability to process large data sets in real time while automatically scheduling jobs and managing cluster resources.

To complement the Hadoop modules there are also a variety of other projects that provide specialized services.

  • Apache Hive: “A data warehouse infrastructure that provides data summarization and ad hoc querying.” It’s a system that gives users the tools to make powerful queries and get results often in real-time.

  • Apache Spark: Apache Spark is a general compute engine that offers fast data analysis on a large scale. Spark is built on HDFS but bypasses MapReduce and instead uses its own data processing framework. Common uses cases for Apache Spark include real-time queries, event stream processing, iterative algorithms, complex operations and machine learning.

  • Apache Ambari: Ambari was created to help manage Hadoop. It offers support for many of the tools in the Hadoop ecosystem including Hive, HBase, Piq, Sqoop and Zookeeper. The tool features a management dashboard that keeps track of cluster health and can help diagnose performance issues.

  • Apache Pig: Pig is a platform with a high-level query language built to handle large data sets

  • Apache HBase: HBase is a non-relational database management system that runs on top of HDFS. It is built to handle sparse data sets common to big data projects.

Other common Hadoop projects include: Avro, Cassandra, Chukwa, Mahout, and Zookeeper.

By implementing Hadoop, users gain access to an amazing amount of tools and resources that allow them to truly personalize their big data experience to fit whatever their business needs may be.

hadoop Big data

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • How to Submit a Post to DZone
  • Unlocking the Power of Elasticsearch: A Comprehensive Guide to Complex Search Use Cases
  • Spring Boot vs Eclipse Micro Profile: Resident Set Size (RSS) and Time to First Request (TFR) Comparative
  • Create a REST API in C# Using ChatGPT

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: