DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Databases
  4. An Introduction to HBase

An Introduction to HBase

int his article, let's take a look at an introduction to HBase and also explore how to create 3 node HBase clusters.

Nitin Ranjan user avatar by
Nitin Ranjan
·
Sep. 19, 18 · Analysis
Like (2)
Save
Tweet
Share
5.62K Views

Join the DZone community and get the full member experience.

Join For Free

in our last two articles, we talked about the hdfs cluster and zookeeper cluster. which is needed for deploying opentsdb in clustered mode. continuing to the series, we are going to talk about hbase, which will be used by opentsdb in the cluster to store data.

hbase is a column-oriented nosql database management system that runs on top of hadoop distributed file system (hdfs) .

it is a part of the hadoop ecosystem that provides random real-time read/write access to data in the hadoop file system.

one can store the data in hdfs either directly or through hbase. data consumer reads/accesses the data in hdfs randomly using hbase. hbase sits on top of the hadoop file system and provides read and write access.

it is well suited for sparse data sets, which are common in many big data use cases. like most of other apache projects, it is also mainly written in java. it can store the huge amount of data from terabytes to petabytes. hbase is not a relational database system. unlike the relational database system, it does not support a structured query language like sql. it is built for low latency operations, which is having some specific features compared to traditional relational models.

storage mechanism in hbase:

hbase is a column-oriented database. it stores data in tables and sorted by rowid. in table schema, only column family is defined. it is a key-value pair. a table has multiple column families and each column family can have any number of columns. hbase stores data on disk in a column-oriented format, it is distinctly different from traditional columnar databases.

architecture:

in hbase, the tables are divided into regions and served by region servers.

the main component of hbase are:

  • master server usage apache zookeeper and assigns region to the region server
  • responsible for load balancing. it will reduce the load from busy servers and assign that region to less occupied servers.
  • responsible for schema changes (hbase table creation, the creation of column families etc).
  • interface for creating, deleting, updating tables
  • monitor all the region servers in the cluster.

hbase region:

the hbase tables are the tables that are split horizontally into regions and are managed by region server.

hbase region server:

regions are assigned to a node in the cluster called region server. region server manages region. when data size grows beyond the limit, to reduce the load on one region server. hbase automatically splits the table and distributes the load to another region server. a single region server can server around 1000 regions.

the process of splitting tables into regions is called sharding and it is done automatically.

role of region server:

  • it communicates with the client and handles data-related operation
  • decide the size of the region
  • splitting regions automatically
  • handling read and writes requests
  • handle the read and write request for all the regions under it.

hfile:

hfile is a file-based data structure that is used to store data in hbase. it is key/value type of file data structure. a file of sorted key/value pairs. both keys and values are byte arrays. this data structure supports random read and writes operation on the table. using key it will update the values on the table.

memstore:

memstore is a write buffer. before permanent write data is a buffered in memstore. when memstore is full it content is flushed to hfile. it doesn't write in existing hfile instead it creates a new one.

hdfs:

hbase uses hdfs to store data. for more info please refer our article: an introduction to hdfs .

zookeeper:

hbase uses zookeeper as a centralized monitoring server to maintain configuration information. it also provides distributed synchronization. for more info, please refer to our last article: an introduction to zookeeper .

deploy hbase:

for deploying hbase, we will use the harisekhon/hbase:1.2 docker image.

hbase-site.xml:

create hbase-site.xml file in /root/hadoop/ location in all 3 vm's.

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://namenode:8020/hbase</value>
  </property>

  <property>
 <name>hbase.zookeeper.property.clientport</name>
  <value>2181</value>
  </property>

  <property>
  <name>hbase.zookeeper.quorum</name>
   <value>zoo1,zoo2,zoo3</value>
   </property>
<property>
  <name>hbase.zookeeper.session.timeout</name>
   <value>60000</value>
   </property>

 <property>
    <name>hbase.status.published</name>
    <value>false</value>
  </property>
 <property>
    <name>hbase.region.replica.replication.enabled</name>
    <value>true</value>
 </property>

</configuration>

replace zoo1,zoo2,zoo3 with respective zookeeper ip.

hbase:

hbase on vm 1:

docker run -dit  --name hbase1 -p 8080:8080 -p 8085:8085 -p 9090:9090 -p 9095:9095   -p 16000:16000 -p 16010:16010 -p 16201:16201 -p 16301:16301 -v /root/hadoop/hbase-site.xml:/hbase-1.2.6/conf/hbase-site.xml --env-file hbase_env --network generic-class-net
 -h hbase1.generic-class-net harisekhon/hbase:1.2

hbase on vm 2:

docker run -dit  --name hbase2 -p 8080:8080 -p 8085:8085 -p 9090:9090 -p 9095:9095   -p 16000:16000 -p 16010:16010 -p 16201:16201 -p 16301:16301 -v /root/hadoop/hbase-site.xml:/hbase-1.2.6/conf/hbase-site.xml --env-file hbase_env --network generic-class-net
 -h hbase2.generic-class-net harisekhon/hbase:1.2

hbase on vm 3:

docker run -dit  --name hbase3 -p 8080:8080 -p 8085:8085 -p 9090:9090 -p 9095:9095   -p 16000:16000 -p 16010:16010 -p 16201:16201 -p 16301:16301 -v /root/hadoop/hbase-site.xml:/hbase-1.2.6/conf/hbase-site.xml --env-file hbase_env --network generic-class-net
 -h hbase3.generic-class-net harisekhon/hbase:1.2

once all the services are deployed, you can see the hbase status on http://<vm1 | vm2 | vm3 ip>:16010/master-status .

in this article, we studied hbase and how to create 3 node hbase clusters. in the next article, we will study the opentsdb and will add it to our hdfs , zookeeper , and hbase cluster.

Database Relational database Big data Docker (software) File system hadoop sql

Published at DZone with permission of Nitin Ranjan, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • How and Why You Should Start Automating DevOps
  • 3 Ways That You Can Operate Record Beyond DTO [Video]
  • GitOps: Flux vs Argo CD
  • Spring Cloud: How To Deal With Microservice Configuration (Part 1)

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: