{{announcement.body}}
{{announcement.title}}

What Is HBase in Hadoop NoSQL?

DZone 's Guide to

What Is HBase in Hadoop NoSQL?

In this article, take a look at HBase in Hadoop NoSQL and see the characteristics and architecture of HBase.

· Big Data Zone ·
Free Resource

HBase is a column-oriented data store that sits on top of the Hadoop Distributed File System and provides random data lookup and updates for big data consultants. Hadoop Distributed File System is based on “Write Once Read Many” architecture which means that files once written to HDFS storage layer cannot be modified but only be read any number of times. However, HBase provides a schema on top of the HDFS files to access and update these files any number of times.

HBase Characteristics

Strong Consistency

HBase provides strong consistency for both Read/Write which means you will always get the latest data in a read operation and also write operation will not be completed unless all the replicas have been updated.

Horizontally Scalable

HBase provides automatic sharding using the concepts of regions, which are distributed over the cluster. Whenever the table size becomes too large to accommodate the data, it is auto sharded and distributed among multiple machines.

Fault-Tolerant

HBase provides automatic region failover in case of failures.

HDFS/MapReduce Integration

HBase is based on top of HDFS and can be integrated with MapReduce programs to act as a source and sinks.

Java API/Rest/Thrift API

HBase provides Java APIs as well as Rest/Thrift APIs for non-java endpoints

Query Optimization

HBase has an inbuilt block cache and bloom filter for query optimization.

When Not to Use HBase?

  • When your data is not big enough. HBase is suited for data that can be represented in billions of rows that cannot be accommodated on the traditional RDBMS database.
  • When your data is coming at a constant rate and not expected to grow in the future.
  • When you don’t care about transaction controls, triggers, secondary indexing, and many other features that are supported by the traditional database.

HBase Architecture

HBase has Master-Slave architecture in which we have one HBase Master also known as HMaster and multiple slaves that are called region servers or HRegionServers.

Regions: Table in HBase are split over multiple regions and these regions are distributed over multiple machines in the cluster.

HBase Master: HBase is responsible for assigning regions to region servers, provide admin console (to create, update and delete table) and control the failures. In the case of reading requests, HMaster receives the client request and forwards it to the appropriate region server.

Region Server Slaves: Region servers run on all worker nodes and serves a set of regions. Region Servers consist of block cache which holds frequently access data to serve read requests more efficiently. Region Server also consists of memstore that is a write cache to cache new data that is not yet written to disk. Data is written to multiple Hfile on the disk of region servers.

ZooKeeper: HBase uses zookeeper for coordination and failure recovery. A zookeeper holds the configuration information about HBase Master and Region Servers. The client must access zookeeper first in order to connect with the HBase cluster. ZKquoram is a zookeeper daemon that monitors for failures and repair failed nodes. So Zookeeper is an integral part of HBase architecture that maintains all the coordination and synchronization in the HBase cluster.

HBase Data Model

HBase data model


HBase Tables: It is a collection of rows and these tables are spread over distributed regions.

HBase Row: it represents a single entity in an HBase table.

Row Key: It is just like a primary key that is used to uniquely identify each row in HBase table.

Columns: Columns represent the attributes of an entity. For example, In the customer HBase table, columns could be customer name, age, phone no, etc.

Column Family: All the columns that exhibit some kinds of same qualities can be clubbed together in same column family and these columns are stored on Hadoop Distributed File System as Hfile.

Getting Started With HBase

We will create the below table named as an employee using HBase shell command and then by using Java API. Employee table has two column families namely personal column family, which represents personal information such as name, age, and professional column family, which represent professional information such as salary and designation.

HBase Shell Commands

Java
 




xxxxxxxxxx
1
33


 
1
//Create table employee with column family personal and professional
2
 
          
3
create 'employee', ‘personal', 'professional'
4
 
          
5
//Insert data into table employee
6
 
          
7
put 'employee','1','personal:name','John'
8
 
          
9
put 'employee','1','personal:age','24'
10
 
          
11
put 'employee','1','professional:designation','Manager'
12
 
          
13
put 'employee','1','professional:salary','7000'
14
 
          
15
put 'employee','1','personal:name','Mary'
16
 
          
17
put 'employee','1','personal:age','30'
18
 
          
19
put 'employee','1','professional:designation','Developer'
20
 
          
21
put 'employee','1','professional:salary','4000'
22
 
          
23
put 'employee','1','personal:name','Albert'
24
 
          
25
put 'employee','1','personal:age','45'
26
 
          
27
put 'employee','1','professional:designation','Director'
28
 
          
29
put 'employee','1','professional:salary','12000'
30
 
          
31
//Read data from employee
32
 
          
33
get 'employee', 'row1', {COLUMN  'personal:name'}



Java API

Create Table

Java
 




xxxxxxxxxx
1
21


 
1
public class CreateEmployeeTable {
2
 
          
3
   public static void main(String[] args {
4
 
          
5
      Configuration conf = HBaseConfiguration.create();
6
 
          
7
      HBaseAdmin hAdmin = new HBaseAdmin(conf);
8
 
          
9
      HTableDescriptor tDescriptor = new
10
 
          
11
      HTableDescriptor(TableName.valueOf("employee"));
12
 
          
13
      tableDescriptor.addFamily(new HColumnDescriptor("personal"));
14
 
          
15
      tableDescriptor.addFamily(new HColumnDescriptor("professional"));
16
 
          
17
      hAdmin.createTable(tableDescriptor);
18
 
          
19
      System.out.println(" Employee Table created ");
20
 
          
21
   }}



Put Data

Java
 




xxxxxxxxxx
1
27


 
1
public class InsertEmployeeData{
2
  
3
   public static void main(String[] args) {
4
 
          
5
      Configuration conf = HBaseConfiguration.create();
6
 
          
7
      HTable hTable = new HTable(config, "employee");
8
 
          
9
      Put p = new Put(Bytes.toBytes("row1")); 
10
 
          
11
      p.add(Bytes.toBytes("personal"),
12
 
          
13
      Bytes.toBytes("name"),Bytes.toBytes("John"));
14
 
          
15
      p.add(Bytes.toBytes("personal"),Bytes.toBytes("age"),Bytes.toBytes("24"));
16
 
          
17
      p.add(Bytes.toBytes("professional"),Bytes.toBytes("designation"),Bytes.toBytes("Manager"));
18
 
          
19
      p.add(Bytes.toBytes("professional"),Bytes.toBytes("salary"),Bytes.toBytes("7000"));
20
 
          
21
      hTable.put(p);
22
 
          
23
      System.out.println("Employee row inserted");
24
 
          
25
      hTable.close();
26
 
          
27
   }}



Update Table

Java
 




xxxxxxxxxx
1
17


 
1
public class UpdateEmployeeData{
2
 
          
3
   public static void main(String[] args){
4
 
          
5
      Configuration hconfig = HBaseConfiguration.create();
6
 
          
7
      HTable hTable = new HTable(hconfig, "employee");
8
 
          
9
      Put p = new Put(Bytes.toBytes("row1"));
10
 
          
11
      p.add(Bytes.toBytes("personal"),Bytes.toBytes("name"),Bytes.toBytes("jim"));
12
 
          
13
      hTable.put(p);
14
 
          
15
      hTable.close();
16
 
          
17
   }}



Read Data

Java
 




xxxxxxxxxx
1
21


 
1
public class ReadEmployeeData{
2
 
          
3
   public static void main(String[] args){
4
 
          
5
      Configuration hconfig = HBaseConfiguration.create();
6
 
          
7
      HTable table = new HTable(hconfig, "employee");
8
 
          
9
      Get g = new Get(Bytes.toBytes("row1"));
10
 
          
11
      byte [] value = result.getValue(Bytes.toBytes("personal"),Bytes.toBytes("name"));
12
 
          
13
      byte [] value1 = result.getValue(Bytes.toBytes("personal"),Bytes.toBytes("age"));
14
 
          
15
      String name = Bytes.toString(value);
16
 
          
17
      String city = Bytes.toString(value1);   
18
 
          
19
      System.out.println("name: " + name + " age: " + age);
20
 
          
21
   }}



Alter Table

Java
 




x
15


 
1
public class AddColoumnFsmily{
2
 
          
3
   public static void main(String args[]){
4
 
          
5
      Configuration conf = HBaseConfiguration.create();
6
 
          
7
      HBaseAdmin admin = new HBaseAdmin(conf);
8
 
          
9
      HColumnDescriptor columnDescriptor = new HColumnDescriptor("address");
10
 
          
11
      admin.addColumn("employee", columnDescriptor);
12
 
          
13
      System.out.println("column added");
14
 
          
15
   }}



Conclusion

HBase is an ideal choice when your big data is already stored on Hadoop. HBase mitigates the drawbacks of HDFS system by providing random read/writes and updates. It is a distributed, horizontally scalable, fault-tolerant data store that works pretty well with Hadoop Cluster.

Topics:
big data, hadoop

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}