DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
Building Scalable Real-Time Apps with AstraDB and Vaadin
Register Now

Trending

  • Top 10 Engineering KPIs Technical Leaders Should Know
  • Operator Overloading in Java
  • Automating the Migration From JS to TS for the ZK Framework
  • Revolutionizing Algorithmic Trading: The Power of Reinforcement Learning

Trending

  • Top 10 Engineering KPIs Technical Leaders Should Know
  • Operator Overloading in Java
  • Automating the Migration From JS to TS for the ZK Framework
  • Revolutionizing Algorithmic Trading: The Power of Reinforcement Learning
  1. DZone
  2. Data Engineering
  3. Databases
  4. HBase Schema Introduction for Programmers

HBase Schema Introduction for Programmers

Chase Seibert user avatar by
Chase Seibert
·
Apr. 30, 13 · Interview
Like (0)
Save
Tweet
Share
8.71K Views

Join the DZone community and get the full member experience.

Join For Free

schema design in nosql is very different from schema design in a rdbms. once you get something like hbase up and running , you may find yourself staring blankly at a shell, lost in the possibilities of creating your first table.

you’re probably used to thinking of tables like this:

rowkey title url clicks clicks_twitter clicks_facebook
fcb75-bit.ly/z0pngz some page http://www.example.com 16 13 3
fb499-bit.ly/15c2tlf null null 1 null null

in hbase, this is actually modelled like this:

hbase table shema

notice that each row is basically a linked list, ordered by column family and then column name. this is how it’s laid down on disk, as well. missing columns are free, because there is no space on disk pre-allocated to a null column. given that, it’s reasonable to design a schema where rows have hundreds or thousands of columns.

just as columns are laid down on disk like a linked list, so too are rows. they are put on disk in order by row key. because row keys can by any collection of bytes, ordering of row keys is lexicographical , aka alphabetical. this is in contrast to most rdbms, where rowkeys are integers and ordered as such.

consider the following row key order: 1 < 256 < 43 < 7 . the row key 265 is actually before 43 , because 2 comes before 4 . this is why it’s common in hbase to make at least parts of your row key fixed width, ex: 00000001 < 00000007 < 00000043 < 00000256 . however, now we have another problem known as hot spotting.

if all your row keys start with the same value, then they will all be going to the same region, and thus the same server. this could easily happen for monotonically increasing row keys, such as traditional rdbms auto-incrementing pks, or for timestamps. this can cause all the load for a big write job to block waiting for a single region server, versus spreading out the writes to the whole cluster. a common way to avoid this is to prefix row keys, for example by the md5 hash of the customer id.

rows can most efficiently be read back by scanning for consecutive blocks. say you have a table with a rowkey of customer-date-user . you can easily read back all the data for a given customer and date range using the prefix customer-first-part-of-date , but you can’t easily read back dates ranges for all users at once without scanning all the rows. if you reverse the row key and use customer-user-date , you have the reverse problem. so you want to think about what your primary read pattern is going to be when designing your keys.

say your primary read patten is going to be reading off the most recent rows. depending on the format of the dates in your row keys, you may end up with the more recent data at the end of the table. for example: 20130101 > 20130102 > 20130303 . instead, a common pattern is to invert your dates, such as 79869898 > 79869897 > 798698986 . this may not apply if you will know at run time the range of values that will be the most recent, i.e. the last 30 days.

for more about hbase schema design, i recommend the online hbase reference book , as well as the excellent hbase: the definitive guide .

Database Schema

Published at DZone with permission of Chase Seibert, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Trending

  • Top 10 Engineering KPIs Technical Leaders Should Know
  • Operator Overloading in Java
  • Automating the Migration From JS to TS for the ZK Framework
  • Revolutionizing Algorithmic Trading: The Power of Reinforcement Learning

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: