Why Imgur Dropped MySQL in Favor of HBase

Imgur engineer Carlos Espinoza describes how Imgur made the switch from MySQL to HBase.

In a Medium post, Imgur engineer Carlos Espinoza described how Imgur made the switch from MySQL to HBase in order to better deal with the high number of notifications Imgur was implementing. For years, Imgur had been using MySQL as their primary store while HBase handled other parts in their stack. Due to the complexity of the notifications they were attempting to create, Espinoza said that they decided to build their system on top of the wide-column value store.

HBase is an open-source Apache project that uses Hadoop at its core. On DB-Engines Ranking, it holds the fifteenth place behind Elasticsearch and above Hive.

MySQL is an open-source RDBMS with customers that include Pinterest, Apple, Dell, and Motorola. First created in 1995, it's been the second most used database system since 2013, behind the Oracle RDBMS.

Some of the reasons Imgur chose HBase over MySQL include:

  • Less restrictive schema — Because notifications on Imgur are composed of a handful of events, it would result in "NULL" values in a MySQL table. HBase doesn't have that issue, although Espinoza said that the table may appear sparse.

  • Atomic increments — Imgur's notification delivery logic is more "lightweight" because these increments only send a notification if a specific milestone is crossed.

  • "Fast" table scans

  • Linear scalability

  • Replication

The restrictive schema of an RDBMS is a commonly cited reason for departing a SQL store in favor of a NoSQL alternative. Telerik notes that as the needs surrounding data grow and change, so too may the need for scalability; rather than throw more horsepower on top of an RDBMS, it may make more sense to scale horizontally with an NoSQL database.

Previously, Imgur only had two types of notifications, and so it was much easer to manage their data. Since they're adding on several notifications with conditional logic, e.g. a user will get a notification when their post earns 100 likes, but not before. As more notification types were added, the number of columns grew, and it became easier to drop MySQL in favor of a NoSQL organization that was more flexible.

Check out Espinoza's full post here.

