Heron Turns Open-Source as Twitter Prioritizes Faster Stream Analytics

DZone 's Guide to

Heron Turns Open-Source as Twitter Prioritizes Faster Stream Analytics

A summary of the history and benefits of Heron, Twitter's open source successor to Apache Storm.

· Big Data Zone ·
Free Resource

You have surely heard of Heron — a stream-processing arrangement that works in real-time. Devised by Twitter, this system was a handy replacement for Apache Storm. This in-house alternative is finally out in the open as Twitter puts an open-source tag on it — two years after being announced.

Figure 1: Putting Heron at the forefront of Twitter’s functional hierarchy

On recapitulating the facts, we infer Twitter’s real motive behind creating Heron in the first place. The foremost requirement was speed, which a ‘real-time stream processing’ hierarchy is capable of providing. Scaling upwards was yet another necessity, resulting in the creation of Heron. Apache Storm was equally effective at times, but the new platform allowed seamless deployment, better management capabilities, easier debugging, and perfect usage of the ‘multitenant cluster’ environment.

That said, Apache Storm was quite fulfilling to begin with after being created by BackType — a Marketing Intelligence company. The former was then bought by Twitter way back in 2011. After the takeover, Storm the open-sourced platform was pushed right to the Apache Foundation. Storm definitely offers a lot of advantages as it boasts an entire ecosystem built around it. Data receipt is easier, but the hierarchy is way harder to decode.

Figure 2: Heron reduces spout latency

Admittedly, Apache Storm was always believed to be an intricate system — giving results only after a continued effort. No wonder Storm was challenged by other renditions — namely Apache Spark and even its very own revised framework for streaming in real-time — regardless of the recent v1.0.

All these factors forced Twitter to look for other avenues, and instead of refurbishing the existing project, the company opted to start from the scratch. It all started with a container and cluster-oriented design. The unique possibilities include jobs and topologies that need to be submitted to the master scheduling system. After processing, the new platform launches the required topology, via a series of usable containers.

Twitter provides us with the flexibility of selecting the desired scheduler with the choices being Apache Aurora, Apache Mesos, or something else. Apache Storm loses out here as one needs to provision the clusters manually — mainly for adding the scales. The best decision made by Twitter was to provide backwards compatibility to Heron pertaining to the Storm API. This was a practical move, as many systems were still using Apache Storm, and the bolts or spouts can now be moved over to the new platform. This concept is similar to using a messaging app like kik which will soon be providing backwards compatibility with obsolete versions — making online kik login easier than before. This messaging app will therefore allow data integration across PC and mobile versions without losing any data.

Coming back to Heron, even the older version of the real-time streaming platform can work with the current system – only with minor modifications — just like a futuristic messaging application. Moreover, people who are still invested in Storm can seamlessly migrate onto Heron with less effort — eliminating the need for a different project.

Figure 3: This is how ‘Heron’ works

The concept of backwards-compatibility allows Twitter to eye an encore of sorts — offering incentives to Storm users. With Heron, expect an increase in efficiency that might be somewhere between two to five times the current rates. The capex and lower opex are subject to major improvements with Heron on-board.

If you are looking for a faster processing system which streams in real-time — check out Twitter’s newest offering.  

analytics, twitter

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}