Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Apache Storm vs WSO2 Stream Processor, Part 2

DZone's Guide to

Apache Storm vs WSO2 Stream Processor, Part 2

We wrap up this two-part series by looking at thirteen scenarios which data scientists may face, and how these two platforms compare in each.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Welcome back! If you missed Part 1, you can check it out here.

5. When to Use What From a Use Case Perspective

Let’s look 13 streaming analytics patterns, one by one, and evaluate to what extent they are applicable for Apache Storm and WSO2 Stream Processor.

Pattern 1: Preprocessing

Preprocessing is often done as a projection from one data stream to the other or through filtering. Some of the potential operations include:

  • Filtering and removing some events.

  • Reshaping a stream by removing, renaming, or adding new attributes to a stream.

  • Splitting and combining attributes in a stream.

  • Transforming attributes.

When implementing this type of use case, the most important feature we should consider is the programming model. Here, Storm has a competitive advantage over WSO2 SP because of Storm’s ability to use a lower level language to specify the filtering task and the ease of specifying distributed processing.

Pattern 2: Alerts and Thresholds

This pattern detects a condition and generates alerts based on a condition (e.g., alarm on high temperature). These alerts can be based on a simple value or more complex conditions such as rate of increase, etc. To implement such alert generation scenarios, the types of operators in the stream processor query language plays a major role since we should be able to specify complex alert conditions with the query language. Here, WSO2 SP has a competitive advantage over Storm because of WSO2 SP’s use of the Siddhi complex event processing library which can specify complex event pattern matching queries using its Streaming SQL capabilities. We call a language that enables users to write SQL-like queries to query streaming data a “Streaming SQL” language. 

Comparision: The powerful Siddhi query language enables users to specify complex stream processing queries quite easily with Streaming SQL. But if someone wants to write code for their logic (such as simple filtering types of applications) the recommended choice would be to use Storm. Writing Java code which does time windows and temporal event sequence patterns is quite complicated [17]. Hence, in such scenarios, WSO2 Stream Processor would be a good choice compared to Storm. Furthermore, WSO2 SP’s built-in dashboard capabilities enable the creation of dashboards with alert messages.

Pattern 3: Simple Counting and Counting With Windows

This pattern includes aggregate functions such as Min, Max, Percentiles, etc. They can be counted without storing any data (e.g., counting the number of failed transactions). Counts are often used with a time window attached to it (e.g., failure count last hour). Distributed processing capabilities are essential for certain types of stream processing applications. When the volume of the data to handle becomes gigabytes per second, the stream processing system needs to scale into multiple nodes. Travel time prediction of each individual vehicle in a very large populated city [29], processing of streaming image data received by large-field radio telescopes [30], etc., are some examples for applications of this category. This kind of application may not include complicated event processing logic, rather, it has to deal with sheer volumes of data gathered from a large number of sensors.

Comparision: Apache Storm is considerably strong in implementing such use cases since it has the capability of operating as a distributed stream processor. Although both Storm and WSO2 SP has windowing capabilities, windows in Storm has to be implemented with first principles whereas in WSO2 SP windows are available with its Streaming SQL. Use of Streaming SQL provides the advantages of usability and portability as discussed in this talk. Hence the better option would be to use WSO2 SP.

Pattern 4: Joining Event Streams

Combining data from two sensors, and detecting the proximity of two vehicles, combining data from a football player’s foot sensors and the football’s sensors to track the movement of the football among the players of a match are some examples for such use cases. Both Storm and WSO2 SP have equal capabilities of joining streams.

Pattern 5: Data Correlation, Missing Events, and Erroneous Data

In this use case, other than joining, we also need to correlate the data within the same stream. This is because different data sensors can send events at different rates, and many use cases require this fundamental operator. Some of the possible sub use cases can be listed as follows:

  • Matching up two data streams that send events at different speeds.

  • Detecting a missing event in a data stream (e.g. detect a customer request that has not been responded to within one hour of its reception).

  • Detecting erroneous data.

Comparision: WSO2 SP has a competitive advantage over Storm for this use case because the Siddhi query language has out-of-the-box support for implementing all the above use cases whereas with Storm custom code has to be written.

Pattern 6: Interacting With Databases

The need for interacting with databases arises when we need to combine the real-time data against the historical data stored in a disk. Some of the examples include:

  • When a transaction happened, look up the age using the customer ID from customer database to be used for fraud detection (enrichment).

  • Checking a transaction against blacklists and whitelists in the database.

  • Receive input from the user.

Comparision: WSO2 SP has built-in extensions and a fine-tuned event persistence layer for interacting with databases. For example, the RDBMS event table extension enables an RDBMS such as MySQL, H2, or Oracle to be paired with the stream processing application. Similar extensions exist for accessing NoSQL databases such as HBASECassandraMongoDB, etc. This enables out-of-the-box event stream persistence with WSO2 Stream Processor. However, with Storm developer, one has to create custom code to interact with databases [28].

Pattern 7: Detecting Temporal Event Sequence Patterns

It is a quite common use case of streaming analytics to detect a sequence of events arranged in time. A simple example of such a use case is the following credit card fraud detection scenario. A thief, having stolen a credit card, would try a smaller transaction to make sure it works before he does a large transaction. Here, the small transaction followed by a large transaction is a temporal sequence of events arranged in time and can be detected using a regular expression written on top of an event sequence.

Comparision: WSO2 SP’s Siddhi query language has out-of-the-box support for detecting such scenarios through its temporal event pattern detection capabilities. But with Storm, the user has to custom implement this feature.

Pattern 8: Tracking

Tracking corresponds to following something over space and time and detecting given conditions. For example, tracking wildlife, making sure they are alive and making sure they have not been sent to the wrong destinations. Both WSO2 SP and Storm can be equally applied to this use case.

Pattern 9: Trend Detection

Detecting patterns from time series data and bringing them into operator attention are common use cases. Trends include rise, fall, turn, outliers, complex trends like triple bottom, etc. WSO2 SP’s built-in event sequence pattern detection capabilities enable developers to create event trend detection and easily implement it with WSO2 SP as compared to Storm.

Pattern 10: Running the Same Query in Batch and Real-Time Pipelines

In this scenario, we run the same query in both real-time and batch pipelines. This is also known as Lambda Architecture. Nathan Marz who created Apache Storm also came up with the concept of Lambda Architecture. The Lambda Architecture is an approach to building stream processing applications on top of MapReduce and Storm or similar systems [32][33]. Here an immutable sequence of records is captured and fed into a batch system and a stream processing system in parallel. The same query is implemented twice, once in a batch system and once in a stream processing system. The results from both systems are used to produce a complete answer.

Both Storm and WSO2 SP have the capability to implement Lambda architecture.

Pattern 11: Detecting and Switching to Detailed Analysis

This scenario detects a condition which suggests some anomaly and further analyzes it using historical data. Use basic rules to detect fraud scenarios and then pull out all transactions done against that credit card for a larger time period from a batch pipeline and run a detailed analysis.

Comparision: WSO2 SP has the capability of specifying the complex event pattern matching queries using its Siddhi query language. Furthermore, WSO2 SP’s built-in capabilities to deal with event stores allows it to query-specific information. With Storm, these features have to be custom coded.

Pattern 12: Using an ML Model

Often we face the scenario of training a model (often a Machine Learning model) and then using it with the real-time pipeline to make decisions. For example, you can build a model using R and export it as PMML. Some examples are fraud detection, segmentation, predicting the next value, predict churn, etc.

Comparision: In order to implement such functionality, WSO2 SP’s streaming machine learning extension can be easily utilized to load a model in PMML format and conduct predictions. However, with Storm, the user will have to custom implement the complete functionality from scratch.

Pattern 13: Online Control

Use cases such as autopilot, self-driving, and robotics, etc., are examples of situations where we need to control things online. These may involve problems like current situation awareness, predicting next value(s), and deciding on corrective actions, etc.

Comparision: Similar to Pattern 12, implementing such online control typically requires the use of machine learning techniques. With its streaming machine learning extension, WSO2 SP can easily enable implementing online control use cases compared to Storm.

6. Conclusion

This two-part article conducted a side-by-side comparison between the features of Apache Storm and WSO2 Stream Processor. Next, it discussed how these features get applied in 13 streaming analytics patterns. The results of this study indicate that WSO2 Stream Processor and Apache Storm have their own pros and cons. Table II summarizes their applicability from a use case point of view.

Table II: When to Use What?

Use Case Apache Storm WSO2 Stream Processor

Preprocessing

Alerts and Thresholds

Simple Counting and Counting with Windows

Joining Event Streams

Data Correlation, Missing Events, and Erroneous Data

Interacting with Databases

Detecting Temporal Event Sequence Patterns

Tracking

Trend detection

Running the same Query in Batch and Real-time Pipelines

Detecting and switching to Detailed Analysis

Using an ML Model

Online Control

References

[1]    Apache Software Foundation (2015), Apache Stormhttp://storm.apache.org/

[2]   Forrester (2014), The Forrester Wave™: Big Data Streaming Analytics Platforms, Q3 
        2014
https://www.forrester.com/report
        /The+Forrester+Wave+Big+Data+Streaming+Analytics+Platforms+Q3+2014/-/E-
        RES113442


[3]   WSO2 (2018), Analytics Solutionshttps://wso2.com/analytics/solutions/

[4]   De Silva, R. and Dayarathna, M.(2017), Processing Streaming Human Trajectories with 
        WSO2 CEP
https://www.infoq.com/articles/smoothing-human-trajectory-streams

[5]   WSO2 (2017), Video Analytics: Technologies and Use Cases
        http://wso2.com/whitepapers/innovating-with-video-analytics-technologies-and-use-cases

[6]   WSO2 (2015), Fraud Detection and Prevention: A Data Analytics 
        Approach
http://wso2.com/whitepapers/fraud-detection-and-prevention-a-data-analytics-
        approach


[7]   WSO2 (2018), WSO2 Helps Safeguard Stock Exchange via Real-Time Data Analysis and         Fraud Detectionhttp://wso2.com/casestudies/wso2-helps-safeguard-stock-exchange-via-
        real-time-data-analysis-and-fraud-detection


[8]   Apache Software Foundation (2015), Trident Tutorialhttp://storm.apache.org/releases
        /1.1.2/Trident-tutorial.html


[9]   Luckham, D.(2016), Proliferation of Open Source Technology for Event Processing
        http://www.complexevents.com/2016/06/15/proliferation-of-open-source-technology-for-
        event-processing/


[10]  Zapletal, P.(2016), Comparison of Apache Stream Processing Frameworks: Part 1,  http://www.cakesolutions.net/teamblogs/comparison-of-apache-stream-processing-frameworks-part-1

[11]   WSO2 (2017), [WSO2Con USA 2017] Scalable Real-time Complex Event Processing at Uberhttp://wso2.com/library/conference/2017/2/wso2con-usa-2017-scalable-real-time-complex-event-processing-at-uber/

[12]   Apache Software Foundation (2015), Storm SQL integrationhttp://storm.apache.org/releases/2.0.0-SNAPSHOT/storm-sql.html

[13]   GitHub (2018), siddhi-execution-reorderhttps://github.com/wso2-extensions/siddhi-execution-reorder

[14]   Microsoft (2018), Stream Analytics Documentationhttps://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-comparison-storm

[15]    Apache Software Foundation (2015), Apache Stormhttp://storm.apache.org/about/multi-language.html

[16]   Tsai, B. (2014), Fault Tolerant Message Processing in Stormhttps://bryantsai.com/fault-tolerant-message-processing-in-storm-6b57fd303512

[17]   Blogger (2015), Why We need SQL like Query Language for Realtime Streaming Analytics?http://srinathsview.blogspot.com/2015/02/why-we-need-sql-like-query-language-for.html 

[18]   GitHub (2018), WSO2 Siddhihttps://github.com/wso2/siddhi

[19]   Apache Software Foundation (2015), Apache Stormhttp://storm.apache.org/releases/current/Fault-tolerance.html

[20]   WSO2 (2018), Introduction - Stream Processor 4.0.0https://docs.wso2.com/display/SP400/Introduction

[21]    Andrade, H.C.M. and Gedik, B. and Turaga, D.S. (2014), Fundamentals of Stream Processing: Application Design, Systems, and Analytics, 9781107434004, Cambridge University Press

[22]   GitHub (2018), Siddhi Query Guide - Partitionhttps://wso2.github.io/siddhi/documentation/siddhi-4.0/#partition

[23]   Apache Software Foundation (2015), Resource Aware Schedulerhttp://storm.apache.org/releases/1.1.2/Resource_Aware_Scheduler_overview.html


[24]    Apache Software Foundation (2015), Storm State Managementhttp://storm.apache.org/releases/1.1.2/State-checkpointing.html


[25]   Apache Software Foundation (2018), Interface IRichSpouthttp://storm.apache.org/releases/1.1.2/javadocs/org/apache/storm/topology/IRichSpout.html

[26]   Bigml (2013), Machine Learning From Streaming Data: Two Problems, Two Solutions, Two Concerns, and Two Lessonshttps://blog.bigml.com/2013/03/12/machine-learning-from-streaming-data-two-problems-two-solutions-two-concerns-and-two-lessons/

[27]   SAP (2017), Forrester Research Names SAP in Leaders Category for Streaming Analyticshttps://reprints.forrester.com/#/assets/2/308/%27RES136545%27/reports

[28]   Apache Software Foundation (2015), Storm JDBC Integrationhttp://storm.apache.org/releases/1.1.2/storm-jdbc.html


[29]    T. Hunter, T. Das, M. Zaharia, P. Abbeel and A. M. Bayen, Large-Scale Estimation in Cyberphysical Systems Using Streaming Data: A Case Study With Arterial Traffic Estimation, in IEEE Transactions on Automation Science and Engineering, vol. 10, no. 4, pp. 884-898, Oct. 2013. doi: 10.1109/TASE.2013.2274523

[30]    A. Biem, B. Elmegreen, O. Verscheure, D. Turaga, H. Andrade and T. Cornwell, A streaming approach to radio astronomy imaging, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, 2010, pp. 1654-1657.

[31]    Pathirage, M.(2018), Kappa Architecturehttp://milinda.pathirage.org/kappa-architecture.com/
 
[32]    MapR Technologies (2018), Architecturehttps://mapr.com/developercentral/lambda-architecture/

[33]    Hausenblas, M., Bijnens, N. (2017), Lambda Architecturehttp://lambda-architecture.net/

[34]    Hortonworks (2018), Apache Storm Component Guidehttps://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_storm-component-guide/content/storm-trident-intro.html

[35]    Weinberger, Y. (2015), Exactly-Once Processing with Trident - The Fake Truthhttps://www.alooma.com/blog/trident-exactly-once

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,wso2 ,apache storm ,data processing ,data analysis

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}