Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Understanding WSO2 Stream Processor, Part 1

DZone's Guide to

Understanding WSO2 Stream Processor, Part 1

In this article, we take a quick look at this open source stream processor and how it can help data professionals better work with large streams of data.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Streaming analytics has been one of the trending topics in the software industry for some time. With the production of billions of events through various sources, analyzing these events provides a competitive advantage for any business. The process of streaming analytics can be divided into 3 main sections.

  1. Collect — Collecting events from various sources.
  2. Analyze — Analyzing the events and deriving meaningful insights.
  3. Act — Take action on the results.

WSO2 Stream Processor (WSO2 SP) is an intuitive approach to stream processing. It provides the necessary capabilities to process events and derive meaningful insights with its state of the art “Siddhi” stream processing runtime. The below figure showcases how WSO2 SP acts as a stream processing engine for various events.


Source: https://docs.wso2.com/display/SP410

With the WSO2 SP, events generated from various sources like devices, sensors, applications, and services can be received. The received events are processed in real time using the streaming SQL language “Siddhi.” Once the results are derived, those results can be published through APIs, alerts, or visualizations so that business users can act on them accordingly.

Users of WSO2 SP need to understand a set of basic concepts around the product. Let’s identify the main components which a user needs to interact with.

WSO2 Stream processor comes with built-in components to configure, run, and monitor the product. Here are the main components.

  • WSO2 SP runtime (worker) — Executes the real-time processing logic which is implemented using Siddhi streaming SQL.
  • Editor — Allows users (developers) to implement their logic using Siddhi streaming SQL and debug, deploy, and run their implementations similar to an IDE.
  • Business Rules — Allows business users to change the processing logic by simply modifying a few values stored in a simple form.
  • Job Manager — Allows the deployment and management of siddhi applications across multiple worker nodes.
  • Portal — Provides ability to visualize the results generated from processing logic which was implemented.
  • Status Dashboard — Monitor multiple worker nodes in a cluster and showcases the information about those nodes and the siddhi applications which are deployed.

In addition to the above components, the diagram includes:

  • Source — Devices, Apps, and Services which generates events.
  • Sink — Results of the processing logic are passed into various sinks like APIs, dashboards, and notifications.

With these components, users can implement a plethora of use cases around streaming analytics and/or stream processing. The next thing you need to understand about WSO2 SP is the “Siddhi” streaming SQL language and its high-level concepts. Let’s take a look at those concepts as well.

Figure: Siddhi high-level concepts in a nutshell

The above figure depicts the concepts which need to be understood by WSO2 SP users. Except for the source and sink which we have looked through in the previous section, all the other concepts are new. Let’s have a look at these concepts one by one.

  • Event — Actual data coming from sources which are formatted according to the schema.
  • Schema — Define the format of the data which is coming with events.
  • Stream — A running (continuous) set of incoming events are considered as a stream.
  • Window — Is a set of events which are selected based on the number of events (length) or a time period (duration).
  • Partition — Is a set of events which are selected based on a specific condition of data (e.g. events with the same “name” field).
  • Table — Is a static set of events which are selected based on a defined schema and can be stored in a data store.
  • Query — Is the processing logic which uses streams, tables, windows, partitions to derive meaningful data out of the incoming data events.
  • Store — Is a table stored in a persistent database for later consumption through queries for further processing or to take actions (visualizations).
  • Aggregation — Is a function (pre-defined) applied to events and produces outputs for further processing or as final results.
  • Triggers — Are used to inject events according to a given schema so that processing logic executes periodically through these events.

Now that we have a basic understanding of WSO2 SP and its main concepts, let’s try to do a real streaming analysis using the product. Before doing that, we need to understand the main building block of WSO2 SP runtime which is a “Siddhi Application.” It is the place where users configure WSO2 SP runtime to make it happen.

Figure: Siddhi application components

Within a Siddhi application, we have three main sections.

  • Source definition — This is the place to define incoming event sources and their schemas. Users can configure different transport protocols, messaging formats, etc.
  • Sink definition — This section defines the place to emit the results of the processing. Users can choose to store the events in tables, output to log files, etc.
  • Processing Logic — This section implements the actual business logic for data processing using the Siddhi streaming SQL language.

Now that you have a basic understanding of WSO2 SP and it’s main concepts, the next thing you can do is to get your hands dirty by trying out a few examples. The tutorials section of the documentation is a good point to start things off.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,stream processing ,data analysis ,wso2 ,streaming analytics

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}