Drools Kinesis Analytics: Process Streaming Data in Real-Time With Drools

DZone 's Guide to

Drools Kinesis Analytics: Process Streaming Data in Real-Time With Drools

Learn about processing streaming data using Amazon Kinesis Analysis. This is perfect for developers without advanced big data skill sets who are on tight timelines.

· Big Data Zone ·
Free Resource

As you may know, Amazon Kinesis Analytics provides a service to process streaming data using SQL. Its applications and benefits are described here by AWS chief evangelist Jeff Barr. He says in the article:

"You can build a powerful, end-to-end stream processing pipeline in five minutes without having to write anything more complex than a SQL query." So now we can process "...clickstreams from web applications, telemetry and sensor reports from connected devices, and more ... all in real-time!"

This is done by developers without advanced big data skill sets and on tight timelines.

SQL queries normally return data from some point in time. While working against a data stream, we usually want something different: query data over a time period. For example, we want to count the number of events per minute or do something if the time proximity between consecutive events is under some threshold. To cope with this need, Amazon extended the SQL language with concepts like a processing "window," PUMPs, and other special functions and mechanisms, making it possible to run continuous SQL queries against your streaming data. Still, it could not address the core SQL problem — it's a primitive scripting language!

SQL lacks basics like complex types, functions, variables, and many other language features available in other languages. When we need just one simple query, SQL is perfect. But real-world applications need to run several queries, need to make data correlations and calculations. While this is possible in SQL using stored procedures, it's discouraged due to language constraints. As a result, SQL is mostly used for data retrieval or updates and the application logic is written in languages like Java, C#, and others.

Does this mean that it's better to write Kinesis Analytics apps in Java? While this possibility is offered by AWS, as well, and definitely provides the maximum flexibility, it's not simple. The main reason is that in this case, you will need to implement all the nice SQL extensions that Amazon developed in Amazon Kinesis Analytics by yourself, and also cope with huge amount of data coming in at high rates, which is not trivial. To cope with that, a developer must have an advanced big data skill set and considerable time to write and debug the implementation.

Drools is a language that is specifically designed to write rule-based apps (analytics). It also has native stream processing extensions. This language has all the high-level constructs developers are familiar with adapted to handle data streams. It's a rule-based language with simple semantics as, "When some conditions occur, then do some tasks."

rule "<name>"
    <Conditions>, e.g.
    $i: Item(cost < 200) // match item with cost < 200, and if matched assign to $i
    <Actions/Consequence> // write any logic using Java syntax. You can use $i here

As a result, the application becomes a state machine with rule(s) for every state.

The then part accepts any valid Java, making it easy to implement any logic. By inserting new objects, the system transitions to a new state or outputs results.

Here, you can see an example walk-through of a fire alarm system.

Shifting to state/rule-based systems makes it semantic and therefore easier to map to the problem it tries to solve. Combined with powerful Java capabilities in the then part, full compilation and type safety makes the whole solution very compelling. It's worth mentioning that Drools is an open-source project sponsored by RedHat under JBoss umbrella.

Now there is also an AWS fully managed service letting write Kinesis Analytics apps using Drools. It supports multiple Kinesis Streams as input (unlike Amazon Kinesis Analytics) and DynamoDB for static reference data. Kinesis Streams, SNS or DynamoDB can be used for outputs. There is an easy-to-use pad to write rules, with instant compilation capability for error checking, and a "Test" console, which lets you run the rules against actual inputs with extensive logging to test your logic. And finally, it's totally free until you are satisfied and deploy the solution to production!

amazon kinesis ,big data ,data analytics ,drools ,real-time data ,sql ,streaming data

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}