Over a million developers have joined DZone.

Closed Loop: Machine Learning and Real Time Stream Processing

A summary of Kai Wähner's talk at Voxxed Zurich about machine learning and real time stream processing with Apache Spark, TIBCO, Apache Storm, and other tools.

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

In March 2016, I had a talk at Voxxed Zurich about “How to Apply Machine Learning and Big Data Analytics to Real Time Processing”.

Image title

Finding Insights With R, H20, Apache Spark MLlib, PMML, and TIBCO Spotfire

Big Data” is currently a big hype. Large amounts of historical data are stored in Hadoop or other platforms. Business Intelligence tools and statistical computing are used to draw new knowledge and to find patterns from this data, for example for promotions, cross-selling or fraud detection. The key challenge is how these findings can be integrated from historical data into new transactions in real time to make customers happy, increase revenue or prevent fraud.

Putting Analytic Models Into Action With Apache Storm, Flink, Spark Streaming, or TIBCO StreamBase

Fast Data” via stream processing is the solution to embed patterns – which were obtained from analyzing historical data – into future transactions in real-time. The following slide deck uses several real world success stories to explain the concepts behind stream processing and its relation to Apache Hadoop and other big data platforms. I discuss how patterns and statistical models of R, Apache Spark MLlib, H20, and other technologies can be integrated into real-time processing using open source stream processing frameworks (such as Apache Storm, Spark Streaming or Flink) or products (such as IBM InfoSphere Streams or TIBCO StreamBase). A great benefit is if a framework or tool already offers out-of-the-box connectors to execute analytic models without need for rebuilding the model respecively rewriting any code. 

Slide Deck and Video Recording From Voxxed Zurich 2016

Here is the slide deck:

How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real Time Streaming Analytics from Kai Wähner

The video recording of this talk is available on Youtube:

A live demo showed the complete development lifecycle combining analytics with TIBCO Spotfire, machine learning via R and stream processing via TIBCO StreamBase and TIBCO Live Datamart.

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.

big data,stream processing,r language,spark,tibco,open source,storm,machine learning,analytics,fast data

Published at DZone with permission of Kai Wähner, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}