Over a million developers have joined DZone.

Hadoop Summit San Jose 2016 - Demo #4: Data Hacks

Hortonworks shows a detailed demo of correlating reputation data and twitter data with Apache NiFi and Spark.

· Big Data Zone

Read this eGuide to discover the fundamental differences between iPaaS and dPaaS and how the innovative approach of dPaaS gets to the heart of today’s most pressing integration problems, brought to you in partnership with Liaison.

Streaming Analytics to Create an Accurate Single Buyer Identity in Real-time

The 4th and final demo of the Data Hacks and Demos session, at Hadoop Summit San Jose, was done by Simon Ball and it showcased how Apache NiFi moved parallel streams of streaming data into Spark and then more analysis could be done by combining Hortonworks Community Connection info and the Twitter data to create an accurate single buyer identity.

So What Did Simon Tell the Audience?

Correlating reputation data from and linking this back to some of the twitter data. The HCC data is very clean and straight-forward,  but the twitter data is not so clean. To identify the “right customer”, use Spark to federate reputation data from HCC, and link queries across multiple data sets, and then from looking at the data extracted from twitter to bring it all together. (BTW, it also used Apache NiFi  to ensure rate-limiting of SMS services so as not to experience overages.)

Once the data is pulled altogether, it is possible to start visualizing the data, and see duplicates on names – for example there are many Brad Anderson’s on Twitter – which one is the one we really care about?  Only the ones who are are interested in Hadoop are prospects of interest. So we put all this data into a Spark machine learning model to cluster the data, to identify communities within the data – and then we can identify who are the  “Hadoop people”, and the “non-Hadoop people.”

Then this data was pushed back out through a Storm topology which was processing incoming votes in real time, combining with the “Hadoop” people information and creating the data set of attendees who were eligible to win the light boxes!

One of the lucky winners was Scott Seligman, pictured here with Joe Witt –  the host of the entire Data Hacks and Demos session, Apache NiFi PMC Member and Senior Dir Engineering,  Hortonworks. )

Photo 30.06.16, 23 19 40


So in conclusion, in 20 minutes the combined team of Joe Witt, Jeremy Dyer, Kay Lerch, and Simon Ball showcased how a brick and mortar retail store could identify which customers are walking in the door, greet them, interact with them and provide personalized offers in real time.  Are you ready to try this yourself? Here are some starting points.

Discover the unprecedented possibilities and challenges, created by today’s fast paced data climate and why your current integration solution is not enough, brought to you in partnership with Liaison

nifi,hortonworks,spark,hadoop,big data,storm

Published at DZone with permission of Anna Yong, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}