Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Hadoop Summit San Jose 2016 - Demo #4: Data Hacks

DZone's Guide to

Hadoop Summit San Jose 2016 - Demo #4: Data Hacks

Hortonworks shows a detailed demo of correlating reputation data and twitter data with Apache NiFi and Spark.

· Big Data Zone
Free Resource

Free O'Reilly eBook: Learn how to architect always-on apps that scale. Brought to you by Mesosphere DC/OS–the premier platform for containers and big data.

Streaming Analytics to Create an Accurate Single Buyer Identity in Real-time

The 4th and final demo of the Data Hacks and Demos session, at Hadoop Summit San Jose, was done by Simon Ball and it showcased how Apache NiFi moved parallel streams of streaming data into Spark and then more analysis could be done by combining Hortonworks Community Connection info and the Twitter data to create an accurate single buyer identity.

So What Did Simon Tell the Audience?

Correlating reputation data from and linking this back to some of the twitter data. The HCC data is very clean and straight-forward,  but the twitter data is not so clean. To identify the “right customer”, use Spark to federate reputation data from HCC, and link queries across multiple data sets, and then from looking at the data extracted from twitter to bring it all together. (BTW, it also used Apache NiFi  to ensure rate-limiting of SMS services so as not to experience overages.)

Once the data is pulled altogether, it is possible to start visualizing the data, and see duplicates on names – for example there are many Brad Anderson’s on Twitter – which one is the one we really care about?  Only the ones who are are interested in Hadoop are prospects of interest. So we put all this data into a Spark machine learning model to cluster the data, to identify communities within the data – and then we can identify who are the  “Hadoop people”, and the “non-Hadoop people.”

Then this data was pushed back out through a Storm topology which was processing incoming votes in real time, combining with the “Hadoop” people information and creating the data set of attendees who were eligible to win the light boxes!

One of the lucky winners was Scott Seligman, pictured here with Joe Witt –  the host of the entire Data Hacks and Demos session, Apache NiFi PMC Member and Senior Dir Engineering,  Hortonworks. )

Photo 30.06.16, 23 19 40

Summary

So in conclusion, in 20 minutes the combined team of Joe Witt, Jeremy Dyer, Kay Lerch, and Simon Ball showcased how a brick and mortar retail store could identify which customers are walking in the door, greet them, interact with them and provide personalized offers in real time.  Are you ready to try this yourself? Here are some starting points.

Easily deploy & scale your data pipelines in clicks. Run Spark, Kafka, Cassandra + more on shared infrastructure and blow away your data silos. Learn how with Mesosphere DC/OS.

Topics:
nifi ,hortonworks ,spark ,hadoop ,big data ,storm

Published at DZone with permission of Anna Yong, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}