Streaming Analytics to Create an Accurate Single Buyer Identity in Real-time
The 4th and final demo of the Data Hacks and Demos session, at Hadoop Summit San Jose, was done by Simon Ball and it showcased how Apache NiFi moved parallel streams of streaming data into Spark and then more analysis could be done by combining Hortonworks Community Connection info and the Twitter data to create an accurate single buyer identity.
So What Did Simon Tell the Audience?
Correlating reputation data from and linking this back to some of the twitter data. The HCC data is very clean and straight-forward, but the twitter data is not so clean. To identify the “right customer”, use Spark to federate reputation data from HCC, and link queries across multiple data sets, and then from looking at the data extracted from twitter to bring it all together. (BTW, it also used Apache NiFi to ensure rate-limiting of SMS services so as not to experience overages.)
Once the data is pulled altogether, it is possible to start visualizing the data, and see duplicates on names – for example there are many Brad Anderson’s on Twitter – which one is the one we really care about? Only the ones who are are interested in Hadoop are prospects of interest. So we put all this data into a Spark machine learning model to cluster the data, to identify communities within the data – and then we can identify who are the “Hadoop people”, and the “non-Hadoop people.”
Then this data was pushed back out through a Storm topology which was processing incoming votes in real time, combining with the “Hadoop” people information and creating the data set of attendees who were eligible to win the light boxes!
One of the lucky winners was Scott Seligman, pictured here with Joe Witt – the host of the entire Data Hacks and Demos session, Apache NiFi PMC Member and Senior Dir Engineering, Hortonworks. )
So in conclusion, in 20 minutes the combined team of Joe Witt, Jeremy Dyer, Kay Lerch, and Simon Ball showcased how a brick and mortar retail store could identify which customers are walking in the door, greet them, interact with them and provide personalized offers in real time. Are you ready to try this yourself? Here are some starting points.
- Download Apache NiFi, part of Hortonworks DataFlow
- Download Sandbox
- Voice-activated real-time stock quotes with #ApacheNiFi + Mac Dictation
- Apache NiFi Image metadata extraction
- Apache Spark in 5 Minutes
- Apache NiFi & Twitter Tutorial
- Running NiFi on Raspberry Pi – Best Practices
- Hadoop Summit Cool Tech Voting GitHub