Over a million developers have joined DZone.

Play-by-Play: Data Hacks and Demos (Demo 2)

See some examples of using Apache NiFi running in Azure, Kafka, Spark, and Zeppelin together.

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

During the second demo of the Data Hacks & Demos session, at Hadoop Summit San Jose, Simon Ball demonstrated how to take data received from the edge, and run facial recognition on a more powerful cloud-based cluster with Apache NiFi running in Azure to collect data, Kafka (substrate across all the analytics) all running on Azure, with Spark pieces on top of YARN, with Zeppelin on top.

Demo 2 Facial Recognition Apache NiFi Spark Hack Demos Hortonworks HS16SJ

So What Did Simon Tell the Audience?

Apache NiFi provides real-time edge analytics for basic facial recognition. But sometimes, you need more powerful computer vision machine learning.

Edge processing has limited power and processing that only allows you to do some basic facial recognition. Using basic facial recognition, Apache NiFi allows you to prioritize which images are more important than others. Then, with Apache NiFI’s site to site protocol, prioritized images are transferred first, along with the metadata from the bar codes on the badges. From there, on the cluster that has received the prioritized images, we use Spark and Zeppelin, together with an additional library, dlib, which specializes in computer vision machine learning.

How Did Facial Recognition With Spark Work?

In a cluster running on the cloud, with Spark’s machine learning capability and its ability to parallelize across very large datasets, one can do more sophisticated analytics. For example, one can compare and correlate data against an entire customer database, which is not practical to store on a Raspberry Pi edge device in a store. We can also do things like facial alignment and take advantage of Spark’s built-in support for numpy and Spark’s ability to crunch a large number of matrices, then we can start to identify facial landmarks and alignment. We can then take facial landmark vectors and pass these into classifiers that can be trained in Spark, and start to compare with reference photos, identify facial vectors, and then the system can start to tell you names solely based on images (without needing the bar code information used earlier).

That was the second demo of Data Hacks & Demos at Hadoop Summit San Jose. The third demo — using IoT to get real-time feedback is up next in this blog series. In the meantime, to get started with building something like this yourself, check out these links:

Related Refcard:

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.

azure,hortonworks,hadoop,zeppelin,big data,nifi

Published at DZone with permission of Anna Yong, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}