Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Play-by-Play: Data Hacks and Demos (Demo 2)

DZone's Guide to

Play-by-Play: Data Hacks and Demos (Demo 2)

See some examples of using Apache NiFi running in Azure, Kafka, Spark, and Zeppelin together.

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

During the second demo of the Data Hacks & Demos session, at Hadoop Summit San Jose, Simon Ball demonstrated how to take data received from the edge, and run facial recognition on a more powerful cloud-based cluster with Apache NiFi running in Azure to collect data, Kafka (substrate across all the analytics) all running on Azure, with Spark pieces on top of YARN, with Zeppelin on top.

Demo 2 Facial Recognition Apache NiFi Spark Hack Demos Hortonworks HS16SJ


So What Did Simon Tell the Audience?

Apache NiFi provides real-time edge analytics for basic facial recognition. But sometimes, you need more powerful computer vision machine learning.

Edge processing has limited power and processing that only allows you to do some basic facial recognition. Using basic facial recognition, Apache NiFi allows you to prioritize which images are more important than others. Then, with Apache NiFI’s site to site protocol, prioritized images are transferred first, along with the metadata from the bar codes on the badges. From there, on the cluster that has received the prioritized images, we use Spark and Zeppelin, together with an additional library, dlib, which specializes in computer vision machine learning.

How Did Facial Recognition With Spark Work?

In a cluster running on the cloud, with Spark’s machine learning capability and its ability to parallelize across very large datasets, one can do more sophisticated analytics. For example, one can compare and correlate data against an entire customer database, which is not practical to store on a Raspberry Pi edge device in a store. We can also do things like facial alignment and take advantage of Spark’s built-in support for numpy and Spark’s ability to crunch a large number of matrices, then we can start to identify facial landmarks and alignment. We can then take facial landmark vectors and pass these into classifiers that can be trained in Spark, and start to compare with reference photos, identify facial vectors, and then the system can start to tell you names solely based on images (without needing the bar code information used earlier).

That was the second demo of Data Hacks & Demos at Hadoop Summit San Jose. The third demo — using IoT to get real-time feedback is up next in this blog series. In the meantime, to get started with building something like this yourself, check out these links:

Related Refcard:

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:
azure ,hortonworks ,hadoop ,zeppelin ,big data ,nifi

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}