During the second demo of the Data Hacks & Demos session, at Hadoop Summit San Jose, Simon Ball demonstrated how to take data received from the edge, and run facial recognition on a more powerful cloud-based cluster with Apache NiFi running in Azure to collect data, Kafka (substrate across all the analytics) all running on Azure, with Spark pieces on top of YARN, with Zeppelin on top.
So What Did Simon Tell the Audience?
Apache NiFi provides real-time edge analytics for basic facial recognition. But sometimes, you need more powerful computer vision machine learning.
Edge processing has limited power and processing that only allows you to do some basic facial recognition. Using basic facial recognition, Apache NiFi allows you to prioritize which images are more important than others. Then, with Apache NiFI’s site to site protocol, prioritized images are transferred first, along with the metadata from the bar codes on the badges. From there, on the cluster that has received the prioritized images, we use Spark and Zeppelin, together with an additional library, dlib, which specializes in computer vision machine learning.
How Did Facial Recognition With Spark Work?
In a cluster running on the cloud, with Spark’s machine learning capability and its ability to parallelize across very large datasets, one can do more sophisticated analytics. For example, one can compare and correlate data against an entire customer database, which is not practical to store on a Raspberry Pi edge device in a store. We can also do things like facial alignment and take advantage of Spark’s built-in support for numpy and Spark’s ability to crunch a large number of matrices, then we can start to identify facial landmarks and alignment. We can then take facial landmark vectors and pass these into classifiers that can be trained in Spark, and start to compare with reference photos, identify facial vectors, and then the system can start to tell you names solely based on images (without needing the bar code information used earlier).
That was the second demo of Data Hacks & Demos at Hadoop Summit San Jose. The third demo — using IoT to get real-time feedback is up next in this blog series. In the meantime, to get started with building something like this yourself, check out these links: