DZone
Big Data Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Big Data Zone > Using GluonCV 0.3 With Apache MXNet 1.3 and Apache NiFi 1.7

Using GluonCV 0.3 With Apache MXNet 1.3 and Apache NiFi 1.7

In this tutorial article, we will learn how to use Apache NiFi for deep learning workflows with the new GluonCV. Read on to get started!

Tim Spann user avatar by
Tim Spann
CORE ·
Sep. 27, 18 · Big Data Zone · Tutorial
Like (4)
Save
Tweet
3.67K Views

Join the DZone community and get the full member experience.

Join For Free

Using GluonCV 0.3 With Apache MXNet 1.3

Source code: https://github.com/tspannhw/nifi-gluoncv-yolo3

*Captured and Processed Image Available for Viewing in Stream in Apache NiFi 1.7.x

Use Case7

I need to easily monitor the contents of my security vault. It is a fixed number of known things.

What we need in the real world is a nice camera(s) (maybe four to eight depending on angles of the room), a device like an NVidia Jetson TX2, MiniFi 0.5 Java Agent, JDK 8, Apache MXNet, GluonCV, lots of Python libraries, a network connection, and a simple workflow. Outside of my vault, I will need a server(s) or clusters to do the more advanced processing, though I could run it all on the local box. If the number of items or certain items I am watching are no longer on the screen, then we should send an immediate alert. That could be to an SMS, email, Slack message, alert system, or other means. We had most of that implemented below. If anyone wants to do the complete use case I can assist.

Demo Implementation

I wanted to use the new YOLO 3 model which is part of the new 0.3 stream, so I installed a 0.3. This may be final by the time you read this. You can try to do a regular pip3.6 install -U gluoncv command and see what you get.

pip3.6 install -U gluoncv==0.3.0b20180924

Yolo v3 is a great pre-trained model to use for object detection.

See: https://gluon-cv.mxnet.io/build/examples_detection/demo_yolo.html

The GluonCV Model Zoo is very rich and incredibly easy to use. So we just grab the model "yolo3_darknet53_voc" with an automatic one time download and we are ready to go. They provide easy to customize code to start with. I wrote my processed image and JSON results out for ingest by Apache NiFi. You will notice this is similar to what we did for the Open Computer Vision talks.

This is updated and even easier. I dropped the MQTT and then just output image files and some JSON to read.

GluonCV makes working with Computer Vision extremely clean and easy.

Why Apache NiFi for Deep Learning Workflows

Let me count the top five ways:

1. Provenance - This lets me see everything, everywhere, all the time with the data and the metadata.

2, Configurable Queues - Queues are everywhere and they are extremely configurable on size and priority. There's always backpressure and safety between every step. Sinks, sources, and steps can be offline as things happen in the real-world internet. Offline, online, wherever, I can recover and have full visibility into my flows as they spread between devices, servers, networks, clouds, and nation-states.

3. Security - Secure at every level from SSL and data encryption. Integration with leading edge tools including Apache Knox, Apache Ranger and Apache Atlas. See here.

4. UI - A simple UI to develop, monitor, and manage incredibly complex flows including IoT, Deep Learning, Logs, and every data source you can throw at it.

5. Agents - MiniFi gives me two different agents for my devices or systems to stream data headless.

Running the gluoncv yolo3 Model

I wrap my Python script in a shell script to throw away warnings and junk

cd /Volumes/TSPANN/2018/talks/ApacheDeepLearning101/nifi-gluoncv-yolo3 
python3.6  -W ignore /Volumes/TSPANN/2018/talks/ApacheDeepLearning101/nifi-gluoncv-yolo3/yolonifi.py 2>/dev/null

List of Possible Objects We Can Detect

["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", 
"diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", 
"tvmonitor"]

I am going to train this with my own data for the upcoming INTERNET OF BEER, for the vault use case we would need your vault content pictures.

See here. 

Example Output in JSON

{  
   "imgname":"images/gluoncv_image_20180924190411_b90c6ba4-bbc7-4bbf-9f8f-ee5a6a859602.jpg",
   "imgnamep":"images/gluoncv_image_p_20180924190411_b90c6ba4-bbc7-4bbf-9f8f-ee5a6a859602.jpg",
   "class1":"tvmonitor",
   "pct1":"49.070724999999996",
   "host":"HW13125.local",
   "shape":"(1, 3, 512, 896)",
   "end":"1537815855.105193",
   "te":"4.199203014373779",
   "battery":100,
   "systemtime":"09/24/2018 15:04:15",
   "cpu":33.7,
   "diskusage":"49939.2 MB",
   "memory":60.1,
   "id":"20180924190411_b90c6ba4-bbc7-4bbf-9f8f-ee5a6a859602"
}

Example Processed Image Output

It found one generic person, we could train against a known set of humans that are allowed in an area or are known users.

NiFi Flows

Gateway Server (We could skip this, but aggregating multiple camera agents is useful)

Send the Flow to the Cloud

Cloud Server Site-to-Site

After we infer the schema of the data once, we don't need it again. We could derive the schema manually or from another tool, but this is easy. Once you are done, then you can delete the InferAvroSchema processor from your flow. I left mine in for your use if you wish to start from this flow that is attached at the end of the article.

Flow Steps

Route when no Error to merge record then convert those aggregated Apache Avro records into one Apache ORC file.

Then store it in an HDFS directory. Once complete, there will be a DDL added to metadata that you can send to a PutHiveQL or manually create the table in Beeline or Zeppelin or Hortonworks Data Analytics Studio.

Schema: gluoncvyolo

{  
   "type":"record",
   "name":"gluoncvyolo",
   "fields":[  
      {  
         "name":"imgname",
         "type":"string",
         "doc":"Type inferred from '\"images/gluoncv_image_20180924211055_8f3b9dac-5645-49aa-94e7-ee5176c3f55c.jpg\"'"
      },
      {  
         "name":"imgnamep",
         "type":"string",
         "doc":"Type inferred from '\"images/gluoncv_image_p_20180924211055_8f3b9dac-5645-49aa-94e7-ee5176c3f55c.jpg\"'"
      },
      {  
         "name":"class1",
         "type":"string",
         "doc":"Type inferred from '\"tvmonitor\"'"
      },
      {  
         "name":"pct1",
         "type":"string",
         "doc":"Type inferred from '\"95.71207000000001\"'"
      },
      {  
         "name":"host",
         "type":"string",
         "doc":"Type inferred from '\"HW13125.local\"'"
      },
      {  
         "name":"shape",
         "type":"string",
         "doc":"Type inferred from '\"(1, 3, 512, 896)\"'"
      },
      {  
         "name":"end",
         "type":"string",
         "doc":"Type inferred from '\"1537823458.559896\"'"
      },
      {  
         "name":"te",
         "type":"string",
         "doc":"Type inferred from '\"3.580893039703369\"'"
      },
      {  
         "name":"battery",
         "type":"int",
         "doc":"Type inferred from '100'"
      },
      {  
         "name":"systemtime",
         "type":"string",
         "doc":"Type inferred from '\"09/24/2018 17:10:58\"'"
      },
      {  
         "name":"cpu",
         "type":"double",
         "doc":"Type inferred from '12.0'"
      },
      {  
         "name":"diskusage",
         "type":"string",
         "doc":"Type inferred from '\"48082.7 MB\"'"
      },
      {  
         "name":"memory",
         "type":"double",
         "doc":"Type inferred from '70.6'"
      },
      {  
         "name":"id",
         "type":"string",
         "doc":"Type inferred from '\"20180924211055_8f3b9dac-5645-49aa-94e7-ee5176c3f55c\"'"
      }
   ]
}

Tabular data has fields with types and properties. Let's specify those for automated analysis, conversion, and live stream SQL.

Hive table schema: gluoncvyolo

CREATE EXTERNAL TABLE IF NOT EXISTS gluoncvyolo (imgname STRING, imgnamep STRING, class1 STRING, pct1 STRING, host STRING, shape STRING, end STRING, te STRING, battery INT, systemtime STRING, cpu DOUBLE, diskusage STRING, memory DOUBLE, id STRING) STORED AS ORC;

Apache NiFi generates tables for me in Apache Hive 3.x as Apache ORC files for fast performance.


Hive acid table schema: gluoncvyoloacid

CREATE TABLE gluoncvyoloacid
(imgname STRING, imgnamep STRING, class1 STRING, pct1 STRING, host STRING, shape STRING, `end` STRING, te STRING, battery INT, systemtime STRING, cpu DOUBLE, diskusage STRING, memory DOUBLE, id STRING)
STORED AS ORC TBLPROPERTIES ('transactional'='true')

I can just as easily insert or update data into Hive 3.x ACID 2 tables.

We have data, now query it. Easy, no need to install analytics with tables, Leafletjs, AngularJS, graphs, maps, and charts.

NiFi Flow Registry

To manage version control I am using the NiFi Registry which is great. In the newest version, 0.2, there is the ability to back it up with GitHub! It's easy. Everything you need to know is in the docs and Bryan Bend's excellent post on the subject.

There were a few gotchas for me.

  • Use your own new GitHub project with permissions and then clone it to a local git clone https://github.com/tspannhw/nifi-registry-github.git.
  • Make sure the GitHub directory has permission and is empty (no readme or junk).
  • Make sure you put in the full directory path.
  • Update your config like below:
    <flowPersistenceProvider>
        <class>org.apache.nifi.registry.provider.flow.git.GitFlowPersistenceProvider</class>
        <property name="Flow Storage Directory">/Users/tspann/Documents/nifi-registry-0.2.0/conf/nifi-registry-github</property>
        <property name="Remote To Push">origin</property>
        <property name="Remote Access User">tspannhw</property>
        <property name="Remote Access Password">generatethis</property>
    </flowPersistenceProvider> 

This is my GitHub directory to hold versions: https://github.com/tspannhw/nifi-registry-github

Resources

  • https://github.com/tspannhw/UsingGluonCV
  • https://gluon.mxnet.io/chapter01_crashcourse/ndarray.html
  • https://gluon-cv.mxnet.io/build/examples_detection/demo_yolo.html#sphx-glr-build-examples-detection-demo-yolo-py
  • https://gluon-cv.mxnet.io/model_zoo/index.html#object-detection
  • https://community.hortonworks.com/articles/215271/iot-edge-processing-with-deep-learning-on-hdf-32-a-2.html
  • https://community.hortonworks.com/articles/198912/ingesting-apache-mxnet-gluon-deep-learning-results.html

Zeppelin Notebook

apache-mxnet-gluoncv-yolov3-copy.json

Nifi Flow

gluoncv-server.xml

Apache NiFi Apache MXNet Database Data (computing) Flow (web browser)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • How Do You Know If a Graph Database Solves the Problem?
  • Advancing Cybersecurity Using Machine Learning
  • Tools and Integrations to Significantly Improve Code Review in GitHub
  • Deriving Ideal Indexes: A Guide

Comments

Big Data Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo