Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Introduction to Apache MXNet

DZone's Guide to

Introduction to Apache MXNet

Here is your comprehensive intro to Apache MXNet, as part one of a series of articles about deep learning using Apache's framework.

· AI Zone ·
Free Resource

Did you know that 50- 80% of your enterprise business processes can be automated with AssistEdge?  Identify processes, deploy bots and scale effortlessly with AssistEdge.

Apache Deep Learning 101 Series: Part 1: Intro to Apache MXNet

You can easily run Apache MXNet on an OSX machine or a Linux workstation utilizing a Python script. I have forked the standard Apache MXNet Wine Detector Tutorial to read our local OSX webcam (you may need to change your OpenCV WebCam port from 0 to 1 or to 2, depending on your number of webcams and which one you want to use. I am running this on an OSX laptop connected to a monitor that has a built in webcam, so I use that one which is 1. The webcam numbering starts at 0. If you only have one, then use 0.)

Let's get this installed!

git clone https://github.com/apache/incubator-mxnet.git


The installation instructions at Apache MXNet's website are amazing. Pick your platform and your style. I am doing this the simplest way on a Mac, but you can use Virtual Python Environment which may be best for you.

git clone https://github.com/tspannhw/ApacheBigData101.git


You will want to copy my shell script osxlocalrun.sh , inception copy and  analyze.py  script to your machine. If you don't have a webcam you will want to use the Centos version of the shell and Python. That one works with a static image that you supply. I am assuming you are running a recently updated Mac with 16GB of RAM or more, PIP, Brew, and Python 3 installed already. If not, do that. If you have a pre-1.0 Apache MXNet, please upgrade. You will need curl  and tar  installed which they should be.

git clone https://github.com/apache/incubator-mxnet.git

cd incubator-mxnet
mkdir images
curl --header 'Host: data.mxnet.io' --header 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Firefox/45.0' --header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' --header 'Accept-Language: en-US,en;q=0.5' --header 'Referer: http://data.mxnet.io/models/imagenet/' --header 'Connection: keep-alive' 'http://data.mxnet.io/models/imagenet/inception-bn.tar.gz' -o 'inception-bn.tar.gz' -L

tar -xvzf inception-bn.tar.gz
cp Inception-BN-0126.params Inception-BN-0000.params

Then

brew update
pip install --upgrade pip
pip install --upgrade setuptools
pip install mxnet==1.0.0
brew install graphviz
pip install graphviz

For your machine if you have two versions of Python, you may need to do pip3 and you may need to run via sudo. It depends on how your machine is setup and how locked down it is.

We are creating a directory called Images that will fill with OpenCV capture images. You probably want to delete them or ingest them. It's very easy to ingest with Apache NiFi or MiniFi, both of which run on OSX with ease. See: https://community.hortonworks.com/articles/107379/minifi-for-image-capture-and-ingestion-from-raspbe.html

So we call a simple shell script (osxlocalrun.sh), which calls our custom Python 3 script (you can easily convert this to Python 2 if you need to, in a future article I have this running on Python 2.7 on a Centos 7 HDP 2.6.4 cluster node). I send warnings to /dev/null to get rid of them since they are related to OSX configuration that you may or may not have and cannot easily change. Nothing to see here. You will probably need to chmod 755 your osxlocalrun.sh. If you are running on a Linux variant, follow this directions on the Apache MXNet site or wait for my next article on installing and using Apache MXNet in Centos-based HDP 2.6.4 and HDF 3.1 clusters.

 python3 -W ignore analyze.py 2>/dev/null

For Apache NiFi Flow Templates

You can download my Apache NiFi flows from github or this article.

Architecture

  • Local Apache NiFi 1.5 with NiFi Registry running with JDK 8 on OSX
  • Local Apache MXNet installation with Python 3
  • Remote HDF 3.1 Cluster Running on Centos 7 on OpenStack with Apache Ambari, Apache NiFi, NiFi Registry, Hortonworks Schema Registry.
  • Remote HDP 2.6.4 Cluster Runniong on Centos 7 on OpenStack with Apache Hive, Apache Ambari

The flow is easy:

  • ExecuteProcess: Execute that shell script
  • UpdateAttribute: Add the schema name
  • InferAvroSchema: Really need this one only once if you don't want to hand create your schema, push the results to an attribute
  • Remote Process Group: Send via HTTP Site-to-Site to an HDF 3.1 cluster.

Local OSX Processing

Cluster-Based Record Processing

On the cloud we use ConvertRecord  to convert the Apache MXNet Python script generated JSON into AVRO. We merge a bunch of those together then convert that larger AVRO record to ORC. This ORC file is stored in HDFS. Apache NiFi will automatically generate Hive DDL that we can instantly execute via Apache NiFi or do manually. I do this manually in Apache Zeppelin. I could easily augment this data with weather, Twitter and other REST feeds. Those have been covered in other articles I have written. I could also push the results to Kafka 1.0 for additional processing in Hortonworks Streaming Analytics Manager. I will do that a future time.

Apache Hive SQL DDL

CREATE EXTERNAL TABLE IF NOT EXISTS inception3 (uuid STRING, top1pct STRING, top1 STRING, top2pct STRING, top2 STRING, top3pct STRING, top3 STRING, top4pct STRING, top4 STRING, top5pct STRING, top5 STRING, imagefilename STRING, runtime STRING) STORED AS ORC
LOCATION '/mxnet/local'

Example Output

{"uuid":
"mxnet_uuid_img_20180208204131", "top1pct":
"30.0999999046", "top1": "n02871525 bookshop,
bookstore, bookstall", "top2pct": "23.7000003457",
"top2": "n04200800 shoe shop, shoe-shop, shoe store",
"top3pct": "4.80000004172", "top3":
"n03141823 crutch", "top4pct": "2.89999991655",
"top4": "n04370456 sweatshirt", "top5pct":
"2.80000008643", "top5": "n02834397 bib", "imagefilename":
"images/tx1_image_img_20180208204131.jpg", "runtime":
"2"}

Query Results

Example OpenCV Captured Image

{"top1pct": "67.6", "top5": "n03485794 handkerchief, hankie, hanky, hankey", "top4": "n04590129 window shade", "top3": "n03938244 pillow", "top2": "n04589890 window screen", "top1": "n02883205 bow tie, bow-tie, bowtie", "top2pct": "11.5", "imagefilename": "nanotie7.png", "top3pct": "4.5", "uuid": "mxnet_uuid_img_20180211161220", "top4pct": "2.8", "top5pct": "2.8", "runtime": "3.0"}

My cat assists me in some Deep Learning work, so I use Apache NiFi to track him to make sure he's working and hasn't taken his tie off during office hours. I run a strict office here in the Princeton lab.

Source Code

Consuming AI in byte sized applications is the best way to transform digitally. #BuiltOnAI, EdgeVerve’s business application, provides you with everything you need to plug & play AI into your enterprise.  Learn more.

Topics:
apachemxnet ,apache hadoop ,apache nifi ,python ,deep learning ,hortonworks ,ai

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}