Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

IoT Edge Processing With Apache NiFi, MiniFi, and Multiple Deep Learning Libraries: Part 2

DZone 's Guide to

IoT Edge Processing With Apache NiFi, MiniFi, and Multiple Deep Learning Libraries: Part 2

Check out a tutorial that explains IoT edge processing with Apache NiFI, miniFi, and multiple Deep Learning libraries.

· AI Zone ·
Free Resource

For: https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/68140

See Part 1!

We had a lot of fun at the conference, with the looks of interesting things going on. There's an interesting new way to easily run TensorFlow on YARN! It's called Submarine, and I'll do an article on that in the near future.

Image title

Step By Step Processing

Step 1: Install Apache NiFi (One or More Nodes or clusters)

Choose:

1.  HDF Enterprise Install

2.  For the container folks: docker pull hortonworks/nifi

3.  Download standalone Apache NiFi 1.7.1 from Apache NiFi site.

For full Apache NiFi Configuration for IoT, see this.

You will need to set:

nifi.remote.input.host and

nifi.remote.input.socket.port in the conf/nifi.properties or Ambari settings.

Step 2: Install Apache NiFi: MiniFi on Your Device(s)

Download MiniFi.

You can choose Java or C++. For your first usage, I recommend the Java edition unless your device is too small.

You can also install on a RHEL or Debian Linux machine or OSX.

Download MiniFi Toolkit.

Step 3: Install Apache MXNet (On MiniFi Devices and NiFi Nodes: optional)

Install Apache MXNet

Install build tools and build from scratch Walkthrough install

For Source Github: see this.

Hive — SQL — IoT Data Storage

In this section, we will focus on converting JSON to AVRO to Apache ORC and storage options in Apache Hive 3. I am doing two styles of storage for one of the tables, rainbow. I am storing ORC files with an external table as well as using the Streaming API to store into an ACID table.

NiFi — SQL — On Streams — Calcite

SELECT *
FROM FLOWFILE
WHERE CAST(memory AS FLOAT) > 0 

SELECT *
FROM FLOWFILE
WHERE CAST(tempf AS FLOAT) > 65

I check the flows as they are ingested real-time and filter based on conditions such as memory or temperature. This makes for some powerful and easy simple event processing. This is very handy when you may want to filter out standard conditions where no anomaly has occurred.

IoT Data Storage Options

For time series data, we are blessed with many options in HDP 3.x. The simplest choice I am doing first here. That's a simple Apache Hive 3.x table. Since our data is only 2 rows per minute, we are good.

For second and sub-second data, we need to consider either Druid or HBase. The nice thing is these NoSQL options also have SQL interfaces to use. It comes down to how you are going to query the data and which one you like.

HBase + Phoenix is performant and been used in production forever. With HBase 2.x there are really impressive updates that make this a good option.

For richer analytics and some really cool analytics with Apache Superset, it's hard not to recommend Druid. Druid has really been improved recently and well integrated with Hive 3's rich querying.

Example of Our Geo Data

{"speed": "0.145", "diskfree": "4643.2 MB", 
 "altitude": "6.2", "ts": "2018-08-30 17:47:03", 
 "cputemp": 52.0, "latitude": "38.9789405", 
 "track": "0.0", "memory": 26.5, 
 "host": "rainbow", "uniqueid": "gps_uuid_20180830174705", 
 "ipaddress": "172.20.10.8", "epd": "nan", 
 "utc": "2018-08-30T17:47:05.000Z", 
 "epx": "21.91", "epy": "31.536", 
 "epv": "73.37", "ept": "0.005", "eps": "63.07", 
 "longitude": "-74.824475167", "mode": "3", 
 "time": "1535651225.0", "climb": "0.0", 
 "epc": "nan"}

Tables

CREATE EXTERNAL TABLE IF NOT EXISTS rainbow (
  tempf DOUBLE, cputemp DOUBLE, pressure DOUBLE, 
  host STRING, uniqueid STRING, ipaddress STRING, 
  temp DOUBLE, diskfree STRING, altitude DOUBLE, 
  ts STRING, tempf2 DOUBLE, memory DOUBLE
) STORED AS ORC LOCATION '/rainbow';

create table rainbowacid(
  tempf DOUBLE, cputemp DOUBLE, pressure DOUBLE, 
  host STRING, uniqueid STRING, ipaddress STRING, 
  temp DOUBLE, diskfree STRING, altitude DOUBLE, 
  ts STRING, tempf2 DOUBLE, memory DOUBLE
) STORED AS ORC TBLPROPERTIES ('transactional' = 'true');

CREATE EXTERNAL TABLE IF NOT EXISTS gps (
  speed STRING, diskfree STRING, altitude STRING, 
  ts STRING, cputemp DOUBLE, latitude STRING, 
  track STRING, memory DOUBLE, host STRING, 
  uniqueid STRING, ipaddress STRING, 
  epd STRING, utc STRING, epx STRING, 
  epy STRING, epv STRING, ept STRING, 
  eps STRING, longitude STRING, mode STRING, 
  time STRING, climb STRING, epc STRING
) STORED AS ORC LOCATION '/gps';

CREATE TABLE IF NOT EXISTS gpsacid (
  speed STRING, diskfree STRING, altitude STRING, 
  ts STRING, cputemp DOUBLE, latitude STRING, 
  track STRING, memory DOUBLE, host STRING, 
  uniqueid STRING, ipaddress STRING, 
  epd STRING, utc STRING, epx STRING, 
  epy STRING, epv STRING, ept STRING, 
  eps STRING, longitude STRING, mode STRING, 
  `time` STRING, climb STRING, epc STRING
) STORED AS ORC TBLPROPERTIES ('transactional' = 'true');

CREATE EXTERNAL TABLE IF NOT EXISTS movidiussense (
  label5 STRING, runtime STRING, label1 STRING, 
  diskfree STRING, top1 STRING, starttime STRING, 
  label2 STRING, label3 STRING, top3pct STRING, 
  host STRING, top5pct STRING, humidity DOUBLE, 
  currenttime STRING, roll DOUBLE, uuid STRING, 
  label4 STRING, tempf DOUBLE, y DOUBLE, 
  top4pct STRING, cputemp2 DOUBLE, top5 STRING, 
  top2pct STRING, ipaddress STRING, 
  cputemp INT, pitch DOUBLE, x DOUBLE, 
  z DOUBLE, yaw DOUBLE, pressure DOUBLE, 
  top3 STRING, temp DOUBLE, memory DOUBLE, 
  top4 STRING, imagefilename STRING, 
  top1pct STRING, top2 STRING
) STORED AS ORC LOCATION '/movidiussense';

CREATE EXTERNAL TABLE IF NOT EXISTS minitensorflow2 (
  image STRING, ts STRING, host STRING, 
  score STRING, human_string STRING, 
  node_id INT
) STORED AS ORC LOCATION '/minifitensorflow2';

I have added some ACID tables that are the default in HDP 3/Hive 3. This lets me do updates! I will start trying both to see which is best.

There will be more coming soon!

Topics:
apache nifi ,minifi ,iot ,tensorflow ,mxnet ,hive ,hadoop ,deep learning ,tutorial ,artificial intelligence

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}