Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

IoT Edge Processing With Apache NiFi, MiniFi, and Multiple Deep Learning Libraries: Part 1

DZone's Guide to

IoT Edge Processing With Apache NiFi, MiniFi, and Multiple Deep Learning Libraries: Part 1

Let's look at how to use the new HDP 3.0 and HDF 3.2 stacks to develop IoT applications with Deep Learning.

· AI Zone ·
Free Resource

The most visionary programmers today dream of what a robot could do, just like their counterparts in 1976 dreamed of what personal computers could do. Read more on MistyRobotics.com and enter to win your own Misty. 


In preparation for my talk on utilizing edge devices for Deep Learning, IoT sensor reading, and Big Data processing, I have updated my environment to the latest and greatest tools available.

With the upgrade of HDF to 3.2, I can now use Apache NiFi 1.7 and MiniFi 0.5 for IoT data ingestion, simple event processing, conversion, data processing, data flow, and storage.

The architecture diagram above shows the basic flow we are utilizing.

Step-by-Step

  1. Raspberry Pi with latest patches, Python, GPS software, USB Camera, Sensor libraries, Java 8, MiniFi 0.5, TensorFlow and Apache MXNet installed.
  2. Minifi flow pushes JSON and JPEGs over HTTP(s) / Site-to-Site to an Apache NiFi gateway server.
  3. Option: NiFi can push to a central NiFi cloud cluster and/or Kafka cluster both of which running on HDF 3.2 environments.
  4. Apache NiFi cluster pushes to Hive, HDFS, Dockerized API running in HDP 3.0 and Third Party APIs.
  5. NiFi and Kafka integrate with Schema Registry for our tabular data including rainbow and gps JSON data.

SQL Tables in Hive

I stream my data into Apache ORC files stored on HDP 3.0 HDFS directories and build external tables on them.

CREATE EXTERNAL TABLE IF NOT EXISTS rainbow (tempf DOUBLE, cputemp DOUBLE, pressure DOUBLE, host STRING, uniqueid STRING, ipaddress STRING, temp DOUBLE, diskfree STRING, altitude DOUBLE, ts STRING, 
 tempf2 DOUBLE, memory DOUBLE) 
STORED AS ORC LOCATION '/rainbow';

CREATE EXTERNAL TABLE IF NOT EXISTS gps (speed STRING, diskfree STRING, altitude STRING, ts STRING, cputemp DOUBLE, latitude STRING, track STRING, memory DOUBLE, host STRING, uniqueid STRING, ipaddress STRING, epd STRING, utc STRING, epx STRING, epy STRING, epv STRING, ept STRING, eps STRING, longitude STRING, mode STRING, time STRING, climb STRING, epc STRING) 
STORED AS ORC LOCATION '/gps';

For my processing needs, I also have a Hive 3 ACID table for general table usage and updates.

create table rainbowacid(tempf DOUBLE, cputemp DOUBLE, pressure DOUBLE, host STRING, uniqueid STRING, ipaddress STRING, temp DOUBLE, diskfree STRING, altitude DOUBLE, ts STRING, 
                                             tempf2 DOUBLE, memory DOUBLE) STORED AS ORC 
                        TBLPROPERTIES ('transactional'='true');

CREATE TABLE IF NOT EXISTS gpsacid (speed STRING, diskfree STRING, altitude STRING, ts STRING, cputemp DOUBLE, latitude STRING, track STRING, memory DOUBLE, host STRING, uniqueid STRING, ipaddress STRING, epd STRING, utc STRING, epx STRING, epy STRING, epv STRING, ept STRING, eps STRING, longitude STRING, mode STRING, time STRING, climb STRING, epc STRING) STORED AS ORC
                        TBLPROPERTIES ('transactional'='true');

Then I load my initial data.

insert into rainbowacid
select * from rainbow;

insert into gpsacid 
select * from gps;

Hive 3.x Updates

%jdbc(hive) CREATE TABLE Persons_default (
    ID Int NOT NULL,
    Name String NOT NULL,
    Age Int,
    Creator String DEFAULT CURRENT_USER(),
    CreateDate Date DEFAULT CURRENT_DATE()
)

One of the cool new features in Hive is that you can now have defaults, as you can see, which are helpful for things like standard defaults you might want like current data. This gives us even more relational style features in Hive.

Another very interesting feature is materialized views which help you for having clean and fast subqueries. Here is a cool example:

CREATE MATERIALIZED VIEW mv1
AS
SELECT dest,origin,count(*)
FROM flights_hdfs 
GROUP BY dest,origin

Thanks, and let me know your thoughts in the comments section!

Robot Development Platforms: What the heck is ROS and are there any frameworks to make coding a robot easier? Read more on MistyRobotics.com

Topics:
hdf3.2 ,spark ,tensorflow ,acid ,hive ,hadoop ,artificial intelligence ,iot ,deep learning ,hdp 3.0

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}