DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Data
  4. IoT Edge Processing With Apache NiFi, MiniFi, and Multiple Deep Learning Libraries: Part 2

IoT Edge Processing With Apache NiFi, MiniFi, and Multiple Deep Learning Libraries: Part 2

Check out a tutorial that explains IoT edge processing with Apache NiFI, miniFi, and multiple Deep Learning libraries.

Tim Spann user avatar by
Tim Spann
CORE ·
Sep. 17, 18 · Tutorial
Like (2)
Save
Tweet
Share
6.00K Views

Join the DZone community and get the full member experience.

Join For Free

For: https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/68140

See Part 1!

We had a lot of fun at the conference, with the looks of interesting things going on. There's an interesting new way to easily run TensorFlow on YARN! It's called Submarine, and I'll do an article on that in the near future.

Image title

Step By Step Processing

Step 1: Install Apache NiFi (One or More Nodes or clusters)

Choose:

1.  HDF Enterprise Install

2.  For the container folks: docker pull hortonworks/nifi

3.  Download standalone Apache NiFi 1.7.1 from Apache NiFi site.

For full Apache NiFi Configuration for IoT, see this.

You will need to set:

nifi.remote.input.host and

nifi.remote.input.socket.port in the conf/nifi.properties or Ambari settings.

Step 2: Install Apache NiFi: MiniFi on Your Device(s)

Download MiniFi.

You can choose Java or C++. For your first usage, I recommend the Java edition unless your device is too small.

You can also install on a RHEL or Debian Linux machine or OSX.

Download MiniFi Toolkit.

Step 3: Install Apache MXNet (On MiniFi Devices and NiFi Nodes: optional)

Install Apache MXNet

Install build tools and build from scratch Walkthrough install: 

For Source Github: see this.

Hive — SQL — IoT Data Storage

In this section, we will focus on converting JSON to AVRO to Apache ORC and storage options in Apache Hive 3. I am doing two styles of storage for one of the tables, rainbow. I am storing ORC files with an external table as well as using the Streaming API to store into an ACID table.

NiFi — SQL — On Streams — Calcite

SELECT *
FROM FLOWFILE
WHERE CAST(memory AS FLOAT) > 0 

SELECT *
FROM FLOWFILE
WHERE CAST(tempf AS FLOAT) > 65

I check the flows as they are ingested real-time and filter based on conditions such as memory or temperature. This makes for some powerful and easy simple event processing. This is very handy when you may want to filter out standard conditions where no anomaly has occurred.

IoT Data Storage Options

For time series data, we are blessed with many options in HDP 3.x. The simplest choice I am doing first here. That's a simple Apache Hive 3.x table. Since our data is only 2 rows per minute, we are good.

For second and sub-second data, we need to consider either Druid or HBase. The nice thing is these NoSQL options also have SQL interfaces to use. It comes down to how you are going to query the data and which one you like.

HBase + Phoenix is performant and been used in production forever. With HBase 2.x there are really impressive updates that make this a good option.

For richer analytics and some really cool analytics with Apache Superset, it's hard not to recommend Druid. Druid has really been improved recently and well integrated with Hive 3's rich querying.

Example of Our Geo Data

{"speed": "0.145", "diskfree": "4643.2 MB", 
 "altitude": "6.2", "ts": "2018-08-30 17:47:03", 
 "cputemp": 52.0, "latitude": "38.9789405", 
 "track": "0.0", "memory": 26.5, 
 "host": "rainbow", "uniqueid": "gps_uuid_20180830174705", 
 "ipaddress": "172.20.10.8", "epd": "nan", 
 "utc": "2018-08-30T17:47:05.000Z", 
 "epx": "21.91", "epy": "31.536", 
 "epv": "73.37", "ept": "0.005", "eps": "63.07", 
 "longitude": "-74.824475167", "mode": "3", 
 "time": "1535651225.0", "climb": "0.0", 
 "epc": "nan"}

Tables

CREATE EXTERNAL TABLE IF NOT EXISTS rainbow (
  tempf DOUBLE, cputemp DOUBLE, pressure DOUBLE, 
  host STRING, uniqueid STRING, ipaddress STRING, 
  temp DOUBLE, diskfree STRING, altitude DOUBLE, 
  ts STRING, tempf2 DOUBLE, memory DOUBLE
) STORED AS ORC LOCATION '/rainbow';

create table rainbowacid(
  tempf DOUBLE, cputemp DOUBLE, pressure DOUBLE, 
  host STRING, uniqueid STRING, ipaddress STRING, 
  temp DOUBLE, diskfree STRING, altitude DOUBLE, 
  ts STRING, tempf2 DOUBLE, memory DOUBLE
) STORED AS ORC TBLPROPERTIES ('transactional' = 'true');

CREATE EXTERNAL TABLE IF NOT EXISTS gps (
  speed STRING, diskfree STRING, altitude STRING, 
  ts STRING, cputemp DOUBLE, latitude STRING, 
  track STRING, memory DOUBLE, host STRING, 
  uniqueid STRING, ipaddress STRING, 
  epd STRING, utc STRING, epx STRING, 
  epy STRING, epv STRING, ept STRING, 
  eps STRING, longitude STRING, mode STRING, 
  time STRING, climb STRING, epc STRING
) STORED AS ORC LOCATION '/gps';

CREATE TABLE IF NOT EXISTS gpsacid (
  speed STRING, diskfree STRING, altitude STRING, 
  ts STRING, cputemp DOUBLE, latitude STRING, 
  track STRING, memory DOUBLE, host STRING, 
  uniqueid STRING, ipaddress STRING, 
  epd STRING, utc STRING, epx STRING, 
  epy STRING, epv STRING, ept STRING, 
  eps STRING, longitude STRING, mode STRING, 
  `time` STRING, climb STRING, epc STRING
) STORED AS ORC TBLPROPERTIES ('transactional' = 'true');

CREATE EXTERNAL TABLE IF NOT EXISTS movidiussense (
  label5 STRING, runtime STRING, label1 STRING, 
  diskfree STRING, top1 STRING, starttime STRING, 
  label2 STRING, label3 STRING, top3pct STRING, 
  host STRING, top5pct STRING, humidity DOUBLE, 
  currenttime STRING, roll DOUBLE, uuid STRING, 
  label4 STRING, tempf DOUBLE, y DOUBLE, 
  top4pct STRING, cputemp2 DOUBLE, top5 STRING, 
  top2pct STRING, ipaddress STRING, 
  cputemp INT, pitch DOUBLE, x DOUBLE, 
  z DOUBLE, yaw DOUBLE, pressure DOUBLE, 
  top3 STRING, temp DOUBLE, memory DOUBLE, 
  top4 STRING, imagefilename STRING, 
  top1pct STRING, top2 STRING
) STORED AS ORC LOCATION '/movidiussense';

CREATE EXTERNAL TABLE IF NOT EXISTS minitensorflow2 (
  image STRING, ts STRING, host STRING, 
  score STRING, human_string STRING, 
  node_id INT
) STORED AS ORC LOCATION '/minifitensorflow2';

I have added some ACID tables that are the default in HDP 3/Hive 3. This lets me do updates! I will start trying both to see which is best.

There will be more coming soon!

Apache NiFi IoT Deep learning Processing Database

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Cloud-Native Application Networking
  • Best Practices for Writing Clean and Maintainable Code
  • What Should You Know About Graph Database’s Scalability?
  • Top 5 PHP REST API Frameworks

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: