DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Integration of Apache NiFi and Cloudera Data Science Workbench for Deep Learning Workflows

Integration of Apache NiFi and Cloudera Data Science Workbench for Deep Learning Workflows

Deep learning integration with Apache NiFi.

Tim Spann user avatar by
Tim Spann
CORE ·
Feb. 15, 19 · Tutorial
Like (8)
Save
Tweet
Share
7.81K Views

Join the DZone community and get the full member experience.

Join For Free

Summary

Now that we have shown that it is easy to do standard NLP, next up is Deep Learning. As you can see, NLP, Machine Learning, Deep Learning, and more are all in your reach for building your own AI as a Service using tools from Cloudera. These can run in public or private clouds at scale. Now you can run and integrate machine learning services, computer vision APIs, and anything you have created in-house with your own Data Scientists. The YOLO pre-trained model will download the image to /tmp from the URL to process it. The Python 3 script will also download the GLUONCV model for YOLO3.

Using Pre-trained Model:

yolo3_darknet53_voc 

Image Sources:

https://github.com/tspannhw/images and/or https://picsum.photos/400/600

Example Input:

{ "url": "https://raw.githubusercontent.com/tspannhw/images/master/89389-nifimountains.jpg" }

Sample Call to Our REST Service:

curl -H "Content-Type: application/json" -X POST http://myurliscoolerthanyours.com/api/altus-ds-1/models/call-model -d '{"accessKey":"longkeyandstuff","request":{"url":"https://raw.githubusercontent.com/tspannhw/images/master/89389-nifimountains.jpg"}}'

Sample JSON Result Set:

{
  "class1": "cat",
  "pct1": "98.15670800000001",
  "host": "gluoncv-apache-mxnet-29-49-67dfdf4c86-vcpvr",
  "shape": "(1, 3, 566, 512)",
  "end": "1549671127.877511",
  "te": "10.178656578063965",
  "systemtime": "02/09/2019 00:12:07",
  "cpu": 17,
  "memory": 12.8
}

Example Deployment:

Model Resources
Replicas 1 
Total CPU 1 vCPUs
Total Memory 2.00 GiB

I recommend for Deep Learning models to use more vCPUs and more memory as you will be manipulating images and large tensors. I also recommend more replicas for production use cases. You can have up to 9. I like the idea of 3, 5, or 7 replicas.

How-To

Step 1: Let's Clean Up and Test Some Python 3 Code

First, I take an existing Python3 GluonCV Apache MXNet YOLO example code that I already have. As you can see, it uses a pre-trained model from Apache MXNet's rich model zoo. This started here. I paired down the libraries as I used an interactive Python 3 session to test and refine my code. As before, I set a variable to pass in my data; this time, a URL pointing to an image.

As you can see in my interactive session, I can run my YOLO function and get results. I had a library in there to display the annotated image while I was testing. I took this code off to save time, memory, and to reduce libraries. This was needed while testing though. The model seems to be working because it identified me as a person and my cat as a cat.

Step 2: Create, Build, and Deploy a Model

I got to models, point to my new file and the function I used (YOLO), and put in a sample Input and response.

I deploy it, and it is then in the list of available models. You go through a few steps as the docker container is deployed to K8, and all the required pips are installed during a build process.

Once it is built, you can see the build(s) in the Build screen.

Step 3: Test the Model

Once it is done building and marked as deployed, we can use the built-in tester from Overview.

We can see that the result in JSON is ready to travel over an HTTP REST API.

Step 4: Monitor the Deployed Model

We can see the standard output and error and see how many times we are called and success.

You can see that it downloaded the model from the Apache MXNet zoo.

If you need to stop, rebuild, or replace a model, it's easy.

Step 5: Apache NiFi Flow

As you can see, it takes a few steps to run the flow. I am using GenerateFlowFile to get us started, but I could have a cron scheduler starting us or react to a Kafka/MQTT/JMS message or another trigger. I then build the JSON needed to call the REST API. Example: {"accessKey":"accesskey","request":{"url":"${url}"}}

Then, we call the REST API via an HTTP Post (http://myurliscoolerthanyours.com/api/altus-ds-1/models/call-model).

We then parse the JSON, and it returns to just give us the fields we want. We don't really need status.

We name our schema so we can run Apache Calcite SQL queries against it.

Let's only save Cats and People to our Amazon S3 bucket.

At this point, I can add more queries and destinations. I can store it anywhere and everywhere.

Example Output

{  
"success": true,  
"response": 
{    "class1": "cat",    "cpu": 38.3,    "end": "1549672761.1262221",    "host": "gluoncv-apache-mxnet-29-50-7fb5cfc5b9-sx6dg",    "memory": 14.9,    "pct1": "98.15670800000001",    "shape": "(1, 3, 566, 512)",    "systemtime": "02/09/2019 00:39:21",    "te": "3.380652666091919"  }}

Build a Schema for the Data and store it in Apache NiFi AVRO Schema Registry or Cloudera Schema Registry

{ "type" : "record", "name" : "gluon", "fields" : [ { "name" : "class1", "type" : ["string","null"] }, { "name" : "cpu", "type" :  ["double","null"] }, { "name" : "end", "type" :  ["string","null"]}, { "name" : "host", "type" :  ["string","null"]}, { "name" : "memory", "type" :  ["double","null"]}, { "name" : "pct1", "type" :  ["string","null"] }, { "name" : "shape", "type" :  ["string","null"] }, { "name" : "systemtime", "type" :  ["string","null"] }, { "name" : "te", "type" :  ["string","null"] } ] }

I like to allow for nulls in case we have missing data, but that is up to your Data Steward and team. If you need to add a version of the schema with a new field, you must add "null" as an option since the old data won't have that if you wish to share a schema.

Source

https://github.com/tspannhw/nifi-cdsw-gluoncv

cdswmxnet.xml

Data science Apache NiFi Deep learning Integration Workbench (AmigaOS)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • 5 Challenges in Building Distributed Systems
  • Essential Mobile App Security Tips for Coders in 2023: Make Your App Unhackable
  • Debugging Streams and Collections
  • Architecture and Code Design, Pt. 2: Polyglot Persistence Insights To Use Today and in the Upcoming Years

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: