Using Cloudera Data Science Workbench With Apache NiFi

DZone 's Guide to

Using Cloudera Data Science Workbench With Apache NiFi

Integrating machine learning models and data flow pipelines.

· AI Zone ·
Free Resource

Using Deployed Models as a Function as a Service

Using Cloudera Data Science Workbench with Apache NiFi, we can easily call functions within our deployed models from Apache NiFi as part of flows. I am working against CDSW on HDP, but it will work for all CDSW regardless of install type.

In my simple example, I built a Python model that uses TextBlob to run sentiment against a passed sentence. It returns Sentiment Polarity and Subjectivity, which we can immediately act upon in our flow.

CDSW is extremely easy to work with, and I was up and running in a few minutes. For my model, I created a Python 3 script and a shell script for install details. Both of these artifacts are available here

My Apache NiFi 1.8 flow is on GitHub (I use no custom processors).

Deploying a Machine Learning Model as a REST Service

Once you login to CDSW, create a project or choose an existing one. From your project, open workbench and you install some libraries and test some Python. I am using a Python 3 session to download the TextBlob/NLTK Corpora for NLP.

Let's Pip Install Some Libraries for Testing

Let's Create a new Model

Choose your file (mine is sentiment.py see GitHub). The function name is actually sentiment. Notice a typo; I had to rebuild this and deploy. Set up an example input (sentence is the input parameter name) and an example output. Input and output will be JSON since this is a REST API.

Let's Deploy It (Python 3)

We can see the standard output, standard error, status, # of REST calls received, and success.

Once Deployed, It's Ready To Test and Use From Apache NiFi

Just click test and see the JSON results. We can now call it from an Apache NiFi flow.

Once Deployed, We Can Monitor The Model

Once a Model is Deployed, We Can Control It

We can stop it, rebuild it, or replace the files if need be. I had to update things a few times. The number of resources used for the model rest hosting if your choice from a drop down. Since I am doing something small, I picked the smallest model with only 1 virtual CPU and 2 GB of RAM. All of this is running in Docker on Kubernetes.

Let's Run the Test

See the status and response!

Apache NiFi Example Flow

Step 1: Call Twitter

Step 2: Extract Social Attributes of Interest

Step 3: Build our web call with our access key and function parameter

Step 4: Extract our string as a flow file to send to the HTTP Post

Step 5: Call Our Cloudera Data Science Workbench REST API (see tester).

Step 6: Extract the two result values.

Step 7: Let's route on the sentiment

We can have negative (<0), neutral (0), positive (>0), and very positive (1) polarity of the sentiment. See TextBlob for more information on how this works.

Step 8: Send bad sentiment to a slack channel for human analysis.

We send all the related information to a slack channel including the message below.

Example Message Sent to Slack

Step 9: Store all the results (or some) in either Phoenix/HBase, Hive LLAP, Impala, Kudu, or HDFS.

Results as Attributes

Slack Message Call 

${msg:append(" User:"):append(${user_name}):append(${handle}):append(" Geo:"):append(${coordinates}):append(${geo}):append(${location}):append(${place}):append(" Hashtags:"):append(${hashtags}):append(" Polarity:"):append(${polarity}):append(" Subjectivity:"):append(${subjectivity}):append(" Friends Count:"):append(${friends_count}):append(" Followers Count:"):append(${followers_count}):append(" Retweet Count:"):append(${retweet_count}):append(" Source:"):append(${source}):append(" Time:"):append(${time}):append(" Tweet ID:"):append(${tweet_id})}

REST CALL to Model 
 {"accessKey":"from your workbench","request":{"sentence":"${msg:replaceAll('\"', ''):replaceAll('\n','')}"}}

apache-nifi, artificial intelligence, cloudera, data-science, machine learning, nlp, python3, sentiment-analysis, textblob

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}