Real-Time Ingestion and AI

DZone 's Guide to

Real-Time Ingestion and AI

Learn about the talk I gave at the last DataWorks Summit on using Apache NiFi to ingest and transform various data types using my Inception V3 TensorFlow Apache NiFi Processor.

· AI Zone ·
Free Resource

If you have not yet attended a DataWorks Summit, I highly recommend doing so. It is an amazing event held at three locations each year and is a great community experience. The content is deep and highly technical and you will learn about the current state-of-the-art processes and what's coming next. It's not just big data but also AI, streaming, microservices, containers, cloud, and many other topics that startups and enterprises alike need to know.

My topic was a simple talk on using Apache NiFi to ingest and transform various data types.

There is a small group forming around my quickly released Inception V3 TensorFlow Apache NiFi Processor. I encourage you to try it and provide feedback, pull requests, bug reports, documentation, unit tests, examples, and more. The Java API for TensorFlow is new so this is really basic. Thanks to Simon Elliston Ball for a major cleanup on it.

What do we want to do?

  • MiNiFi ingests camera images and sensor data
  • Run TensorFlow Inception v3 to recognize objects in image
  • NiFi stores images, metadata, and enriched data in Hadoop
  • NiFi ingests social data and feeds
  • NiFi analyzes sentiment of textual data
    • TensorFlow (C++, Python, Java) via ExecuteStream command
    • TensorFlow NiFi Java custom processor
    • TensorFlow running on edge nodes (MiniFi)
    • TensorFlow Mobile (iOS, Android, RPi)
    • TensorFlow on Spark (Yahoo) via Livy, S2S, Kafka
    • TensorFlow running on containers in YARN 3.0 on Hadoop

(NiFI 1.4) gRPC call to TensorFlow Server:

python classify_image.py --image_file/dir/solarroofpanel.jpg

solar dish, solar collector, solar furnace (score = 0.98316)
window screen    						   (score = 0.00196)
manhole cover   						   (score = 0.00070)
radiator								   (score = 0.00041)
doormat, welcome mat					   (score = 0.00041)  

Python uses:

pip install -U textblob python -m textblob.download_corpora  
pip install -U spacy python -m spacy.en.download all          
pip install -U nltk pip install -U numpy         


python sentiment.py "$@”          


from nltk.sentiment.vader import SentimentIntensityAnalyzer     
import sys     
sid = SentimentIntensityAnalyzer()     
ss = sid.polarity_scores(sys.argv[1])     
print('Compound{0}Negative {1}Neutral{2}Positive{3}'.format( ss['compound'],ss['neg'],ss['neu'],ss['pos']))   

These are some good Python libraries that you should be using with this. I recommend using Python 3.X unless you are stuck with 2.6/2.7.

I have also created two processors for working with text/NLP, these are listed below for Apache OpenNLP and Stanford CoreNLP.

Please comment on this article, check out GitHub and do pull requests, and come to a meetup!

References Code, Examples, Templates and Scripts for DataWorksSummit 2017 Sydney Tal

nifi ,python ,nlp ,deep learning ,ai ,tutorial ,real-time data ,ingestion ,tensorflow

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}