Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Creating Word Clouds From DataFlows With Apache NiFi and Python

DZone's Guide to

Creating Word Clouds From DataFlows With Apache NiFi and Python

WordCloud is easy to integrate with Apache NiFi so it's part of your big data pipeline, making it another great tool in the Data Engineer tool box.

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

Python is awesome. There are so many awesome libraries — one of which is WordCloud. You don't even need to write a Python app to use it. When you do the pip install, you'll get a CLI that lets you run it. Brilliant! And it's easy to integrate with Apache NiFi so it's part of your big data pipeline, making it another great tool in the Data Engineer tool box.

Python Word Cloud

Integrating existing Python libraries and scripts is very easy in Apache NiFi. I add the library for both versions of Python that I have on my system while moving all new scripts to the 3.x branch.

Install the library for both Python 2.7 and 3.5:

pip install wordcloud 
pip3 install wordcloud 

Example usage:

echo "NiFi\nHadoop\nSpark\n" | wordcloud_cli.py --imagefile wordcloud.png

For use in NiFi, I wrap my call with a shell script wc.sh:

echo $1 | tr " " "\n" | wordcloud_cli.py 

This will build a PNG for me that I can store in a file system or in HDFS. I updated the filename to add png at the end. This will take a parameter to a shell script (our Tweet) and convert it into words usable for a word cloud. You can use other sources or other methods of splitting words.

I am pulling Twitter messages, so I use ReplaceText to replace the flow file with ${msg}, which is just the tweet.

Then I execute the Python WordCloud CLI:

Example:

And that's it!

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.

Topics:
python ,wordcloud ,big data ,nifi

Published at DZone with permission of Tim Spann, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}