This Week in Hadoop and More: NLP and DL

DZone 's Guide to

This Week in Hadoop and More: NLP and DL

Natural Language processing and various Deep Learning libraries combined with Spark.

· Big Data Zone ·
Free Resource

Many of the interesting new libraries coming out are in Python.   So I suggest you get at least Python 2.7 (or Python 3) installed and have PIP available to install cool parsing, ML and DL libraries.

 pip install -U spacy 

The one thing I will warn you is that a lot of deep learning libraries and NLP libraries include large training datasets that could fill up gigabytes or more of space on your harddrive.


One excellent use of NLP is to identify names in a corpus of text, say a lot of tweets you have stored in your data lake or a huge collection of corporate documents.   See this article I wrote on the topic.

Quick Sentiment Analysis in a Few Lines of Python

from nltk.sentiment.vader import SentimentIntensityAnalyzer
import sys

sid = SentimentIntensityAnalyzer()
ss = sid.polarity_scores(sys.argv[1])
if ss['compound'] == 0.00:
elif ss['compound'] < 0.00:
print ('Negative')

Deep Learning

Caffe is another great deep learning library, this one has support from Yahoo and others.   For using the ever possible ImageNet, check this out.  A web demo of interfacing with Caffe.  There are so many flavours of Deep Learning, I am hoping Keras will help unify them all.   My odds on favorites and TensorFlow and DeepLearning4J dominating due to the ecosystems, community, backers, quality, mind share and Keras.   It's nice to have Microsoft and Google competiting to see who can provide the best open source libraries!!!   I am hoping many of these libraries will move into Apache and get unified under one banner.   Imagine all those developers, scientists working on one unified framework, algorithms, models and documentation.   Skynet in 2 years...   Pretrained model zoos are awesome, but I think Pet Clone Farm sounds better as pets are pretrained.   You can take animals from the zoo and they are not pretrained.

Speaking of Model Zoos

Must-Watch Presentations to Start Your Year

Using GPUs Within SPARK

IBM has a few interesteing enhancements to Spark to allow usage of GPU processing power.   GPUs are becoming extremely useful for processing machine learning, deep learning and number crunching jobs.

Deep Learning Resources

deep learning, hadoop, machine learning, nlp, spark

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}