This Week in Hadoop and More: NLP and DL
Natural Language processing and various Deep Learning libraries combined with Spark.
Join the DZone community and get the full member experience.Join For Free
Many of the interesting new libraries coming out are in Python. So I suggest you get at least Python 2.7 (or Python 3) installed and have PIP available to install cool parsing, ML and DL libraries.
pip install -U spacy
The one thing I will warn you is that a lot of deep learning libraries and NLP libraries include large training datasets that could fill up gigabytes or more of space on your harddrive.
One excellent use of NLP is to identify names in a corpus of text, say a lot of tweets you have stored in your data lake or a huge collection of corporate documents. See this article I wrote on the topic.
Quick Sentiment Analysis in a Few Lines of Python
from nltk.sentiment.vader import SentimentIntensityAnalyzer import sys sid = SentimentIntensityAnalyzer() ss = sid.polarity_scores(sys.argv) if ss['compound'] == 0.00: print('Neutral') elif ss['compound'] < 0.00: print ('Negative') else: print('Positive')
Caffe is another great deep learning library, this one has support from Yahoo and others. For using the ever possible ImageNet, check this out. A web demo of interfacing with Caffe. There are so many flavours of Deep Learning, I am hoping Keras will help unify them all. My odds on favorites and TensorFlow and DeepLearning4J dominating due to the ecosystems, community, backers, quality, mind share and Keras. It's nice to have Microsoft and Google competiting to see who can provide the best open source libraries!!! I am hoping many of these libraries will move into Apache and get unified under one banner. Imagine all those developers, scientists working on one unified framework, algorithms, models and documentation. Skynet in 2 years... Pretrained model zoos are awesome, but I think Pet Clone Farm sounds better as pets are pretrained. You can take animals from the zoo and they are not pretrained.
Speaking of Model Zoos
- Caffe's Model Zoo - the original
Deep Learning 4 J Model Zoo - JVM!
Universal Dependencies — Not a model zoo, but helpful
Must-Watch Presentations to Start Your Year
Visual Detection Recognition and Tracking with Deep Learning by Yu Huang, Senior Architect, Autonomous Driving@Baidu USA — This 225 page presentation in amazingly in depth.
Using GPUs Within SPARK
Deep Learning Resources
Opinions expressed by DZone contributors are their own.