There are a lot of exciting things going on in Natural Language Processing (NLP) in the Apache Spark world. There's a ton of libraries and new work going on in OpenNLP and StanfordNLP.
There's a lot of interesting applications that can be done using NLP and large sources of text, a few notes, presentations and code examples that follow will be helpful.
The last few Apache Spark Summit's have produced some great talks on NLP.
From NBCU, is this cool talk on Use of Spark MLib for Predicting the Offlining of Digital Media
- NLP on a Billion Documents: Scalable Machine Learning with Apache Spark using Python
From the Advanced Apache Spark Meetup in San Francisco, there were a ton of great documents and source code: