Over a million developers have joined DZone.

Natural Language Processing With Apache Spark

DZone's Guide to

Natural Language Processing With Apache Spark

This article focuses on natural language processing (text munging and machine learning) on the Apache Spark platform.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

There are a lot of exciting things going on in Natural Language Processing (NLP) in the Apache Spark world.   There's a ton of libraries and new work going on in OpenNLP and StanfordNLP.   

There's a lot of interesting applications that can be done using NLP and large sources of text, a few notes, presentations and code examples that follow will be helpful.

The last few Apache Spark Summit's have produced some great talks on NLP.

From the Advanced Apache Spark Meetup in San Francisco, there were a ton of great documents and source code:

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

apache spark ,big data ,hadoop ,machine learning ,nlp

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}