Over a million developers have joined DZone.

Natural Language Processing With Apache Spark

DZone's Guide to

Natural Language Processing With Apache Spark

This article focuses on natural language processing (text munging and machine learning) on the Apache Spark platform.

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

There are a lot of exciting things going on in Natural Language Processing (NLP) in the Apache Spark world.   There's a ton of libraries and new work going on in OpenNLP and StanfordNLP.   

There's a lot of interesting applications that can be done using NLP and large sources of text, a few notes, presentations and code examples that follow will be helpful.

The last few Apache Spark Summit's have produced some great talks on NLP.

From the Advanced Apache Spark Meetup in San Francisco, there were a ton of great documents and source code:

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

apache spark ,big data ,hadoop ,machine learning ,nlp

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}