Over a million developers have joined DZone.

Book Review: Natural Language Processing with Python

· Web Dev Zone

Learn why developers are gravitating towards Node and its ability to retain and leverage the skills of JavaScript developers and the ability to deliver projects faster than other languages can.  Brought to you in partnership with IBM.

Natural Language Processing with Python” provides a nice overview of NLP techniques and Python, using NLTK (Natural Language Toolkit), a framework maintained by the books authors. It’s intended for use as (I assume) under-grad textbook (some of their examples of “difficult” bits of code will not appear difficult to more experienced programmers).

Don’t be put off by the use of a specific library, or the idea of reading a textbook – the book is written in an easy-to-read, engaging style, and the library makes it easy to get into NLP. Most of their examples could be reproduced in any preferred framework/language, given access to the right data. The framework exists in part to make it very easy to get started with NLP, and provides an easy mechanism to download some useful datasets, as well as APIs that are thorough enough to get you going in no time. I haven’t used NLTK enough yet to have a feeling one way or the other about whether it is suitable for production use, but clearly it is good for prototyping.

The book’s treatment of Python is interesting- various language structures are introduced throughout the book, mostly sprinkled at the end of each chapter. For someone experienced with the language, these could easily be skipped – not knowing Python, I found the examples sufficient to get me writing code in no time, without need for external references.

I found the existence of exercises at the ends of chapters quite helpful, even though I didn’t complete them all, as they provide food for thought and insight into how techniques are used. Different chapters covers basic analysis, part of speech tagging, entity extraction, summarizing text contents, and grammars.

It becomes clear on reading through the book that a lot of NLP is very similar to at ETL data cleaning process, except with the caveat of likely never being fully “solved.” A lot of the techniques are specific tactics large volumes of English text; a lot of the work get you most of the way to a solution, at which point you either are forced to data correct the errors, feed another process that is ok with errors, or just accept it.

There are different angles of approach to NLP problems – ranging from specific tricks and tactics, statistical modelling techniques, to formalized grammars on the more rigid mathematical side. A surprise to me was the coverage that formalized grammars and lambda calculus receive in this book – clearly language does not make formalized grammars easy to develop, and the book covers a series of powerful extensions to concepts I learned in school like context-free grammars, which make them more attractive.

Perhaps more surprising is that the most accurate NLP results appear to come not from a particular approach, but from combining results of different types of algorithms. “Natural Language Processing with Python“ has numerous passing mentions throughout to real use cases, which I find helpful to see the value of the material – text to speech, language translation, entity recognition, text summarization, etc. Overall, a good read.

Make the transition to Node.js if you are Java, PHP, Rails or .NET developer with these resources to help jumpstart your Node.js knowledge plus pick up some development tips.  Brought to you in partnership with IBM.

Topics:

Published at DZone with permission of Gary Sieling, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}