Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Solving a Polyglot Error in Python

DZone's Guide to

Solving a Polyglot Error in Python

I came across some issues when I tried do analyze data about Russian Twitter trolls, but ran into some issues. Here's how I solved them.

· Big Data Zone
Free Resource

Access NoSQL and Big Data through SQL using standard drivers (ODBC, JDBC, ADO.NET). Free Download 

I wanted to use the polyglot NLP library that my colleague Will Lyon mentioned in his analysis of Russian Twitter trolls, but had installation problems that I thought I’d share in case anyone else experiences the same issues.

I started by trying to install polyglot:

$ pip install polyglot
 
ImportError: No module named 'icu'

Hmmm, I’m not sure what icu is, but luckily there’s a GitHub issue covering this problem. That led me to Toby Fleming’s blog post that suggests the following steps:

brew install icu4c
export ICU_VERSION=58
export PYICU_INCLUDES=/usr/local/Cellar/icu4c/58.2/include
export PYICU_LFLAGS=-L/usr/local/Cellar/icu4c/58.2/lib
pip install pyicu

I already had icu4c installed, so I just had to make sure that I had the same version of that library as Toby did. I ran the following command to check that:

$ ls -lh /usr/local/Cellar/icu4c/
total 0
drwxr-xr-x  12 markneedham  admin   408B 28 Nov 06:12 58.2

That still wasn’t enough, though! I had to install these two libraries, as well:

pip install pycld2
pip install morfessor

I was then able to install polyglot, but had to then run the following commands to download the files needed for entity extraction:

polyglot download embeddings2.de
polyglot download ner2.de
polyglot download embeddings2.en
polyglot download ner2.en

And that's all!

The fastest databases need the fastest drivers - learn how you can leverage CData Drivers for high performance NoSQL & Big Data Access.

Topics:
big data ,python ,nlp ,data analytics ,tutorial

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}