Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Open Source Tool Aims to Provide Smart Textual Analysis

DZone's Guide to

Open Source Tool Aims to Provide Smart Textual Analysis

Recently I looked at a couple of platforms that aimed to locate the latest research in particular fields. The automated systems used some clever technology to analyze the text contained in papers to judge the merits as well as the content of each paper for the user.

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

Recently I looked at a couple of platforms that aimed to locate the latest research in particular fields.  The automated systems used some clever technology to analyze the text contained in papers to judge the merits as well as the content of each paper for the user.

I have some doubts over the sense-making capabilities of the various platforms on the market, but what is not in doubt are the text analyzing capabilities currently out there.

A good example of this is a tool recently developed by researchers at USC.  The tool, called TACIT (Text Analysis, Crawling and Interpretation Tool), is an open source tool for gathering, managing and analyzing text.

“Currently [text analysis] techniques are available as independent programs or software, but they require a lot of expertise and because social scientists often don’t have the programming background, they don’t use them,” the team say. “So we’ve created a very researcher-friendly environment where they can easily access and use these methods. And if they want more, anyone can write their own plugins for the system.”

Smart Text Analysis

The tool utilizes a number of techniques by which sense can be made of a piece of text.  What’s nice about it is the open-source nature of the software, so that other developers can easy develop plugins to extend its functionality.

The software comes with three core components:

  1. a crawler to allow text to be captured from a range of online sources
  2. a corpus management feature to process and store bodies of text
  3. an analysis tool to count instances and ratios of words

The software is due for a beta launch this month, with a final release due to go live in March 2016.  The team are confident in the demand for the solution however.

“In the first week the program launched,” they say, “there were over 2,500 hits to our website. We had people from Kenya to Vietnam, from Uruguay to Estonia and from Hawaii to Maine downloading the software.”

The last few years have seen tremendous gains in language processing, with innovations such as Siri achieving high levels of competence.

The TACIT team hope that their own system can take things on one step further.  It will be an interesting project to follow.

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.

Topics:
text analysis ,programming

Published at DZone with permission of Adi Gaskell, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}