Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

AI Determines How Content-Dense News Stories Are

DZone's Guide to

AI Determines How Content-Dense News Stories Are

The news media has gone through considerable scrutiny thanks to clickbait and fake news. These researchers are determined to use AI to analyze this.

· AI Zone ·
Free Resource

Insight for I&O leaders on deploying AIOps platforms to enhance performance monitoring today. Read the Guide.

With the whole fake news debate, there has been considerable scrutiny given to the news media in recent years. A big part of this trend has been the huge growth in content, with digital channels churning out content around the clock. While some of this content might be somewhat "fake," there is also a high degree of "me too" content that is not especially adding to the discourse.

Research led by the University of Pennsylvania sees an AI tool used to autonomously rank content according to its 'content density'. The system was able to accurately sort and classify news stories across a range of domains, comparing each piece of content with articles already correctly classified.

The algorithm was trained on a batch of around 50,000 articles from the New York Times linguistic dataset. This contains not only the original articles but also their metadata and short summaries of each piece. The leading paragraph of each story was then compared to the summary attached to the article, with the difference between the two used as an indicator of the information richness of the piece. This is because the summary will have incredibly dense content and will, therefore, be a good benchmark to compare against.

Rating the News

The content density is, therefore, the difference between the two scores. The stories were initially rated by a combination of recruits from Mechanical Turk, the research team, and the algorithm they'd developed. These articles, together with their content density team, are then fed to the algorithm so that it can develop an understanding of what is and is not content dense.

As you might expect, the rules for what is and is not content-dense will vary significantly depending on the topic of the story. For instance, sports stories would veer towards the non-content-dense end of things.

The algorithm was put through its paces against a training subset that had already been labeled accurately. It was able to provide a good reflection of the density in around 80% of instances, which is a reasonable start point.

"We have confirmed that the automatic annotation of data captures distinctions in informativeness as perceived by people," the authors say. "We also show proof-of-concept experiments that show how the approach can be used to improve single-document summarization of news and the generation of summary snippets in news-browsing applications. In future work the task can be extended to more fine-grained levels, with predictions on sentence level and the predictor will be integrated into a fully functioning summarization system."

TrueSight is an AIOps platform, powered by machine learning and analytics, that elevates IT operations to address multi-cloud complexity and the speed of digital transformation.

Topics:
ai ,algorithm ,machine learning ,data analytics

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}