DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Migrate, Modernize and Build Java Web Apps on Azure: This live workshop will cover methods to enhance Java application development workflow.

Modern Digital Website Security: Prepare to face any form of malicious web activity and enable your sites to optimally serve your customers.

Kubernetes in the Enterprise: The latest expert insights on scaling, serverless, Kubernetes-powered AI, cluster security, FinOps, and more.

E-Commerce Development Essentials: Considering starting or working on an e-commerce business? Learn how to create a backend that scales.

Related

  • 6 Free Data Mining and Machine Learning eBooks
  • Five Tools for Data Scientists to 10X their Productivity
  • Transforming Text Messaging With AI: An In-Depth Exploration of Natural Language Processing Techniques
  • AIPRM Plugin for ChatGPT

Trending

  • DZone's Article Submission Guidelines
  • DDD and Microservices
  • What’s New Between Java 17 and Java 21?
  • Real-Time Anomaly Detection
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Using NLP To Uncover Truth in the Age of Fake News and Bots

Using NLP To Uncover Truth in the Age of Fake News and Bots

Cutting through the noise and misinformation shared online is crucial for creating accurate sentiment models.

Oussama Belmejdoub user avatar by
Oussama Belmejdoub
·
Dec. 09, 22 · Opinion
Like (1)
Save
Tweet
Share
3.6K Views

Join the DZone community and get the full member experience.

Join For Free

The modern political landscape is full of division. This is nothing new, and there have always been a number of factors contributing to the cut and thrust of political discourse. But today, political sentiment is influenced by more dynamic and immediate forces that can be used as tools in the information war. Traditional modes of communication, such as print media, political campaigns and advertisements are, of course, still prominent, but the modern information landscape contains the added variables of the web and, more significantly, social media. 

We are now in an age in which sentiment on any number of topics can be uncovered through the analysis of enormous amounts of data, ranging from the traditional, such as polls, election results and expert analysis, to alternative data sets, such as social media platforms. To ensure we get a true picture of any sentiment, however, we must be confident that the information we analyze is credible, which is becoming increasingly difficult to identify. As a data scientist with extensive experience in building sentiment models using natural language processing (NLP), I’d like to share my experience in uncovering the truth in today’s increasingly challenging information landscape. 

Seeking Validation

When it comes to creating a model for sentiment analysis, the value of alternative data sources cannot be understated. Social media platforms provide a wealth of information that can be analyzed and categorized in real-time, whereas traditional means, such as opinion polls and news sources offer snapshots in time.

At the beginning of any sentiment analysis project, it’s essential to decide on the data sources that will feed the model, as well as the methodology of creating the indicator, i.e. ensuring its output is the validated reality. As with any data science project, this requires a long period of discovery, data wrangling and validation. It’s also important to ensure all data is anonymized in advance of performing the analysis, in full compliance with data protection and privacy regulations. 

All projects begin with an idea. Essential to transforming ideas into workable solutions is collaboration and validation with subject matter experts. Indicators geared towards specific markets, such as real estate, for example, require expertise from these industries to ensure the methodologies behind them are sound. Once the models are run, variations can be queried and adjustments made in line with feedback, improving the performance of the models. 

For sentiment analysis that focuses on different countries, it is essential that the language of the target population is fully understood. If we are to decide which social media posts are expressing a negative or positive sentiment, we have to ensure that slang and dialect are also taken into account, which can be done at a local, regional, and national level. For these use cases, linguists and native language experts and data scientists are essential. 

Ultimately, our job when analyzing social media posts is to identify credible engagements with a topic, such as elections to political office. When it comes to social media platforms, such as Twitter, Telegram and WeChat, a credible source does not have to be an expert, it just has to be a real person engaging with the topic of discussion—but in the age of the bot, this is where things can become difficult.  

Finding Fakes

Increasingly, bots and fake news accounts dedicated to spreading misinformation and disinformation are being used to influence our perception of reality. This is where sentiment indicators that can sift through the noise and deliver true insights become invaluable. 

NLP is used for political indicators and financial indicators. For both, it is essential to avoid bots and fake news accounts. However, when it comes to political use cases, such as election results, there are far more real and fake users engaging with topics, which means there is more data to analyze. In my work creating sentiment analysis indicators, I have found that many accounts are bots, which must be removed from the data pipeline that feeds models. 

Through NLP, which harnesses input from subject matter experts and linguists, bots can be detected and discounted from the discourse under analysis, i.e. removed from the indicator. Twitter is, of course, the most popular platform so I will use this as my example use case. 

Deciding which accounts are bots involves a number of stages. Firstly, Twitter provides metadata on accounts, which provides an initial layer of analysis. Following some further validation work, the next layer is where the model must ascribe a sentiment to a Tweet. This requires the creation of a term-document matrix, in which negative, positive, and neutral sentiments can be determined through text analysis. State-of-the-art analysis of NLP methods, such as Bidirectional Encoder Representations from Transformers (BERT), can then be used to detect the context, syntax and semantics in text, enabling further accuracy when determining the sentiment related to a subject. Again, this is where earlier work with subject matter experts, on which terms are ascribed their values, comes into play. 

For an economic indicator, the term “increase production” in a Tweet would be positive when discussing a major exporter of crude oil but negative when relating to crude oil prices. This is why other terms within the same Tweet must also be considered, as well as the relationship between terms and the context in which they are used. Through an analysis of all the sentiments within a Tweet, the model will provide a score that is either positive or negative—with neutral results discounted from the final output. 

No Black Boxes

When developing an indicator, it’s essential that the underlying technology, data, and methodology used to construct the model are entirely explainable. Being able to explain every stage of the process, from data collation and validation to processing and fine-tuning, provides confidence to users that the model is not missing key data, was not constructed with bias, and ascribes sentiment in a logical and fair manner. 

Ultimately, the end result is an indicator but, as with all machine learning models, the output only accounts for around two per cent of the work that goes into creating a model. Showing the workings is not only the best practice but is also crucial for ensuring continuous improvement and accelerating the development of more compelling solutions.  

Data science Data wrangling IT Machine learning NLP News Sentiment analysis Use case Indicator (metadata) Media (communication)

Opinions expressed by DZone contributors are their own.

Related

  • 6 Free Data Mining and Machine Learning eBooks
  • Five Tools for Data Scientists to 10X their Productivity
  • Transforming Text Messaging With AI: An In-Depth Exploration of Natural Language Processing Techniques
  • AIPRM Plugin for ChatGPT

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: