DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Implementing Ethical AI: Practical Techniques for Aligning AI Agents With Human Values
  • Foundational Building Blocks for AI Applications
  • Demystifying the Magic: A Look Inside the Algorithms of Speech Recognition
  • Architecting High-Performance Supercomputers for Tomorrow's Challenges

Trending

  • Caching 101: Theory, Algorithms, Tools, and Best Practices
  • How to Introduce a New API Quickly Using Micronaut
  • The Perfection Trap: Rethinking Parkinson's Law for Modern Engineering Teams
  • Immutable Secrets Management: A Zero-Trust Approach to Sensitive Data in Containers
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. LLMs for Bad Content Detection: Pros and Cons

LLMs for Bad Content Detection: Pros and Cons

This post evaluates two different methods for identifying harmful content on the Internet: training supervised classifiers and using large language models.

By 
Kashyap Puranik user avatar
Kashyap Puranik
·
Pranesh Srinivasan user avatar
Pranesh Srinivasan
·
Sep. 25, 23 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
4.3K Views

Join the DZone community and get the full member experience.

Join For Free

Harmful content detection involves detecting content that is harmful to Internet users. Examples of harmful content include hateful/offensive content, spam, harassment, sexual content, phishing/scams, and solicitation.

Harmful content on content platforms can have a huge negative impact, including

  • Emotional distress, humiliation, and even physical harm to the users
  • Damage to the reputation of the platforms that host it
  • reduction in active users and difficulty in attracting advertisers

So, it is crucial to be able to identify and oversee harmful content, facilitating its removal. User-generated content platforms are at risk of this as they allow users to upload a wide range of content. UGC platforms include social media, messaging services, forums, gaming platforms, and marketplaces. Detection and mitigation of harmful content on these platforms hold significant importance.

To minimize the number of users exposed to such content, platforms often rely on automated detection and take down of harmful content. Automated detection can be a challenging task, as harmful content can take many forms (text, videos, images, links, etc.), and it can be difficult to distinguish between what is harmful and what is not. On top of this, false positives (automated systems incorrectly identifying something as being harmful) can also have a number of negative effects, including harm to the users, damage to the platform's reputation, potential legal challenges, and so on. Platforms use artificial intelligence (AI) to automatically detect harmful content, but they must carefully balance the detection of harmful content with the avoidance of false positives.

Supervised Classifiers

The most popular approach used for the automated detection of harmful content today is training classifiers (supervised machine learning models) to detect harmful content using a labeled dataset. A labeled dataset for a particular harm type consists of a number of both harmful and benign examples. The training process consists of feature extraction from the content followed by training of supervised classifiers using the extracted features and labels in the dataset.

With the emergence of pre-trained foundational models, the number of labeled datasets required has however been significantly reduced. The training process in the case of text classification, for example, in the foundational model approach, involves taking a pre-trained model, such as a BERT or RoBERTa, to generate embeddings of text, and using the embeddings as features to train traditional supervised classifiers. This approach requires a much smaller labeled dataset. Embeddings are fixed-length vector representations of text in our dataset used to capture the meaning. Thus, the supervised model learns to classify whether the meaning of the text is harmful or not.

Here are some examples of free, open-source foundational models that can be used as described above or fine-tuned for the purpose of classification.

Images can be additionally processed through optical character recognition (OCR), and audio/video can be processed through automated speech recognition (ASR) to extract text that can be subjected to harmful content detection.

Here is some sample code to train a hate classifier. This should train and output a model in a local directory called "hate"

Python
 
from datasets import load_dataset
from transformers import (
    AutoTokenizer,
    DataCollatorWithPadding,
    AutoModelForSequenceClassification,
    TrainingArguments,
    Trainer
)
import numpy as np
from datasets import load_metric
 
# Load any dataset of choice for training.
hate_dataset = load_dataset("SetFit/toxic_conversations")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

def preprocess_function(examples):
  return tokenizer(examples["text"], truncation=True)

tokenized_train = hate_dataset["train"].map(preprocess_function, batched=True)
tokenized_test = hate["test"].map(preprocess_function, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
model = AutoModelForSequenceClassification.from_pretrained(
  "distilbert-base-uncased", num_labels=2)
    
def compute_metrics(eval_pred):
  load_accuracy = load_metric("accuracy")
  logits, labels = eval_pred
  predictions = np.argmax(logits, axis=-1)
  accuracy = load_accuracy.compute(
    predictions=predictions, references=labels)["accuracy"]
  return {"accuracy": accuracy}

training_args = TrainingArguments(
  output_dir="hate",
  evaluation_strategy = "epoch",
  save_strategy = "epoch",
  learning_rate=2e-5,
  per_device_train_batch_size=16,
  per_device_eval_batch_size=16,
  num_train_epochs=2,
  weight_decay=0.01,
)
trainer = Trainer(
  model=model,
  args=training_args,
  train_dataset=tokenized_train,
  eval_dataset=tokenized_test,
  tokenizer=tokenizer,
  data_collator=data_collator,
  compute_metrics=compute_metrics,
)
trainer.train()
trainer.evaluate()


Disadvantages of Supervised Classifiers

While using foundational models that have been trained on large amounts of text significantly reduces the number of labeled training examples needed to train a classifier, there are some disadvantages to this technique:

  • Supervised learning still requires labeled data, which may have to be created manually. This can be time-consuming and expensive to collect.
  • Supervised learning models can be sensitive to noise in the data. This means that even a small amount of incorrect or irrelevant data can significantly degrade the performance of the model.
  • Supervised learning models can be biased if the training data is biased. This means that the model may learn to make predictions that are not accurate or fair.

N-Shot Classification Using Large Language Models

N-shot classification is a machine learning technique that allows a model to classify objects from previously unseen classes without receiving any specific training for those classes. This can be done by providing the model with a set of class descriptions, which the model can then use to learn the features that distinguish the different classes.

To prompt an LLM to detect bad content, one can use a variety of techniques. One common technique is to use a natural language question, such as "Is this text hate speech?" The LLM can then be used to answer this question by predicting the class of the text. Another technique is to use a prompt that provides more information about the text, such as "This text contains the word 'hate' and the phrase 'kill all Immigrants.' Is it hate speech?" The LLM can then use this information to make a more informed decision about the class of the text. In addition to the question, a few examples can be provided as part of the prompt to help the LLM improve its performance.

The advantages of using LLMs for zero-shot classification of harmful content include:

  • LLMs can be trained on large datasets of text and code, which makes them more robust to variations in the way that harmful content is written.
  • They can be used to classify harmful content from previously unseen classes and subclasses without receiving any specific training for those classes. This makes them well-suited for emerging forms of harmful content.
  • They can be used to detect harmful content in a variety of languages. This makes them a valuable tool for global content moderation.
  • Most importantly, a big dataset is not needed for training a supervised classifier, which can reduce operations costs, time to launch, and.

Here is some sample ChatGPT API code to detect hate speech. It uses a 0-shot classification, but the N-shot would be similar. It is impressive to see how much smaller the amount of code is below.

Python
 
import openai
openai.api_key = # insert key here

def detect_hate(input_text):
  response = openai.ChatCompletion.create(
      model="gpt-3.5-turbo", # Or gpt-4
      temperature=0,  # For deterministic response with the most likely answer
      messages = [
        {"role": "system",
         "content" :
         """You are an expert content moderator."""},
        {"role": "user",
         "content" : "Is this hate speech?  ```%s```" %(input_text)}])

  return response["choices"][0]["message"]["content"]


Disadvantages of using LLMs for zero-shot/N-shot classification

  • They can be computationally expensive to train and deploy. It is highly discouraged to train a new large language model, and it is encouraged to use either proprietary models like GPT4, Palm 2, Claude 2 or open source models like LLAMA 2 and Falcon. Even with the usage of these models, the inference can be computationally expensive.
  • They can be susceptible to bias, which can lead to misclassification of harmful content.
  • It is hard to scale detection horizontally as proprietary models can have their own rate limits
  • This will also require sharing potentially sensitive user-generated private data with external parties
  • The additional computation brings in additional latency, and external service calls add further latency to detection depending on the size of the prompt.
  • While a training dataset is not needed, it is still important to evaluate the prompts for the performance. Small changes to prompts can lead to large changes in performance.
  • Complicated model-specific prompt engineering that does not apply across models may be required and can still require some initial learning investment.

Conclusion

Harmful content detection is a challenging but important task. By using the right approach, it is possible to develop systems that can effectively detect harmful content and protect users from harm. Large language models can help with N-shot classification and help a team quickly launch classifiers to detect a wide number of harmful content types across languages without the need for a large training dataset, while supervised detection using smaller models can help the team do it with lower latency, cost, in house and at scale with good training data.

AI Language model Machine learning Speech recognition

Opinions expressed by DZone contributors are their own.

Related

  • Implementing Ethical AI: Practical Techniques for Aligning AI Agents With Human Values
  • Foundational Building Blocks for AI Applications
  • Demystifying the Magic: A Look Inside the Algorithms of Speech Recognition
  • Architecting High-Performance Supercomputers for Tomorrow's Challenges

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!