DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Exploring the Landscape of Generative AI
  • Stop Poisoning Your Models: How I Built a CV Dataset Quality Toolkit I Can Reuse Forever
  • How to Effectively Evaluate a Ranking ML System
  • The Only AI Test That Still Humbles Every Machine on Earth

Trending

  • 5 Failure Patterns That Break AI Chatbots in Production
  • The Big Data Architecture Blueprint: Core Storage, Integration, and Governance Patterns
  • Reproducible Development Environments, One Command Away: Introducing CodingBooth
  • Building a RAG-Powered Bug Triage Agent With AWS Bedrock and OpenSearch k-NN
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Categorizing Content Without Labels Using Zero-Shot Classification

Categorizing Content Without Labels Using Zero-Shot Classification

Learn how zero-shot classification makes it easy to categorize content without needing labeled data by using pre-trained models for efficient results.

By 
Vamsi Kavuri user avatar
Vamsi Kavuri
DZone Core CORE ·
Dec. 02, 24 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
23.6K Views

Join the DZone community and get the full member experience.

Join For Free

Usually, when we want to classify content, we rely on labeled data and train machine learning models with that labeled data to make predictions on new or unseen data. For example, we might label each image in the image dataset as "dog" or "cat" or categorize an article as "tutorial" or "review." These labels help the model learn and make predictions on new data. 

But here is the problem: getting labeled data is not always easy. Sometimes, it can be really expensive or time-consuming, and on top of that, new labels might pop up as time goes on. That is where zero-shot classification comes into the picture. With zero-shot models, we can classify content without needing to train on every single labeled class beforehand. These models can generalize to new categories based on natural language by using pre-trained language models that have been trained on huge amounts of text. 

Zero-Shot Classification With Hugging Face

In this article, I will use Hugging Face's Transformers library to perform zero-shot classification with a pre-trained BART model. Let's take a quick summary of a DZone article and categorize it into one of the following categories: "Tutorial," "Opinion," "Review," "Analysis," or "Survey."

Environment Setup

  • Ensure Python 3.10 or higher is installed.
  • Install the necessary packages mentioned below.
Shell
 
pip install transformers torch


Now, let's use the following short summary from my previous article to perform zero-shot classification and identify the category mentioned above:

Summary:

"Learn how Integrated Gradients help identify which input features contribute most to the model's predictions to ensure transparency."

Python
 
from transformers import pipeline

# Initializing zero-shot classification pipeline using BART pre-trained model
zero_shot_classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

# tl;dr from this article - https://dzone.com/articles/integrated-gradients-ai-explainability
article_summary = """
Learn how Integrated Gradients helps identify which input features contribute most to the model's predictions to ensure transparency.
"""

# sample categories from DZone - 
sample_categories = ["Tutorial", "Opinion", "Review", "Analysis", "Survey"]

# Now, classify article into one of the sample categories.
category_scores = zero_shot_classifier(article_summary, sample_categories)

# pick the category with highest score and print
cateogry = category_scores['labels'][0]
print(f"The article is most likely a '{cateogry}'")


The model classified that the article is most likely a Tutorial. We could also check the scores of each category instead of picking the one with the highest score.  

Python
 
# Print score for each category 
for category in range(len(category_scores['labels'])):
    print(f"{category_scores['labels'][i]}: {category_scores['scores'][i]:.2f}")


Here is the output: 

Plain Text
 
Tutorial: 0.53
Review: 0.20
Survey: 0.12
Analysis: 0.10
Opinion: 0.06


These scores are helpful if you want to use zero-shot classification to identify the most appropriate tags for your content.

Conclusion

In this article, we explored how zero-shot classification can be used to categorize content without the need to train on labeled data. As you can see from the code, it is very easy to implement and requires just a few lines. 

While easy and flexible, these models might not work well in specialized categories where the model does not understand the specific terminology. For example, classifying a medical report into one of the categories like "Cardiology," "Oncology," or "Neurology" requires a deep understanding of medical terms that were not part of the model's pre-training. In those cases, you might still need to fine-tune the model with specific datasets for better results. 

Additionally, zero-shot models may have trouble with ambiguous language or context-dependent tasks, such as detecting sarcasm or cultural references.

Machine learning Agent-based model Natural language generation

Opinions expressed by DZone contributors are their own.

Related

  • Exploring the Landscape of Generative AI
  • Stop Poisoning Your Models: How I Built a CV Dataset Quality Toolkit I Can Reuse Forever
  • How to Effectively Evaluate a Ranking ML System
  • The Only AI Test That Still Humbles Every Machine on Earth

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook