DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Improving Sentiment Score Accuracy With FinBERT and Embracing SOLID Principles
  • I Was Tired of Flying Blind With AI Agents, So I Built AgentDog
  • Prompt Injection Is Real, So I Built a Python Firewall for LLM Pipelines
  • Building Threat Intelligence Pipelines Using Python, APIs, and Elasticsearch

Trending

  • 8 RAG Patterns You Should Stop Ignoring
  • Implementing Secure API Gateways for Microservices Architecture
  • The 7 Pillars of Meeting Design: Transforming Expensive Conversations into Decision Assets
  • Reactive Ops to Autonomous Infrastructure: How Agentic AI Is Redefining Modern DevOps
  1. DZone
  2. Coding
  3. Languages
  4. A Developer's Guide to Sentiment Analysis With Naive Bayes and Python

A Developer's Guide to Sentiment Analysis With Naive Bayes and Python

Learn sentiment analysis with Python and Scikit-learn using Naive Bayes. Build, train, and evaluate a text classifier for real-world applications.

By 
Soumya Banerjee user avatar
Soumya Banerjee
·
Nov. 06, 25 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
2.8K Views

Join the DZone community and get the full member experience.

Join For Free

Sentiment analysis is a powerful tool for understanding customer feedback, social media comments, and product reviews. It allows us to programmatically determine whether a piece of text is positive, negative, or neutral. While complex models like Transformers (e.g., BERT) often grab the headlines, the classic Multinomial Naive Bayes classifier remains a surprisingly effective, efficient, and interpretable baseline, especially for text-based tasks.

In this guide, we'll walk through a complete sentiment analysis project using Python and Scikit-learn. We'll cover:

  • Why Naive Bayes is a great starting point for text
  • Exploratory Data Analysis (EDA) to understand our dataset
  • Data preprocessing to clean and prepare the text
  • Vectorization using TF-IDF
  • Model training using a Scikit-learn Pipeline
  • Performance evaluation with code for Confusion Matrix, ROC-AUC, and Precision-Recall curves
  • Making predictions on new, unseen data

Why Naive Bayes for Text?

Before we dive in, why choose Naive Bayes?

  1. Speed: It's incredibly fast to train, even on large datasets.
  2. Efficiency: It requires a relatively small amount of training data to produce a decent result.
  3. Works well with high dimensions: Text classification is a high-dimensional problem (one dimension for every unique word in your vocabulary). Naive Bayes handles this "wide" data gracefully.

It's called "naive" because it makes a "naive" assumption: that the presence of one word in a document is independent of the presence of all other words. While this is obviously false (the word "New" is highly dependent on the word "York"), the model works exceptionally well in practice.

Step 1: Setup and Exploratory Data Analysis (EDA)

Before writing any machine learning code, we must first understand our data. For this project, we'll simulate a movie review dataset with pre-labeled positive (1) and negative (0) reviews.

First, let's get our imports and data ready.

Python
 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from wordcloud import WordCloud

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
from sklearn.metrics import (
    accuracy_score, 
    confusion_matrix, 
    classification_report, 
    RocCurveDisplay, 
    PrecisionRecallDisplay
)

# --- Download NLTK assets ---
# (Only need to run this once)
# nltk.download('stopwords')
# nltk.download('wordnet')
# nltk.download('omw-1.4') # For WordNet

# --- 1. Simulate our DataFrame ---
# In a real project, you'd use pd.read_csv() here
data = {
    'review': [
        "This movie was absolutely fantastic! The acting was superb.",
        "I loved this film. The story was compelling and beautiful.",
        "A truly great and moving picture. Highly recommend.",
        "What a wonderful movie. I'll watch it again.",
        "The best film I've seen all year.",
        "Completely boring. I fell asleep halfway through.",
        "A terrible plot and awful acting. Do not recommend.",
        "This was a bad movie. Just plain bad.",
        "I hated it. The end was a letdown.",
        "The characters were flat and the story was predictable."
    ],
    'sentiment': [1, 1, 1, 1, 1, 0, 0, 0, 0, 0] # 1 = Positive, 0 = Negative
}
df = pd.DataFrame(data)


Now, let's explore.

1. Check Class Balance

Is our dataset balanced? An imbalanced dataset (e.g., 90% positive, 10% negative) can mislead our model.

Python
 
sns.countplot(x='sentiment', data=df)
plt.title('Class Distribution (0=Negative, 1=Positive)')
plt.show()


Our tiny dataset is perfectly balanced, which is ideal.

2. Visualize With Word Clouds

Word clouds show the most frequent words for each sentiment.

Python
 
positive_text = ' '.join(df[df['sentiment'] == 1]['review'])
negative_text = ' '.join(df[df['sentiment'] == 0]['review'])

# Positive Word Cloud
wc_positive = WordCloud(width=800, height=400, background_color='white').generate(positive_text)
plt.figure(figsize=(10, 5))
plt.imshow(wc_positive, interpolation='bilinear')
plt.title('Most Frequent Words in Positive Reviews')
plt.axis('off')
plt.show()


Positive reviews are dominated by words like "fantastic," "great," "loved," "superb," and "beautiful."

Python
 
# Negative Word Cloud
wc_negative = WordCloud(width=800, height=400, background_color='black', colormap='Reds').generate(negative_text)
plt.figure(figsize=(10, 5))
plt.imshow(wc_negative, interpolation='bilinear')
plt.title('Most Frequent Words in Negative Reviews')
plt.axis('off')
plt.show()


Negative reviews feature "terrible," "awful," "boring," "bad," and "hated."

3. Analyze Review Lengths

Is there a correlation between the length of a review and its sentiment?

Python
 
df['review_length'] = df['review'].apply(len)
sns.histplot(data=df, x='review_length', hue='sentiment', multiple='stack', bins=20)
plt.title('Distribution of Review Lengths by Sentiment')
plt.show()


In our simple data, there's no clear pattern, but in a larger dataset, you might find (for example) that negative reviews are often shorter and more "to the point."

Step 2: Data Preprocessing — Cleaning Our Text

Raw text is messy. To make it usable, we need to clean it. We'll write a single function to do this.

  1. Convert to lowercase: Ensures "Movie" and "movie" are treated as the same word.
  2. Remove punctuation and numbers: These characters generally don't add sentiment value.
  3. Remove stop words: Eliminate common words like "the," "a," and "is" that don't carry sentiment.
  4. Lemmatization: Reduce words to their root form (e.g., "running" becomes "run," "was" becomes "be"). This group relates words together.
Python
 
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

def preprocess_text(text):
    # 1. Convert to lowercase
    text = text.lower()
    
    # 2. Remove punctuation and numbers
    text = re.sub(r'[^a-zA-Z\s]', '', text)
    
    # 3. Tokenize (split into words)
    words = text.split()
    
    # 4. Remove stop words and lemmatize
    words = [lemmatizer.lemmatize(word) for word in words if word not in stop_words]
    
    # 5. Join back into a string
    return ' '.join(words)

# Let's see the 'before and after'
print(f"Original: {df['review'][0]}")
print(f"Cleaned:  {preprocess_text(df['review'][0])}")

# Apply this function to our entire 'review' column
df['cleaned_review'] = df['review'].apply(preprocess_text)


Output:

Plain Text
 
Original: This movie was absolutely fantastic! The acting was superb.
Cleaned:  movie absolutely fantastic acting superb


Step 3: Building and Training the Naive Bayes Model

Now we convert the cleaned text into numbers that our model can understand.

Vectorization (TF-IDF)

We'll use TF-IDF (Term Frequency-Inverse Document Frequency).

  • Term frequency (TF): How often a word appears in a single document (a review).
  • Inverse document frequency (IDF): How rare a word is across all documents.

This technique gives a high score to words that are frequent in one review but rare in all other reviews. This helps the model find unique, sentiment-bearing words.

Using a Pipeline (The Right Way)

The best practice in Scikit-learn is to use a Pipeline. A pipeline chains our steps (vectorizer and model) into one object. This preven

Classification Report and Confusion Matrix

Python
 
# 1. Define our features (X) and target (y)
X = df['cleaned_review']
y = df['sentiment']

# 2. Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# 3. Create the Scikit-learn Pipeline
# This pipeline will:
# 1. Apply TfidfVectorizer
# 2. Train a MultinomialNB model
pipeline = Pipeline([
    ('tfidf', TfidfVectorizer(max_features=5000, ngram_range=(1, 2))),
    ('model', MultinomialNB())
])

# 4. Train the model
# We just call .fit() on the pipeline!
pipeline.fit(X_train, y_train)

# 5. Make predictions
y_pred = pipeline.predict(X_test)


Step 4: Evaluating Model Performance

How did our model do? We evaluate its predictions on the X_test data.

Classification Report and Confusion Matrix

  • Accuracy: Overall percentage of correct predictions. (Use with caution on imbalanced data!)
  • Precision: Of all reviews we predicted as positive, how many were positive?
  • Recall: Of all actual positive reviews, how many did we find?
  • F1-score: The harmonic mean of Precision and Recall. A great all-around metric.

Classification report and confusion matrix

Python
 
# 1. Accuracy
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
print("-" * 40)

# 2. Classification Report
print(classification_report(y_test, y_pred))
print("-" * 40)

# 3. Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Predicted Negative', 'Predicted Positive'],
            yticklabels=['Actual Negative', 'Actual Positive'])
plt.title('Confusion Matrix')
plt.show()


ROC Curve (Receiver Operating Characteristic)

The ROC Curve plots the True Positive Rate against the False Positive Rate.

  • A curve in the top-left corner (AUC = 1.0) is a perfect model.
  • A diagonal line (AUC = 0.5) is a model that's no better than random guessing.

ROC curve

Python
 
# Plot ROC Curve
RocCurveDisplay.from_estimator(pipeline, X_test, y_test)
plt.title('ROC Curve for Naive Bayes Classifier')
plt.plot([0, 1], [0, 1], 'r--', label='Random Guess')
plt.legend()
plt.show()


Precision-Recall Curve

This curve is particularly useful when dealing with imbalanced datasets, as it focuses on the performance of the positive class.

Precision-recall curve
Python
 
# Plot Precision-Recall Curve
PrecisionRecallDisplay.from_estimator(pipeline, X_test, y_test)
plt.title('Precision-Recall Curve')
plt.show()


Step 5: Predicting on New Data

The best part! Let's use our trained pipeline to predict sentiment on new, raw text. The pipeline will automatically apply all the preprocessing steps and the TF-IDF vectorization.

Python
 
def predict_sentiment(text):
    # The pipeline does all the work:
    # 1. Preprocesses the text
    # 2. TF-IDF vectorizes it
    # 3. Predicts
    prediction = pipeline.predict([text])[0]
    probability = pipeline.predict_proba([text])[0]

    if prediction == 1:
        return f"Positive (Confidence: {probability[1]:.2f})"
    else:
        return f"Negative (Confidence: {probability[0]:.2f})"

# Try it out
print(predict_sentiment("This was the best movie I have ever seen!"))
print(predict_sentiment("The acting was stiff and the plot was just awful."))


Example output:

Plain Text
 
Positive (Confidence: 0.95)
Negative (Confidence: 0.99)


Conclusion

We've successfully built an end-to-end sentiment analysis classifier. We started by exploring our text data, then moved on to cleaning the text, building a robust Pipeline, training a Naive Bayes model, and finally, evaluating its performance with industry-standard metrics.

This process provides a solid baseline that can be applied to a wide range of text classification problems. From here, you could experiment with different vectorizers (like CountVectorizer), tune model hyperparameters (like alpha for Naive Bayes), or use this model's performance as a benchmark to see if more complex models (like Logistic Regression or Transformers) provide a significant lift.

Naive Bayes classifier Sentiment analysis Python (language)

Opinions expressed by DZone contributors are their own.

Related

  • Improving Sentiment Score Accuracy With FinBERT and Embracing SOLID Principles
  • I Was Tired of Flying Blind With AI Agents, So I Built AgentDog
  • Prompt Injection Is Real, So I Built a Python Firewall for LLM Pipelines
  • Building Threat Intelligence Pipelines Using Python, APIs, and Elasticsearch

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook