A Guide to Aspect-Based Sentiment Analysis With GPT and BERT
Explore rapid prototyping with GPT and custom BERT fine-tuning to extract targeted sentiment insights for nuanced text analysis and business applications.
Join the DZone community and get the full member experience.
Join For FreeAspect-based sentiment analysis (ABSA) focuses on determining the sentiment (positive, negative, or neutral) associated with a specific aspect of a text. For example, in the sentence "The battery life is nice, but the screen is dim," ABSA helps identify that the sentiment toward the "battery life" is positive, while the sentiment toward the "screen" is negative. This capability is crucial for businesses to gain nuanced insights from customer feedback, product reviews, and social media.
ABSA is particularly useful for businesses and researchers to gain granular insights into user feedback, such as identifying how customers perceive different product features or services. It has applications in:
- E-commerce: Analyzing product reviews to identify customer satisfaction with specific features.
- Customer Support: Extracting actionable insights from feedback or complaints.
- Social Media Monitoring: Understanding public sentiment on particular aspects of a topic or event.
Approach 1: Using OpenAI's GPT API
This approach leverages OpenAI's GPT, a cutting-edge language model capable of performing sophisticated NLP tasks. By crafting prompts, we can instruct GPT to focus on specific aspects of a sentence and classify its sentiment. This approach is excellent for rapid prototyping, especially if you're new to sentiment analysis or lack labeled data.
Why Choose GPT for ABSA?
- Ease of use: Minimal coding required; no need to collect or prepare large datasets.
- High accuracy: Benefiting from pre-trained GPT models capable of understanding complex language nuances.
- Flexible applications: Ideal for scenarios where quick insights are needed without building a full-fledged machine learning model.
Prerequisites
- Python installed: Ensure Python 3.7 or later is installed.
- OpenAI API Key: Sign up at OpenAI to get your API key.
Step-by-Step Guide
1. Install Dependencies
Before proceeding, install the required library: This library enables you to interact with the OpenAI GPT API.
pip install openai
2. Set Up the OpenAI API Key
The script requires your OpenAI API key to authenticate requests. Replace ADD-YOUR-KEY-GET-FROM-OPENAI
with your actual API key in the following code snippet:
openai.api_key = "ADD-YOUR-KEY-GET-FROM-OPENAI"
3. Understanding the Code
The script is built around the function get_aspect_sentiment
, which accepts a sentence and an aspect as input. Here's the breakdown:
- Input parameters:
- sentence: The text to analyze.
- aspect: The specific aspect to identify sentiment for.
- Prompt design: The script uses a carefully crafted prompt to instruct the GPT model to focus on analyzing the sentiment of the given aspect within the context of the sentence. The prompt also instructs the model to consider subtle expressions and negations.
prompt = (
f"Identify the sentiment for the aspect '{aspect}' in the following sentence:\n\n"
f"Sentence: \"{sentence}\"\n\n"
"Sentiment options: Positive, Negative, or Neutral.\n"
"Please consider any subtle expressions and negations while analyzing the sentiment."
)
- OpenAI API call: The
openai.ChatCompletion.create
function sends the prompt to the GPT model for processing. The model's response is parsed to extract the sentiment.
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are an assistant that analyzes sentiment for specific aspects in a sentence."},
{"role": "user", "content": prompt}
],
max_tokens=50,
temperature=0
)
- Output: The function returns the sentiment as a string (Positive, Negative, or Neutral).
Complete Code:
import openai
# Set up the OpenAI API key
openai.api_key = "ADD-YOUR-KEY-FROM-OPENAI"
def get_aspect_sentiment(sentence, aspect):
# Craft the prompt for aspect-based sentiment analysis
prompt = (
f"Identify the sentiment for the aspect '{aspect}' in the following sentence:\n\n"
f"Sentence: \"{sentence}\"\n\n"
"Sentiment options: Positive, Negative, or Neutral.\n"
"Please consider any subtle expressions and negations while analyzing the sentiment."
)
# Make the API call
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are an assistant that analyzes sentiment for specific aspects in a sentence."},
{"role": "user", "content": prompt}
],
max_tokens=50,
temperature=0
)
# Extract the response text
sentiment = response['choices'][0]['message']['content'].strip()
return sentiment
# Example usage
sentence = "The battery life is nice."
aspect = "battery life"
sentiment = get_aspect_sentiment(sentence, aspect)
print(f"Sentiment for '{aspect}': {sentiment}")
Approach 2: Fine-Tuning BERT for ABSA
BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained transformer model designed for various NLP tasks. Unlike GPT, BERT is bidirectional, meaning it understands the context of words based on both preceding and following words in a sentence. In ABSA, BERT is fine-tuned to predict sentiment for specific aspects using a labeled dataset.
Why Choose BERT for ABSA?
- Domain adaptation: Fine-tune BERT on your specific dataset for high accuracy in niche applications.
- Offline capability: Use the trained model locally without requiring an internet connection.
- Scalability: Suitable for large-scale projects with diverse datasets.
Step-by-Step Guide
1. Install Dependencies:
pip install transformers torch scikit-learn pandas
2. Prepare the Dataset
The dataset includes sentences, aspects, and corresponding sentiment labels (e.g., 0 = Negative, 1 = Neutral, 2 = Positive).
import pandas as pd
data = pd.DataFrame({
'sentence': ["The battery life is amazing", "The camera quality is poor", "The screen is bright and clear"],
'aspect': ["battery life", "camera", "screen"],
'label': [2, 0, 2] # Sentiment: 2 = Positive, 0 = Negative
})
3. Create a Dataset Class
This class converts the dataset into a format compatible with the BERT tokenizer.
class ABSADataset(Dataset):
def __init__(self, data, tokenizer, max_length):
self.data = data
self.tokenizer = tokenizer
self.max_length = max_length
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
sentence = self.data.iloc[idx]['sentence']
aspect = self.data.iloc[idx]['aspect']
label = self.data.iloc[idx]['label']
inputs = self.tokenizer(
sentence, aspect,
add_special_tokens=True,
max_length=self.max_length,
padding='max_length',
truncation=True,
return_tensors='pt'
)
item = {key: val.squeeze() for key, val in inputs.items()}
item['labels'] = torch.tensor(label, dtype=torch.long)
return item
- Model and Training
from transformers import BertForSequenceClassification, Trainer, TrainingArguments
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=3)
train_dataset = ABSADataset(train_data, tokenizer, max_length=128)
test_dataset = ABSADataset(test_data, tokenizer, max_length=128)
training_args = TrainingArguments(
output_dir='./results',
num_train_epochs=2,
per_device_train_batch_size=4,
evaluation_strategy="epoch",
logging_dir='./logs'
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=test_dataset
)
trainer.train()
- Evaluation and Prediction
def predict_sentiment(model, tokenizer, sentence, aspect):
inputs = tokenizer(
sentence, aspect,
return_tensors="pt",
truncation=True,
padding='max_length',
max_length=128
)
outputs = model(**inputs)
logits = outputs.logits
predicted_class = torch.argmax(logits, dim=1).item()
sentiment_labels = {0: 'Negative', 1: 'Neutral', 2: 'Positive'}
return sentiment_labels[predicted_class]
print(predict_sentiment(model, tokenizer, "The battery life is amazing", "battery life"))
- Output: Positive
Complete Code:
import torch
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
from sklearn.model_selection import train_test_split
from torch.utils.data import Dataset, DataLoader
import pandas as pd
# Sample data (replace with actual dataset)
data = pd.DataFrame({
'sentence': ["The battery life is amazing", "The camera quality is poor", "The screen is bright and clear"],
'aspect': ["battery life", "camera", "screen"],
'label': [2, 0, 2] # 2 = Positive, 0 = Negative
})
# Train-test split
train_data, test_data = train_test_split(data, test_size=0.2, random_state=42)
class ABSADataset(Dataset):
def __init__(self, data, tokenizer, max_length):
self.data = data
self.tokenizer = tokenizer
self.max_length = max_length
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
sentence = self.data.iloc[idx]['sentence']
aspect = self.data.iloc[idx]['aspect']
label = self.data.iloc[idx]['label']
inputs = self.tokenizer(
sentence, aspect,
add_special_tokens=True,
max_length=self.max_length,
padding='max_length',
truncation=True,
return_tensors='pt'
)
item = {key: val.squeeze() for key, val in inputs.items()}
item['labels'] = torch.tensor(label, dtype=torch.long)
return item
# Initialize tokenizer and model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=3)
# Create datasets and dataloaders
train_dataset = ABSADataset(train_data, tokenizer, max_length=128)
test_dataset = ABSADataset(test_data, tokenizer, max_length=128)
train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=8)
# Define metrics computation
def compute_metrics(pred):
labels = pred.label_ids
preds = pred.predictions.argmax(-1)
precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='weighted')
acc = accuracy_score(labels, preds)
return {
'accuracy': acc,
'f1': f1,
'precision': precision,
'recall': recall
}
# Training arguments
training_args = TrainingArguments(
output_dir='./results',
num_train_epochs=2,
per_device_train_batch_size=4,
per_device_eval_batch_size=4,
warmup_steps=100,
weight_decay=0.01,
logging_dir='./logs',
logging_steps=10,
evaluation_strategy="epoch",
report_to="none" # Disable wandb and other logging integrations
)
# Initialize Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=test_dataset,
compute_metrics=compute_metrics
)
# Disable WandB logging
import os
os.environ["WANDB_DISABLED"] = "true"
# Train the model
trainer.train()
# Evaluate the model
results = trainer.evaluate()
print("Evaluation Results:", results)
# Define prediction function
def predict_sentiment(model, tokenizer, sentence, aspect):
inputs = tokenizer(
sentence, aspect,
return_tensors="pt",
truncation=True,
padding='max_length',
max_length=128
)
outputs = model(**inputs)
logits = outputs.logits
predicted_class = torch.argmax(logits, dim=1).item()
sentiment_labels = {0: 'Negative', 1: 'Neutral', 2: 'Positive'}
return sentiment_labels[predicted_class]
# Example prediction
print(predict_sentiment(model, tokenizer, "The battery life is amazing", "battery life"))
Choosing the Right Approach
When selecting the right approach for ABSA, it's important to weigh the strengths and limitations of each method. See the table below for a quick comparison between the two:
CRITERIA
|
GPT API
|
BERT FINE-TUNING
|
---|---|---|
Ease of Use |
Easy to set up |
Requires ML expertise |
Customization |
Limited |
Highly customizable |
Domain-Specific Applications |
Moderate |
Excellent |
Online/Offline
|
Online only
|
Online after training
|
Conclusion
In this tutorial, we examined two effective methods for ABSA: optimizing BERT for sophisticated, domain-specific applications and utilizing the ease of use of OpenAI's GPT API for rapid and effective sentiment extraction. Depending on the project's complexity, scalability, and customization needs, each approach has distinct benefits.
These methods offer a strong basis for learning ABSA, regardless of your level of experience. Beginners can plunge into sentiment analysis with little preparation, while developers may construct a reliable offline solution customized for certain datasets.
Begin modestly with GPT, hone your abilities with BERT, and keep coming up with new ideas as you extract useful information from your text data. With these resources at your disposal, you're prepared to take on the complex problems of contextualizing sentiment. The options are virtually limitless.
Opinions expressed by DZone contributors are their own.
Comments