My Sentiments, Erm… Not Exactly

In this article, we look at how Hugging Face and GPT-3 perform classifying a potentially ambiguous sentence, along with little help from AllenNLP’s semantic role labeling.

Denis Rothman

Jun. 19, 22 · Tutorial

Like (1)

Save

6.2K Views

This article is an excerpt from the book Transformers for Natural Language Processing, Second Edition. This edition includes working with GPT-3 engines, more use cases, such as casual language analysis and computer vision tasks, and an introduction to OpenAI's Codex.

Have a look at the following sentence:

Though the customer seemed unhappy, she was, in fact, satisfied but thinking of something else at the time, which gave a false impression.

Applying sentiment analysis with the Hugging Face transformers, Microsoft/MiniLM-L12-H384-uncased and RoBERTa-large-mnli labeled the sentence “neutral.”

But is that true?

Labeling this sentence “neutral” bothered me. I was curious to see if OpenAI GPT-3 could do better. After all, GPT-3 is a foundation model that can theoretically do many things it wasn’t trained for.

When I read the sentence closely, I could see that the customer is she. When I looked deeper, I understood that she is in fact satisfied. I decided not to try models blindly until I reached one that works. Trying one model after the other is not productive.

I needed to get to the root of the problem using logic and experimentation. I didn’t want to rely on an algorithm that would find the reason automatically. Sometimes we need to use our neurons!

Could the problem be that it is difficult to identify she as the customer for a machine? Let’s ask AllenNLP’s SRL BERT.

Investigating With SRL

I first ran She was satisfied using the Semantic Role Labeling interface on https://demo.allennlp.org/.

The result was correct:

The analysis is clear in the frame of this predicate: was is the verb, She is ARG1, and satisfied is ARG2.

We should find the same analysis in a complex sentence, and we do:

Satisfied is still ARG2, so the problem might not be there.

Now, the focus is on ARGM-ADV, which modifies was as well. The word false is quite misleading because ARGM-ADV is relative to ARG2, which contains thinking.

The thinking predicate gave a false impression, but thinking is not identified as a predicate in this complex sentence. Could it be that she was is an unidentified ellipsis?

We can quickly verify that by entering the full sentence without an ellipsis:

Though the customer seemed unhappy, she was, in fact, satisfied but she was thinking of something else at the time, which gave a false impression.

The problem with SRL was the ellipsis again. We now have five correct predicates with five accurate frames.

Frame 1 shows that unhappy is correctly related to seemed:

Frame 2 shows that satisfied is now separated from the sentence and individually identified as an argument of was in a complex sentence:

Now, let’s go straight to the predicate containing thinking, which is the verb we wanted BERT SRL to analyze correctly. Now that we suppressed the ellipsis and repeated she was in the sentence, the output is correct:

Now, we can leave our SRL investigation with two clues:

The word false is a confusing argument for an algorithm to relate to other words in a complex sentence
The ellipsis of the repetition of she was

Before we go to GPT-3, let’s go back to Hugging Face with our clues.

Investigating With Hugging Face

Let’s try the DistilBERT-base-uncased-finetuned-sst-2-english model.

We will investigate our two clues (the ellipsis of she was and the presence of false in a positive sentence.

The Ellipsis of She Was

We will first submit a full sentence with no ellipsis:

Though the customer seemed unhappy, she was, in fact, satisfied but she was thinking of something else at the time, which gave a false impression

The output remains negative:

The Presence of False in an Otherwise Positive Sentence

We will now take false out of the sentence but leave the ellipsis:

Though the customer seemed unhappy, she was, in fact, satisfied but thinking of something else at the time, which gave an impression

Bingo! The output is positive:

We know that the word false creates confusion for SRL if there is an ellipsis of was thinking. We also know that false creates confusion for the sentiment analysis Hugging Face transformer model we used.

Can GPT-3 do better? Let’s see.

Investigating With the GPT-3 Playground

Let’s use OpenAI’s example of an Advanced tweet classifier and modify it to satisfy our investigation in three steps:

Step 1: Showing GPT-3 What we Expect

Sentence: “The customer was satisfied”

Sentiment: Positive

Sentence: “The customer was not satisfied”

Sentiment: Negative

Sentence: “The service was good”

Sentiment: Positive

Sentence: “This is the link to the review”

Sentiment: Neutral

Step 2: Showing it a Few Examples of the Output Format We Expect

1. “I loved the new Batman movie!”
2. “I hate it when my phone battery dies”
3. “My day has been good”
4. “This is the link to the article”
5. “This new music video blew my mind”

Sentence sentiment ratings:

1: Positive
2: Negative
3: Positive
4: Neutral
5: Positive

Step 3: Entering Our Sentence Among Others (Number 3):

1. “I can’t stand this product”

2. “The service was bad!”

3. “Though the customer seemed unhappy, she was, in fact, satisfied but thinking of something else at the time, which gave a false impression”

4. “The support team was lovely”

5. “Here is the link to the product.”

Sentence sentiment ratings:

1: Negative

2: Positive

3: Positive

4: Positive

5: Neutral

The output seems satisfactory since our sentence is positive (number 3). Is this result reliable? We could run the example here several times. But let’s go down to the code level to find out.

GPT-3 Code

We just click on View code in the playground, copy it, and paste it into a notebook. We add a line to only print what we want to see:

     Python
    
 

    response = openai.Completion.create(
  engine=“davinci”,
  prompt=“This is a Sentence sentiment classifier\nSentence: \”The customer was satisfied\”\nSentiment: Positive\n###\nSentence: \”The customer was not satisfied\”\nSentiment: Negative\n###\nSentence: \”The service was good\”\nSentiment: Positive\n###\nSentence: \”This is the link to the review\”\nSentiment: Neutral\n###\nSentence text\n\n\n1. \”I loved the new Batman movie!\”\n2. \”I hate it when my phone battery dies\”\n3. \”My day has been good\”\n4. \”This is the link to the article\”\n5. \”This new music video blew my mind\”\n\n\nSentence sentiment ratings:\n1: Positive\n2: Negative\n3: Positive\n4: Neutral\n5: Positive\n\n\n###\nSentence text\n\n\n1. \”I can’t stand this product\”\n2. \”The service was bad!\”\n3. \”Though the customer seemed unhappy she was in fact satisfied but thinking of something else at the time, which gave a false impression\”\n4. \”The support team was lovely\”\n5. \”Here is the link to the product.\”\n\n\nSentence sentiment ratings:\n”,
  temperature=0.3,
  max_tokens=60,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0,
  stop=[“###”]
)
r = (response[“choices”][0])
print(r[“text”])

   

The output is not stable, as we can see in the following responses.

Run 1: Our Sentence (Number 3) Is Neutral:

1: Negative

2: Negative

3: Neutral

4: Positive

5: Positive

Run 2: Our Sentence (Number 3) Is Positive:

1: Negative

2: Negative

3: Positive

4: Positive

5: Neutral

In run 3, our sentence is positive, and in run 4, our sentence is negative.

Conclusions

This leads us to the conclusions of our investigation:

SRL shows that if a sentence is simple and complete (no ellipsis, no missing words), we will get a reliable sentiment analysis output.
SRL shows that if the sentence is moderately difficult, the output might, or might not, be reliable.
SRL shows that if the sentence is complex (ellipsis, several propositions, many ambiguous phrases to solve, and so on), the result is not stable and therefore not reliable.

The conclusions of the job positions of developers in the present and future are:

Less AI development will be required with Cloud AI and ready-to-use modules.
More design skills will be required.
Classical development of pipelines to feed AI algorithms, control them, and analyze their outputs will require thinking and targeted development.

GPT-3

Published at DZone with permission of Denis Rothman. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending

My Sentiments, Erm… Not Exactly

In this article, we look at how Hugging Face and GPT-3 perform classifying a potentially ambiguous sentence, along with little help from AllenNLP’s semantic role labeling.

Investigating With SRL

Investigating With Hugging Face

The Ellipsis of She Was

The Presence of False in an Otherwise Positive Sentence

Investigating With the GPT-3 Playground

Step 1: Showing GPT-3 What we Expect

Step 2: Showing it a Few Examples of the Output Format We Expect

Step 3: Entering Our Sentence Among Others (Number 3):

GPT-3 Code

Run 1: Our Sentence (Number 3) Is Neutral:

Run 2: Our Sentence (Number 3) Is Positive:

Conclusions

Related

Partner Resources