DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Ten Years of Beam: From Google's Dataflow Paper to 4 Trillion Events at LinkedIn
  • The Hidden Latency of Autoscaling
  • Evolving Spring Boot APIs to an Event-Driven Mesh
  • End-to-End Event Streaming With Kafka, Spring Boot and AWS SQS/SNS (Production-Ready Code Guide)

Trending

  • Introduction to Tactical DDD With Java: Steps to Build Semantic Code
  • Stateless JWT Auth Microservice Architecture With Spring Boot 3 and Redis Sentinel
  • Event-Driven Pipelines With Apache Pulsar and Go
  • Slopsquatting: Building a Scanner That Catches AI-Hallucinated Packages Before They Reach Production

Understanding the Confusion Matrix

Want to learn more about understanding the confusion matrix? Check out this post to learn more about using the confusion matrix for performance measurement.

By 
Sunil Kappal user avatar
Sunil Kappal
·
Aug. 18, 18 · Analysis
Likes (4)
Comment
Save
Tweet
Share
13.3K Views

Join the DZone community and get the full member experience.

Join For Free

The confusion matrix is one of the most popular and widely used performance measurement techniques for classification models. While it is super easy to understand, its terminology can be a bit confusing.

Therefore, keeping the above premise under consideration, this article aims to clear the "fog" around this amazing model evaluation system.

To get things started, I have included a working example of this matrix to make things simple and easy to understand. However, the application of this technique can be used on more than two classes.

What Can We Learn From This Matrix?

  • There are two possible classifications: "yes" and "no." If we have to predict the presence of an occurrence, then "yes" would mean that that event has occurred, and "no" would mean the opposite, i.e. it didn't happen.
  • The model made a total of 100 predictions about an event
  • Out of those 100 results, the model predicted "yes" 45 times and "no" 55 times. Whereas, in reality, 60 times the event in question occurred and 40 times it didn't.

Basic Terms to Keep in Mind 

  1. True Positive (TP): These are the events that were correctly predicted by the model as "occurred = Yes."
  2. True Negative (TN): These are the events that were correctly predicted by the model as "not occurred = No."
  3. False Positive (FP): These are the events that were predicted as "occurred = Yes," but in reality, it was "not occurred = No."
  4. False Negative (FN): This is the opposite of FP, i.e. predicted as "not occurred = No," but in reality, it was "occurred = Yes."

How These Metrics Are Computed for a Binary Classifier

  • Accuracy: Overall, how often does the classifier predict right?
    (TP+TN)/Total = (35+50)/100 = 85%
  • Misclassification Rate (aka Error Rate): Overall, how often is it wrong?
    (FP+FN)/Total = (5+10)/100 = 15%
    This is the equivalent of 1 minus Accuracy.
  • True Positive Rate: Events that were correctly predicted by the model as "occurred = Yes."
    TP/ actual yes = 35/40 = 87% (aka "Recall" or "Sensitivity")
  • False Positive Rate: Events that were predicted as "occurred = Yes," but in reality, it was "not occurred = No."
    FP/actual no = 10/60 = 17%
  • Specificity: When an event was actually predicated as "no," and it was actually a "no."
    TN/actual no = 50/60 83%
  • Precision: When an event is predicted "yes," how often it is correct?
    TP/predicted yes = 35/45 = 77%
  • Prevalence: How often does the "yes" event occur in the sample?
    Actual Yes/Total = 40/100 = 40%

Key Terms Worth Knowing

Positive Predictive Value: This is very similar to precision with a difference of the fact that it takes prevalence into consideration. In the case where the classes are perfectly balanced (prevalence in 50%), the positive predictive value (PPV) is equivalent to precision.

Null Error Rate: This term defines how often your prediction would be wrong if you always predicted the majority class. In the above example, the null error rate would be 60/100 = 60%, because if you always predicted yes, you would on be wrong for the 60 "no" events. This can be considered and used as a baseline metric to compare your classifier against. However, the best classifier for a particular application will sometimes have a higher error rate than the null error rate.

F Score: This is a weighted average of the true positive (recall) and precision.

Roc Curve: This shows the true positive rates (an event) against false positives (a non-event marked as an event) rate at various cut points. A ROC Curve demonstrates a trade-off between sensitivity (recall and specificity or the true negative rate) and an increase in sensitivity, which will introduce a reduction in specificity. The closer the curve follows to the left-hand border and then top border of the ROC plot, the more accurate the test.

Matrix (protocol) Event

Published at DZone with permission of Sunil Kappal. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Ten Years of Beam: From Google's Dataflow Paper to 4 Trillion Events at LinkedIn
  • The Hidden Latency of Autoscaling
  • Evolving Spring Boot APIs to an Event-Driven Mesh
  • End-to-End Event Streaming With Kafka, Spring Boot and AWS SQS/SNS (Production-Ready Code Guide)

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook