Blazing Through Commonly Used Statistical Performance Metrics
Performance metrics of statistical models is a great way of finding the accuracy of your predictive analytics. Read on to learn more!
Join the DZone community and get the full member experience.
Join For FreePerformance metrics are often used to evaluate how effective statistical models have been in predicting response variables based on the given set of observations that the models were trained on.
In this post, we are restricting the discussion to classification problems. Examples of classification can be binary, such as:
A creditworthy person being classified as either creditworthy or non-creditworthy.
A consumer decision's to either switch services or stay on with you.
Your statistical model may be able to classify some response variables correctly, and perhaps some will always be missed. For if it does not miss anything, then maybe it has adapted too much for this set of observations, and may not be able to reproduce this learning with a different set of observations. This is called variance in learning, as opposed to bias.
Bias prevents your model from becoming flexible enough to learn the nuances. On the other hand, flexibility makes it too meek to stand up and adapt to a different set of circumstances.
So, every model will have a trade-off between Bias and Variance. You don't want a model that is too rigid (this could cause it to miss out on learning from variations) while at the same time you don't want a model that is too meek (so as to minimize errors due to any variations the model undergoes as the observation sets are changed, such as the alterations that take place when going from Training to Test).
Leaving those aspects aside, how do you measure the performance of your model in sheer quantitative terms?
Here are few measures explained. Let's assume for the sake of simplicity that the predicted classification is binary ( TRUE or FALSE, 0 or 1, POSITIVE or NEGATIVE).
Often, it is easy to visualize this using a tabular approach:
1) Precision: I know how many Positives or "1"s my model has predicted using the response variable.
The question is whether they capture all the positives out there in the original response variable?
Hence,
Precision = TP/MP, where TP is True Positives and MP is Marked Positives.
But TP= MP-FP because my model may have misclassified some Negative values as Positive.
Hence,
Precision = (MP-FP)/MP
So, precision is equal to how many positives my model thinks there are in response variable versus how many positives actually are. This tells you how precise your model's ability to recognize a given class versus other classes is ( in this case, it's positive value versus all other values).
2) Recall: When we measured the precision of our model, we did not think about the symmetry of misclassification. Recall is then used to gauge a model's ability to not only recognize a given class's state such, as positive (while adjusting for any false positive misclassifications) but also doing so in relative contrast with how many such positives actually existed (including any false negative misclassifications).
That is, when a model can misclassify a negative as a positive, it can also misclassify a positive as a negative and hence we need to quantify that aspect too.
Let's adjust the True Positive (TP) count by adding back False Negatives: TP+FN.
Hence,
Recall = (MP-FP)/(MP-FP+FN)
So, the difference in the denominator is about bringing in two symmetric corrections in the denominator: namely, adjusting for false positives as well as adjusting for false negatives.
This measure is also called sensitivity.
3) Specificity: This measure is similar to precision, but the difference is that it examines the negative classification rather than the positive.
Specificity = (MN-FN)/MN
So, you are trying to measure how precisely your model can recognize negative values. This, combined with precision, allows you to see the contribution of both positive and negative misclassifications.
4) Accuracy: Observe the numerators and denominators in the previous two metrics. These values are only concerned with one specific state of a given class in the response variable: either positive or negative.
What is so easy to miss is that when a model falsely recognizes a value as positive, it also misses an opportunity to mark this value as negative.
Similarly, when a model does not recognize a value as positive, it also ends up falsely recognizing that value as negative.
The most difficult part is to remember the symmetrical impact on misclassification.
Accuracy = ((MP-FP)+(MN-FN)) / (TP+FP+TN+FN).
Thus, the accuracy measure allows you to measure how off your model is in recognizing not only positives but also negatives.
If this looks tedious, R has a built-in package called ROCR.
All you need is a collection of predictions and corresponding labels.
pred1 <- prediction( dfpredictmatrix$predictions, dfpredictmatrix$labels)
performance(pred1,"prec","rec")
plot(perf1)
abline(h=0.8,col="RED",lty=2)
That's it! This will allow you to examine the tradeoff between precision and recall, respectively.
Opinions expressed by DZone contributors are their own.
Comments