Over a million developers have joined DZone.

Multi-level Classification, Cohen Kappa, Krippendorff Alpha, and Cancer

DZone's Guide to

Multi-level Classification, Cohen Kappa, Krippendorff Alpha, and Cancer

There are two interesting methods in R to compute agreement between potential classifiers and test performance of classifiers which predict 33 cancer types

· Big Data Zone ·
Free Resource

Learn how to operationalize machine learning and data science projects to monetize your AI initiatives. Download the Gartner report now.

I was facing an interesting problem last week. Playing with data from The Genome Cancer Atlas (full genetic and clinical data for thousands of patients) I was building a classifier that predicts the type of cancer based on sets of genetic signatures.CGA_Banner

In the PANCAN33 subset there are samples for 33 different types of cancer. And the classifier shall be able to classify a new sample to one of these 33 classes. I’ve tried different methods like random forest, svm, bgmm and few others, and end up with collection of classifiers. How to choose the best one?

We need a method that computes an agreement between classifier predictions and true labels/cancer types. For binary classifiers there is a lot of commonly used metrics like precision, recall, accuracy etc. But here we have 33 classes. The confusion matrix is 33×33 cells large, a lot of numbers to compare.
Of course, there are some straightforward solutions, like fraction of samples on which classifier correctly guesses true labels. But such easy solutions suffer a lot if there is unequal distribution of classes (quite common). Such metrics may be high for dummy classifier like: always vote for most common class. It is better to avoid such metrics.

Other Measures of Agreement

Actually I used two interesting ones – Cohen Kappa and Krippendorff Alpha. They take into account the distribution of votes for each rater. Moreover Krippendorff Alpha takes into account missing data (find more information here).

Both coefficients are widely used by psychometricians (e.g. to asses how two psychiatrists agree on a diagnosis). We use them in order to estimate the performance of the classifier. Both coefficients are implemented in the irr package.

Below you will find an example application:

kappa2(cbind(predictions, trueLabels))
# Cohen's Kappa for 2 Raters (Weights: unweighted)
# Subjects = 3599 
#   Raters = 2 
#    Kappa = 0.941 
#        z = 160 
#  p-value = 0 

kripp.alpha(rbind(predictions, trueLabels))
# Krippendorff's alpha
# Subjects = 3599 
#   Raters = 2 
#    alpha = 0.941 

Bias comes in a variety of forms, all of them potentially damaging to the efficacy of your ML algorithm. Our Chief Data Scientist discusses the source of most headlines about AI failures here.

big data

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}