Over a million developers have joined DZone.

Multi-level Classification, Cohen Kappa, Krippendorff Alpha, and Cancer

DZone's Guide to

Multi-level Classification, Cohen Kappa, Krippendorff Alpha, and Cancer

There are two interesting methods in R to compute agreement between potential classifiers and test performance of classifiers which predict 33 cancer types

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

I was facing an interesting problem last week. Playing with data from The Genome Cancer Atlas (full genetic and clinical data for thousands of patients) I was building a classifier that predicts the type of cancer based on sets of genetic signatures.CGA_Banner

In the PANCAN33 subset there are samples for 33 different types of cancer. And the classifier shall be able to classify a new sample to one of these 33 classes. I’ve tried different methods like random forest, svm, bgmm and few others, and end up with collection of classifiers. How to choose the best one?

We need a method that computes an agreement between classifier predictions and true labels/cancer types. For binary classifiers there is a lot of commonly used metrics like precision, recall, accuracy etc. But here we have 33 classes. The confusion matrix is 33×33 cells large, a lot of numbers to compare.
Of course, there are some straightforward solutions, like fraction of samples on which classifier correctly guesses true labels. But such easy solutions suffer a lot if there is unequal distribution of classes (quite common). Such metrics may be high for dummy classifier like: always vote for most common class. It is better to avoid such metrics.

Other Measures of Agreement

Actually I used two interesting ones – Cohen Kappa and Krippendorff Alpha. They take into account the distribution of votes for each rater. Moreover Krippendorff Alpha takes into account missing data (find more information here).

Both coefficients are widely used by psychometricians (e.g. to asses how two psychiatrists agree on a diagnosis). We use them in order to estimate the performance of the classifier. Both coefficients are implemented in the irr package.

Below you will find an example application:

kappa2(cbind(predictions, trueLabels))
# Cohen's Kappa for 2 Raters (Weights: unweighted)
# Subjects = 3599 
#   Raters = 2 
#    Kappa = 0.941 
#        z = 160 
#  p-value = 0 

kripp.alpha(rbind(predictions, trueLabels))
# Krippendorff's alpha
# Subjects = 3599 
#   Raters = 2 
#    alpha = 0.941 

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

big data

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}