# Association & Concordance Measures with R

# Association & Concordance Measures with R

Join the DZone community and get the full member experience.

Join For Free**The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.**

In order to define assocation measures or concordance measures, define a *concordance function* as follows:

Let be a random pair with copula , and with copula . Then define

the so-called *concordance function*. Thus

As proved last week,

Based on that function, several concordance measures can be derived. A popular measure is Kendall's tau, from Kendall (1938), defined as i.e.

which is simply .

Here, computation can be tricky. Consider the following sample:

set.seed(1) > n=40 > library(mnormt) > X=rmnorm(n,c(0,0), + matrix(c(1,.4,.4,1),2,2)) > U=cbind(rank(X[,1]),rank(X[,2]))/(n+1)

Then, using R function, we can obtain Kendall's tau easily,

> cor(X,method="kendall")[1,2] [1] 0.3794872

To get our own code (and to understand a bit more how to get that coefficient), we can use

> i=rep(1:(n-1),(n-1):1) > j=2:n > for(k in 3:n){j=c(j,k:n)} > M=cbind(X[i,],X[j,]) > concordant=sum((M[,1]-M[,3])*(M[,2]-M[,4])>0) > discordant=sum((M[,1]-M[,3])*(M[,2]-M[,4])<0) > total=n*(n-1)/2 > (K=(concordant-discordant)/total) [1] 0.3794872

or the following (we'll use random variable quite frequently),

> i=rep(1:n,each=n) > j=rep(1:n,n) > Z=((X[i,1]>X[j,1])&(X[i,2]>X[j,2])) > (K=4*mean(Z)*n/(n-1)-1) [1] 0.3794872

Another measure is Spearman's rank correlation, from Spearman (1904),

where has distribution .

Here, which leads to the following expressions

Numerically, we have the following

> cor(X,method="spearman")[1,2] [1] 0.5388368 > cor(rank(X[,1]),rank(X[,2])) [1] 0.5388368

Note that it is also possible to write

Another measure is the cograduation index, from Gini (1914), obtained by sybstituting an L1 norm instead of a L2 one in the previous expression,

Note that this index can also be defined as . Here,

> Rx=rank(X[,1]);Ry=rank(X[,2]); > (G=2/(n^2) *(sum(abs(Rx+Ry-n-1))- + sum(abs(Rx-Ry)))) [1] 0.41

Finally, another measure is the one from Blomqvist (1950). Let denote the median of , i.e.

Then define

or equivalently

> Mx=median(X[,1]);My=median(X[,2]) > (B=4*sum((X[,1]<=Mx)*((X[,2]<=My)))/n-1) [1] 0.4

**Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.**

Published at DZone with permission of Arthur Charpentier , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}