Over a million developers have joined DZone.

Association & Concordance Measures with R

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

In order to define assocation measures or concordance measures, define a concordance function as follows:

Let http://freakonometrics.blog.free.fr/public/perso6/conc-28.gif be a random pair with copula http://freakonometrics.blog.free.fr/public/perso6/conc-27.gif, and http://freakonometrics.blog.free.fr/public/perso6/conc-29.gif with copula http://freakonometrics.blog.free.fr/public/perso6/conc-26.gif. Then define

http://freakonometrics.blog.free.fr/public/perso6/cibc-25.gif

the so-called concordance function. Thus

http://freakonometrics.blog.free.fr/public/perso6/conc-23.gif

As proved last week,

http://freakonometrics.blog.free.fr/public/perso6/conc-24.gif

Based on that function, several concordance measures can be derived. A popular measure is Kendall's tau, from Kendall (1938), defined as http://freakonometrics.blog.free.fr/public/perso6/conc-22.gif i.e.

 http://freakonometrics.blog.free.fr/public/perso6/conc-21.gif

which is simply http://freakonometrics.blog.free.fr/public/perso6/conc-20.gif.

Here, computation can be tricky. Consider the following sample:

set.seed(1)
> n=40
> library(mnormt)
> X=rmnorm(n,c(0,0),
+ matrix(c(1,.4,.4,1),2,2))
> U=cbind(rank(X[,1]),rank(X[,2]))/(n+1)

Then, using R function, we can obtain Kendall's tau easily,

> cor(X,method="kendall")[1,2]
[1] 0.3794872

To get our own code (and to understand a bit more how to get that coefficient), we can use

> i=rep(1:(n-1),(n-1):1)
> j=2:n
> for(k in 3:n){j=c(j,k:n)}
> M=cbind(X[i,],X[j,])
> concordant=sum((M[,1]-M[,3])*(M[,2]-M[,4])>0)
> discordant=sum((M[,1]-M[,3])*(M[,2]-M[,4])<0)
> total=n*(n-1)/2
> (K=(concordant-discordant)/total)
[1] 0.3794872

or the following (we'll use random variable http://freakonometrics.blog.free.fr/public/perso6/conc-30.gif quite frequently),

> i=rep(1:n,each=n)
> j=rep(1:n,n)
> Z=((X[i,1]>X[j,1])&(X[i,2]>X[j,2]))
> (K=4*mean(Z)*n/(n-1)-1)
[1] 0.3794872

Another measure is Spearman's rank correlation, from Spearman (1904),

http://freakonometrics.blog.free.fr/public/perso6/conc-05.gif

where http://freakonometrics.blog.free.fr/public/perso6/conc-19.gif has distribution http://freakonometrics.blog.free.fr/public/perso6/conc-17.gif.

Here, http://freakonometrics.blog.free.fr/public/perso6/conc-07.gif which leads to the following expressions

http://freakonometrics.blog.free.fr/public/perso6/conc-06.gif

Numerically, we have the following

> cor(X,method="spearman")[1,2]
[1] 0.5388368
> cor(rank(X[,1]),rank(X[,2]))
[1] 0.5388368

Note that it is also possible to write

http://freakonometrics.blog.free.fr/public/perso6/conc-04.gif

Another measure is the cograduation index, from Gini (1914), obtained by sybstituting an L1 norm instead of a L2 one in the previous expression,

http://freakonometrics.blog.free.fr/public/perso6/concord-01.gif

Note that this index can also be defined as http://freakonometrics.blog.free.fr/public/perso6/concor-02.gif. Here,

> Rx=rank(X[,1]);Ry=rank(X[,2]);
> (G=2/(n^2) *(sum(abs(Rx+Ry-n-1))-
+ sum(abs(Rx-Ry))))
[1] 0.41

Finally, another measure is the one from Blomqvist (1950). Let http://freakonometrics.blog.free.fr/public/perso6/conc-10.gif denote the median of http://freakonometrics.blog.free.fr/public/perso6/conc-12.gif, i.e.

http://freakonometrics.blog.free.fr/public/perso6/conc-15.gif

Then define

http://freakonometrics.blog.free.fr/public/perso6/conc-09.gif

or equivalently

http://freakonometrics.blog.free.fr/public/perso6/conc-08.gif

> Mx=median(X[,1]);My=median(X[,2])
> (B=4*sum((X[,1]<=Mx)*((X[,2]<=My)))/n-1)
[1] 0.4

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.

Topics:

Published at DZone with permission of Arthur Charpentier , DZone MVB .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}