Over a million developers have joined DZone.

Association & Concordance Measures with R

DZone's Guide to

Association & Concordance Measures with R

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

In order to define assocation measures or concordance measures, define a concordance function as follows:

Let http://freakonometrics.blog.free.fr/public/perso6/conc-28.gif be a random pair with copula http://freakonometrics.blog.free.fr/public/perso6/conc-27.gif, and http://freakonometrics.blog.free.fr/public/perso6/conc-29.gif with copula http://freakonometrics.blog.free.fr/public/perso6/conc-26.gif. Then define


the so-called concordance function. Thus


As proved last week,


Based on that function, several concordance measures can be derived. A popular measure is Kendall's tau, from Kendall (1938), defined as http://freakonometrics.blog.free.fr/public/perso6/conc-22.gif i.e.


which is simply http://freakonometrics.blog.free.fr/public/perso6/conc-20.gif.

Here, computation can be tricky. Consider the following sample:

> n=40
> library(mnormt)
> X=rmnorm(n,c(0,0),
+ matrix(c(1,.4,.4,1),2,2))
> U=cbind(rank(X[,1]),rank(X[,2]))/(n+1)

Then, using R function, we can obtain Kendall's tau easily,

> cor(X,method="kendall")[1,2]
[1] 0.3794872

To get our own code (and to understand a bit more how to get that coefficient), we can use

> i=rep(1:(n-1),(n-1):1)
> j=2:n
> for(k in 3:n){j=c(j,k:n)}
> M=cbind(X[i,],X[j,])
> concordant=sum((M[,1]-M[,3])*(M[,2]-M[,4])>0)
> discordant=sum((M[,1]-M[,3])*(M[,2]-M[,4])<0)
> total=n*(n-1)/2
> (K=(concordant-discordant)/total)
[1] 0.3794872

or the following (we'll use random variable http://freakonometrics.blog.free.fr/public/perso6/conc-30.gif quite frequently),

> i=rep(1:n,each=n)
> j=rep(1:n,n)
> Z=((X[i,1]>X[j,1])&(X[i,2]>X[j,2]))
> (K=4*mean(Z)*n/(n-1)-1)
[1] 0.3794872

Another measure is Spearman's rank correlation, from Spearman (1904),


where http://freakonometrics.blog.free.fr/public/perso6/conc-19.gif has distribution http://freakonometrics.blog.free.fr/public/perso6/conc-17.gif.

Here, http://freakonometrics.blog.free.fr/public/perso6/conc-07.gif which leads to the following expressions


Numerically, we have the following

> cor(X,method="spearman")[1,2]
[1] 0.5388368
> cor(rank(X[,1]),rank(X[,2]))
[1] 0.5388368

Note that it is also possible to write


Another measure is the cograduation index, from Gini (1914), obtained by sybstituting an L1 norm instead of a L2 one in the previous expression,


Note that this index can also be defined as http://freakonometrics.blog.free.fr/public/perso6/concor-02.gif. Here,

> Rx=rank(X[,1]);Ry=rank(X[,2]);
> (G=2/(n^2) *(sum(abs(Rx+Ry-n-1))-
+ sum(abs(Rx-Ry))))
[1] 0.41

Finally, another measure is the one from Blomqvist (1950). Let http://freakonometrics.blog.free.fr/public/perso6/conc-10.gif denote the median of http://freakonometrics.blog.free.fr/public/perso6/conc-12.gif, i.e.


Then define


or equivalently


> Mx=median(X[,1]);My=median(X[,2])
> (B=4*sum((X[,1]<=Mx)*((X[,2]<=My)))/n-1)
[1] 0.4

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.


Published at DZone with permission of Arthur Charpentier, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}