Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

DZone's Guide to

# Association & Concordance Measures with R

· Big Data Zone ·
Free Resource

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

How to Simplify Apache Kafka. Get eBook.

In order to define assocation measures or concordance measures, define a concordance function as follows:

Let be a random pair with copula , and with copula . Then define

the so-called concordance function. Thus

As proved last week,

Based on that function, several concordance measures can be derived. A popular measure is Kendall's tau, from Kendall (1938), defined as i.e.

which is simply .

Here, computation can be tricky. Consider the following sample:

```set.seed(1)
> n=40
> library(mnormt)
> X=rmnorm(n,c(0,0),
+ matrix(c(1,.4,.4,1),2,2))
> U=cbind(rank(X[,1]),rank(X[,2]))/(n+1)```

Then, using R function, we can obtain Kendall's tau easily,

```> cor(X,method="kendall")[1,2]
[1] 0.3794872```

To get our own code (and to understand a bit more how to get that coefficient), we can use

```> i=rep(1:(n-1),(n-1):1)
> j=2:n
> for(k in 3:n){j=c(j,k:n)}
> M=cbind(X[i,],X[j,])
> concordant=sum((M[,1]-M[,3])*(M[,2]-M[,4])>0)
> discordant=sum((M[,1]-M[,3])*(M[,2]-M[,4])<0)
> total=n*(n-1)/2
> (K=(concordant-discordant)/total)
[1] 0.3794872```

or the following (we'll use random variable quite frequently),

```> i=rep(1:n,each=n)
> j=rep(1:n,n)
> Z=((X[i,1]>X[j,1])&(X[i,2]>X[j,2]))
> (K=4*mean(Z)*n/(n-1)-1)
[1] 0.3794872```

Another measure is Spearman's rank correlation, from Spearman (1904),

where has distribution .

Here, which leads to the following expressions

Numerically, we have the following

```> cor(X,method="spearman")[1,2]
[1] 0.5388368
> cor(rank(X[,1]),rank(X[,2]))
[1] 0.5388368```

Note that it is also possible to write

Another measure is the cograduation index, from Gini (1914), obtained by sybstituting an L1 norm instead of a L2 one in the previous expression,

Note that this index can also be defined as . Here,

```> Rx=rank(X[,1]);Ry=rank(X[,2]);
> (G=2/(n^2) *(sum(abs(Rx+Ry-n-1))-
+ sum(abs(Rx-Ry))))
[1] 0.41```

Finally, another measure is the one from Blomqvist (1950). Let denote the median of , i.e.

Then define

or equivalently

```> Mx=median(X[,1]);My=median(X[,2])
> (B=4*sum((X[,1]<=Mx)*((X[,2]<=My)))/n-1)
[1] 0.4```

Topics:

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.