# Heuristics on Correspondence Analysis

### Check out this cool data science example of heuristics on correspondence analysis with presidential election examples.

Join the DZone community and get the full member experience.

Join For FreeIn the course on non-supervised techniques for data science, we’ve been using a dataset, with a candidate for the presidential elections in 2002 (per row) and newspapers (per column). In order to visualize that dataset, consider three candidates, and three newspapers

```
> base=read.table(
"http://freakonometrics.free.fr/election2002.txt",header=TRUE)
> sb=base[,c(2,3,4)]
> sb=sb[c(4,12,7),]
> (N=sb)
LeFigaro Liberation LeMonde
Jospin 7 41 26
Chirac 35 9 18
Mamere 1 10 7
```

The first part is based on a description of rows. Consider here rows are conditional probabilities, in the set of newspapers,

```
> (L=N/apply(N,1,sum))
LeFigaro Liberation LeMonde
Jospin 0.09459459 0.5540541 0.3513514
Chirac 0.56451613 0.1451613 0.2903226
Mamere 0.05555556 0.5555556 0.3888889
```

The “average row” is the marginal distribution of newspapers

```
> (Lbar=apply(N,2,sum)/sum(N))
LeFigaro Liberation LeMonde
0.2792208 0.3896104 0.3311688
```

If we visualize those individuals, in the set of newspapers (in the simplexe in the newspapers space), we have

Here it is,

But actually, we will not stay in the simplexe. A PCA is considered, with weights on individuals, that take into account the importance of the different candidates, and weights for the scalar product (in order to have a distance related to the chi-square distance, and not a standard Euclidean distance)

```
> matL0=t(t(L)-Lbar)
> library(FactoMineR)
> acpL=PCA(matL0,scale.unit=FALSE,
+ row.w=(apply(N,1,sum)),
+ col.w=1/(apply(N,2,sum)))
> plot.PCA(acpL,choix="ind",ylim=c(-.02,.02))
```

The second part is based on a description of columns. Here Columns are conditional probabilities, in the set of candidates,

```
> (C=t(t(N)/apply(N,2,sum)))
LeFigaro Liberation LeMonde
Jospin 0.16279070 0.6833333 0.5098039
Chirac 0.81395349 0.1500000 0.3529412
Mamere 0.02325581 0.1666667 0.1372549
```

Here again, we can compute the “average column”

```
> (Cbar=apply(N,1,sum)/sum(N))
Jospin Chirac Mamere
0.4805195 0.4025974 0.1168831
```

In the simplex, points are

i.e.

But here again, we won’t use that simplexe. We consider a PCA, with two vectors of weights, some to take into account the weights of the newspapers, and some to get a chi-square distance

```
> Cbar=apply(N,1,sum)/sum(N)
> matC0=C-Cbar
> acpC=PCA(t(matC0),scale.unit=FALSE,
+ row.w=(apply(N,2,sum)),
+ col.w=1/(apply(N,1,sum)))
```

Now, we can almost overlap the two projections. Almost because we might, sometimes switch right and left, top and bottom. Because if

is a (unit) eigenvector, so is . Here, for instance, we should switch them

`> CA(N)`

Published at DZone with permission of Arthur Charpentier, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Comments