In the course on non-supervised techniques for data science, we’ve been using a dataset, with a candidate for the presidential elections in 2002 (per row) and newspapers (per column). In order to visualize that dataset, consider three candidates, and three newspapers

```
> base=read.table(
"http://freakonometrics.free.fr/election2002.txt",header=TRUE)
> sb=base[,c(2,3,4)]
> sb=sb[c(4,12,7),]
> (N=sb)
LeFigaro Liberation LeMonde
Jospin 7 41 26
Chirac 35 9 18
Mamere 1 10 7
```

The first part is based on a description of rows. Consider here rows are conditional probabilities, in the set of newspapers,

```
> (L=N/apply(N,1,sum))
LeFigaro Liberation LeMonde
Jospin 0.09459459 0.5540541 0.3513514
Chirac 0.56451613 0.1451613 0.2903226
Mamere 0.05555556 0.5555556 0.3888889
```

The “average row” is the marginal distribution of newspapers

```
> (Lbar=apply(N,2,sum)/sum(N))
LeFigaro Liberation LeMonde
0.2792208 0.3896104 0.3311688
```

If we visualize those individuals, in the set of newspapers (in the simplexe in the newspapers space), we have

Here it is,

But actually, we will not stay in the simplexe. A PCA is considered, with weights on individuals, that take into account the importance of the different candidates, and weights for the scalar product (in order to have a distance related to the chi-square distance, and not a standard Euclidean distance)

```
> matL0=t(t(L)-Lbar)
> library(FactoMineR)
> acpL=PCA(matL0,scale.unit=FALSE,
+ row.w=(apply(N,1,sum)),
+ col.w=1/(apply(N,2,sum)))
> plot.PCA(acpL,choix="ind",ylim=c(-.02,.02))
```

The second part is based on a description of columns. Here Columns are conditional probabilities, in the set of candidates,

```
> (C=t(t(N)/apply(N,2,sum)))
LeFigaro Liberation LeMonde
Jospin 0.16279070 0.6833333 0.5098039
Chirac 0.81395349 0.1500000 0.3529412
Mamere 0.02325581 0.1666667 0.1372549
```

Here again, we can compute the “average column”

```
> (Cbar=apply(N,1,sum)/sum(N))
Jospin Chirac Mamere
0.4805195 0.4025974 0.1168831
```

In the simplex, points are

i.e.

But here again, we won’t use that simplexe. We consider a PCA, with two vectors of weights, some to take into account the weights of the newspapers, and some to get a chi-square distance

```
> Cbar=apply(N,1,sum)/sum(N)
> matC0=C-Cbar
> acpC=PCA(t(matC0),scale.unit=FALSE,
+ row.w=(apply(N,2,sum)),
+ col.w=1/(apply(N,1,sum)))
```

Now, we can almost overlap the two projections. Almost because we might, sometimes switch right and left, top and bottom. Because if

is a (unit) eigenvector, so is . Here, for instance, we should switch them

`> CA(N)`

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}