DZone
Big Data Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Big Data Zone > Heuristics on Correspondence Analysis

Heuristics on Correspondence Analysis

Check out this cool data science example of heuristics on correspondence analysis with presidential election examples.

Arthur Charpentier user avatar by
Arthur Charpentier
·
Mar. 14, 16 · Big Data Zone · Analysis
Like (1)
Save
Tweet
3.11K Views

Join the DZone community and get the full member experience.

Join For Free

In the course on non-supervised techniques for data science, we’ve been using a dataset, with a candidate for the presidential elections in 2002 (per row) and newspapers (per column). In order to visualize that dataset, consider three candidates, and three newspapers

> base=read.table(
"http://freakonometrics.free.fr/election2002.txt",header=TRUE)
> sb=base[,c(2,3,4)]
> sb=sb[c(4,12,7),]
> (N=sb)
 LeFigaro Liberation LeMonde
Jospin 7 41 26
Chirac 35 9 18
Mamere 1 10 7

The first part is based on a description of rows. Consider here rows are conditional probabilities, in the set of newspapers,

> (L=N/apply(N,1,sum))
 LeFigaro Liberation LeMonde
Jospin 0.09459459 0.5540541 0.3513514
Chirac 0.56451613 0.1451613 0.2903226
Mamere 0.05555556 0.5555556 0.3888889

The “average row” is the marginal distribution of newspapers

> (Lbar=apply(N,2,sum)/sum(N))
 LeFigaro Liberation LeMonde 
 0.2792208 0.3896104 0.3311688

If we visualize those individuals, in the set of newspapers (in the simplexe in the newspapers space), we have

Here it is,

But actually, we will not stay in the simplexe. A PCA is considered, with weights on individuals, that take into account the importance of the different candidates, and weights for the scalar product (in order to have a distance related to the chi-square distance, and not a standard Euclidean distance)

> matL0=t(t(L)-Lbar)
> library(FactoMineR)
> acpL=PCA(matL0,scale.unit=FALSE,
+   row.w=(apply(N,1,sum)),
+   col.w=1/(apply(N,2,sum)))
> plot.PCA(acpL,choix="ind",ylim=c(-.02,.02))

The second part is based on a description of columns. Here Columns are conditional probabilities, in the set of candidates,

> (C=t(t(N)/apply(N,2,sum)))
 LeFigaro Liberation LeMonde
Jospin 0.16279070 0.6833333 0.5098039
Chirac 0.81395349 0.1500000 0.3529412
Mamere 0.02325581 0.1666667 0.1372549

Here again, we can compute the “average column”

> (Cbar=apply(N,1,sum)/sum(N))
 Jospin Chirac Mamere 
0.4805195 0.4025974 0.1168831

In the simplex, points are

i.e.

But here again, we won’t use that simplexe. We consider a PCA, with two vectors of weights, some to take into account the weights of the newspapers, and some to get a chi-square distance

> Cbar=apply(N,1,sum)/sum(N)
> matC0=C-Cbar
> acpC=PCA(t(matC0),scale.unit=FALSE,
+          row.w=(apply(N,2,sum)),
+          col.w=1/(apply(N,1,sum)))

Now, we can almost overlap the two projections. Almost because we might, sometimes switch right and left, top and bottom. Because if

 is a (unit) eigenvector, so is . Here, for instance, we should switch them

> CA(N)

Correspondence analysis

Published at DZone with permission of Arthur Charpentier, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • How to Test JavaScript Code in a Browser
  • 7 Traits of an Effective Software Asset Manager
  • Choosing Between GraphQL Vs REST
  • Comprehensive Guide to Jenkins Declarative Pipeline [With Examples]

Comments

Big Data Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo