Data Mining the Votes of Members of the Polish Parliament
Data Mining the Votes of Members of the Polish Parliament
Visualizing the different votes of the members of the Polish Parliament. Get ready for some crazy graphs and trees!
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
7th term of the Sejm has already come to its end. It would be nice to see how have the Members of Polish Parliament voted for these last 4 years! In total they took part in over 6000 votings. Did the representatives of the same clubs voted more similarly to each other? Did the Members of Polish Parliament who changed the clubs they belonged to voted in a different way than the Members of Parliament from their former clubs? Let’s see!
In order to display the similarity between the votes of the Members of Parliament we will use a technique known to geneticists — namely, a phylogenetic tree. Such diagrams are employed to present similarities between sequences of DNA/proteins of various organisms/genes. In our analyses the Members of Parliament will stand for organisms and their votes will serve as their DNA. We will build the phylogenetic tree for the Members of Parliament.
Phylogenetic trees are based on similarities between objects presented on these trees. In this case we will compare levels of similarity between votes cast by the Members of Parliament. Firstly, we will learn about the manner in which that similarity is calculated. As an example, let us take six Members of Parliament and a dozen or so votings (here we will discuss 14 votings concerning the Act on Infertility Treatment). Each Member of Parliament might choose between the following options: to vote For, Against, and Abstain from voting (what is a "delicate" against), or simply be absent during the voting. Let us encode these options with numbers: +2, -2, -1 and 0 respectively, or with colours: blue, red, yellow and light blue. The following graphic presents the votes of each of the six Members of Parliament during each of the votings under analysis. The left part of the diagram shows the similarity between the votes. During these votings, J. Palikot and L. Miller voted in the same manner; E. Kopacz voted similarly; J. Piechociński voted in a less similar, but still fairly alike way. The other two Members of Parliament, B. Szydło and J. Gowin voted in a similar way but quite differently from the remaining four. The distance between the tree branches corresponds to the voting profiles.
Now we increase the number of votings from 14 to 6000. The vector with votes is longer but the similarity is calculated in the same way -the Euclidean distance.
One diagram cannot present all 6000 votings clearly. We do not draw them; the only thing presented at the diagrams are names of the Members of Parliament. To make the diagram more legible, next to the names I put the name of all the clubs that given Members of Parliament belonged to during the 7th term of the Sejm. Colours represent clubs in which politicians spent most of the period of the 7th term. Below you may see a fragment of the tree. It shows that J. Żalka and J. Gowin voted in a rather similar way but very different than rest of their club. The tree also allows us to notice that both of them cast most of the votes while they belonged to PO, but both of them also belonged to ZP and KPSP. Members of Parliament from PSL and PO usually voted quite alike and for that reason they belong to the same subtree.
We may also present the whole tree, although it has many leaves. When we include the MPs who left the Sejm and were elected for the Sejm, we have over 500 names. As we may see, the Members of Parliament from PO in most cases voted similarly. They create a separate subtree with PSL. PiS with a part of the right wing also creates its own subtree. The remaining two subtrees represent SLD and Twój Ruch/Ruch Palikota.
The same tree may be presented in various manners; above, for example, we may see a more packed version of it.
However, a little fan like the one above is far more comprehensible. There is more space for names of the Members of Parliament.
If we take into consideration all the votings, we will notice that the greatest differences exist between the government and the opposition. It turns out that there are two Members of Parliament on the side of the government (Jarosław Gowin and Jacek Żalek) who usually voted as MPs from PO (the colour on the diagram corresponds to the club that a given Member of Parliament belonged to during most votings), yet their profiles differ considerably from the profiles of the remaining MPs. Besides, that pair migrated from one party to another, what may explain their incompatibility with the stance of PO. As far as PiS is concerned, the least compliant voters were Górski Artur and Tomaszewski Jan (who finally transferred to PO at the end of the year).
There are many more such interesting stories where a typical voting profile is incompatible with the "main" club; they can be found in every club. Just look for them for a while.
When we look at votings on specific acts, the pictures tend to change. Below you may see an example diagram concerning voting on Personal Income Tax Act.
Here you may look at the results of the votings on the Act on Higher Education. PO and PSL voted so similarly that the Members of Parliament belonging to both clubs are mixed. SLD and TR create their own trees. There are also groups of MPs, both within PO and PiS, who in all 53 votings (that is the number of votings on that act) voted identically (their tree branches are very short).
You may see an enlarged version of every diagram by clicking on it. Computer specialist sometimes joke that, unlike the normal trees, their trees grow upside down.
As it turns out the trees created by data analysts may grow in any direction. Or in every direction at the same time!
R packets — cluster, ggdendro and ape — were used during analysis of the clusters.
The source code can be found at github: https://github.com/mi2-warsaw/JakOniGlosowali/tree/master/glosy.
Data on votings may be found in the package called sejmRP: https://github.com/mi2-warsaw/sejmRP.
Published at DZone with permission of deepsense.io Blog , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.