The following article was originally written by Tony Hirst over at this blog, OUseful info.
The code is horrible and the visualisations quite possibly misleading, but I’m dead tired and there are a couple of tricks in the following R code that I want to remember, so here’s a contrived bit of fumbling with some data of the form:
|1||strongly agree||strongly disagree|
|2||strongly agree||strongly disagree|
|3||neither agree nor disagree||strongly disagree|
That is, N rows, no identifiers, two columns; each column relates to a questionnaire question with a scaled response enumerated as ‘strongly agree’,'agree ‘,’neither agree nor disagree’,'disagree’,'strongly disagree’.
THe first thing I tried to do was some “traditional” Likert scale style stacked bar charts using ggplot2 (surely there must be a Likert scale visualisation library around? If so, how would it work with data in the above (and below) forms? Answers via the comments please…)
require(reshape) require(ggplot2) #My sample data doesn't have row based identifiers, so here's a hacked incremental index based ID fd$a=1 fd$b=cumsum(fd$a) fd=subset(fd,select=c('enjoyCompany','tooMuchFamily','b')) #melt the data into a dataframe with 3 cols: the id col, /b/; a /variable/ column that contains the original column heading; and a /value/ column that contains the original cell value for the corresponding row and column. ff=melt(fd,id.var='b') #Get rid of blank values ff=subset(ff,value!='') #Get rid of unused levels ff$value=factor(ff$value) ##Check: #levels(ff$value) #Reorder the levels into a meaningful order ff$value <- factor(ff$value, levels =rev(c('strongly agree','agree ','neither agree nor disagree','disagree','strongly disagree'))) ggplot(ff)+geom_bar(aes(variable,fill=value))+ coord_flip()
A couple of notable issues with the resulting diagram:
- the colours aren’t that pleasing to look at;
- we have lost all sense of correlation between values. We may like to think that the agree/strongly agree ratings from one question are corrleated with the disagree/strongly disagree responses from the other, but there is nothing in that chart that says this for sure…
However, a pairwise comparison may help…
#Let's count how many times the different scale values occur with each other, and then plot some sort of correlation plot. fs=as.data.frame(table(subset(fd,select=c('enjoyCompany','tooMuchFamily')))) fs=subset(fs,enjoyCompany!='' & tooMuchFamily!='') fs$enjoyCompany <- factor(fs$enjoyCompany, levels =rev(c('strongly agree','agree ','neither agree nor disagree','disagree','strongly disagree'))) fs$tooMuchFamily <- factor(fs$tooMuchFamily, levels =rev(c('strongly agree','agree ','neither agree nor disagree','disagree','strongly disagree'))) ggplot(fs)+geom_point(aes(x=enjoyCompany,y=tooMuchFamily,size=Freq
If I had rather more than two question columns, how would I generate a lattice of pairwise correlation charts to get a visual overview of the how all the question answers interact at the pairwise level?