R: ggplot - Cumulative frequency graphs
Join the DZone community and get the full member experience.
Join For FreeIn my continued playing around with ggplot I wanted to create a chart showing the cumulative growth of the number of members of the Neo4j London meetup group.
My initial data frame looked like this:
> head(meetupMembers) joinTimestamp joinDate monthYear quarterYear week dayMonthYear 1 1.376572e+12 2013-08-15 13:13:40 2013-08-01 2013-07-01 2013-08-15 2013-08-15 2 1.379491e+12 2013-09-18 07:55:11 2013-09-01 2013-07-01 2013-09-12 2013-09-18 3 1.349454e+12 2012-10-05 16:28:04 2012-10-01 2012-10-01 2012-10-04 2012-10-05 4 1.383127e+12 2013-10-30 09:59:03 2013-10-01 2013-10-01 2013-10-24 2013-10-30 5 1.372239e+12 2013-06-26 09:27:40 2013-06-01 2013-04-01 2013-06-20 2013-06-26 6 1.330295e+12 2012-02-26 22:27:00 2012-02-01 2012-01-01 2012-02-23 2012-02-26
The first step was to transform the data so that I had a data frame where a row represented a day where a member joined the group. There would then be a count of how many members joined on that date.
We can do this with dplyr like so:
library(dplyr) > head(meetupMembers %.% group_by(dayMonthYear) %.% summarise(n = n())) Source: local data frame [6 x 2] dayMonthYear n 1 2011-06-05 7 2 2011-06-07 1 3 2011-06-10 1 4 2011-06-12 1 5 2011-06-13 1 6 2011-06-15 1
To turn that into a chart we can plug it into ggplot and use the cumsum function to generate a line showing the cumulative total:
ggplot(data = meetupMembers %.% group_by(dayMonthYear) %.% summarise(n = n()), aes(x = dayMonthYear, y = n)) + ylab("Number of members") + xlab("Date") + geom_line(aes(y = cumsum(n)))

Alternatively we could bring the call to cumsum forward and generate a data frame which has the cumulative total:
> head(meetupMembers %.% group_by(dayMonthYear) %.% summarise(n = n()) %.% mutate(n = cumsum(n))) Source: local data frame [6 x 2] dayMonthYear n 1 2011-06-05 7 2 2011-06-07 8 3 2011-06-10 9 4 2011-06-12 10 5 2011-06-13 11 6 2011-06-15 12
And if we plug that into ggplot we’ll get the same curve as before:
ggplot(data = meetupMembers %.% group_by(dayMonthYear) %.% summarise(n = n()) %.% mutate(n = cumsum(n)), aes(x = dayMonthYear, y = n)) + ylab("Number of members") + xlab("Date") + geom_line()
If we want the curve to be a bit smoother we can group it by quarter rather than by day:
> head(meetupMembers %.% group_by(quarterYear) %.% summarise(n = n()) %.% mutate(n = cumsum(n))) Source: local data frame [6 x 2] quarterYear n 1 2011-04-01 13 2 2011-07-01 18 3 2011-10-01 21 4 2012-01-01 43 5 2012-04-01 60 6 2012-07-01 122
Now let’s plug that into ggplot:
ggplot(data = meetupMembers %.% group_by(quarterYear) %.% summarise(n = n()) %.% mutate(n = cumsum(n)), aes(x = quarterYear, y = n)) + ylab("Number of members") + xlab("Date") + geom_line()

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments