R: dplyr - Group by field dynamically
Join the DZone community and get the full member experience.
Join For Free‘regroup’ is deprecated / no applicable method for ‘as.lazy’ applied to an object of class “list”
A few months ago I wrote a blog explaining how to dynamically/programatically group a data frame by a field using dplyr but that approach has been deprecated in the latest version.
To recap, the original function looked like this:
library(dplyr) groupBy = function(df, field) { df %.% regroup(list(field)) %.% summarise(n = n()) }
And if we execute that with a sample data frame we’ll see the following:
> data = data.frame( letter = sample(LETTERS, 50000, replace = TRUE), number = sample (1:10, 50000, replace = TRUE) ) > groupBy(data, 'letter') %>% head(5) Source: local data frame [5 x 2] letter n 1 A 1951 2 B 1903 3 C 1954 4 D 1923 5 E 1886 Warning messages: 1: %.% is deprecated. Please use %>% 2: %.% is deprecated. Please use %>% 3: 'regroup' is deprecated. Use 'group_by_' instead. See help("Deprecated")
I replaced each of the deprecated operators and ended up with this function:
groupBy = function(df, field) { df %>% group_by_(list(field)) %>% summarise(n = n()) }
Now if we run that:
> groupBy(data, 'letter') %>% head(5) Error in UseMethod("as.lazy") : no applicable method for 'as.lazy' applied to an object of class "list"
It turns out the ‘group_by_’ function doesn’t want to receive a list of fields so let’s remove the call to list:
groupBy = function(df, field) { df %>% group_by_(field) %>% summarise(n = n()) }
And now if we run that:
> groupBy(data, 'letter') %>% head(5) Source: local data frame [5 x 2] letter n 1 A 1951 2 B 1903 3 C 1954 4 D 1923 5 E 1886
Good times! We get the correct result and no deprecation messages.
If we want to group by multiple fields we can just pass in the field names like so:
groupBy = function(df, field1, field2) { df %>% group_by_(field1, field2) %>% summarise(n = n()) }
> groupBy(data, 'letter', 'number') %>% head(5) Source: local data frame [5 x 3] Groups: letter letter number n 1 A 1 200 2 A 2 218 3 A 3 205 4 A 4 176 5 A 5 203
Or with this simpler version:
groupBy = function(df, ...) { df %>% group_by_(...) %>% summarise(n = n()) }
> groupBy(data, 'letter', 'number') %>% head(5) Source: local data frame [5 x 3] Groups: letter letter number n 1 A 1 200 2 A 2 218 3 A 3 205 4 A 4 176 5 A 5 203
I realised that we can actually just use the group_by itself and pass in the field names without quotes, something I couldn’t get to work in earlier versions:
groupBy = function(df, ...) { df %>% group_by(...) %>% summarise(n = n()) }
> groupBy(data, letter, number) %>% head(5) Source: local data frame [5 x 3] Groups: letter letter number n 1 A 1 200 2 A 2 218 3 A 3 205 4 A 4 176 5 A 5 203
We could even get a bit of pipelining going on if we fancied it:
> data %>% groupBy(letter, number) %>% head(5) Source: local data frame [5 x 3] Groups: letter letter number n 1 A 1 200 2 A 2 218 3 A 3 205 4 A 4 176 5 A 5 203
And as of dplyr 0.3 we can simplify our groupBy function to make use of the new count function which combines group_by and summarise:
groupBy = function(df, ...) { df %>% count(...) }
> data %>% groupBy(letter, number) %>% head(5) Source: local data frame [5 x 3] Groups: letter letter number n 1 A 1 200 2 A 2 218 3 A 3 205 4 A 4 176 5 A 5 203
Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments