Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

R: dplyr - Group by field dynamically

DZone's Guide to

R: dplyr - Group by field dynamically

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

‘regroup’ is deprecated / no applicable method for ‘as.lazy’ applied to an object of class “list”

A few months ago I wrote a blog explaining how to dynamically/programatically group a data frame by a field using dplyr but that approach has been deprecated in the latest version.

To recap, the original function looked like this:

library(dplyr)
 
groupBy = function(df, field) {
  df %.% regroup(list(field)) %.% summarise(n = n())
}

And if we execute that with a sample data frame we’ll see the following:

> data = data.frame(
      letter = sample(LETTERS, 50000, replace = TRUE),
      number = sample (1:10, 50000, replace = TRUE)
  )
 
> groupBy(data, 'letter') %>% head(5)
Source: local data frame [5 x 2]
 
  letter    n
1      A 1951
2      B 1903
3      C 1954
4      D 1923
5      E 1886
Warning messages:
1: %.% is deprecated. Please use %>% 
2: %.% is deprecated. Please use %>% 
3: 'regroup' is deprecated.
Use 'group_by_' instead.
See help("Deprecated")

I replaced each of the deprecated operators and ended up with this function:

groupBy = function(df, field) {
  df %>% group_by_(list(field)) %>% summarise(n = n())
}

Now if we run that:

> groupBy(data, 'letter') %>% head(5)
Error in UseMethod("as.lazy") : 
  no applicable method for 'as.lazy' applied to an object of class "list"

It turns out the ‘group_by_’ function doesn’t want to receive a list of fields so let’s remove the call to list:

groupBy = function(df, field) {
  df %>% group_by_(field) %>% summarise(n = n())
}

And now if we run that:

> groupBy(data, 'letter') %>% head(5)
Source: local data frame [5 x 2]
 
  letter    n
1      A 1951
2      B 1903
3      C 1954
4      D 1923
5      E 1886

Good times! We get the correct result and no deprecation messages.

If we want to group by multiple fields we can just pass in the field names like so:

groupBy = function(df, field1, field2) {
  df %>% group_by_(field1, field2) %>% summarise(n = n())
}
> groupBy(data, 'letter', 'number') %>% head(5)
Source: local data frame [5 x 3]
Groups: letter
 
  letter number   n
1      A      1 200
2      A      2 218
3      A      3 205
4      A      4 176
5      A      5 203

Or with this simpler version:

groupBy = function(df, ...) {
  df %>% group_by_(...) %>% summarise(n = n())
}
> groupBy(data, 'letter', 'number') %>% head(5)
Source: local data frame [5 x 3]
Groups: letter
 
  letter number   n
1      A      1 200
2      A      2 218
3      A      3 205
4      A      4 176
5      A      5 203

I realised that we can actually just use the group_by itself and pass in the field names without quotes, something I couldn’t get to work in earlier versions:

groupBy = function(df, ...) {
  df %>% group_by(...) %>% summarise(n = n())
}
> groupBy(data, letter, number) %>% head(5)
Source: local data frame [5 x 3]
Groups: letter
 
  letter number   n
1      A      1 200
2      A      2 218
3      A      3 205
4      A      4 176
5      A      5 203

We could even get a bit of pipelining going on if we fancied it:

> data %>% groupBy(letter, number) %>% head(5)
Source: local data frame [5 x 3]
Groups: letter
 
  letter number   n
1      A      1 200
2      A      2 218
3      A      3 205
4      A      4 176
5      A      5 203

And as of dplyr 0.3 we can simplify our groupBy function to make use of the new count function which combines group_by and summarise:

groupBy = function(df, ...) {
  df %>% count(...)
}
> data %>% groupBy(letter, number) %>% head(5)
Source: local data frame [5 x 3]
Groups: letter
 
  letter number   n
1      A      1 200
2      A      2 218
3      A      3 205
4      A      4 176
5      A      5 203


Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.

Topics:

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}