Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

R: Refactoring to dplyr

DZone's Guide to

R: Refactoring to dplyr

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

I’ve been looking back over some of the early code I wrote using R before I knew about the dplyr library and thought it’d be an interesting exercise to refactor some of the snippets.

We’ll use the following data frame for each of the examples:

library(dplyr)
 
data = data.frame(
  letter = sample(LETTERS, 50000, replace = TRUE),
  number = sample (1:10, 50000, replace = TRUE)
  )

Take {n} rows

> data[1:5,]
  letter number
1      R      7
2      Q      3
3      B      8
4      R      3
5      U      2

becomes:

> data %>% head(5)
  letter number
1      R      7
2      Q      3
3      B      8
4      R      3
5      U      2

Order by numeric value descending

> data[order(-(data$number)),][1:5,]
   letter number
14      H     10
17      G     10
63      L     10
66      W     10
73      R     10

becomes:

> data %>% arrange(desc(number)) %>% head(5)
  letter number
1      H     10
2      G     10
3      L     10
4      W     10
5      R     10

Count number of items

> length(data[,1])
[1] 50000

becomes:

> data %>% count()
Source: local data frame [1 x 1]
 
      n
1 50000

Filter by column value

> length(subset(data, number == 1)[, 1])
[1] 4928

becomes:

> data %>% filter(number == 1) %>% count()
Source: local data frame [1 x 1]
 
     n
1 4928

Group by variable and count

> aggregate(data, by= list(data$number), function(x) length(x))
   Group.1 letter number
1        1   4928   4928
2        2   5045   5045
3        3   5064   5064
4        4   4823   4823
5        5   5032   5032
6        6   5163   5163
7        7   4945   4945
8        8   5077   5077
9        9   5025   5025
10      10   4898   4898

becomes:

> data %>% count(number)
Source: local data frame [10 x 2]
 
   number    n
1       1 4928
2       2 5045
3       3 5064
4       4 4823
5       5 5032
6       6 5163
7       7 4945
8       8 5077
9       9 5025
10     10 4898

Select a range of rows

> data[4:5,]
  letter number
4      R      3
5      U      2

becomes:

> data %>% slice(4:5)
  letter number
1      R      3
2      U      2

There’s certainly more code in some of the dplyr examples but I find it easier to remember how the dplyr code works when I come back to it and hence tend to favour that approach.

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.

Topics:

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}