Over a million developers have joined DZone.
Platinum Partner

Thoughts on Software Development R: Apply a Custom Function Across Multiple Lists

· Big Data Zone

The Big Data Zone is presented by Exaptive.  Learn about how to rapidly iterate data applications, while reusing existing code and leveraging open source technologies.

In my continued playing around with R I wanted to map a custom function over two lists comparing each item with its corresponding items.

If we just want to use a built in function such as subtraction between two lists it’s quite easy to do:

> c(10,9,8,7,6,5,4,3,2,1) - c(5,4,3,4,3,2,2,1,2,1)
 [1] 5 5 5 3 3 3 2 2 0 0

I wanted to do a slight variation on that where instead of returning the difference I wanted to return a text value representing the difference e.g. ’5 or more’, ’3 to 5′ etc.

I spent a long time trying to figure out how to do that before finding an excellent blog post which describes all the different ‘apply’ functions available in R.

As far as I understand ‘apply’ is the equivalent of ‘map’ in Clojure or other functional languages.

In this case we want the mapply variant which we can use like so:

> mapply(function(x, y) { 
    if((x-y) >= 5) {
        "5 or more"
    } else if((x-y) >= 3) {
        "3 to 5"
    } else {
        "less than 5"
    }    
  }, c(10,9,8,7,6,5,4,3,2,1),c(5,4,3,4,3,2,2,1,2,1))
 [1] "5 or more"   "5 or more"   "5 or more"   "3 to 5"      "3 to 5"      "3 to 5"      "less than 5"
 [8] "less than 5" "less than 5" "less than 5"

We could then pull that out into a function if we wanted:

summarisedDifference <- function(one, two) {
  mapply(function(x, y) { 
    if((x-y) >= 5) {
      "5 or more"
    } else if((x-y) >= 3) {
      "3 to 5"
    } else {
      "less than 5"
    }    
  }, one, two)
}

which we could call like so:

> summarisedDifference(c(10,9,8,7,6,5,4,3,2,1),c(5,4,3,4,3,2,2,1,2,1))
 [1] "5 or more"   "5 or more"   "5 or more"   "3 to 5"      "3 to 5"      "3 to 5"      "less than 5"
 [8] "less than 5" "less than 5" "less than 5"

I also wanted to be able to compare a list of items to a single item which was much easier than I expected:

> summarisedDifference(c(10,9,8,7,6,5,4,3,2,1), 1)
 [1] "5 or more"   "5 or more"   "5 or more"   "5 or more"   "5 or more"   "3 to 5"      "3 to 5"     
 [8] "less than 5" "less than 5" "less than 5"

If we wanted to get a summary of the differences between the lists we could plug them into ddply like so:

> library(plyr)
> df = data.frame(x=c(10,9,8,7,6,5,4,3,2,1), y=c(5,4,3,4,3,2,2,1,2,1))
> ddply(df, .(difference=summarisedDifference(x,y)), summarise, count=length(x))
   difference count
1      3 to 5     3
2   5 or more     3
3 less than 5     4

The Big Data Zone is presented by Exaptive.  Learn how rapid data application development can address the data science shortage.

Topics:

Published at DZone with permission of Mark Needham , DZone MVB .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}