Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Thoughts on Software Development R: Apply a Custom Function Across Multiple Lists

DZone's Guide to

Thoughts on Software Development R: Apply a Custom Function Across Multiple Lists

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

In my continued playing around with R I wanted to map a custom function over two lists comparing each item with its corresponding items.

If we just want to use a built in function such as subtraction between two lists it’s quite easy to do:

> c(10,9,8,7,6,5,4,3,2,1) - c(5,4,3,4,3,2,2,1,2,1)
 [1] 5 5 5 3 3 3 2 2 0 0

I wanted to do a slight variation on that where instead of returning the difference I wanted to return a text value representing the difference e.g. ’5 or more’, ’3 to 5′ etc.

I spent a long time trying to figure out how to do that before finding an excellent blog post which describes all the different ‘apply’ functions available in R.

As far as I understand ‘apply’ is the equivalent of ‘map’ in Clojure or other functional languages.

In this case we want the mapply variant which we can use like so:

> mapply(function(x, y) { 
    if((x-y) >= 5) {
        "5 or more"
    } else if((x-y) >= 3) {
        "3 to 5"
    } else {
        "less than 5"
    }    
  }, c(10,9,8,7,6,5,4,3,2,1),c(5,4,3,4,3,2,2,1,2,1))
 [1] "5 or more"   "5 or more"   "5 or more"   "3 to 5"      "3 to 5"      "3 to 5"      "less than 5"
 [8] "less than 5" "less than 5" "less than 5"

We could then pull that out into a function if we wanted:

summarisedDifference <- function(one, two) {
  mapply(function(x, y) { 
    if((x-y) >= 5) {
      "5 or more"
    } else if((x-y) >= 3) {
      "3 to 5"
    } else {
      "less than 5"
    }    
  }, one, two)
}

which we could call like so:

> summarisedDifference(c(10,9,8,7,6,5,4,3,2,1),c(5,4,3,4,3,2,2,1,2,1))
 [1] "5 or more"   "5 or more"   "5 or more"   "3 to 5"      "3 to 5"      "3 to 5"      "less than 5"
 [8] "less than 5" "less than 5" "less than 5"

I also wanted to be able to compare a list of items to a single item which was much easier than I expected:

> summarisedDifference(c(10,9,8,7,6,5,4,3,2,1), 1)
 [1] "5 or more"   "5 or more"   "5 or more"   "5 or more"   "5 or more"   "3 to 5"      "3 to 5"     
 [8] "less than 5" "less than 5" "less than 5"

If we wanted to get a summary of the differences between the lists we could plug them into ddply like so:

> library(plyr)
> df = data.frame(x=c(10,9,8,7,6,5,4,3,2,1), y=c(5,4,3,4,3,2,2,1,2,1))
> ddply(df, .(difference=summarisedDifference(x,y)), summarise, count=length(x))
   difference count
1      3 to 5     3
2   5 or more     3
3 less than 5     4

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}