Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Code Snippets for R - Part 1

DZone's Guide to

Code Snippets for R - Part 1

Zone Leader Sibanjan Das came across some temporal code snippets for R and thought he would share them with everyone.

· AI Zone ·
Free Resource

Enable your enterprise to add AI to your existing infrastructure with EdgeVerve’s Business Applications built on AI platform Infosys Nia™. Register for our webinar to learn more.

There are times we know things but can't execute them immediately. For example, we are working on a task which requires us to transform a categorical variable. It is effortless for us to tell one hot encoding or label encoding would be the appropriate technique to convert categorical variables to an equivalent numeric format. However, when we start writing the code, we face difficulty. First, we search for the codes over the internet. It is a time-consuming task and is repetitive exercise. Being in the field of Machine Learning and Artificial Intelligence (AI), we should streamline our work before we automate the world. CARET is an excellent package that has most of the functions we need while working in R. But sometimes that's not enough, and seldom we require to work on things that are not available in CARET.

So, I have started cooking a few R codes that would be handy for me when I work on R and thought to share with you all.

Categorical Treatment

One hot encode and Label encode function for transforming categorical data.

one_hot_encode = function(outcome, vars, df){
# Load the package vtreat
library(vtreat)
library(magrittr)
# Create the treatment plan
treatplan <- designTreatmentsZ(df, vars, verbose = FALSE)
# Prepare the training data
temp.treat <- prepare(treatplan, df) 
# join  treatment dat with  original data
temp.clean <- cbind(df[,!(names(df) %in% vars)], temp.treat)
temp.clean
}

label_encode = function(vars){
as.factor(vars)
}

label_encode_xgboost = function(vars){
as.numeric(vars)
}

Temporal Data Treatment

It is very essential to create features out of the temporal attribute for using it to build a supervised learning model. The below time_features function will create 11 new attributes out of a temporal variable.

library(lubridate)

time_features = function(time, col_name)
{
   numeric_time <- as.numeric(time)
                       day_of_week <- wday(time)
                       day_of_month <- mday(time)
                       day_of_quarter <- qday(time)
                       day_of_year <- yday(time)
                       hr_of_day <- hour(time)
                       min_of_day <- 60*hour(time) + minute(time)
                       sec_of_day <- 3600*hour(time) + 60*minute(time) + second(time)
                       week_of_year <- week(time)
                       month_of_year <- month(time)
                       year <- year(time)

   df_temp <- data.frame(numeric_time,
 day_of_week,
day_of_month,
day_of_quarter,
day_of_year,
hr_of_day,
min_of_day,
sec_of_day,
week_of_year,
month_of_year,
year
)

  time_df <- setNames(df_temp, paste(col_name, names(df_temp),sep="_"))   
  return(time_df)

}


Numerical Binning

Sometimes it is required to convert continuous numerical to discrete data. For example, Naive Bayes and Apriori algorithm work properly when the values are discrete. The below function employs equiwidth binning to convert continuous data to discrete format.

#set.seed(1)
equi_width_binning = function(input, no_of_bins){
#Equi width binning
bins<-no_of_bins #10
minimumVal<-min(input, na.rm=TRUE)
minimumVal
maximumVal<-max(input, na.rm=TRUE)
maximumVal
width=(maximumVal-minimumVal)/bins;
width
bins <- cut(input, breaks=seq(minimumVal, maximumVal, width))
#browser()
bins
}

This is just the beginning. We will continue creating similar modules for the tasks that are redundant. You can download the codes from my github and start using them. If you need something in R to be modularized or want to contribute, feel free to add your code to the project and help us out.

Adopting a digital strategy is just the beginning. For enterprise-wide digital transformation to take effect, you need an infrastructure that’s #BuiltOnAI. Register for our webinar to learn more.

Topics:
big data ,data science ,artificial intelligence ,ai ,r ,code snippets ,r automation

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}