Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Code Snippets for R - Part 1

DZone's Guide to

Code Snippets for R - Part 1

Zone Leader Sibanjan Das came across some temporal code snippets for R and thought he would share them with everyone.

· AI Zone ·
Free Resource

Insight for I&O leaders on deploying AIOps platforms to enhance performance monitoring today. Read the Guide.

There are times we know things but can't execute them immediately. For example, we are working on a task which requires us to transform a categorical variable. It is effortless for us to tell one hot encoding or label encoding would be the appropriate technique to convert categorical variables to an equivalent numeric format. However, when we start writing the code, we face difficulty. First, we search for the codes over the internet. It is a time-consuming task and is repetitive exercise. Being in the field of Machine Learning and Artificial Intelligence (AI), we should streamline our work before we automate the world. CARET is an excellent package that has most of the functions we need while working in R. But sometimes that's not enough, and seldom we require to work on things that are not available in CARET.

So, I have started cooking a few R codes that would be handy for me when I work on R and thought to share with you all.

Categorical Treatment

One hot encode and Label encode function for transforming categorical data.

one_hot_encode = function(outcome, vars, df){
# Load the package vtreat
library(vtreat)
library(magrittr)
# Create the treatment plan
treatplan <- designTreatmentsZ(df, vars, verbose = FALSE)
# Prepare the training data
temp.treat <- prepare(treatplan, df) 
# join  treatment dat with  original data
temp.clean <- cbind(df[,!(names(df) %in% vars)], temp.treat)
temp.clean
}

label_encode = function(vars){
as.factor(vars)
}

label_encode_xgboost = function(vars){
as.numeric(vars)
}

Temporal Data Treatment

It is very essential to create features out of the temporal attribute for using it to build a supervised learning model. The below time_features function will create 11 new attributes out of a temporal variable.

library(lubridate)

time_features = function(time, col_name)
{
   numeric_time <- as.numeric(time)
                       day_of_week <- wday(time)
                       day_of_month <- mday(time)
                       day_of_quarter <- qday(time)
                       day_of_year <- yday(time)
                       hr_of_day <- hour(time)
                       min_of_day <- 60*hour(time) + minute(time)
                       sec_of_day <- 3600*hour(time) + 60*minute(time) + second(time)
                       week_of_year <- week(time)
                       month_of_year <- month(time)
                       year <- year(time)

   df_temp <- data.frame(numeric_time,
 day_of_week,
day_of_month,
day_of_quarter,
day_of_year,
hr_of_day,
min_of_day,
sec_of_day,
week_of_year,
month_of_year,
year
)

  time_df <- setNames(df_temp, paste(col_name, names(df_temp),sep="_"))   
  return(time_df)

}


Numerical Binning

Sometimes it is required to convert continuous numerical to discrete data. For example, Naive Bayes and Apriori algorithm work properly when the values are discrete. The below function employs equiwidth binning to convert continuous data to discrete format.

#set.seed(1)
equi_width_binning = function(input, no_of_bins){
#Equi width binning
bins<-no_of_bins #10
minimumVal<-min(input, na.rm=TRUE)
minimumVal
maximumVal<-max(input, na.rm=TRUE)
maximumVal
width=(maximumVal-minimumVal)/bins;
width
bins <- cut(input, breaks=seq(minimumVal, maximumVal, width))
#browser()
bins
}

This is just the beginning. We will continue creating similar modules for the tasks that are redundant. You can download the codes from my github and start using them. If you need something in R to be modularized or want to contribute, feel free to add your code to the project and help us out.

TrueSight is an AIOps platform, powered by machine learning and analytics, that elevates IT operations to address multi-cloud complexity and the speed of digital transformation.

Topics:
big data ,data science ,artificial intelligence ,ai ,r ,code snippets ,r automation

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}