Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Introduction to R: The Statistical Programming Language

DZone's Guide to

Introduction to R: The Statistical Programming Language

Now you can get started using R, a statistical programming language- even if you're coming from a traditional programming background. Read on to get started.

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

R is a powerful language used widely for data analysis and statistical computing. It was developed in the early 90s. It is one of the most popular languages used by statisticians, data analysts, researchers, and marketers to retrieve, clean, analyze, visualize and present data. It is open source and free. It supports cross-platform interoperability, i.e. R code written on one platform can easily be ported to another without any issues.

IEEE publishes a list of the most popular programming languages each year. R was ranked 5th in 2016, up from 6th in 2015. It is a big deal for a domain-specific language like R to be more popular than a general purpose language like C#.

R is easy to learn. All you need is data and a clear intent to draw a conclusion based on analysis of that data. However, programmers that come from a Python, PHP or Java background might find R quirky and confusing at first. The syntax that R uses is a bit different from other common programming languages.

To install and run R on your ubuntu systems use the following commands:  sudo apt-get update  and  sudo apt-get install r-base .

After installation, type R in your terminal and you are good to go!

Basics of R Programming

R can be used like a calculator and indeed one of its principal uses is to undertake complex mathematical and statistical calculations. R can perform simple calculations as well as more complex ones.

To get familiar with the R coding environment, let’s start with some basic calculations. R console can be used as an interactive calculator too:

> 2 + 3
[1] 5

> 8 / 2
[1] 4

> (2*15)/(3*2)
[1] 5

> log(12)
[1] 2.484907

> sqrt (169)
[1] 13


You also see that this line begins with [1] rather than the > cursor. R is telling you that the first element of the answer is 5. At the moment this does not seem very useful, but the usefulness becomes clearer later when the answers become longer.

Datatypes

R is object-oriented, which means that it expects to find named things to deal with in some way. For example, if you are conducting an experiment and collecting data from several samples, you want to create several named data objects in R in order to work on them and do your analyses later on.

A vector, matrix, data frame, even a variable is an object. So, R has 5 basic classes of objects. This includes:

1. Character.
2. Numeric (Real Numbers).
3. Integer (Whole Numbers).
4. Complex.
5. Logical (True / False).

> c <- "Hello"
> print(class(c))
[1] "character"

> n <- 26.9
> print(class(n))
[1] "numeric"

> i <- 2L
> print(class(i))
[1] "integer"

> m <- 2 + 5i
> print(class(m))
[1] "complex"

> l <- TRUE
> print(class(l))
[1] "logical"

R has various types of ‘data types’ which include vector (numeric, integer, etc), matrices, data frames, and list. Let’s understand them one by one.

Vector

A vector contains an object of the same class. It contains elements of the same type. The data types can be logical, integer, double, character, complex or raw. You can mix objects of different classes too. When objects of different classes are mixed in a list, coercion occurs. This effect causes the objects of different types to ‘convert’ into one class. Coercion is from lower to higher types from logical to integer to double to character.

For example: vectors are generally created using the  c()  function which is used to combine or concatenate.

> x <- c(1,9,3,12,43)
> typeof(x)
[1] "double"

> length(x)
[1] 5

> x <- c(3,6.3,TRUE,"R Programming")
> x
[1] "3"             "6.3"           "TRUE"          "R Programming"

> typeof(x)
[1] "character"


List

A list is a special type of vector which contains elements of different data types.

For example: list can be created using the  list()  function.

> mylist <- list("name" = "John", "age" = 30, "likes" = "R programming")
> mylist
$name
[1] "John"

$age
[1] 30

$likes
[1] "R programming"

> str(mylist)
List of 3
 $ name : chr "John"
 $ age  : num 30
 $ likes: chr "R programming"


Matrices

Matrices are the R objects in which the elements are arranged in a two-dimensional rectangular layout. They contain elements of the same atomic types. Though we can create a matrix containing only characters or only logical values, they are not of much use. We use matrices containing numeric elements to be used in mathematical calculations. A Matrix is created using the  matrix()  function.

The basic syntax for creating a matrix in R is −
matrix(data, nrow, ncol, byrow, dimnames)

data is the input vector which becomes the data elements of the matrix.
nrow is the number of rows to be created.
ncol is the number of columns to be created.
byrow is a logical clue. If TRUE then the input vector elements are arranged by row.
dimname is the names assigned to the rows and columns.

> myMatrix <- matrix((1:15),3,5,FALSE)
> myMatrix
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    4    7   10   13
[2,]    2    5    8   11   14
[3,]    3    6    9   12   15

> newMatrix <- matrix((24:31),4,2,TRUE
> newMatrix
      [,1] [,2]
[1,]   24   25
[2,]   26   27
[3,]   28   29
[4,]   30   31


Factors

A factor is a data structure used for fields that take only predefined, finite number of values (categorical data). For example, a data field such as marital status may contain only values from single, married, separated, divorced, or widowed. In such a case, we know the possible values beforehand and these predefined, distinct values are called levels. Following is an example of factor in R:

> data <- c("single","married","married","married","single","single","married")
> factor_data <- factor(data)
> factor_data
[1] single  married married married single  single  married

Levels: married single


Data Frames

This is the most commonly used member of the data types family. It is used to store tabular data. It is different from a matrix. In a matrix, every element must have the same class. But, in a data frame, you can put a list of vectors containing different classes. This means every column of a data frame acts like a list.

For example:

> student <- data.frame("Roll No." = 1:3, "Name" = c("John","Sam","Mary"))
> student
     Roll.No. Name
1        1    John
2        2    Sam
3        3    Mary

> class(student)
[1] "data.frame"

> typeof(student)
[1] "list"


Functions in R

Functions are used to logically break our code into simpler parts which become easy to maintain and understand.

Syntax for writing a function in R:

Function_name <- function(arg1, arg2, ..) {
          Function body
}


R provides certain number of built-in functions like  seq() ,  mean() ,  max() , sum(x)   and paste(…) ,  etc. :

> print(seq(33,37))
[1] 33 34 35 36 37

> print(mean(15:25)
[1] 20

> print(sum(36:48))
[1] 546


This has been the introduction to R. We’ll dive deeper into R in my further blogs.

References:

  1. Beginning R The Statistical Programming Language – Dr. Mark Gardener
  2. https://www.programiz.com/r-programming
  3. https://www.analyticsvidhya.com/blog/2016/02/complete-tutorial-learn-data-science-scratch/

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.

Topics:
r ,tutorial ,hello world ,big data ,programming languages

Published at DZone with permission of Ramandeep Kaur. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}