# Introduction to R: The Statistical Programming Language

# Introduction to R: The Statistical Programming Language

### Now you can get started using R, a statistical programming language- even if you're coming from a traditional programming background. Read on to get started.

Join the DZone community and get the full member experience.

Join For FreeThe open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

R is a powerful language used widely for data analysis and statistical computing. It was developed in the early 90s. It is one of the most popular languages used by statisticians, data analysts, researchers, and marketers to retrieve, clean, analyze, visualize and present data. It is open source and free. It supports cross-platform interoperability, i.e. R code written on one platform can easily be ported to another without any issues.

IEEE publishes a list of the most popular programming languages each year. R was ranked 5th in 2016, up from 6th in 2015. It is a big deal for a domain-specific language like R to be more popular than a general purpose language like C#.

R is easy to learn. All you need is data and a clear intent to draw a conclusion based on analysis of that data. However, programmers that come from a Python, PHP or Java background might find R quirky and confusing at first. The syntax that R uses is a bit different from other common programming languages.

To install and run R on your ubuntu systems use the following commands: `sudo apt-get update`

and `sudo apt-get install r-base`

.

After installation, type R in your terminal and you are good to go!

**Basics of R Programming**

R can be used like a calculator and indeed one of its principal uses is to undertake complex mathematical and statistical calculations. R can perform simple calculations as well as more complex ones.

To get familiar with the R coding environment, let’s start with some basic calculations. R console can be used as an interactive calculator too:

```
> 2 + 3
[1] 5
> 8 / 2
[1] 4
> (2*15)/(3*2)
[1] 5
> log(12)
[1] 2.484907
> sqrt (169)
[1] 13
```

You also see that this line begins with [1] rather than the > cursor. R is telling you that the first element of the answer is 5. At the moment this does not seem very useful, but the usefulness becomes clearer later when the answers become longer.

**Datatypes**

R is object-oriented, which means that it expects to find named things to deal with in some way. For example, if you are conducting an experiment and collecting data from several samples, you want to create several named data objects in R in order to work on them and do your analyses later on.

A vector, matrix, data frame, even a variable is an object. So, R has 5 basic classes of objects. This includes:

1. Character.

2. Numeric (Real Numbers).

3. Integer (Whole Numbers).

4. Complex.

5. Logical (True / False).

```
> c <- "Hello"
> print(class(c))
[1] "character"
> n <- 26.9
> print(class(n))
[1] "numeric"
> i <- 2L
> print(class(i))
[1] "integer"
> m <- 2 + 5i
> print(class(m))
[1] "complex"
> l <- TRUE
> print(class(l))
[1] "logical"
```

R has various types of ‘data types’ which include vector (numeric, integer, etc), matrices, data frames, and list. Let’s understand them one by one.

**Vector**

A vector contains an object of the same class. It contains elements of the same type. The data types can be logical, integer, double, character, complex or raw. You can mix objects of different classes too. When objects of different classes are mixed in a list, coercion occurs. This effect causes the objects of different types to ‘convert’ into one class. Coercion is from lower to higher types from logical to integer to double to character.

For example: vectors are generally created using the `c()`

function which is used to combine or concatenate.

```
> x <- c(1,9,3,12,43)
> typeof(x)
[1] "double"
> length(x)
[1] 5
> x <- c(3,6.3,TRUE,"R Programming")
> x
[1] "3" "6.3" "TRUE" "R Programming"
> typeof(x)
[1] "character"
```

**List**

A list is a special type of vector which contains elements of different data types.

For example: list can be created using the `list()`

function.

```
> mylist <- list("name" = "John", "age" = 30, "likes" = "R programming")
> mylist
$name
[1] "John"
$age
[1] 30
$likes
[1] "R programming"
> str(mylist)
List of 3
$ name : chr "John"
$ age : num 30
$ likes: chr "R programming"
```

**Matrices**

Matrices are the R objects in which the elements are arranged in a two-dimensional rectangular layout. They contain elements of the same atomic types. Though we can create a matrix containing only characters or only logical values, they are not of much use. We use matrices containing numeric elements to be used in mathematical calculations. A Matrix is created using the `matrix()`

function.

The basic syntax for creating a matrix in R is −

matrix(data, nrow, ncol, byrow, dimnames)

*data* is the input vector which becomes the data elements of the matrix.*nrow* is the number of rows to be created.*ncol* is the number of columns to be created.*byrow* is a logical clue. If TRUE then the input vector elements are arranged by row.*dimname* is the names assigned to the rows and columns.

```
> myMatrix <- matrix((1:15),3,5,FALSE)
> myMatrix
[,1] [,2] [,3] [,4] [,5]
[1,] 1 4 7 10 13
[2,] 2 5 8 11 14
[3,] 3 6 9 12 15
> newMatrix <- matrix((24:31),4,2,TRUE
> newMatrix
[,1] [,2]
[1,] 24 25
[2,] 26 27
[3,] 28 29
[4,] 30 31
```

**Factors**

A factor is a data structure used for fields that take only predefined, finite number of values (categorical data). For example, a data field such as marital status may contain only values from single, married, separated, divorced, or widowed. In such a case, we know the possible values beforehand and these predefined, distinct values are called levels. Following is an example of factor in R:

```
> data <- c("single","married","married","married","single","single","married")
> factor_data <- factor(data)
> factor_data
[1] single married married married single single married
Levels: married single
```

**Data Frames**

This is the most commonly used member of the data types family. It is used to store tabular data. It is different from a matrix. In a matrix, every element must have the same class. But, in a data frame, you can put a list of vectors containing different classes. This means every column of a data frame acts like a list.

For example:

```
> student <- data.frame("Roll No." = 1:3, "Name" = c("John","Sam","Mary"))
> student
Roll.No. Name
1 1 John
2 2 Sam
3 3 Mary
> class(student)
[1] "data.frame"
> typeof(student)
[1] "list"
```

**Functions in R**

Functions are used to logically break our code into simpler parts which become easy to maintain and understand.

Syntax for writing a function in R:

```
Function_name <- function(arg1, arg2, ..) {
Function body
}
```

R provides certain number of built-in functions like `seq()`

, `mean()`

, `max()`

, `sum(x)`

and `paste(…)`

, etc. :

```
> print(seq(33,37))
[1] 33 34 35 36 37
> print(mean(15:25)
[1] 20
> print(sum(36:48))
[1] 546
```

This has been the introduction to R. We’ll dive deeper into R in my further blogs.

**References:**

- Beginning R The Statistical Programming Language – Dr. Mark Gardener
- https://www.programiz.com/r-programming
- https://www.analyticsvidhya.com/blog/2016/02/complete-tutorial-learn-data-science-scratch/

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Published at DZone with permission of Ramandeep Kaur , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}