R has many different data structures for different scenarios.

### Lists

Lists are vectors that allow their elements to be any type of object. They are created using the `list()`

function.

`> x <- list(1, "two", c(3, 4))`

In this example, we’ve defined `x`

as a list consisting of three elements: the number `1`

, the string `"two"`

, and a vector, `3 4`

, of length 2. We can examine the structure of x using the `str()`

function.

```
> str(x)
List of 3
$ : num 1
$ : chr "two"
$ : num [1:2] 3 4
```

Remember that each element of the list is a vector; `1`

is a numeric vector of length 1, and `two`

is a character vector of length 1.

One particularly interesting ability of a list is that it can contain lists within it. Had we defined `x`

as `x <- list(1, "two", list(3, 4))`

, the `str()`

function would have returned:

```
> str(x)
List of 3
$ : num 1
$ : chr "two"
$ :List of 2
..$ : num 3
..$ : num 4
```

This means that a list is a recursive object (you can test this with the `is.recursive()`

function). Lists can be hypothetically nested indefinitely.

### Factors

A factor is a vector that stores categorical data—data that can be classified by a finite number of categories. These categories are known as the `levels`

of a factor.

Say you define `x`

as a collection of the strings `"a"`

, `"b"`

, and `"c"`

: `x <- c("b", "c", "b", "a", "c", "c")`

.

Using the `factor()`

function, you can have R convert the atomic character vector into a factor. R will automatically attempt to determine the levels of the factor; this will produce an error when `factor`

is given an argument that is non-atomic. Let’s take a look at the factor here:

```
> x <- c("b", "a", "b", "c", "a", "a")
> x <- factor(x)
> \# this can also be written as x <- factor(c("b", "a", "b", "c", "a", "a"))
> x
[1] b a b c a a
Levels: a b c
> str(x)
Factor w/ 3 levels "a","b","c": 2 1 2 3 1 1
> levels(x)
[1] "a" "b" "c"
> table(x)
x
a b c
3 2 1
```

By using the `factor()`

function on `x`

, R logically categorized the values into “levels.” When x was printed, R returned the elements in its original order, but it also printed the levels of the factor. Examining the structure of `x`

shows that `x`

is a factor with three levels, lists the levels (alphabetically), and then shows which level each element of the factor corresponds to. So here, since `"b"`

is alphabetically second, the `2`

in `2 1 2 3 1 1`

corresponds with `"b"`

.

The `levels()`

function returns a vector containing only the names of the different levels of the factor. So here, the function `levels(x)`

returns the three levels “a”, “b”, and “c”, in order (here from the lowest value of the level to the highest).

The `tables()`

function gives a table summarizing the factor. Using the `table()`

function on `x`

returned the name of the variable, a list of the levels of `x`

, and then, underneath, the number of values that occurs in `x`

corresponding with the above level. So this table shows us that, in the factor `x`

, there are three instances of the level `"a"`

, two instances of `"b"`

, and one instance of `"c"`

.

If the levels of your factor need to be in a particular order, you can use the `factor()`

argument `levels`

to define the order, and set the argument `ordered`

to `TRUE`

:

```
> x <- c("b", "a", "b", "c", "a", "a")
> x <- factor(x, levels = c("c", "b", "a"), ordered = TRUE
> x
[1] b a b c a a
Levels: c < b < a
> str(x)
Ord.factor w/ 3 levels "c"<"b"<"a": 2 3 2 1 3 3
> levels(x)
[1] "c" "b" "a"
> table(x)
x
c b a
1 2 3
```

Now R returned the levels in the order specified by the vector given to the `levels`

argument. The `<`

(less than) symbol in the output of `x`

and `str(x)`

indicate that these levels are ordered, and the `str(x)`

function reports that the object is an ordered factor.

### Matrixes

A matrix is, in most cases, a two-dimensional atomic data structure (though you can have a one-dimensional matrix, or a non-atomic matrix made from a list). To create a matrix, you can use the `matrix()`

function on a vector with the `nrow`

and/or `ncol`

arguments. `matrix(1:20, nrow = 5)`

will produce a matrix with five rows and four columns containing the numbers one through twenty. `matrix(1:20, ncol = 4)`

produces the same matrix.

```
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
```

The matrix will fill by column unless the argument `byrow`

is set to `TRUE`

.

Note that the position indexes are assigned to rows **and** columns here. Since a matrix is naturally two-dimensional, R provides column indexes to more easily interact with the matrix. You can use the index vector `[]`

to return the value of an individual cell of the matrix. `x[1,2]`

will return the value of row one, column 2: `6`

. You can also use the index vector to return the values of whole rows or columns. `x[1,]`

will return `1 6 11 16`

, the elements of the first row of the matrix.

You can also create a matrix by assigning dimensions to a vector using the `dim()`

function, as shown here:

```
x <- 1:20
dim(x) <- c(5, 4)
```

This created the same matrix you saw earlier. With the `dim()`

function, you can also redefine the dimensions of a matrix. `dim(x) <- c(4,5)`

will “redraw” the matrix to have four rows and five columns.

### Arrays

What happens if the vector you passed to the `dim()`

function had more than two elements? If we had written `dim(x) <- c(5, 2, 2)`

we would have created another data structure: an array.

Technically, a matrix is specifically a two-dimensional array, but arrays can have unlimited dimensions. When `x`

contained 20 elements—`x <- 1:20`

—executing `dim(x) <- c(5, 2, 2)`

would have given `x`

three dimensions. R would represent this as a “series” of matrixes:

```
> x
, , 1
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
, , 2
[,1] [,2]
[1,] 11 16
[2,] 12 17
[3,] 13 18
[4,] 14 19
[5,] 15 20
```

In the case of an array, the “row” and “column” numbers remain in the same order, and R will show the other dimensions above each matrix. In this case, we received two matrixes (based on the third dimension given) of five rows (based on the first dimension given) and two columns (based on the second dimension given). R displays arrays in order of each dimension given—so if we had an array of **four** dimensions (say `5, 2, 2, 2`

), it would print matrixes `, , 1, 1`

, then `, , 1, 2`

, then`, , 2, 1`

, and lastly `, , 2, 2`

.

Again, you can use index vectors to find a particular element, or particular elements, of the array. In our three-dimensional array shown earlier, `x[1, 2, 2]`

will return `16`

. You can see by the way R has printed the array that rows come **before** the first comma, columns come **after** the first comma, and the third dimension of the array comes **after the second comma**.

### Data Frames

A data frame is a (generally) two-dimensional structure consisting of vectors of the same length. Data frames are used often, as they are the closest data structure in R to a spreadsheet or relational data tables. You can use the `data.frame()`

function to create a data frame.

```
> x <- data.frame(y = 1:3, z = c("one", "two", "three"), stringsAsFactors = FALSE)
> x
y z
1 1 one
2 2 two
3 3 three
```

In this example, we have created a data frame with two columns and three rows. Using `y =`

and `z =`

defines the names of the columns, which will make them easier to access, manipulate and analyze. Here, we’ve used the argument `stringsAsFactors = FALSE`

to make column `z`

an atomic character vector instead of a factor. By default, data frames will coerce vectors of strings into factors.

You can use the `names()`

function to change the names of your columns. `names(x) <- c("a", "b")`

provides a vector of new values to replace the column names, changing the columns to `a`

and `b`

. To change a certain column or columns, you can use the index vector to specify which column(s) to rename.

```
> names(x)[1] <- "a"
> x
a z
1 1 one
2 2 two
3 3 three
```

You can combine data frames with the `cbind()`

function or the `rbind()`

function. `cbind()`

will add the columns of one data frame to another, as long as the frames have the same number of rows.

```
> cbind(x, b = data.frame(c("I", "II", "III"), stringsAsFactors = FALSE)))
a z b
1 1 one I
2 2 two II
3 3 three III
```

`rbind()`

will add the rows of one data frame to the rows of another, so long as the frames have the same number of columns and have the same column names.

```
> rbind(x, data.frame(a = 4, z = "four"))
a z
1 1 one
2 2 two
3 3 three
4 4 four
```

`cbind()`

and `rbind()`

will also coerce vectors and matrixes of the proper lengths into a data frame, so long as one of the arguments of the bind function is a data frame. We could have used `rbind(x, c(4, "four"))`

to take the data frame `x`

we defined earlier, and coerce the vector `c(4, "four")`

to fit into the existing data frame. But coercion can affect the way your data frame stores your data. In this case, the vector `c(4, "four")`

would have coerced the integer `4`

into the character `"4"`

. Then the data frame would have coerced the entire first column into a character vector. This makes it safer to use `rbind()`

and `cbind()`

to bind data frames with each other.

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}