Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

R: Ordering Rows in a Data Frame by Multiple Columns

DZone's Guide to

R: Ordering Rows in a Data Frame by Multiple Columns

· Big Data Zone
Free Resource

Find your next Big Data job at DZone Jobs. See jobs focused on Big Data or create your profile and have the employers come to you!

In one of the assignments of Computing for Data Analysis we needed to sort a data frame based on the values in  two of the columns and then return the top value.

The initial data frame looked a bit like this:

> names <- c("paul", "mark", "dave", "will", "john")
> values <- c(1,4,1,2,1)
> smallData <- data.frame(name = names, value = values)
> smallData
  name value
1 paul     1
2 mark     4
3 dave     1
4 will     2
5 john     1

I want to be able to sort the data frame by value and name both in ascending order so the final result should look like this:

  name value
3 dave     1
5 john     1
1 paul     1
4 will     2
2 mark     4

To do that we can use the order function which will tell us the indices of the vector in sorted order.

e.g. in our case

> order(c(1,4,1,2,1))
[1] 1 3 5 4 2

If we pass a collection of indices to the extract operation it’ll reorder the rows. e.g.

> smallData[c(5,4,3,2,1),]
  name value
5 john     1
4 will     2
3 dave     1
2 mark     4
1 paul     1

In our case we wire everything together like this to sort by the second column (value):

> smallData[order(smallData[,2]),]
  name value
1 paul     1
3 dave     1
5 john     1
4 will     2
2 mark     4

It’s a reasonably small tweak to get it to sort first by the second column and then by the first (name) which is what we want:

> smallData[order(smallData[,2], smallData[,1]),]
  name value
3 dave     1
5 john     1
1 paul     1
4 will     2
2 mark     4

If we wanted to use the column names instead of indices we’d do the following:

> smallData[order(smallData$value, smallData$name),]
  name value
3 dave     1
5 john     1
1 paul     1
4 will     2
2 mark     4

We could also rewrite it using the with function if we want to reduce the code further:

> smallData[with(smallData, order(value, name)),]
  name value
3 dave     1
5 john     1
1 paul     1
4 will     2
2 mark     4

As I understand it, when we use with we put smallData into the environment and evaluate the second argument to with with respect to that so in this case it allows us to refer to the column names of smallData.

Find your next Big Data job at DZone Jobs. See jobs focused on Big Data or create your profile and have the employers come to you!

Topics:

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}