# R: dplyr - Select 'random' rows from a data frame

Join the DZone community and get the full member experience.

Join For FreeFrequently I find myself wanting to take a sample of the rows in a data frame where just taking the head isn’t enough.

Let’s say we start with the following data frame:

data = data.frame( letter = sample(LETTERS, 50000, replace = TRUE), number = sample (1:10, 50000, replace = TRUE) )

And we’d like to sample 10 rows to see what it contains. We’ll start by generating 10 random numbers to represent row numbers using the runif function:

> randomRows = sample(1:length(data[,1]), 10, replace=T) > randomRows [1] 8723 18772 4964 36134 27467 31890 16313 12841 49214 15621

We can then pass that list of row numbers into dplyr’s slice function like so:

> data %>% slice(randomRows) letter number 1 Z 4 2 F 1 3 Y 6 4 R 6 5 Y 4 6 V 10 7 R 6 8 D 6 9 J 7 10 E 2

If we’re using that code throughout our code then we might want to pull out a function like so:

pickRandomRows = function(df, numberOfRows = 10) { df %>% slice(runif(numberOfRows,0, length(df[,1]))) }

And then call it like so:

> data %>% pickRandomRows() letter number 1 W 5 2 Y 3 3 E 6 4 Q 8 5 M 9 6 H 9 7 E 10 8 T 2 9 I 5 10 V 4 > data %>% pickRandomRows(7) letter number 1 V 7 2 N 4 3 W 1 4 N 8 5 G 7 6 V 1 7 N 7

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Comments