R: dplyr - Select 'random' rows from a data frame
Join the DZone community and get the full member experience.
Join For FreeFrequently I find myself wanting to take a sample of the rows in a data frame where just taking the head isn’t enough.
Let’s say we start with the following data frame:
data = data.frame( letter = sample(LETTERS, 50000, replace = TRUE), number = sample (1:10, 50000, replace = TRUE) )
And we’d like to sample 10 rows to see what it contains. We’ll start by generating 10 random numbers to represent row numbers using the runif function:
> randomRows = sample(1:length(data[,1]), 10, replace=T) > randomRows [1] 8723 18772 4964 36134 27467 31890 16313 12841 49214 15621
We can then pass that list of row numbers into dplyr’s slice function like so:
> data %>% slice(randomRows) letter number 1 Z 4 2 F 1 3 Y 6 4 R 6 5 Y 4 6 V 10 7 R 6 8 D 6 9 J 7 10 E 2
If we’re using that code throughout our code then we might want to pull out a function like so:
pickRandomRows = function(df, numberOfRows = 10) { df %>% slice(runif(numberOfRows,0, length(df[,1]))) }
And then call it like so:
> data %>% pickRandomRows() letter number 1 W 5 2 Y 3 3 E 6 4 Q 8 5 M 9 6 H 9 7 E 10 8 T 2 9 I 5 10 V 4 > data %>% pickRandomRows(7) letter number 1 V 7 2 N 4 3 W 1 4 N 8 5 G 7 6 V 1 7 N 7
Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
Azure Virtual Machines
-
Understanding Data Compaction in 3 Minutes
-
How To Use Git Cherry-Pick to Apply Selected Commits
-
Deploying Smart Contract on Ethereum Blockchain
Comments