Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Random Sampling From a File

DZone 's Guide to

Random Sampling From a File

See the Linux commands in action that will set up a random sampling of values from a file, either with or without replacement.

· Open Source Zone ·
Free Resource

I recently learned about the Linux command line utility shuf from browsing The Art of Command Line. This could be useful for random sampling.

Given just a file name, shuf randomly permutes the lines of the file.

With the option -n, you can specify how many lines to return. So it’s doing sampling without replacement. For example...

shuf -n 10 foo.txt


... would select 10 lines from foo.txt.

Actually, it would select at most 10 lines. You can’t select 10 lines without replacement from a file with fewer than 10 lines. If you ask for an impossible number of lines, the -n option is ignored.

You can also sample with replacement using the -r option. In that case, you can select more lines than are in the file since lines may be reused. For example, you could run ...

shuf -r -n 10 foo.txt


... to select 10 lines drawn with replacement from foo.txt, regardless of how many lines foo.txt has. For example, when I ran the command above on a file containing

alpha
beta
gamma


I got the output:

beta
gamma
gamma
beta
alpha
alpha
gamma
gamma
beta


I don’t know how shuf seeds its random generator. Maybe from the system time. But if you run it twice you will get different results. Probably.

Topics:
open source ,linux commands ,command line ,random sampling ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}