Unix Parallel: Populating All the USB Sticks
We take a look at how to leverage GNU's "parallel" command to help improve the throughput of some menial work, in this case involving creating a number of USB sticks at one time. Read on to find out more!
Join the DZone community and get the full member experience.
Join For FreeThe day before Graph Connect Europe 2016 we needed to create a bunch of USB sticks containing Neo4j and the training materials and eventually iterated our way to a half decent approach which made use of the GNU parallel command, which I’ve always wanted to use!
But first I needed to get a USB hub so I could do lots of them at the same time. I bought the EasyAcc USB 3.0 but there are lots of other ones that do the same job.
Next, I mounted all the USB sticks and then renamed the volumes to be NEO4J1 -> NEO4J7:
for i in 1 2 3 4 5 6 7; do diskutil renameVolume "USB DISK" NEO4J${i}; done
I then created a bash function called ‘duplicate’ to do the copying work:
function duplicate() {
i=${1}
echo ${i}
time rsync -avP --size-only --delete --exclude '.*' --omit-dir-times /Users/markneedham/Downloads/graph-connect-europe-2016/ /Volumes/NEO4J${i}/
}
We can now call this function in parallel like so:
seq 1 7 | parallel duplicate
And that’s it. We didn’t get a 7x improvement in the throughput of USB creation from doing 7 in parallel but it took ~9 minutes to complete 7 compared to 5 minutes each. Presumably, there’s still some part of the copying that is sequential further down – Amdahl’s law #ftw.
I want to go and find other things that I can use pipe into parallel now!
Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments