Over a million developers have joined DZone.

Running Bash Commands in Parallel

A modern server is typically multi-core, perhaps even multi-CPU. That is plenty of computing power to unleash on a given job. However, unless you run a job in parallel, you are not maximizing the use of all that power.

· Performance Zone

Evolve your approach to Application Performance Monitoring by adopting five best practices that are outlined and explored in this e-book, brought to you in partnership with BMC.

A modern server is typically multi-core, perhaps even multi-CPU. That is plenty of computing power to unleash on a given job. However, unless you run a job in parallel, you are not maximizing the use of all that power.

Below are some typical everyday operations we can speed up using parallel computing:

  1. Backup files from multiple source directories to a removable disk.
  2. Resize image files in a directory.
  3. Compress files in a directory.

To execute a job in parallel, you can use any of the following commands:

  • ppss
  • pexec
  • GNU parallel

This post focuses on the GNU parallel command.

Installation of GNU Parallel

To install GNU parallel on a Debian/Ubuntu system, run the following command:

$ sudo apt-get install parallel

General Usage

The GNU parallel program provides many options which you can specify to customize its behavior. Interested readers can read its man page to learn more about their usage. In this post, I will narrow the execution of GNU parallel to the following scenario.

My objective is to run a shell command in parallel but on the same multi-core machine. The command can take multiple options, but only 1 is variable. Specifically, you run concurrent instances of the command by providing a different value for that one variable option. The different values are fed, one per line, to GNU parallel via the standard input.

The rest of this post shows how GNU parallel can backup multiple source directories by running rsync in parallel.

Parallel Backup

The following command backs up 2 directories in parallel: /home/peter and /data.

$ echo -e '/home/peter\n/data' | parallel -j-2 -k --eta rsync -R -av {} /media/myBKUP

Standard Input

The echo command assembles the 2 source directory locations, separated by a newline character (\n), and pipes it to GNU parallel.

How Many Jobs?

By default, GNU parallel deploys 1 job per core. You can override the default using the -j option.

-j specifies the maximum number of parallel jobs that GNU parallel can deploy. The maximum number can be specified in 1 of several ways:

  • -j followed by a number-j2 means that up to 2 jobs can run in parallel.
  • -j+ followed by a number-j+2 means that the maximum number of jobs is the number of cores plus 2.
  • -j- followed by a number-j-2 means that the maximum number of jobs is the number of cores minus 2.

If you don't know how many cores the machine has, run the command below:

$ parallel --number-of-cores
8

Keeping Output Order

Each job may output lines to the standard output. When multiple jobs are run in parallel, the default behavior is that a job's output is displayed as soon as the job finishes. You may find this confusing because the output order may be different from the input order. The -k option keeps the output sequence the same as the input sequence.

Showing Progress

The --eta option reports progress while GNU parallel executes, including the estimated remaining time (in seconds).

Input Place-Holder

GNU parallel substitutes the {} parameter with the next line in the standard input.

Each input line is a directory location, e.g., /home/peter. Instead of the full location, you can specify other parameters in order to extract a portion thereof - e.g., the directory name(/home) and the basename (peter). Please refer to the man page for details.

Summary

GNU parallel is a tool that Linux administrators should add to their repertoire. Running a job in parallel can only improve one's efficiency. If you are already familiar with xargs, you will find the syntax familiar. Even if you are new to the command, there is a wealth of on-line help on the GNU parallel website.

Learn tips and best practices for optimizing your capacity management strategy with the Market Guide for Capacity Management, brought to you in partnership with BMC.

Topics:
shell script ,parallel

Published at DZone with permission of Peter Leung, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}