Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Java/Scala: Runtime.exec hanging/in ‘pipe_w’ state

DZone's Guide to

Java/Scala: Runtime.exec hanging/in ‘pipe_w’ state

· Java Zone ·
Free Resource

Build vs Buy a Data Quality Solution: Which is Best for You? Gain insights on a hybrid approach. Download white paper now!

On the system that I’m currently working on we have a data ingestion process which needs to take zip files, unzip them and then import their contents into the database.

As a result we delegate from Scala code to the system unzip command like so:

def extract {
  var command = "unzip %s -d %s" format("/file/to/unzip.zip", "/place/to/unzip/to")
  var process: Process = null
 
  try {
    process = Runtime.getRuntime.exec(command)
    val exitCode = process.waitFor
  } catch {
    case e : Exception => // do some stuff
  } finally {
    // close the stream here
  }
}

 

We ran into a problem where the unzipping process was hanging and executing ‘ps’ showed us that the ‘unzip’ process was stuck in the ‘pipe_w’ (pipe waiting) state which suggested that it was waiting for some sort of input.

After a bit of googling Duncan found this blog which explained that we needed to process the output stream from our process otherwise it might end up hanging

a.k.a. RTFM:

The Runtime.exec methods may not work well for special processes on certain native platforms, such as native windowing processes, daemon processes, Win16/DOS processes on Microsoft Windows, or shell scripts.

The created subprocess does not have its own terminal or console. All its standard io (i.e. stdin, stdout, stderr) operations will be redirected to the parent process through three streams (Process.getOutputStream(), Process.getInputStream(), Process.getErrorStream()).

The parent process uses these streams to feed input to and get output from the subprocess.

Because some native platforms only provide limited buffer size for standard input and output streams, failure to promptly write the input stream or read the output stream of the subprocess may cause the subprocess to block, and even deadlock.

For most of the zip files we presumably hadn’t been reaching the limit of the buffer because the list of files being sent to STDOUT by ‘unzip’ wasn’t that high.

In order to get around the problem we needed to gobble up the output stream from unzip like so:

import org.apache.commons.io.IOUtils
def extract {
  var command = "unzip %s -d %s" format("/file/to/unzip.zip", "/place/to/unzip/to")
  var process: Process = null
 
  try {
    process = Runtime.getRuntime.exec(command)
    val thisVariableIsNeededToSuckDataFromUnzipDoNotRemove = "Output: " + IOUtils.readLines(process.getInputStream)
    val exitCode = process.waitFor
  } catch {
    case e : Exception => // do some stuff
  } finally {
    // close the stream here
  }
}

We need to do the same thing with the error stream as well in case ‘unzip’ ends up overflowing that buffer as well.

On a couple of blog posts that we came across it was suggested that we should ‘gobble up’ the output and error streams on separate threads but we weren’t sure why exactly that was considered necessary…

If anyone knows then please let me know in the comments.

 

From http://www.markhneedham.com/blog/2011/11/20/javascala-runtime-exec-hangingin-pipe_w-state

Build vs Buy a Data Quality Solution: Which is Best for You? Maintaining high quality data is essential for operational efficiency, meaningful analytics and good long-term customer relationships. But, when dealing with multiple sources of data, data quality becomes complex, so you need to know when you should build a custom data quality tools effort over canned solutions. Download our whitepaper for more insights into a hybrid approach.

Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}