Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Debugging Oddjob: Java Parallel Runtime Execs Running Serially Under Java 7

DZone's Guide to

Debugging Oddjob: Java Parallel Runtime Execs Running Serially Under Java 7

Java’s native process support is notoriously flaky; updating to Java 7u80 fixes anomalous log messages with Oddjob.

· Java Zone
Free Resource

Build vs Buy a Data Quality Solution: Which is Best for You? Gain insights on a hybrid approach. Download white paper now!

Several Oddjob users have reported that, when running several execs in parallel on Windows, they all appeared to wait for each other to complete. The issue was easy to reproduce using this Oddjob configuration:

<oddjob>
  <job>
    <parallel>
      <jobs>
        <exec redirectStderr="true"><![CDATA[TestJob.cmd 2]]></exec>
        <exec redirectStderr="true"><![CDATA[TestJob.cmd 10]]></exec>
      </jobs>
    </parallel>
  </job>
</oddjob>


Where TestJob.cmd is:

ping -n %1 127.0.0.1
echo Finished pinging for %1.
exit 0


The problem can be seen here:

Parallel Exec Jobs

From the console of the first Exec Job it has clearly finished but its icon is still showing as Executing.

Java’s native process support is notoriously flaky, especially on Windows, and was the prime suspect. However, first I had to eliminate Oddjob from the enquiry. Here is some simple Java code that reproduces the problem:

public class ExecMain {

static class Exec implements Runnable {
private final String waitSeconds;

Exec(String waitSeconds) {
this.waitSeconds = waitSeconds;
}

@Override
public void run() {
long startTime = System.currentTimeMillis();

final ByteArrayOutputStream captureOutput = new ByteArrayOutputStream();

ProcessBuilder processBuilder = 
new ProcessBuilder("TestJob.cmd", waitSeconds);
processBuilder.redirectErrorStream(true);

try {
final Process process = processBuilder.start();
Thread t = new Thread(new Runnable() {
@Override
public void run() {
copy(process.getInputStream(), captureOutput);
}
});
t.start();
process.waitFor();
System.out.println("Process for TestJob.cmd " + waitSeconds + 
" finished in " + secondsFrom(startTime) + " seconds.");
t.join();
System.out.println("Output thread for TestJob.cmd " + waitSeconds + 
" joined after " + secondsFrom(startTime) + " seconds.");
}
catch (InterruptedException | IOException e) {
throw new RuntimeException(e);
}
}

void copy(InputStream from, OutputStream to) {
byte[] buf = new byte[0x0400];
try {
while (true) {
int r = from.read(buf);
if (r == -1) {
break;
}
to.write(buf, 0, r);
}
}
catch (IOException e) {
throw new RuntimeException(e);
}
}

int secondsFrom(long startMillis) {
return Math.round((System.currentTimeMillis() - startMillis) / 1000);
}
}

public static void main(String... args) {

new Thread(new Exec("2")).start();
new Thread(new Exec("10")).start();
}
}


And here’s the output:

Process for TestJob.cmd 2 finished in 1 seconds.
Output thread for TestJob.cmd 2 joined after 9 seconds.
Process for TestJob.cmd 10 finished in 9 seconds.
Output thread for TestJob.cmd 10 joined after 9 seconds.

We can see that the process ends as expected after a second, but joining on the stream copying thread doesn’t happen until the sibling process has finished. This can only be if the first processes output stream isn’t being closed. Is it waiting for its siblings process output stream to close too?

Hours of Googling prove fruitless. Then by happenstance, I run my sample against Java 8. It works as expected. Off to the Java bug database – nothing obvious. Oddjob is currently supported on Java 7 and above so I downloaded the latest Java 7u80 release just to see, and it works to. Here is the correct output:

Process for TestJob.cmd 2 finished in 1 seconds.
Output thread for TestJob.cmd 2 joined after 1 seconds.
Process for TestJob.cmd 10 finished in 9 seconds.
Output thread for TestJob.cmd 10 joined after 9 seconds

And now in Oddjob we can see the Exec Job completes when the process does:

Parallel Exec Fixed
So this is a story with a happy ending but a niggling loose end. What was the Java bug that caused this? If you have an idea, please post a comment for others to see!

Build vs Buy a Data Quality Solution: Which is Best for You? Maintaining high quality data is essential for operational efficiency, meaningful analytics and good long-term customer relationships. But, when dealing with multiple sources of data, data quality becomes complex, so you need to know when you should build a custom data quality tools effort over canned solutions. Download our whitepaper for more insights into a hybrid approach.

Topics:
oddjob ,java 7

Published at DZone with permission of Rob Gordon, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}