Parallel.ForEachwe are getting reduced performance and our waits are timing out. In this part we’ll dive deep into the problem and find out what is going on.
Clearly the problem has to do with the
Process class and/or with spawning processes in general. Looking more carefully, we notice that besides the spawned process we also have two asynchronous reads. We have intentionally requested these to be executed on separate threads. But that shouldn’t be a problem, unless we have misused the async API.
It is reasonable to suspect that the approach used to do the async reading is at fault. This is where I ventured to look at the
Process class code. After all, it’s a wrapper on Win32 API, and it might make assumptions that I was ignoring or contradicting. Regrettably, that didn’t help to figure out what was going on, except for initiating me to write the said previous post.
Looking at the
BeginOutputReadLine() function, we see it creating an
AsyncStreamReader, which is internal to the .Net Framework and then calls
BeginReadLine(), which presumably is where the async action happens.
AsyncStreamReader.BeginReadLine() we see the familiar asynchronous read on Stream using
Unfortunately, I had incorrectly assumed that this async wait was executed on one of the I/O Completion Port threads of the
ThreadPool. It seems that this is not the case, as
ThreadPool.GetAvailableThreads() always returned the same number for completionPortThreads (incidentally, workerThreads value didn’t change much as well, but I didn’t notice that at first).
A breakthrough came when I started changing the maximum parallelism (i.e. maximum thread count) of
I thought I should increase the maximum number of threads to resolve the issue. Indeed, for certain values of MaxDegreeOfParallelism, I could never reproduce the problem (all processes finished very swiftly, and no timeouts). For everything else, the problem was reproducible most of the time. Nine out of ten I’d get timeouts. However, and to my surprise, the problem went away when I reduced MaxDegreeOfParallelism!
The magic number was 12. Yes, the number of cores at disposal on my dev machine. If we limit the number of concurrent
ForEach executions to less than 12, everything finishes swiftly, otherwise, we get timeouts and finishing
ExecAll() takes a long time. In fact, with maxThreads=11, 500 process executions finish under 8500ms, which is very commendable. However, with maxThreads=12, every 12 process wait until they timeout, which would take several minutes to finish all 500.
With this information, I tried increasing the
ThreadPool limit of threads using
ThreadPool.SetMaxThreads(). But it turns out the defaults are 1023 worker threads and 1000 for I/O Completion Port threads, as reported by
ThreadPool.GetMaxThreads(). I was assuming that if the available thread count was lower than the required, the
ThreadPool would simply create new threads until it reached the maximum configured value.
Putting It All Together
The assumption that
Parallel.ForEach executes its body on the
ThreadPool, assuming said body is a black-box is clearly flawed. In our case the body is initiating asynchronous I/O which needs their own threads. Apparently, these do not come from the I/O thread pool but the worker thread pool. In addition, the number of threads in this pool is initially set to that of the available number of cores on the target machine. Even worse, it will resist creating new threads until absolutely necessary. Unfortunately, in our case it’s too late, as our waits are timing out. What I left until this point (both for dramatic effect and to leave the solution to you, the reader, to find out) is that the timeouts were happening on the
StandardError streams. That is, even though the child processes had exited a long time ago, we were still waiting to read their output.
Let me spell it out, if it’s not obvious: Each call to spawn and wait for a child process is executed on a
ThreadPool worker thread, and is using it exclusively until the waits return. The async stream reads on
StandardError need to run on some thread. Since they are apparently queued to run on a
ThreadPool thread, they will starve if we use all of the available threads in the pool to wait on them to finish. Thereby timing out on the read waits (because we have a deadlock).
This is a case of Leaky Abstraction, as our black box of a “execute on ThreadPool” failed miserably when the code executed itself depended on the
ThreadPool. Specifically, when we had used all available threads in the pool, we left none for our code that depends on the
ThreadPool to use. We shot ourselves in the proverbial foot. Our abstraction failed.
In the next part we’ll attempt to solve the problem.