This can’t be happening. This was pretty much the only thought in my head when staring at the log files. The JVM generating those logs was getting SIGTERM signals out of nowhere and disappearing without a trace.
Let me repeat the last sentence – several times a week someone was deliberately killing an otherwise perfectly decent Java batch job.
The mystery started to unwind when I managed to get access to the rest of the data from within this machine.
Apparently this java process was not the only inhabitant in this machine. I discovered another JVM running on the very same box. The second JVM was apparently running a small webapp deployed to the Tomcat app server. What immediately caught my eye was the correlation in availability of those applications. It seemed that whenever the problematic batch job died, Tomcat was also facing some sort of outage. But Tomcat seemed to recover shortly after, as the batch job remained dead.
An hour later I had went through the Tomcat log files and found another interesting pattern. Right before the Tomcat restarts the all-too-familiar java.lang.OutOfMemoryError: heap space was staring right into my face. So apparently the Tomcat was dying due to lack of memory. But it was still not explaining why the batch job was behaving like it was.
And then I found myself watching at the following parameter in the JAVA_OPTS used to launch Tomcat:
I was not even aware there is such an option available. But apparently you can indeed register a shell script to be executed when your JVM has run out of memory. The OutOfMemoryErrors indeed are a painpoint if they have deserved a special flag in the JVM.
But jokes asides, the author of this solution did not take into account the potential of having several Java processes in the same JVM. And so he fired a SIGTERM to all Java processes in this machine. Mystery solved.
Moral of the story? If there was a world cup for guys who are into hiding the symptoms instead of solving the underlying problem, this one would have made it to the semi-finals. Why on earth should you think it is a good idea to solve a shortage of memory and/or memory leak in such a peculiar way?
If you belong to the ranks of engineers who always try to get down to the root cause then subscribe to our Twitter feed for performance tuning advice.