Gracefully Shutting Down Java in Containers
Let's look at some lessons learned about gracefully shutting down Java programs in the worlds of Docker, Kubernetes, and enterprise integration.
Join the DZone community and get the full member experience.
Join For FreeGracefulness is not only an admirable human quality: It is also a must-have for any application program, especially when it is heaving the burden of mission-critical domains.
UltraESB has had a good history of maintaining gracefulness throughout its runtime, including shutdown. The new UltraESB-X honored the tradition and implemented graceful shutdown in its 17.07 release.
When we composed the ips-worker
Docker image for our Integration Platform (IPS) as a tailored version of UltraESB-X, we could guarantee that ESBs running in the platform would shut down gracefully — or so we thought.
Unfortunately not.
As soon as we redeploy or change the replication count of a cluster, all ESB instances running under the cluster would terminate (and new instances get spawned to take their place). The termination is supposed to be graceful; the ESBs would first stop accepting any new incoming messages and hold off the internal shutdown sequence for a few seconds until processing of in-flight messages gets completed, or a timeout ends the hold-off.
On our Kubernetes-based mainstream IPS release, we retrieve logs of ESB instances (pods) via the K8s API as well as via a database appender so that we can analyze them later. In analyzing the logs, we noticed that we were never seeing any ESB shutdown logs, no matter how big the log store has grown. It was as if the ESBs were getting brutally killed as soon as the termination signal was received.
To investigate the issue, I started off with a quick-and-dirty Java program: one that registers a shutdown hook — the world-famous way of implementing graceful shutdown in Java, which we had utilized in both our ESBs — and keeps running forever, printing some text periodically (to indicate the main
thread is active). As soon as the shutdown hook is triggered, I interrupted the main
thread, changed the output to indicate that we are shutting down, and let the handler finish after a few seconds (consistent with a "mock" graceful shutdown).
class Kill {
private static Thread main;
public static void main(String[] a) throws Exception {
Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
public void run() {
System.out.println("TERM");
main.interrupt();
for (int i = 0; i < 4; i++) {
System.out.println("busy");
try {
Thread.sleep(1000);
} catch (Exception e) {}
}
System.out.println("exit");
}
}));
main = Thread.currentThread();
while (true) {
Thread.sleep(1000);
System.out.println("run");
}
}
}
Testing it is pretty easy:
javac Kill.java
java Kill
While the program keeps on printing:
run
run
run
...
Press Ctrl+C to see what happens:
...
run
run
^CTERM
busy
Exception in thread "main" java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at Kill.main(Kill.java:22)
busy
busy
busy
exit
Looks good.
That done, converting this into a fully-fledged Docker container took only a few minutes, and the following Dockerfile:
FROM openjdk:8-jre-alpine
ADD Kill*.class /
ENTRYPOINT ["java", "Kill"]
Plus the build command:
docker build -t kill:v1 .
Next, I ran a container with the new image:
docker run -it --rm kill:v1
Which gave the expected output:
run
run
run
...
Then I sent a TERM
signal (which maps to Ctrl+C in normal jargon, and is the default trigger for Java's shutdown hook) to the process, using the kill
command:
# pardon the fancy functions;
# they are quite useful for me when dealing with processes
function pid() {
ps -ef | grep $1 | grep -v grep | awk '{print $2}'
}
function killsig() {
for i in `pid $2`; do
sudo kill $1 $i
done
}
alias termit='killsig -15'
# with all the above in place, I just have to run:
termit Kill
As expected, the shutdown hook got invoked and executed smoothly.
Going a step further, I made the whole thing into a standalone K8s pod (backed by a single-replica Deployment):
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kill
spec:
selector:
matchLabels:
k8s-app: kill
template:
metadata:
labels:
k8s-app: kill
spec:
containers:
- name: kill
image: kill:v1
...And tried out the same thing, this time by zeroing-out spec.replicas
(same as we do it in IPS) via the kubectl edit deployment
command, instead of a manual kill -TERM
:
kubectl edit deployment kill
# vi is my default editor
# set "replicas" to 0 (line 20 in my case)
# <ESC>:wq<ENTER>
...While having a console tail of the pod in a separate window:
# fancy stuff again
function findapp() {
kubectl get pod -l k8s-app=$1 -oname | cut -b 6-;
}
function klog() {
kubectl logs -f `findapp $1`;
}
# the final command
klog kill
Showing the output:
run
run
...
run
TERM
busy
Exception in thread "main" java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at Kill.main(Kill.java:22)
busy
busy
busy
exit
Damn, it still shuts down gracefully!
So what's wrong with my ips-worker
?
Just to verify, I got a single-replica cluster running on IPS, manually changed the image (spec.template.spec.containers[0].image
) and startup command (spec.template.spec.containers[0].command
) of the K8s deployment via kubectl edit
(keeping all other factors—such as environmental variables and volume mounts—unchanged), and tried out the same zero-out sequence;
Same result! Graceful shutdown!
Then it occurred to me that, while my kill
container simply uses a java Kill
command, ips-worker
uses a bit more complicated command:
/bin/sh -c \
<copy some files> && \
<run some custom command> && \
<run ultraesb-x.sh>
Where, in the last part, we construct (with a specially fabricated classpath, and some JVM parameters) and execute a pretty long java
command that starts up the UltraESB-X beast.
So ultimately, the final live command in the container boils down to:
/bin/sh -c <basepath>/ultraesb-x.sh
Hence I tried a shell command on my kill
container by slightly changing the Dockerfile:
FROM openjdk:8-jre-alpine
ADD Kill*.class /
# note the missing brackets and quotes, so that the command gets the default /bin/sh -c prefix
ENTRYPOINT java Kill
And yay! Graceful shutdown was no more. The Java process got killed brutally, on Docker (docker stop
) as well as in K8s (replica zero-out).
Investigating further, I was guided by Google to this popular SE post, which basically said that the shell (sh
) does not pass received signals to its child processes by default. The suggested alternative was to run the internal command as an exec
which would basically replace the parent process (sh
) with the child (java
, in case of kill
):
FROM openjdk:8-jre-alpine
ADD Kill*.class /
ENTRYPOINT exec java Kill
For kill
, that did the trick right away.
For ips-worker
things were a bit different, as there were two levels of invocation: the container's command invoking a chain of commands via /bin/sh -c
, and the built-in ultraesb-x.sh
invoking the ultimate java
command. Hence I had to include exec
at two places:
Once at the end of the command chain:
/bin/sh -c \
<copy some files> && \
<run some custom command> && \
exec <basepath>/ultraesb-x.sh
And again at the end of ultraesb-x.sh
:
# do some magic to compose the classpath and other info for ESB startup
exec $JAVA_HOME/bin/java <classpath and other params>
Simple as it may seem, those two exec
s were enough to bring back graceful shutdown to ips-worker
, and hence to our Integration Platform.
Published at DZone with permission of Janaka Bandara, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments