Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Zombie Processes: A Short Survival Guide

DZone's Guide to

Zombie Processes: A Short Survival Guide

Zombie processes can cause resource leakage or be a sign of a bug. Learn how to find them and debug them in this quick guide.

· Performance Zone ·
Free Resource

Sensu is an open source monitoring event pipeline. Try it today.

In this article, we will talk about zombie processes that are specific to Unix and Unix-like systems. Usually, they are not too dangerous, but in some cases, zombie processes can cause resource leakage and can be a sign of a bug in a program or an operating system. Let’s see what they are and what you can do with them.

What Is a Zombie Process?

A zombie process is a process that has completed but still has an entry in the process table. The process table in an Operating System records process information such as ID, parent, status, etc. A child process is that which is created by a higher order process (its parent). Each process might create many children, but each child has only one parent. If a process doesn’t have a parent, that usually means that this process was created by the kernel. When a child process is terminated, the kernel keeps some information about it in the process table (including its exit status). The parent needs to read the exit status of the child before it removes the child’s entry from the table. A child process must always become a zombie until its status is collected by its parent.

Are Zombies Bad?

When a process is dead, all resources associated with it are deallocated so that they can be reused by other processes. A zombie process does not use more memory than is required for keeping its entry in the resource table, which is negligible. The problem occurs when you have too many zombies.

There is only one process table per system and this table has a limited number of unique processes identifiers (PIDs). If you have too many entries in this table, it won’t be able to create a new one.

Where Is the Bug?

It is a parent’s duty to reap its dead child, and if you have a parent process that leaves too many zombies, that is a bug of the parent, and the best solution is to fix it.

If a parent dies before a child, the init takes care of an orphan and reaps it automatically. Therefore, killing the parent might be a solution to a potential bug in the parent, however, if killing the parent doesn’t result in zombie removal, then this implies that the bug is in the system.

How to Find and Kill a Zombie Process

Although debugging the parent process is the most effective way to control the multiplication of zombies, it might be useful and necessary to find zombies and kill them. There are several ways to find zombie processes, for example:

top | grep zombie

or

ps aux | grep -w Z

or

ps -alx | awk ‘$10 ~ /STAT|Z/

Now, finding zombie processes in our system:

$ ps aux | grep -w Z
git 2512 0.6 0.0 0 0 ? Z 08:31 1:08 [grunt] 
git 3574 0.0 0.0 0 0 ? Z кві13 0:48 [grunt] 
root 12523 0.0 0.0 112652 1036 pts/5 S+ 11:22 0:00 grep - color=auto -w Z
git 13855 0.3 0.0 0 0 ? Z 07:07 0:47 [grunt] 
git 14896 0.0 0.0 0 0 ? Z кві13 0:48 [grunt] 
git 16213 0.0 0.0 0 0 ? Z кві03 0:44 [grunt] 
git 24146 0.5 0.0 0 0 ? Z 07:49 1:11 [grunt] 
git 26321 0.0 0.0 0 0 ? Z кві13 0:47 [grunt] 
git 29765 0.5 0.0 0 0 ? Z 08:10 1:09 [grunt] 
git 32440 0.0 0.0 0 0 ? Z кві13 0:47 [grunt]

Killing zombie processes is not that easy. The most proper way is to find its parent and to kill or restart it. Also, rebooting the server might be helpful, but this is not our approach.

Let’s find parent processes (you can see their PID’s in the third column):

$ ps ajx | grep -w Z
 2475 2512 2427 2427 ? -1 Z 1004 1:08 [grunt] 
 3557 3574 3509 3509 ? -1 Z 1004 0:48 [grunt] 
13839 13855 13791 13791 ? -1 Z 1004 0:47 [grunt] 
14869 14896 14820 14820 ? -1 Z 1004 0:48 [grunt] 
16192 16213 16144 16144 ? -1 Z 1004 0:44 [grunt] 
24120 24146 24071 24071 ? -1 Z 1004 1:11 [grunt] 
26290 26321 26236 26236 ? -1 Z 1004 0:47 [grunt] 
29725 29765 29645 29645 ? -1 Z 1004 1:09 [grunt] 
32423 32440 32375 32375 ? -1 Z 1004 0:47 [grunt]

It might be necessary to search for more details about parent processes, for example:

$ ps auxww | grep 32375
git 32375 0.0 0.0 211180 2892 ? Ss кві13 0:00 git-receive-pack

After that we can simply kill this process:

kill -9 32375

And now it’s all cleaned:

$ ps ajx | grep -w Z
$

The output is empty — no zombies! 

Summary

Finding zombie processes and killing their parents seems easy, but in some cases that might not be the best solution. The kill -9 command (aka kill -SIGKILL) terminates a program immediately. It might work for some simple processes, however, most processes need to clean up temporary files and wrap up properly before being terminated. As a result of kill -9, there is a risk of unexpected problems, that are difficult to debug. That is why we highly recommend you only use kill -9 if you don’t see any other ways to solve this problem. 

Special thanks to Evgeny Lebed for his materials that were used in the part How to find and kill a zombie process.

Sensu: workflow automation for monitoring. Learn more—download the whitepaper.

Topics:
monitoring ,unix ,performance

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}