Chaos Engineering: Deadlock
A ‘deadlock’ is a situation where a set of processes are blocked because each process is holding a resource and waiting for another resource acquired by another process.
Join the DZone community and get the full member experience.Join For Free
In this series of chaos engineering articles, we have been learning to simulate various performance problems. In this post, let’s discuss how to simulate deadlock.
What Is a Deadlock?
Deadlocks tend to happen in multi-threaded applications. The technical definition of a ‘deadlock’ is a situation where a set of processes are blocked because each process is holding a resource and waiting for another resource acquired by some other process. Here is a practical example that may help you understand deadlocks.
Fig1: Trains starting on the same track.
Fig2: Trains experiencing Deadlock.
Let’s say there is only one train track and this train track has six parts (part-1, part-2, part-3, part-4, part-5, part-6). Let’s say Train-A starts at part-1 and Train-B starts at Part-6 on the same train track at the same time. Let’s say both trains travel at the same speed. Under this circumstance, Train-A and Train-B will reach a deadlock state when they reach part-3 and part-4 of the train track. When Train-A is in part-3 of the train track, it will be stuck waiting for part-4 of the track, which Train-B holds. On the other hand, when Train-B is in part-4, it will be stuck waiting for part-3, which Train-A holds. Thus, both trains can’t move forward. This is a classic deadlock situation. Once a deadlock happens in an application, it can not be recovered. The only way to recover from a deadlock is to restart the application.
Java Deadlock Program
Here is a sample program from the open-source BuggyApp application, which generates deadlock between two threads.
Notice the sample program contains the ‘
DeadLockDemo’ class. This class has a
start() method. In this method, two threads with the names ‘
ThreadA’ and ‘
ThreadB’ are launched.
run()’ method in ‘
ThreadA’ invokes ‘
CoolObject#method1()’. Similarly, the ‘
run()’ method in ‘
ThreadB’ invokes ‘
If you notice, both ‘
CoolObject#method1()’ and ‘
HotObject#method2()’ are synchronized methods. When a method is synchronized, only one thread that has the lock of that object can execute that method. If another thread tries to execute the method, it will go into a
BLOCKED state until the first thread completes executing the method. After entering the respective methods, both threads sleep for 10 seconds and then continue to invoke other method, i.e., ‘
CoolObject#method1()’ will invoke ‘
HotObject#method2()’ and ‘
HotObject#method2()’ will invoke ‘
So let's visualize what happens when the above program is executed:
- ThreadA acquires CoolObject’s lock.
- ThreadB acquires HotObject’s lock.
- ThreadA waits for HotObject’s lock.
- ThreadB waits for CoolObject’s lock.
Thus both threads will end up in classic Deadlock.
How to Troubleshoot Deadlocks?
You can diagnose Deadlock either through a manual or automated approach.
In the manual approach, you need to capture thread dump from the application suffering from Deadlock. The thread dump is a snapshot of all the threads that are running in the application. It contains thread names, thread IDs, thread state, code execution path, lock level details. Once the thread dump is captured, you need to import the thread dump from your production servers to your local machine. You can use thread dump analysis tools like fastThread or samurai to analyze the thread dumps from your local machine.
You can use root cause analysis tools like yCrash – which automatically captures application-level data (thread dump, heap dump, Garbage Collection log) and system-level data (netstat, vmstat, iostat, top, top -H, dmesg, …). Besides capturing the data automatically, it marries these two datasets and generates an instant root cause analysis report. Below is the report generated by the yCrash tool when the above sample program is executed:
Fig: yCrash tool pointing out the root cause of the Deadlock.
You can clearly see yCrash reporting ‘
Thread-16’ and ‘
Thread-15’ suffering from Deadlock. yCrash also reports the stack trace of ‘
Thread-16’ and ‘
Thread-15’. From the stack trace, you can notice ‘
Thread-16’ acquiring lock of ‘
HotObject’, but waiting for ‘
CoolObject’ lock. On the other hand, ‘
Thread-15’ acquired the lock of ‘
CoolObject’, but waiting for the ‘
HotObject’ lock. Now based on this stack trace we know the exact code that is causing the problem.
Opinions expressed by DZone contributors are their own.