Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Detecting Visibility Bugs in Concurrent Java Applications

DZone 's Guide to

Detecting Visibility Bugs in Concurrent Java Applications

Learn what a visibility bug is, why it happens, and how to find elusive visibility bugs in concurrent Java applications.

· Performance Zone ·
Free Resource

The chances to detect visibility bugs vary. The following visibility bug can, in a best-case scenario, detected in 90 percent of all cases. In the worst case, the chance to detect the bug is lower than one in a million.

But first, what are visibility bugs?

What Are Visibility Bugs?

A visibility bug happens when a thread reads a stale value. In the following example, a thread signals another thread to stop the processing of its while loop:

public class Termination {
   private int v;
   public void runTest() throws InterruptedException   {
       Thread workerThread = new Thread( () -> { 
           while(v == 0) {
               // spin
           }
       });
       workerThread.start();
       v = 1;
       workerThread.join();  // test might hang up here 
   }
 public static void main(String[] args)  throws InterruptedException {
       for(int i = 0 ; i < 1000 ; i++) {
           new Termination().runTest();
       }
   }    
}

The bug is that the worker thread might never see the update of the variable v and therefore runs forever.

One reason for reading stale values is the cache of the CPU cores. Each core of modern CPUs has his own cache. So if the reading and writing thread runs on different cores the reading thread sees a cached value and not the value written by the writing thread. The following shows the cores and caches inside an Intel Pentium 4 CPU, from this superuser answer:

Image title

Each core of an Intel Pentium 4 CPU has its own level 1 and level 2 cache. All cores share a large level 3 cache. The reason for those caches is performance. The following numbers show the time needed to access the memory, from Computer Architecture, A Quantitative Approach, JL Hennessy, DA Patterson, 5th edition, page 72:

  • CPU Register ~ 300 picosecond
  • Level 1 Cache ~ 1 nanosecond
  • Main Memory ~ 50 - 100 nanosecond

Reading and writing to a normal field does not invalidate the cache, so if two threads on different cores read and write to the same variable, they see stale values. Let us see if we can reproduce this bug.

How to Reproduce a Visibility Bug

If you have run the above example, chances are high that the test does not hang up. The test needs so few CPU cycles that both threads typically run on the same core, and when both threads run on the same core they read and write to the same cache. Luckily, the OpenJDK provides a tool, jcstress, which helps with this type of test. jcstress uses multiple tricks so that the threads of the tests run on different cores. Here the above example is rewritten as a jcstress test:

@JCStressTest(Mode.Termination)
@Outcome(id = "TERMINATED", expect = Expect.ACCEPTABLE, desc = "Gracefully finished.")
@Outcome(id = "STALE", expect = Expect.ACCEPTABLE_INTERESTING, desc = "Test hung up.")
@State
public class APISample_03_Termination {
    int v;
    @Actor
    public void actor1() {
        while (v == 0) {
            // spin
        }
    }
    @Signal
    public void signal() {
        v = 1;
    }
}

This test is from the jcstress examples. By annotating the class with the annotation @JCStressTest, we tell jcstress that this class is a jcstress test. jcstress runs the methods annotated with @Actor and @Signal in a separate thread. jcstress first starts the actor thread and then runs the signal thread. If the test exits in a reasonable time, jcstress records the "TERMINATED" result; otherwise, the result "STALE."

jcstress runs the test case multiple times with different JVM parameters. Here are the results of this test on my development machine, an Intel i5 4 core CPU using the test mode stress.

Image title

For the JVM parameter-XX:-TieredCompilation, the thread hangs up in 90 percent of all cases, but for the JVM flags -XX:TieredStopAtLevel=1 and -Xint, the thread terminated in all runs.

After confirming that our example indeed contains a bug, how can we fix it?

How to Avoid Visibility Bugs

Java has specialized instructions which guarantee that a thread always sees the latest written value. One such instruction is the volatile field modifier When reading a volatile field a thread is guaranteed to see the last written value. The guarantee not only applies to the value of the field but to all values written by the writing thread before the write to the volatile variable. Adding the field modifier volatile to the field v from the above example makes sure that the while loop always terminates even if run in a test with jcstress.

public class Termination {
   volatile int v;
   // methods omitted
}

The volatile field modifier is not the only instruction which gives such visibility guarantees. For example, the synchronized statement and classes in the package java.util.concurrent give the same guarantees. A good read to learn about techniques to avoid visibility bugs is the book "Java Concurrency in Practice" by Brian Goetz et al.

After seeing why visibility bugs happen and how to reproduce and avoid them, let us look at how to find them.

How to Find Visibility Bugs

The Java Language Specification Chapter 17. Threads and Locks defines the visibility guarantees of the Java instructions formally. This specification defines a so-called "happens-before" relationship to define the visibility guarantees:

"Two actions can be ordered by a happens-before relationship. If one action happens-before another, then the first is visible to and ordered before the second."

And the reading from and writing to a volatile field creates such a happens-before relation:

"A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field."

Using this specification, we can check if a program contains visibility bugs, called "data race" in the specification.

"When a program contains two conflicting accesses (§17.4.1) that are not ordered by a happens-before relationship, it is said to contain a data race. Two accesses to (reads of or writes to) the same variable are said to be conflicting if at least one of the accesses is a write."

Looking at our example, we see that there is no happens-before relation between the read and the write to the shared variable v, so this example contains a data race according to the specification.

Of course, this reasoning can be automated. The following two tools use this rules to automatically detect visibility bugs:

  • ThreadSanitizer uses the rules of the C++ memory model to find visibility bugs in C++ applications. The C++ memory model consists of formal rules to specify the visibility guarantees of C++ instructions similar to what the Java Language Specification does for the Java instructions. There is a draft for a Java enhancement proposal, JEP draft: Java Thread Sanitizer, to include ThreadSanitizerinto the OpenJDK JVM. The use of ThreadSanitizer should be enabled through a command line flag.
  • vmlens, a tool I have written to test concurrent Java, uses the Java Language Specification to automatically check if a Java test run contains visibility bugs.
Topics:
java ,concurrent programming ,debugging ,performance ,tutorial ,caching

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}