Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Golang Race Detection

DZone 's Guide to

Golang Race Detection

Time to catch up with those annoying race bugs using Golang and ThreadSanitizer.

· Open Source Zone ·
Free Resource

Race conditions are pretty nasty bugs. They're hard to trace, reproduce and isolate, often occurring under unusual circumstances and often lead to hard-to-debug issues. Thus it’s best to prevent and detect them early. Thankfully, there’s a production-ready algorithm to tackle the challenge — ThreadSanitizer v2, a battle-proven library for compiler support in Go, Rust, C, and C++.

First, let’s display a typical race condition. We have a simple global count variable:

var Cnt int


We count the number of events happening; whether we’re counting HTTP requests, the number of likes or tinder matches is irrelevant. What’s relevant it how we do it. We call function Run:

fun Run(amount int) {for i := 0; i < amount; i++ {Cnt++}}


But that’s not all. We’re calling it from multiple goroutines:

func main() {wg := &sync.WaitGroup{}for i := 0; i < 10; i++ {wg.Add(1)gofunc() {defer wg.Done()Run(1000)}()}wg.Wait();fmt.Println(Cnt)}

And we’d expect our result to be 10000. Yet it’s improbably this will happen on a multicore multithreaded system. You’re most likely get results like 5234 one run, 1243 second run, or even 1521 last run without any determinism or repeatability. This is a typical data race!

Let’s not stop on this small example. Instead, let’s evaluate some famous real-world race conditions with serious consequences.

Famous Examples

DirtyCow

This is a race condition in the Linux kernel. It involves a tricky memory pages interplay, mmap and madvise system call which allow for privilege escalation exploit. That is, you could mmap a root-owned file copy-on-write, which is a valid operation (every write to this mmap-ed region will not be written to an underlying file), yet under certain conditions, write would propagate to underlying file, despite we’re an unprivileged user.

Therac-25

Another famous example is the Therac-25 radiotherapy machine. It had a race condition between machine settings and display settings. If the user typed instructions too quickly and changed them quickly, the machine could end up in maximum radiation dosage while displaying false information to the operator. This led to multiple accidents and death cases.

Definitions

Before continuing let’s briefly iterate over required definitions:

Race condition — A race condition is an undesirable situation that occurs when a device or system attempts to perform two or more operations at the same time, but because of the nature of the device or system, the operations must be done in the proper sequence to be done correctly.

Due to general race condition scope, and requiring domain knowledge understanding what is correct behavior, for the rest of this post we’re focusing on data race, much more narrow and objective definition.

Data race — Concurrent access to a memory location, one of which is a write.

Concurrent access — is one where event ordering isn’t known. There’s no happens before relation.

A happens-before relation requires some further explaining. Each goroutine has its own logical clock. It is incremented on each operation. Within each goroutine there’s strict ordering, event happen sequentially (or at least we observe them as if they happened sequentially, various compiler optimization/Out-of-order execution might interfere). Between goroutines, there’s no order unless there’s synchronization between them.

In the following image, you can see it visually depicted. happens before relations are highlighted using arrows. I’d like to remind you that a happens before relation is transitive.


Common Mistakes

Let’s observe a few common mistakes in Go programs. The examples have a similar equivalent in other major languages; it’s just in Go since I quite like it.

Race on The Loop Counter

func main() {var wg sync.WaitGroupwg.Add(5)for i := 0; i < 5; i++ {gofunc() {fmt.Println(i) // Not the 'i' you are looking for.wg.Done()}()}wg.Wait()}


Try it out. You might get 0 4 4 4 4 as the output depending on the event ordering. Certainly not the output you’d expect. The fix is quite simple:

func main() {var wg sync.WaitGroupwg.Add(5)for i := 0; i < 5; i++ {gofunc(j int) {fmt.Println(j)wg.Done()}(i)}wg.Wait()}

Unprotected Global

This one is similar to the introductory example:

var Cnt intfunc Inc(amount int) {Cnt += amount}func main() {go Inc(10)go Int(100)}


Solutions are to sync access to global Cnt variable since increment operations aren’t atomic (unless from the sync/atomic package).

Thus we use a mutex:

var Cnt intvar m = &sync.Mutex{}func Inc(amount int) {m.Lock()defer m.Unlock()Cnt += amount}


or the atomic variable:

var Cnt int64func Inc(amount int64) {atomic.AddInt64(&Cnt, amount)}

Violating Go Memory Model

Go memory model specifies what guarantees you have from the compiler. If you breach that contract, you’re in for a bad time. The compiler is free to optimize away your code or do unpredictable things with it.

var a stringvar done boolfunc setup() {a = "hello, world"done = true}func main() {go setup()for !done {}fmt.Print(a)}


In this example, Go’s memory model doesn’t guarantee to write to done in one goroutine is visible to other goroutines since there was no synchronization between them. The compiler is also free to optimize away

for !done {}

into a simpler construct, not loading done variable each iteration. Furthermore, there are no guarantees memory buffers between CPU cores and L1 caches shall be flushed between concurrent reads for the done variable. This is all due to the nasty complexity of out-of-order execution and various memory guarantees across architectures. 

Synchronization Primitives

In Go we have following synchronization primitives:

  • channels, that is, sending and receiving is a synchronization point
  • sync package
  • Mutex
  • atomics

For further details, look those up, they are not a concept unique to Go nor race detection. The important point is they bring happens-before ordering into the program code.

Detecting Race Conditions

In Go, this is simply done with -race compiler flag. We can even run our tests with it and fail on any race condition:

go install -racego build -racego test -race


As said in the beginning it uses the ThreadSanitizer library under the hood. You should expect runtime slowdown of 2-20x and increased memory consumption 5-10x. Other requirements are CGO-enabled and 64-bit operating system. It detects race conditions reliably, without false positives. During its usage it uncovered 1,000+ bugs in Chrome, 100+ in Go’s standard library, and more in other projects.

What -race flag does is instrument your code with additional instructions. For example, this simple function:

func Inc(x *int) {*x++}


Get’s compiled into:

movq "".x+8(SP), AXincq (AX)


However, upon turning on -race you get a bunch of assembly code, interesting bits being:

...call runtime.raceread(SB)...call runtime.racewrite(SB)...


That is compiler adds read and write barriers for each concurrently reachable memory location. The compiler is smart enough not to instrument local variables since this incurs a quite performance penalty.

Algorithm

First the bad news:

Determining if an arbitrary program contains potential data races is NP-hard.

First, how and when do we collect our data? Our choices are either dynamic or static analysis. The main static analysis drawback is the time required for proper source code annotation. This dynamic approach is used since it requires no additional programmer intervention except turning it on. Data could be collected on the fly or dumped somewhere for post mortem analysis. ThreadSanitizer uses an on-the-fly approach due to performance considerations. Otherwise, we could pile up huge amounts of unnecessary data.

There are multiple approaches for dynamic race detection, pure happens before based, lockset based and hybrid models. ThreadSanitizer uses pure happens before. Now we’ll go over 3 algorithms, each pure happens before dynamic race detection, each improving upon the last. We’ll see how ThreadSanitizer evolved and understand how it works from its humble origins:

Vector Clocks

First, let’s explain the concept of vector clocks. Instead of each goroutine remembering only its logical clock, we remember the logical clock of the last time we hear from another goroutine.

vector-clocks

Vector clocks are partially ordered set. If two events have strictly greater or less than relations between them, there’s a happens before relation between them. Otherwise, they are concurrent.

DJIT+

DJIT+ is an application of vector clocks for pure happens before data race detector.

We remember vector clocks for:

  •   Each lock m release $L_m= (t_0, \ldots , t_n)$
  •      Last memory read on location x $R_x = (t_0,\ldots, t_n)$
  •    Last memory write on location x $W_x = (t_0, \ldots, t_n)$
  •    Goroutine vector clock, $C_x = (t_0, \ldots, t_n)$

Let’s see an example where there are no races:

Each row represents a single event as our race detector sees it. First, we write to location x from goroutine 0, and it’s remembered in W_x. Afterward, we release the lock m and goroutine 0 field is updated in L_m, that is our lock release vector clock. By acquiring the lock, we max by elements vectors clocks of L_m and C_1, because we know lock’s lock happens after lock’s release. We perform the write-in goroutine 1 and check whether our own vector clock is concurrent to W_x and R_x. It’s strictly ordered, thus everything is good.

And now the same example, but without goroutine synchronization. There are no lock’s lock and release. When comparing goroutine 1 vector clock, (0, 8) is concurrent to the last write.W_x = (4,0) Thus, we have detected the data race. This is the most important concept to understand, as the rest is optimizations.

FastTrack

The Fast track introduces the first optimization. For full details I recommend reading the original paper, as it’s quite well-written!

We observe the following property. If there are no data races on writes, each write is sequential. That is, it’s sufficient to remember the last write’s goroutine and logical clock for that goroutine. Thus, we create the shadow word, representing the last memory write, logical clock and goroutine id. Format <logical clock>@<goroutine id>

For reads, it’s a bit of a different story. It’s perfectly valid having multiple concurrent reads as long as reads and write are strictly ordered. Thus, we utilize the shadow word read representation as long as a single goroutine reads the data, and fallback expensive full vector clock in concurrent read scenario:

ThreadSanitizer v2

Thread sanitizer further improved on the FastTrack. Instead of having separate ones, we keep a fixed sized shadow word pool for each 8-byte quadword. That is, we approximate the full vector clock with partial shadow words. Upon full shadow word pool we randomly evict an entry.

By introducing this trade-off, we trace false negatives for speed and performance. Our fixed shadow word pool is memory mapped, thus allowing cheap access with additions and byte shifts compared to more expensive hashmaps or variable sized array access.

Let’s go over an example. Each shadow word is 8 bytes wide and consists of goroutine logical clock, goroutine id, write/read flag and position in the 8 byte word we’re reading/writing.

Everything’s good so far. Let’s make another operation.

Now let’s introduce a data race. The goroutine 3 vector clock for goroutine 1 is 5, that is smaller than 10@1 shadow word written in the pool. Thus we’re certain a data race happened. (We assume events for each goroutine arrive in order, otherwise, we couldn’t be sure whether a 10@1 entry for goroutine 3 is bigger or smaller than the current vector clock’s entry for goroutine 3. This is enforced by algorithm design and implementation.)

Summary

To summarize this article, we covered the basics of logical and vector clocks, how they are used in happens before data race detector and we build it up to the ThreadSanitizer v2 which is used in go’s -race.

We observed the tradeoffs in the algorithm design. It traded a higher false negative rate for speed, however, it forbids a false positive. This property builds trust, and with that trust, we know for certain that we have a race if it screams at us, not some weird edge case in the algorithm. No flaky race tests, only true positives are reported.

Though, keep in mind, this is only data race detector; it’s easy to circumvent it. For example, the following code shall pass the data race detector despite being horribly wrong in the concurrent setting:

var Cnt int64func Run(amount int) {for i := 0; i < amount; i++ {val := atomic.LoadInt64(&Cnt)val ++ // incrementing Cnt isn't atomic, we just load and store the value atomically.atomic.StoreInt64(&Cnt, val)}}

References


Topics:
golang ,race condition ,distributed systems ,concurrency ,bug detection ,open source

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}