DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Demystifying Java's Compare-and-Swap (CAS)
  • How To Build a Google Photos Clone - Part 1
  • Java and Low Latency
  • Zero-Downtime Deployments for Java Apps on Kubernetes

Trending

  • Building Production-Grade GenAI on GCP with Vertex AI Agent Builder
  • Product-Led Software Delivery: Intelligent Platforms for DevOps at Scale
  • Why Google Data Migration Gets Stuck at 99%: Causes and Proven Fixes
  • What Is Plagiarism? How to Avoid It and Cite Sources
  1. DZone
  2. Coding
  3. Java
  4. Lock-Free Programming: From Primitives to Working Structures

Lock-Free Programming: From Primitives to Working Structures

Locking is not the only way to deal with concurrency. Lock-free programming approaches are on the opposite side. Let's dive into them.

By 
Bartłomiej Żyliński user avatar
Bartłomiej Żyliński
DZone Core CORE ·
Jul. 22, 25 · Analysis
Likes (7)
Comment
Save
Tweet
Share
4.5K Views

Join the DZone community and get the full member experience.

Join For Free

Working with multiple threads is one of the most complex problems we may encounter in our daily work. When put against the wall of multithreading, most people right away reach out for blocking approaches. In Java, it takes the form of the synchronized keyword, or some other less painful mechanisms, like ReentrantLock. Locks are not the only option: Lock-free programming is also the way.

In this text, I will show problems, techniques, and best practices related to Lock-Free Programming. I will also provide a real-life example of how to implement a Lock-Free stack. Besides, I will share common patterns on moving from Lock-Free to Wait-Free.

Here you can find only the most interesting code samples. The full source code is available in my GitHub repository.

What Is Lock-Free Programming?

Guarantee: Lock-free programming algorithms guarantee system-wide progress. Within a finite number of steps, by all threads, at least one thread completes its operation.

Typical techniques: Atomic primitives CAS, LL/SC, FAA.

Pros:

  • Scales well under contention because there is no kernel blocking or context-switch overhead.
  • Eliminate deadlocks and livelocks

Cons:

  • Harder to design and reason about.
  • Higher cost under very low contention compared with a simple mutex.
  • Starvation is still possible — one unlucky thread might repeatedly fail while others succeed, but the program as a whole keeps moving forward.

What Is Wait-Free Programming?

Guarantee: Wait-free algorithm provides a per-thread progress bound. Every thread finishes its operation in a finite number of its own steps, regardless of what the other threads do.

Typical techniques: Per-thread operation descriptors and finite-step helper functions

Pros:

  • All from lock-free programming
  • Real-time friendly
  • Eliminate the starvation problem

Cons:

  • Significantly more memory and bookkeeping.
  • Often slower in the uncontended “happy path”
  • Harder (sometimes impossible) to create for complex data structures.

Blocking vs. Lock-Free vs. Wait-Free: The Progress Guarantees

Deadlock

Deadlock is probably the most common error in multithreading. Two (or more) threads each hold resources that the others need and wait forever for those resources to be released. Our threads will not be able to move forward with processing. Thus, we end with a standstill — a deadlock.

Shell
 
Thread A: lock(m1); lock(m2);
Thread B: lock(m2); lock(m1);


If A acquires m1 and B acquires m2, both block forever on the second lock().

Deadlock

Livelock

A livelock is motion without progress. Like two people in a hallway, sidestepping left and right in perfect sync and never getting past each other. Threads are active — spinning, retrying, sending messages, but the system never completes a useful operation. Usually caused by over-reactive collision‐avoidance or continual retries without an eventual terminal state. 

LivelockStarvation

Starvation means a particular thread never gets the resources or CPU time it needs. Even despite the fact that the system as a whole keeps making progress. If other threads hit the same data structure with high frequency. Then one of the threads may have to retry many times because its compare-and-swap keeps failing. This high contention can cause the Thread to starve. 

Starvation


Problem Lock-Free Programming Wait-Free Programming
Deadlock Impossible. No thread ever waits while holding a lock, so circular-wait conditions can’t arise. Impossible. Same lock-free property plus bounded completion time for every thread.
Livelock Prevented. At least one operation must complete in a finite number of steps, so the system can’t get stuck in endless mutual retries. Prevented. Every operation finishes in a bounded number of steps, so the whole system and each thread move forward.
Starvation Possible. System makes progress, but an unlucky thread may be perpetually overtaken by others. Impossible. Each thread completes within a fixed bound, so no one can be starved.


Atomic Primitives That Power Non-Blocking Algorithms

Compare-and-Swap (CAS)

Compare-and-Swap (CAS) is probably the single most famous atomic instruction. In one atomic step, the processor:

  • Reads a memory word,
  • Checks whether that word still holds an expected value
  • Only if the comparison succeeds, replace it with a new value

That simple “check-and-set” sequence lets a thread act as if it briefly owned the variable without ever locking it.

Most, if not all, non-blocking data structures rely on this instruction mix with a loop to work correctly. You will see one of them below, in the form of LockFreeStack. However, there is a catch or two hidden here. First, when contention is rising, the loop starts taking more and more time to complete. The other is known as an A-B-A problem.

In Java, we have all the Atomic classes that have compareAndSwap methods. Additionally, you can try to emulate CAS with the use of the volatile keyword.

Load-Link / Store-Conditional (LL/SC)

One of the ways to address the A-B-A problem from above is the LC/SC instruction pair. It is a pair of two steps, atomic with respect to other threads. Effectively creating a “split CAS.”

Step What happens
LL Load the word at addr and place the CPU into a reservation state for that cache line
SC If reservation valid, no intervening write by another core, stores new value, otherwise it does nothing and returns false


The main difference between CAS and LL/SC is the reservation mechanism. Thus, it dodges the ABA problem. Unfortunately, LL/SC is not a silver bullet and has drawbacks. The biggest one is that the Reservation can be lost for reasons other than contention. In such a case, we lose all the benefits of using this primitive.

In Java, the closest thing to LL/SC is AtomicStampedReference.

Fetch-and-Add (FAA)

Fetch-and-Add, or AtomicAdd, performs the following operations in a single atomic step:

  • Read the current value at addr
  • Adds a delta k
  • Stores the result back.
  • Returns the old value (some variants return the new one).

There are other variants of this primitive, like: Fetch-and-Subtract. FAA is used as a base in more complex techniques like Fetch-And-Store or Test-And-Set.

The main disadvantage of this primitive is what its name suggests. FAA only supports arithmetic transforms of the form old + k. To do anything more complex, you will still need CAS or LL/SC.

In Java, all Atomic classes have exposed getAndIncrement or incrementAndGet methods.

Double-Width CAS

This is an extension of classic CAS. Double Width CAS (or DCAS) extends the regular CAS idea to two adjacent machine words treated as a single 128-bit (or larger) unit. Both sub-words must match their expected values for the store to occur.

In pseudocode:

C++
 
cas2(expected_lo, expected_hi,
     new_lo, new


It allows you to store a pointer in the low word and a tag in the high word. Both words must match, so resurrecting an old pointer with the same address, but a different tag, fails. Thus solving the A-B-A problem.

DCAS enables part of lock-free data structures to swap pairs (pointer/counter or value/status) atomically. The main drawback is heavier micro-architectural cost than single-word CAS (locks 16-byte cache line).

Primitives Summary

Primitive General form Typical uses Common pitfalls
CAS old ⇒ new Universal lock-free building block ABA problem, retry cost under contention
DW-CAS (old_lo, old_hi) ⇒ (new_lo, new_hi) Pointer + tag pairs, ABA defense Heavier locks 16-byte cache line
LL/SC LL; …; SC split Portable lock-free ops when CAS not present Reservation loss → more retries
FAA new = old + k Counters, ticket locks, ref-counts Cache-line ping-pong, limited to arithmetic ops


Besides these four primitives, we have more complex ones like Atomic Exchange / Test-and-Set (TAS), Fetch-and-store (FAS).

Lock-Free Programming In Java: Lock-Free Stack

Base

Simplest Lock Free Stack, with single atomic top pointer. Push and pop CAS loops operate on that pointer. It is a singly linked Stack; thus, the Node class only contains a next reference.

Java
 
import java.util.concurrent.atomic.AtomicReference;

private record Node<E>(E value, Node<E> next) {

}

private final AtomicReference<Node<E>> top = new AtomicReference


Lock-Free Push

Here we have a lock-free push. You can see the CAS loop wrap around the top pointer.

Java
 
 /* ---------------- push ----------------------- */
public void push(E item) {
    Node<E> newNode;
    Node<E> oldTop;
    do {
        oldTop = top.get();
        newNode = new Node<>(item, oldTop);
    } while (!top.compareAndSet(oldTop, newNode));


What’s happening:

  1. Read the current head pointer.
  2. Build the replacement node – safe because nothing else can see newNode yet.
  3. CAS attempt cas(oldTop, newNode).
    • If no one has changed top in the meantime, the CAS succeeds, and the loop exits.
    • If another thread slipped in, the CAS fails, we loop to re-snapshot the new top.

Lock-Free Pop

Java
 
public E pop() {
    Node<E> oldTop;
    Node<E> newTop;
    do {
        oldTop = top.get();
        if (oldTop == null) return null;
        newTop = oldTop.next();
    } while (!top.compareAndSet(oldTop, newTop));
    return oldTop.value();
}


What’s happening:

  1. Read the current head pointer.
  2. Null-check nothing to pop → return immediately, no CAS needed.
  3. Assign a new top.
  4. CAS from oldTop ➜ newTop
    • If no one has changed top in the meantime, the CAS succeeds, we atomically detach oldTop.
    • If another thread slipped in, the CAS fails, we loop to re-snapshot the new top.
  5. Return — safe because oldTop is now private to this thread, GC will eventually reclaim it.

From Lock-Free Programming to Wait-Free Programming

I won't be presenting a complete Wait-Free implementation. It is long, complex, and explaining it correctly will probably double the size of this blog. However, there are a few good sources on building Wait-Free data structures.

  • Queues
    • Scalable Synchronous Queue
    • Wait-Free Queue
  • Stack
    • A Wait-Free Stack
    • Fetch-And-Add Wait-Free Stack
  • General

There are a couple of common points shared between all Wait-Free implementations:

  • Op Log — Each thread writes a small record describing what it wants to do.
  • Per-thread slot reservation → Thread grabs a unique index with FAA; writes its OpLogs there.
  • Phase (ticket) numbers → Every request gets a monotonically increasing “phase”.
  • Bounded Help loops → A helper will finish at most k foreign operations before returning to its own.

Summary

As you could see, both Lock-free and wait-free are not simple things. However, with a couple of good tools and techniques, they can be much easier.

Key Takeaways

Definitions

  • Lock-free programming → from all active threads, at least one makes progress in a finite number of steps.
  • Wait-free programming→ every operation by every thread completes in a bounded number of its own steps.

Atomic Primitives

  • CAS
  • LL/Sc
  • FAA
  • Double-width CAS

Best Practices in Implementing Lock-Free and Wait-Free Structures

  • Design first, code later → Draw the state diagram and identify every location that can change concurrently.
  • One atomic word to one logical invariant → Keep each CAS/LL-SC operating on the entire state you need to test.
  • Use DCAS → rather than splitting an invariant across two variables.
  • Write the fast path first → Attempt a cheap single-CAS path first; after N failures, publish a descriptor and switch to the helping (“slow”) path.
  • Bounded helping (Wait-Free) → Ensure a helper can finish at most k foreign operations before returning to its own.

Thank you for your time!

Central Authentication Service Requirements engineering Java (programming language) Lock (computer science)

Published at DZone with permission of Bartłomiej Żyliński. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Demystifying Java's Compare-and-Swap (CAS)
  • How To Build a Google Photos Clone - Part 1
  • Java and Low Latency
  • Zero-Downtime Deployments for Java Apps on Kubernetes

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook