Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Java and Branch Prediction

DZone's Guide to

Java and Branch Prediction

Get your microprocessing performance hat on because we're going to talk about branch prediction in Java and how it affects your code.

· Java Zone ·
Free Resource

How do you break a Monolith into Microservices at Scale? This ebook shows strategies and techniques for building scalable and resilient microservices.

branch prediction unit is a device that is part of microprocessors that have a pipeline architecture that predicts a conditional jump in an executable program. Branch prediction allows you to reduce the pipeline downtime by preloading and executing instructions that must be executed after the conditional branch instruction is executed. Forecasting branching plays a critical role (the accuracy of the prediction of transitions in modern processors exceeds 90%) because it allows you to optimally use the computing resources of the processor.  

Let's consider this situation: There are two arrays with the same length. The first array is sorted and the other array is unsorted. Processing of one of them is faster than the other. As you could guess from the topic, processing the sorted array is faster than the unsorted array. In fact, branch prediction for the unsorted array will fail.

To understand branch prediction, one must first understand instruction pipelining: A processor with an implementation of branch prediction that usually makes correct predictions can minimize the performance penalty from branching. However, if branches are predicted poorly, it may create more work for the processor, such as flushing the incorrect code path from the pipeline that has begun execution before resuming execution at the correct location.

In case the of the sorted array, the branch predictor always takes the true branch, since historically, all its predictions are correct. But with the random unsorted array, the prediction will need to discard the partially executed instructions and start over with the correct branch.

It can be shown using JMH:

package samples.branch_prediction;

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.Arrays;
import java.util.Random;
import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(5)
@State(Scope.Benchmark)
public class BranchPredictionBenchmark {

    private static final int COUNT = 1024 * 1024;

    private byte[] sorted;
    private byte[] unsorted;

    @Setup
    public void setup() {
        sorted = new byte[COUNT];
        unsorted = new byte[COUNT];
        Random random = new Random(1234);
        random.nextBytes(sorted);
        random.nextBytes(unsorted);
        Arrays.sort(sorted);
    }

    @Benchmark
    @OperationsPerInvocation(COUNT)
    public void sorted(Blackhole bh1, Blackhole bh2) {
        for (byte v : sorted) {
            if (v > 0) {
                bh1.consume(v);
            } else {
                bh2.consume(v);
            }
        }
    }

    @Benchmark
    @OperationsPerInvocation(COUNT)
    public void unsorted(Blackhole bh1, Blackhole bh2) {
        for (byte v : unsorted) {
            if (v > 0) {
                bh1.consume(v);
            } else {
                bh2.consume(v);
            }
        }
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(".*" + BranchPredictionBenchmark.class.getSimpleName() + ".*")
                .build();

        new Runner(opt).run();
    }

}


The output of benchmark is next:

# Run complete. Total time: 00:01:48

Benchmark                           Mode  Cnt  Score   Error  Units
BranchPredictionBenchmark.sorted    avgt   25  4.289 ± 0.754  ns/op
BranchPredictionBenchmark.unsorted  avgt   25  9.742 ± 0.466  ns/op


For this benchmark, we used AverageTime BenchmarkMode (see 14 line of the code snapshot). It means that the lower score value is better, and as you see, BranchPredictionBenchmark.sorted is more than two times faster. The full listing of source code is on my GitHub repository.

How do you break a Monolith into Microservices at Scale? This ebook shows strategies and techniques for building scalable and resilient microservices.

Topics:
java ,java performance ,branch prediction ,tutorial

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}