{{announcement.body}}
{{announcement.title}}

Project Valhalla: Fast and Furious Java

DZone 's Guide to

Project Valhalla: Fast and Furious Java

Learn more about Project Valhalla Inline types and improved performance.

· Java Zone ·
Free Resource

Performance Improvements of New Inline Types

Project Valhalla is one of the most interesting projects concerning the future of Java and of all JVM. It is still in an experimental state, but in this post, we will look at how to use it and implement a small program to verify possible performance improvements.

Valhalla's aim is to bring a new Inline Types (aka Value Types) that will: "Codes like a class works like an int."

You may also like: Project Valhalla: A First Look at L-World Value Types

Currently, in the JVM, there are eight kinds of primitive types, each one is associated with a "signature letter" of the alphabet:

B — byte signed byte
C — char Unicode character encoded with UTF-16
D  double double-precision floating-point value
F  float single-precision floating-point value
I — int integer
J — long long integer
S — short signed short
Z — boolean true or false

On top of these, we have objects:

L — ClassName reference an instance of class ClassName

You may have noted the signature letter if you printed out the result of toString() method for a generic object.

Project Valhalla introduces a new letter for a new kind of types:

Q — ClassName inline types

These new types will eventually replace the primitive types in our applications, removing current boxed types (Integer and such) and bringing a new seamless world based on Reference and Inline types only.

This is still in the future. For the moment, let's enjoy what is already available in the current early-access build.

Getting Started

To simplify the set-up, the Project Valhalla team released an Early-Access Build on August 30, 2019.

You can download it here and configure it in JAVA_HOME as a normal JDK (version 14): http://jdk.java.net/valhalla/

Unfortunately, at the moment (December 2019), IntelliJ and Gradle are not able to compile a source with the new features of the language. We need to use the JDK command-line tools or ant.

I shared on GitHub a repository with several examples and the ant build scripts needed to run them.

A Valhalla Point

Let's start with something very simple. Let's define a Point type with two fields for the coordinates.

Java




x


 
1
inline public class Point {
2
    public int x;
3
    public int y;
4
 
          
5
    public Point(int x, int y) {
6
        this.x = x;
7
        this.y = y;
8
    }
9
}



The only change is the inline keyword on line 1. Point is an Inline Type, and now, we will look at the differences with a normal class.

Code Like a Class

  • It can be created using new
  • As you can see, you can declare fields, constructors, and methods like a normal class
  • It can implement an Interface, overriding the methods.
  • It can be used by generics and can have generic parameters

Work Like an Int

  • The typePoint is immutable. The fields x and y are automatically treated as final. We can decide if we want them private or public but we cannot change them.
  • Point has automatically generated methods equals, toString, and hashcode. We can test its equality using the == operator like int and double.
  • Point has no null value. If you want to represent a null Point, you need to use the new "inline widening," which is a kind of boxing on steroids.
  • Point instances are not allocated separately from their references; they are represented directly in memory, like int now.
  • Point has a default value that corresponds to Point(0,0). It can be created with Point.default.
  • Point cannot inherit from another type and nothing can inherit from it.

Work in Progress

Some characteristics are still unfinished and they may not work in the current build or change how they work in future releases. In particular:

  • It is not possible to use synchronization operations on Inline Types. So no synchronized methods, locks, wait/notify.
  • Reflection: It's not possible to distinguish them from reference objects. There will be some special interface for this.
  • Thread-safe: It's currently not possible to have a volatile behavior and to use them in Atomic operations.

For a more comprehensive look at Project Valhalla features, see the articles linked at the end of this post. Below, we will concentrate on the performance aspect of these features.

Compact Arrays

One of the most exciting features of the Inline Classes is the way in which arrays are created.

In the case of primitives, each position of the array has the direct representation of the type; for example, the 64 bits of a long in case of an array of long.

In the case of the referenced object instead, the array will only contain the reference to the object allocated memory on the heap.

This means that reading an object from an array involves first reading the reference and then fetching the memory from its actual location. If the object has referenced fields, those would also cause data to be fetched from a remote memory location.

Inline Types on the other side work like primitives.

To have a taste of how much this matters, let's make a simple test program to sum all trades from a given account in a big array.

Here is our trade's simple representation, with an amount, an account, and a traded security.

Java




xxxxxxxxxx
1
26


1
public class TradeRef {
2
    public final double amount;
3
    public final String account;
4
    public final String security;
5
 
          
6
    public TradeRef(double amount, String account, String security) {
7
        this.amount = amount;
8
        this.account = account;
9
        this.security = security;
10
    }
11
 
          
12
    @Override
13
    public boolean equals(Object o) {
14
        if (this == o) return true;
15
        if (o == null || getClass() != o.getClass()) return false;
16
        TradeRefEncoded that = (TradeRefEncoded) o;
17
        return Double.compare(that.amount, amount) == 0 &&
18
                account == that.account &&
19
                security == that.security;
20
    }
21
 
          
22
    @Override
23
    public int hashCode() {
24
        return Objects.hash(amount, account, security);
25
    }
26
}



Similarly, we will define a TradeInline class, identical with the inline modifier:

Java




xxxxxxxxxx
1
11


1
inline public class TradeInline {
2
    final double amount;
3
    final String account;
4
    final String security;
5
 
          
6
    public TradeInline(double amount, String account, String security) {
7
        this.amount = amount;
8
        this.account = account;
9
        this.security = security;
10
    }
11
}



After the compilation, we can use javap from the command line to print the generated ByteCode:

Java




xxxxxxxxxx
1


1
javap -s antbuild/com/ubertob/ministring/TradeInline.class



And this is the output:

Java




xxxxxxxxxx
1
19


1
public final value class com.ubertob.ministring.TradeInline {
2
  final double amount;
3
    descriptor: D
4
  final java.lang.String account;
5
    descriptor: Ljava/lang/String;
6
  final java.lang.String security;
7
    descriptor: Ljava/lang/String;
8
  public final int hashCode();
9
    descriptor: ()I
10
 
          
11
  public final boolean equals(java.lang.Object);
12
    descriptor: (Ljava/lang/Object;)Z
13
 
          
14
  public final java.lang.String toString();
15
    descriptor: ()Ljava/lang/String;
16
 
          
17
  public static com.ubertob.ministring.TradeInline com.ubertob.ministring.TradeInline(double, java.lang.String, java.lang.String);
18
    descriptor: (DLjava/lang/String;Ljava/lang/String;)Qcom/ubertob/ministring/TradeInline;
19
}



Note the value modifier in the first line and the methods hashCode, equals, and toString that are not present in our class.

Encoding a String Into a Long

Our TradeInline class has still two String inside as fields. Can we inline them?

Unfortunately, at the moment, it is not possible to flatten String and Arrays inside an Inline type. There are future plans for something called Array 2.0 that would allow for that.

What we can do now is squeeze a String inside a long type. Although, we need to accept some limitations.

The long type has 64 bits, so using a base 64 encoding (6 bits), we can store up to 10 characters inside.

In our case, we assume that the Security and the Account have a maximum length of 10 characters and they are composed only of uppercase letters, numbers, and a limited number of special characters.

This is the code of our MiniString inline type, with the encoding and decoding static functions:

Java




x


1
inline public class MiniString {
2
 
          
3
    long raw;
4
 
          
5
    public MiniString(String str) {
6
        raw = encode(str);
7
    }
8
 
          
9
    public String get() {
10
        return decode(raw);
11
    }
12
 
          
13
    public static final int MAX_MINISTR_LEN = 10;
14
    public static final int MINI_STR_BASE = 64;
15
    public static final String letters = "=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 _-!?.$&%@#:[]{}()*<>:;',/^";
16
 
          
17
    public static long encode(String str) {
18
        String prepared = prepareString(str);
19
        long encoded = 0;
20
        for (char c : prepared.toCharArray()) {
21
            int x = letters.indexOf(c);
22
            encoded = encoded * MINI_STR_BASE + x;
23
        }
24
        return encoded;
25
    }
26
 
          
27
    private static String prepareString(String str) {
28
        StringBuilder prepared = new StringBuilder();
29
        for (char c : str.toCharArray()) {
30
            if (letters.indexOf(c) >= 0)
31
                prepared.append(c);
32
            if (prepared.length() > MAX_MINISTR_LEN)
33
                break;
34
        }
35
        return prepared.toString();
36
    }
37
 
          
38
    public static String decode(long number) {
39
        StringBuilder decoded = new StringBuilder();
40
        long remaining = number;
41
 
          
42
        while (true) {
43
            int mod = (int) ( remaining % MINI_STR_BASE);
44
            char c = letters.charAt(mod);
45
            decoded.insert(0, c);
46
            if ( remaining < MINI_STR_BASE)
47
                break;
48
            remaining = remaining / MINI_STR_BASE;
49
        }
50
        return decoded.toString();
51
    }
52
}
74
  public static com.ubertob.ministring.TradeMiniString com.ubertob.ministring.TradeMiniString(double, java.lang.String, java.lang.String);



We will define a TradeMiniString inline type using our new MiniString for Account and Security.

This is the generated ByteCode:

Java




xxxxxxxxxx
1
19


 
1
public final value class com.ubertob.ministring.TradeMiniString {
2
  final double amount;
3
    descriptor: D
4
  final com.ubertob.ministring.MiniString account;
5
    descriptor: Qcom/ubertob/ministring/MiniString;
6
  final com.ubertob.ministring.MiniString security;
7
    descriptor: Qcom/ubertob/ministring/MiniString;
8
  public final int hashCode();
9
    descriptor: ()I
10
 
11
  public final boolean equals(java.lang.Object);
12
    descriptor: (Ljava/lang/Object;)Z
13
 
14
  public final java.lang.String toString();
15
    descriptor: ()Ljava/lang/String;
16
 
17
  public static com.ubertob.ministring.TradeMiniString com.ubertob.ministring.TradeMiniString(double, java.lang.String, java.lang.String);
18
    descriptor: (DLjava/lang/String;Ljava/lang/String;)Qcom/ubertob/ministring/TradeMiniString;
19
}



Note how the descriptor for the MiniString account starts with the Q of inline types and not with the L of the reference types.

Finally, to have a fair comparison, we also define a TradeRefEncoded class using the same trick to encode an Account and Security in a field of type long.

You can find the complete code in my GitHub repository.

Before looking at the performance, let's look at how our Trade objects are using the memory.

First, the standard TradeRef object:

On the left, there is our big array. Then a pointer to theTradeRef object, and then two other pointers to the strings.

Depending on the order of creation, these can be quite far apart in our heap.

This is a performance problem because fetching memory from different regions is one of the slowest operations for a CPU.

This classic post from Martin Thomson is one of the better explanations:
https://mechanical-sympathy.blogspot.com/2012/08/memory-access-patterns-are-important.html

Let's look at a simple diagram of the  InlineTrade value memory representation now:

Here, the whole value is directly on the array. For this reason, the array is actually bigger than in the first case. If you don't allocate all objects, in other words, if your array mostly contains null values, this can be a drawback.

Finally, let's see how the TradeMiniString is allocated in the memory:

You can see that it stays completely inside the array element, with no external pointers.

This is possible only because we accept big limitations on what can be contained in those strings; still, it's an acceptable compromise when using strings only for storing ID from a database or the stock ticker symbols.

Performance Comparison

To compare the performance, we can create four arrays of 5 million elements, one for each of our trade types:

Java




xxxxxxxxxx
1


1
 static final int arraySize = 5_000_000;
2
 
          
3
    public TradeRef[] tradeRefs = new TradeRef[arraySize];
4
    public TradeRefEncoded[] tradesRefEncoded = new TradeRefEncoded[arraySize];
5
    public TradeInline[] tradesInline = new TradeInline[arraySize];
6
    public TradeMiniString[] tradesMiniString = new TradeMiniString[arraySize];
7
 
          



Then, we fill them with identical random values:

Java




xxxxxxxxxx
1
16


1
public static void generatingTradesAndBrowing() {
2
 
          
3
        var tr = new TradeRepository();
4
        tr.fillWithRandomData();
5
 
          
6
        var searcherRef = new TradeRefBrowser(tr.tradeRefs);
7
        var searcherInline = new InlineTradeBrowser(tr.tradesInline);
8
        var searcherRefEncoded = new TradeRefEncodedBrowser(tr.tradesRefEncoded);
9
        var searcherMiniString = new MiniStringTradeBrowser(tr.tradesMiniString);
10
 
          
11
        var account = tr.tradeRefs[1000].account;
12
 
          
13
        while (true) {
14
            benchmarks(searcherRef, searcherInline, searcherRefEncoded, searcherMiniString, account);
15
        }
16
    }



And finally, we do searches on each one repeatedly, printing out the time elapsed.

Java




x


 
1
private static void benchmarks(TradeRefBrowser searcherRef, InlineTradeBrowser searcherInline, TradeRefEncodedBrowser searcherRefEncoded, MiniStringTradeBrowser searcherMiniString, String account) {
2
        cronoSum(() -&gt; searcherRef.sumByAccountFor(account), "Ref with for");
3
        cronoSum(() -&gt; searcherRef.sumByAccountStream(account), "Ref with stream");
4
 
          
5
        cronoSum(() -&gt; searcherRefEncoded.sumByAccountFor(account), "RefEncoded with for");
6
        cronoSum(() -&gt; searcherRefEncoded.sumByAccountStream(account), "RefEncoded with stream");
7
 
          
8
        cronoSum(() -&gt; searcherInline.sumByAccountFor(account), "Inline with for");
9
        cronoSum(() -&gt; searcherMiniString.sumByAccountFor(account), "MiniString with for");
10
    }
11
 
          



We don't care much about specific numbers here, so we just keep looping and print the results. The timings tend to stabilize after a few minutes when the Java Hotspot compiler optimizes and inlines most of the methods.

We can filter and sum using two methods: a for loop or a nicer stream map and reduce:

Java




xxxxxxxxxx
1
16


 
1
    public double sumByAccountStream(String account){
2
        return Arrays.stream(repo)
3
                .filter( trade -&gt; trade.account.equals(account))
4
                .map(trade -&gt; trade.amount)
5
                .reduce(0.0, (a, b) -&gt; a+b);
6
    }
7
 
          
8
    public double sumByAccountFor(String account){
9
        double res = 0;
10
        for (int i = 0; i < repo.length; i++) {
11
            TradeRef tradeRef = repo[i];
12
            if (tradeRef.account.equals(account))
13
                res = res + tradeRef.amount;
14
        }
15
        return res;
16
    }



Unfortunately, streams are not (yet) working with Inline Classes:

Java




xxxxxxxxxx
1
12


1
Exception in thread "main" java.lang.ClassFormatError: Illegal class name "Qcom/ubertob/ministring/TradeMiniString;" in class file <Unknown&gt;
2
        at java.base/jdk.internal.misc.Unsafe.defineAnonymousClass0(Native Method)
3
        at java.base/jdk.internal.misc.Unsafe.defineAnonymousClass(Unsafe.java:1345)
4
        at java.base/java.lang.invoke.InnerClassLambdaMetafactory.spinInnerClass(InnerClassLambdaMetafactory.java:324)
5
        at java.base/java.lang.invoke.InnerClassLambdaMetafactory.buildCallSite(InnerClassLambdaMetafactory.java:192)
6
        at java.base/java.lang.invoke.LambdaMetafactory.metafactory(LambdaMetafactory.java:329)
7
        at java.base/java.lang.invoke.BootstrapMethodInvoker.invoke(BootstrapMethodInvoker.java:127)
8
        at java.base/java.lang.invoke.CallSite.makeSite(CallSite.java:307)
9
        at java.base/java.lang.invoke.MethodHandleNatives.linkCallSiteImpl(MethodHandleNatives.java:259)
10
        at java.base/java.lang.invoke.MethodHandleNatives.linkCallSite(MethodHandleNatives.java:249)
11
        at com.ubertob.ministring.MiniStringTradeBrowser.sumByAccountStream(Unknown Source)
12
 
          



So, for the moment, we can only measure performance using for loops.

Fancy Graphs (Finally)

The actual numbers are not very important here, so I just grabbed a significative sample running the application on my Linux laptop.

Rather than the numbers, we are interested in how fast they are relative to each other.

When it comes to the time in millisec, the shorter the better

We can see here how in this case the improvement is really huge — inline TradeMiniString is more than 20x faster!

Event compared with the TradeRefEncoded is still 6x faster!

This seems almost too good to be true, so we can use a better method to measure the actual performance.

Brendan Gregg wrote some very useful tools to use Linux kernel perf tool to profile Java programs.

There are several blog posts on how to produce and how to read flame graphs:
https://medium.com/netflix-techblog/java-in-flames-e763b3d32166
.

And here are our flames:

You can find the actual svg file in the GitHub repository here.

We can see four green blocks (including the one with the arrow) that are a representation of how much time these four methods used the CPU. In this graph, the green blocks are Java calls, while the red ones are kernel system calls.

The spikes are other methods, related to the print out of results and garbage collection. We can ignore them.

From the left, the first block is the InlineTrade, using normal strings. It is clearly smaller than the last two blocks which represent the TradeRef and TradeRefEncoded.

As you may have guessed, the block pointed by the arrow is the TradeMiniString duration; it is so brief that you can read only a few characters of the name.

The relative duration of each method from the perf profile is reported in this graph:

When it comes to the number of performance samples, the smaller the better!

The relative timings are similar, Inline is 6 times faster than Ref and Inline Encoded is 5.5 times faster than Ref Encoded.

Conclusions

It is still too early for precise measurements, but it is already clear that Valhalla inline types have the potential to bring a massive performance boost for a certain type of critical applications.

There are still many rough edges and painful decisions to make but the overall shape of the Valhalla project is very exciting.

Since the last EAB, there have already been many improvements in the source code. So let's hope to have another stable build to test soon with more features.

This post was originally published as part of the Java Advent series. If you like it, please spread the word by sharing it on Twitter, Facebook, etc.!

Want to write for the Java Advent blog? We are looking for contributors to fill all 24 slots and would love to have your contribution! Contact the Java Advent Admin at contribute@javaadvent.com!

To Learn More...

The full code of this and other examples:
https://github.com/uberto/testValueTypes

A Recent video about Valhalla at Devoxx BE
https://devoxx.be/talk/?id=41660

Brian Goetz on the State of Valhalla
http://cr.openjdk.java.net/~briangoetz/valhalla/sov/02-object-model.html

An in-depth explanation of Inline Types from Ben Evans:
https://www.infoq.com/articles/inline-classes-java

My post on the previous early-access build:
https://medium.com/@ramtop/a-taste-of-value-types-1a8a136fcfe2

Topics:
inline, inline types, java, java performance, jvm advent, performance, project vahalla

Published at DZone with permission of Uberto Barbini , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}