Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Scaling Benchmarks With More Robust UseNUMA Flag in OpenJDK

DZone 's Guide to

Scaling Benchmarks With More Robust UseNUMA Flag in OpenJDK

Want to learn how to scale benchmarks with the UseNUMA flag in the OpenJDK.

· Performance Zone ·
Free Resource

What happens when you run a Java application without checking your hardware configuration? Obviously, your application lags in terms of performance. For small applications, you need not worry, but for applications that require larger memory (in GB's), you need to take care of the configurations; otherwise, your application can suffer a lot.

What Is NUMA?

Non-Uniform Memory Access, also called NUMA, is a configuration of processor and memory such that some cluster of cores are near to its memory and memory is local to those cores. 

What Is a NUMA Node?

The cluster of memory and processors are called NUMA nodes; for example, there are four NUMA nodes on this Linux machine.

numactl -H

available: 4 nodes (0-3)
node 0 cpus: 0 2 4 6 8 10 12 14
node 0 size: 65501 MB
node 0 free: 8971 MB
node 1 cpus: 16 18 20 22 24 26 28 30
node 1 size: 65536 MB
node 1 free: 34057 MB
node 2 cpus: 1 3 5 7 9 11 13 15
node 2 size: 65536 MB
node 2 free: 53761 MB
node 3 cpus: 17 19 21 23 25 27 29 31
node 3 size: 65519 MB
node 3 free: 30209 MB
node distances:
node 0 1 2 3
0: 10 16 16 16
1: 16 10 16 16
2: 16 16 10 16
3: 16 16 16 10


JVM allocates the memory to the objects very wisely, as most of the objects die young. The JVM divides the heap into two regions called Young Gen and Old Gen. The first time the objects are created in Young Gen where all objects survive after a certain number of GC's are moved to Old Gen. Further, the Young Gen is divided in Eden and two survivor spaces called from and to. Lgrps are created inside Eden space based on the number of NUMA nodes of your machine. If the user enables the UseNUMA flag, when we don’t bind our Java application to any NUMA node, it creates a number of lgroups based on default NUMA nodes on the system.

Recently, I resolved a bug in the JDK while running a Java application on a NUMA-aware machine. The bug has been resolved in JDK 11.

Bug-ID: https://bugs.openjdk.java.net/browse/JDK-8189922

Resolution: http://hg.openjdk.java.net/jdk/client/rev/50eb2c0f252b

After collecting the GC logs with debug mode, it becomes clear how the heap is getting used for NUMA nodes and how the number of lgrps (heap space on a particular node) are getting created, so if you bind your application like this:

numactl --cpunodebind=0 --membind=0 <java-app>


Here, the Java application is going to create a number of lgrps based on the number of NUMA nodes on your system, which is not correct and should not create any lgrps and ideally disable UseNUMA feature if the application is bound to single NUMA node.

The below image explains the internal structure of the JVM heap:

Image titleA similar case is there for two NUMA nodes and the creations of lgrps should be done accordingly.

If you are running a single Java application on a multi-node Java system, then enabling flag UseNUMA is advised as it helps to assign heap space according to NUMA nodes.

It’s not good practice to use the UseNUMA flag with a single NUMA node as, internally, the JVM degrades the application performance, but JDK 11 onwards, the flag automatically gets disabled if used on a single NUMA node. The UseNUMA flag is designed to use for multi-NUMA node systems, but until JDK 10, the use of UseNUMA flag with membind had issues. Recently, however, the issue was resolved. With JDK 11 onwards, the UseNUMA flag can be used to bind the number of nodes according to the requirement, so you can run your application on the specific NUMA nodes where you want them to run.

Topics:
java ,performanace ,openjdk ,jdk11 ,numa ,flag ,benchmark ,nodes ,lag ,jdk

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}