Native Memory Allocation in Java
Native Memory Allocation in Java
In this article, we discuss how to better manage native memory within your containers to avoid common pitfalls.
Join the DZone community and get the full member experience.Join For Free
From time to time, we run into a memory problem that is not related to Java Heap but Native Memory. Let's imagine the situation where we have a running container that is restarted once per day. We look at Prometheus/Grafana to check the reason behind this issue. However, we don't spot any problem with Java Heap size (exported via JMX) and start blaming our container scheduler to do any inconvenient operations with our containers :). The real problem can be hidden a bit deeper — in Native Memory.
Do you use any native memory allocator in your code or do you use any library that stores its internal state outside the Java Heap? It could be whatever third-party code that tries to minimize GC, such as I/O libraries or databases/caches keeping data in memory inside your process.
In this article, I will go through some basics: how the native-memory is allocated in those libraries and mainly how to notice that something in your code started allocating native memory that could lead to killing your container.
How to Allocate the Native Memory
For a long time, the only efficient way to allocate memory from Java was using
sun.misc.Unsafe. From the very beginning, Java architects warn that this library is only for internal use-cases, and there is no intention to keep it backward compatible as an official API. However, developers wanted the best for their libraries and kept using Unsafe in their code. The big bang came with Java 9 and an introduction to the Java Module System. Unsafe was automatically put inside the
jdk.unsupported module, and it was announced that the module will become visible for external code depending on it for some time. But, the intention is to provide an official API, and the libraries will be forced to migrate to it.
Another way to allocate native memory is by ByteBuffer. There are two implementations:
HeapByteBuffer keeps data inside a byte array allocated on the heap,
DirectByteBuffer is backed by native memory and is primarily used for transferring data between JVM and Kernel. However, some libraries doing I/O implemented their own libraries for dealing with the native memory, such as Netty (ByteBuf — maven) or Aeron (subproject Agrona), especially because of two reasons: a custom collecting of unused memory (Netty uses an implementation of manual reference-counting collector) and a more convenient API.
You can imagine, there was no reason for the library's owners to migrate from Unsafe. Therefore, Project Panama, dealing with an Interconnection of JVM and Native Code, came with something new and shiny. The first output of this project was JEP 370: Foreign Memory Access API, which provided an alternative way to access native memory that was supported by a JIT Compiler to optimize it as much as possible to be close to the efficiency of Unsafe. As an addition, it also brought a new API for designing a native memory layout to avoid "counting" addresses and offsets manually and provide an explicit way of deallocating the memory in the source code (e.g. using try-with-resources). It's available as an incubator in Java 14. Feel free to try it out and give feedback!
Go Straight to the Example
After a short introduction, let's use the simplest way to allocate the native memory, and let's monitor what it actually does.
Let's discuss one stage after another to see what's going on there.
Stage 1: Accessing Unsafe
First, this is just a hack to access
Unsafe without throwing a
SecurityException when you try to access it via
Unsafe.getUnsafe(). Second, we printed Page Size. The Page is a fixed-length addressable block of virtual memory that can be mapped to physical memory (whatever it is, RAM or swapped out memory to disk).
Stage 2: Memory Allocation
Unsafe.allocateMemory, we allocate a 50MB block of memory and receive a long value representing the address in the virtual memory. Let's compile the code and try it out.
We can see that our current PAGE_SIZE value is 4kB. On Linux, we can check the value using:
That means that we are able to allocate memory using 4kB blocks, but if you want to adjust this value according to your needs, feel free to look for a term, Hugepages, that could help you to allocate bigger chunks of memory and reduce the overhead of Minor Page Faults that are thrown if the virtual memory page is not backed by physical memory. It's up to MMU (Memory Management Unit) to find a proper place in physical memory for our allocated block of memory. On the other hand, in case of inappropriate usage of Hugepages, it could lead to significant false sharing and wasting your memory.
Use your favorite tool to examine running processes (I can recommend htop) and look at the RES (Resident Set Size) value that reports the current size of taken physical memory by the process (anonymous pages, memory-mapped files using mmap, shared libraries). However, it does not contain the memory that was swapped out to disk.
You can notice that in terms of the resident memory, there was no change from the previous step. Basically, the memory allocation has no impact on physical memory, it's just virtual memory that is not backed by any physical memory.
Stage 3: Memory Touching
What happened when we pressed "enter" again and moved to Stage 3?
We touched all allocated virtual pages from Stage 2. We iterated from the obtained memory address using the current page size and put a single byte to every page. That means that every write triggered the Minor Page Fault, and MMU needed to find free slots in the physical memory and mapped 4kB slot from virtual memory to the 4kB slot from physical memory.
We can notice two significant changes in our memory report. RES belonging to our process finally grew up by ~50MB and we also increased MINFLT value which stands for Minor Pages Faults. MAJFLT stayed absolutely the same and equal to zero. That means that our process didn't access any memory address that would cause Major Page Fault (a memory block that is not available in the physical memory because has been already swapped out to disk and now needs to be swapped in back to RAM to make it available for our process again)
Stage 4: Memory De-allocation
Our circle is closed. We called
Unsafe.freeMemory with the original memory address and releases it from the physical mapping. We can notice that RES value went back by ~50MB.
That was a quick introduction to how Unsafe allocation and de-allocation works and what it does with our monitoring tools (top/htop in my case). There is one more trick worth noticing: Native Memory Tracking.
Help From Native Memory Tracking
Imagine that we monitor an application as a black box, and we don't know what the application actually does in detail (no source code). The previous example helped us to determine that some kind of memory is growing inside of our process, but we don't know what kind of memory. Is it a memory leak in our Java Heap? Is it some kind of bigger native memory block for our internal cache or memory-mapped file?
While examining Java Heap memory is pretty simple, we just need to have a tool that is able to consume JMX values exported from our process, as digging into the native memory is a bit complicated. Let's do an example of what we are able to extract from JVM. Trigger our application with an additional flag that will start the Native Memory Tracking.
Now, execute a very simple JCMD command:
We obtained all the information about current native memory allocation from all JVM areas. It's out of the scope of this article to go through all of this. But if you want to know more, I can recommend this article.
Let's make a baseline in Stage 1:
Then hit "enter", move to Stage 2 (Memory allocation), and create a diff between those two stages:
We can see the section Other containing allocated native memory by Unsafe and additional info:
- The current value of allocated bytes,
- A change between the current value and the baseline
- #1 number of allocations.
What Should We Expect From a Container
We're now at our last example, and it regards a Docker container. How does the native memory behaves inside our container? What is the impact if we try to allocate more than we configured during the container startup?
Let's modify our code a bit and make an aggressive killer from it. The application starts allocating 50MB every 500millis and automatically touches all the allocated pages to reflect that in RSS memory.
Execute the code inside the container using this simple command.
You can notice that we didn't limit the container with any parameters, and it results in the uncontrolled growth of the host's memory and swapping out the physical memory to a disk to withstand the allocation rate.
What if we limit the size of our container? What's going to be its behavior afterward?
We limited our container to 300MB and ran the same test again.
We can see that the memory is kept on 300MB RSS. However, according to a log statement, "ALLOCATED AND TOUCHED: 19", we did 19 iterations, which equates to ~950MB of allocated and touched memory. In the snippet, it's very clear that the OS hides all memory above 300MB in the swap memory.
Let's disable swapping and see how quickly we fail with our memory allocation program.
This is what we expected. We did 5 iterations, reached 300MB, and failed. Be aware of this behavior on production and design your Java process (native + heap memory) to respect this aspect. Always check the configuration passed to your container. Disabling swapping leads to a better separation of processes. The application behavior is more transparent, and we need to work with clearly defined space.
Your container scheduler (Kubernetes, Mesos, Nomad, ..) has very likely disabled swapping.
You can also check the Docker stats tool that monitors currently running containers on a host. You can also notice some differences between the Resident Set Size and memory reported by docker stats. The reason is that RSS contains all physical memory regarding this process, even shared libraries that could be counted multiple times (for multiple processes). Docker stats tool has exactly this description (according to official configuration):
"On Linux, the Docker CLI reports memory usage by subtracting page cache usage from the total memory usage. The API does not perform such a calculation but rather provides the total memory usage and the amount from the page cache so that clients can use the data as needed."
Play around, try out other use-cases, let us know about your findings and tools you usually use. Thank you for reading my article and please leave comments below. If you would like to be notified about new posts, then start following me on Twitter!
Opinions expressed by DZone contributors are their own.