What's Wrong With Java Boxed Numbers?
What's wrong with boxed numbers in Java? Check out this post to learn more about boxed numbers and how they can waste memory and create GC pressure for your applications.
Join the DZone community and get the full member experience.
Join For FreeIn Java, boxed numbers are instances of classes, such as java.lang.Integer
or java.lang.Double
, that wrap or "box" the respective primitive types: int
, double
, etc. They were designed to allow Java apps to pass around numbers as objects and, more importantly, to store numbers in the common collections, such as java.util.ArrayList
, java.util.HashMap
, etc. The need to store numbers in lists and maps is very common. To satisfy it, the JDK developers had two choices:
Provide specialized collections, i.e. lists and maps, for every primitive type and their combinations. For example, this could include
IntArrayList
,ObjectToDoubleHashMap
,IntToObjectLinkedHashMap
,IntToLongConcurrentHashMap
, etc.Provide a way to reuse the existing collections for numbers
Choosing the second option prevented the explosion of collection classes, saved JDK maintainers some extra work, and made the life of developers somewhat simpler with fewer APIs to remember. However, this also created one problem — can you guess what it is? It's the memory required to store boxed numbers. Compared with the respective primitive types, boxed numbers use much more memory.
To illustrate this point, let's run a quick experiment. The simple program below creates a large array of long
numbers and goes into sleep:
public class BoxedNumMemory {
private static final int NUM_NUMS = 10 * 1000 * 1000;
private static long[] nums = new long[NUM_NUMS];
public static void main(String args[]) throws InterruptedException {
for (int i = 0; i < NUM_NUMS; i++) {
nums[i] = (long) i;
}
System.out.println("Initialized array; going to sleep...");
Thread.sleep(1000000000);
}
}
Compile this program and start it. Then, in a separate console window, invoke the jps
JDK utility to determine the PID of the JVM running this app, and then invoke:
jmap -histo:live <BoxedNumMemory JVM pid>
The above command will attach to our JVM, scan its heap and print a histogram of all live objects — that is, how much memory is taken by all instances of each class. If you use Oracle JDK and the HotSpot JVM, as most of us currently do, your output will look like this:
num #instances #bytes class name
----------------------------------------------
1: 2 80000064 [J
.... long list of other classes, that take much less memory...
Total 4574 80285136
As you can see, most of the memory is taken by our single long[]
array (J
is the historical internal JVM name for the long
type; the second small array comes from JVM internals). Each array element takes eight bytes, as expected.
Now, replace two characters in one line of this program so that instead of a primitive array, it creates an array of boxed numbers:
private static Long[] nums = new Long[NUM_NUMS];
Recompile the program, rerun it (notice that it now takes more time to initialize the array), and obtain the object histogram again. You will see something like this:
num #instances #bytes class name
----------------------------------------------
1: 10000128 240003072 java.lang.Long
2: 2 40001056 [Ljava.lang.Long;
.....
Total 10004710 280288880
It turns out that now our program uses 3.5 times more memory! There are 10 million java.lang.Long
objects now (128 extra objects again come from the JVM internals), and they take up most of the heap. To be fair, our big array now takes half the memory, because it became an array of object references and each reference takes four bytes (or eight bytes if your maximum heap size is above 32GB). However, the savings are small compared to the losses.
Simple division of the above numbers suggests that the size of one java.lang.Long
object is 24 bytes. If it wraps a single eight-byte long number, why is it so big?
The general answer to that is because of a "fixed per-object memory overhead in the JVM." The HotSpot JVM (and most other JVMs) have to make tradeoffs to support virtual method invocation, garbage collection, and object locking (by the latter, we mean that the language specification allows every object to be used as an argument of the synchronized
statement). Each of these mechanisms requires that some extra information is stored in memory per every Java object. Namely, a pointer from the object to its class is needed for virtual methods and GC, and some extra per-object bookkeeping bits are needed for GC and locking. To store all this information, the HotSpot VM uses the so-called object header. It takes 12 bytes per object when the maximum heap size is below 32GB, and 16 bytes otherwise.
12 bytes for an object header plus eight bytes for the primitive long number gives us 20 bytes — so, why dojava.lang.Long
instances actually use 24 bytes? This is a consequence of another tradeoff. It was made by the HotSpot VM developers to allow Java applications to run with heaps bigger than 4GB while utilizing short, economical four-byte pointers for object references.
Here is how it works. Four bytes make 32 bits; 32 bits allow us to encode numbers in the 0 ... (4*1024*1024*1024 - 1) range. That means that normally with a four-byte pointer, we can only address ~4 billion bytes, or 4GB. However, HotSpot developers came up with a smart trick: the JVM, by default, multiplies each pointer value by 8. Thus, for pointers with values 0, 1, etc., real memory addresses become 0, 8, 16, ... This is called eight-byte object alignment, and it means that with short ("narrow"), four-byte pointers, the JVM can now run with a big 32GB heap instead of 4GB! Overall, this is a very good solution, but it has one caveat: the effective size of each Java object becomes proportional to eight bytes. For each object with a real size of 20 or 28 bytes, an extra four bytes of memory are simply wasted. For big objects, the relative amount of waste is small, but for boxed numbers, it's noticeable.
The bottom line is that the combined overhead of object header, object alignment, and the fact that each live boxed number object needs a pointer to it, means that, depending on the type, a boxed number object requires 3-5 times more memory than the respective primitive type. This, by the way, is true for other small objects as well. That's really bad news if your application heavily relies on boxed numbers.
Fortunately, for many applications, this is not the case. If your app employs just a few hash maps with a few hundred keys or values that are boxed numbers, in most cases, you shouldn't worry. However, it's difficult to estimate the actual size and memory consumption of every data structure in every scenario. So, if you know that your app uses boxed numbers, but you are not sure how much memory it costs you, how can you find out?
The answer is: use a memory analysis tool. The simplest way to check how much memory in your app is consumed by boxed numbers is to use jmap -histo
as shown above. However, the object histogram will not tell you where the boxed numbers "come from," i.e. what data structures store and manage them and how much memory is wasted by each individual structure. The best way to obtain this information is to take a heap dump and analyze it.
A heap dump is essentially a full snapshot of the running JVM's heap. It can be either taken at an arbitrary moment by invoking the jmap
utility, or the JVM can be configured to produce it automatically if it fails with OutOfMemoryError
. If you Google "JVM heap dump," you will immediately see a bunch of relevant articles on this subject.
A heap dump is a binary file of about the size of your JVM's heap, so it can only be read and analyzed with special tools. There is a number of such tools available, both open-source and commercial. The most popular open-source tool is Eclipse MAT; there is also VisualVM and some less-powerful, lesser-known tools. The commercial tools include the general-purpose Java profilers: JProfiler and YourKit, as well as one tool built specifically for heap dump analysis called JXRay.
Unlike most other tools, JXRay analyzes a heap dump right away for a large number of common problems, such as duplicate strings and other objects, suboptimal data structures, and, yes, boxed numbers. The tool generates a report with all the collected information in the HTML format. The advantage of this approach is that you can view the results of analysis anywhere at any time and share it with others easily. It also means that you can run the tool on any machine, including big and powerful, but "headless" machines in a data center.
JXRay calculates the overhead (how much memory you would save if you get rid of a particular problem) in bytes and as a percentage of the used heap. For boxed numbers, the overhead is calculated as the amount of memory that you would save if you replaced every boxed number with a plain primitive number. JXRay groups together all objects that have the same problem and are reachable via the same reference chain, e.g. collections or arrays holding boxed numbers, and then objects referencing them all the way up to the GC root, as in the example below:
Knowing which data structures are responsible for the biggest portions of memory waste allows you to quickly and precisely pinpoint the code that causes the problem and then make the necessary changes.
If boxed numbers are kept in standard Java collections, such as java.util.ArrayList
or java.util.HashMap
, what is the best way to get rid of the associated memory waste? It turns out that some third-party libraries are available that provide a wide range of specialized collections for storing numbers directly. The author's favorite library is fastutil; others are GNU Trove and Koloboke. Sometimes, it is sufficient to simply replace a collection such as HashMap<String, Integer>
with Object2IntOpenHashMap<String>
, and recompile your source code. In other situations, like when a single Object[]
array contains a mix of boxed number objects of different types, you may need to perform a more serious redesign of your application.
In summary, boxed numbers are okay if used in a few insignificant parts of an application, but they can waste memory and create GC pressure if big and important data structures rely on them. The best way to measure the impact of boxed numbers on your app's memory is to obtain a heap dump and use a tool like JXRay to analyze it. If you find that boxed numbers are a problem, it's often easy to get rid of them by switching from standard JDK collections to specialized third-party libraries. But, occasionally, you may need to make deeper changes to your code.
Opinions expressed by DZone contributors are their own.
Comments