DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • How Java Apps Litter Beyond the Heap
  • Troubleshooting Memory Leaks With Heap Profilers
  • Java Memory Management
  • Java, Spring Boot, and MongoDB: Performance Analysis and Improvements

Trending

  • AI-Based Threat Detection in Cloud Security
  • Beyond Code Coverage: A Risk-Driven Revolution in Software Testing With Machine Learning
  • FIPS 140-3: The Security Standard That Protects Our Federal Data
  • Revolutionizing Financial Monitoring: Building a Team Dashboard With OpenObserve
  1. DZone
  2. Data Engineering
  3. Data
  4. How to Prevent Your Java Collections From Wasting Memory

How to Prevent Your Java Collections From Wasting Memory

Wondering about the memory impact of your Java collections? Here's how to think about your collections while keeping overhead in mind.

By 
Misha Dmitriev user avatar
Misha Dmitriev
·
Jun. 04, 18 · Tutorial
Likes (32)
Comment
Save
Tweet
Share
100.4K Views

Join the DZone community and get the full member experience.

Join For Free

JDK collections are the standard library implementations of lists and maps. If you look at a memory snapshot of a typical big Java app, you will see thousands or even millions of instances of java.util.ArrayList, java.util.HashMap, etc. Collections are indispensable for in-memory data storage and manipulation. But did you ever consider whether all of the collections in your app use memory optimally? To put it differently: if your Java application crashed with the infamous OutOfMemoryError or experienced long GC pauses — did you check its collections for memory waste? If your answer is "no" or "not sure", then read on.

First, note that the internals of JDK collections are not magic. They are written in Java. Their source code comes with the JDK, so you can open it in your IDE. It can also be easily found on the web. And, as it turns out, most of the collections are not very sophisticated when it comes to optimizing memory footprint.

Consider, for example, one of the simplest and most popular collection classes: java.util.ArrayList. Internally each ArrayList maintains an Object[] elementData array. That's where the elements of the list are stored. Let's see how this array is managed.

When you create an ArrayList with the default constructor, i.e. invoke new ArrayList(), elementData is set to point to a singleton shared zero-size array (elementData could as well be set to null, but a singleton array provides some minor implementation advantages). Once you add the first element to the list, a real, unique elementData array is created, and the provided object is inserted into it. To avoid resizing the array every time a new element is added, it is created with length 10 ("default capacity"). Here comes a catch: if you never add more elements to this ArrayList, 9 out of 10 slots in the elementData array will stay empty. And even if you clear this list later, the internal array will not shrink. The diagram below summarizes this lifecycle:

Internal memory management in java.util.ArrayList

How much memory is wasted here? In absolute terms, it's calculated as (object pointer size) * 9. If you use the HotSpot JVM (which comes with the Oracle JDK), pointer size depends on the maximum heap size (see https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/ for the details). Normally, if you specify -Xmx less than 32 gigabytes, pointer size is 4 bytes; for bigger heaps, it's 8 bytes. So an ArrayList initialized with the default constructor, with only one element added, wastes either 36 or 72 bytes.

In fact, an empty ArrayList wastes memory too, because it doesn't carry any workload, yet the size of an ArrayList object itself is non-zero and bigger than you probably think. That's because, for one thing, every object managed by the HotSpot JVM has a 12- or 16-byte header, that is used by the JVM for the internal purposes. Next, most collection objects contain the size field, a pointer to the internal array or another "workload carrier" object, a modCount field to keep track of contents modifications, etc. Thus even the smallest possible object representing an empty collection will likely need at least 32 bytes of memory. Some, like ConcurrentHashMap, take a lot more.

Consider another ubiquitous collection class: java.util.HashMap. Its lifecycle is similar to that of ArrayList and is summarized below:

HashMap internal memory management

As you can see, a HashMap containing only one key-value pair wastes 15 internal array slots, which translates into either 60 or 120 bytes. These numbers are small, but what matters is how much memory is lost by all collections in your app relative terms. And it turns out that some applications can waste a lot in this way. For example, several popular open-source Hadoop components that the author analyzed, lost around 20 percent of their heap in some scenarios! For products developed by less experienced engineers and not regularly scrutinized for performance, memory waste may be even higher. There are enough use cases where, for example, 90% of nodes in a huge tree contain only one or two children (or none at all), and other situations where the heap is full of 0-, 1- or 2-element collections.

If you discover unused or underutilized collections in your app, how do you fix them? Below are some common recipes. Here our problematic collection is assumed to be an ArrayList referenced by the Foo.list data field.

If most of the instances of list are never used, consider initializing it lazily. Thus the code that previously looked like...

void addToList(Object x) {
    list.add(x);
}


...should be refactored into something like

void addToList(Object x) {
  getOrCreateList().add(x);
}

private list getOrCreateList() {
    // To conserve memory, we don't create the list until the first use
    if (list == null) list = new ArrayList();
    return list;
}


Keep in mind that you will sometimes need to take additional measures to address possible races. For example, if you maintain a ConcurrentHashMap that may be updated by multiple threads concurrently, the code that initializes it lazily should not allow two threads to create two copies of this map accidentally:

private Map getOrCreateMap() {
    if (map == null) {
        // Make sure we aren't outpaced by another thread
        synchronized (this) {
            if (map == null) map = new ConcurrentHashMap();
        }
    }
    return map;
}


If most instances of your list or map contain just a handful of elements, consider initializing them with the more appropriate initial capacity, e.g.

list = new ArrayList(4);  // Internal array will start with length 4


If your collections are either empty or contain just one element (or key-value pair) in most cases, you may consider one extreme form of optimization. It works only if the collection is fully managed within the given class, i.e. other code cannot access it directly. The idea is that you change the type of your data field from e.g. List to a more generic Object, so that it can now point either to a real list, or directly to the only list element. Here is a brief sketch:

// *** Old code ***
private List<Foo> list = new ArrayList<>();

void addToList(Foo foo) { list.add(foo); }

// *** New code ***

// If list is empty, this is null. If list contains only one element,
// this points directly to that element. Otherwise, it points to a
// real ArrayList object.
private Object listOrSingleEl;

void addToList(Foo foo) {
    if (listOrSingleEl == null) { // Empty list
        listOrSingleEl = foo;
    } else if (listOrSingleEl instanceof Foo) { // Single-element 
        Foo firstEl = (Foo) listOrSingleEl;
        ArrayList<Foo> list = new ArrayList<>();
        listOrSingleEl = list;
        list.add(firstEl);
        list.add(foo);
    } else { // Real, multiple-element list
        ((ArrayList<Foo>) listOrSingleEl).add(foo);
    }
}


Obviously, such an optimization makes your code less readable and more difficult to maintain. But if you know that you will save a lot of memory in this way, or will get rid of long GC pauses, it may be worthwhile.

And this probably already made you think: how do I know which collections in my app waste memory, and how much?

The short answer is: this is hard to find out without proper tooling. Attempting to guess the amount of memory used or wasted by data structures in a big, complex application almost never works. And without knowing exactly where the memory goes, you may spend a lot of time chasing the wrong targets while your application stubbornly keeps failing with OutOfMemoryError.

Thus, you need to inspect your app's heap with a tool. From experience, the most optimal way to analyze the JVM memory (measured as amount of information available vs. the tool's impact on application performance) is to obtain a heap dump and then look at it offline. A heap dump is essentially a full snapshot of the heap. It can be either taken at an arbitrary moment by invoking the jmap utility, or the JVM can be configured to produce it automatically if it fails with OutOfMemoryError. If you Google for "JVM heap dump", you will immediately see a bunch of articles explaining in detail how to obtain a dump.

A heap dump is a binary file of about the size of your JVM's heap, so it can only be read and analyzed with special tools. There is a number of such tools available, both open-source and commercial. The most popular open-source tool is Eclipse MAT; there is also VisualVM and some less powerful and lesser-known tools. The commercial tools include the general-purpose Java profilers: JProfiler and YourKit, as well as one tool built specifically for heap dump analysis called JXRay (disclaimer: the author has developed the latter).

Unlike the other tools, JXRay analyzes a heap dump right away for large number of common problems such as duplicate strings and other objects, as well as suboptimal data structures. The problems with collections described above fall into the latter category. The tool generates a report with all the collected information in HTML format. The advantage of this approach is that you can view the results of analysis anywhere at any time and share it with others easily. It also means that you can run the tool on any machine, including big and powerful, but "headless" machines in a data center.

JXRay calculates the overhead (how much memory you would save if you get rid of a particular problem) in bytes and as a percentage of used heap. It groups together collections of the same class that have the same problem...

Summary of bad collections in the JXRay report

...and then groups problematic collections that are reachable from some GC root via the same reference chain, as in the example below

Image title

Knowing what reference chains and/or individual data fields (e.g. INodeDirectory.children above) point to collections that waste most memory, allows you to quickly and precisely pinpoint the code that is responsible for the problem, and then make the necessary changes.

In summary, suboptimally configured Java collections may waste a lot of memory. In many situations, this problem is easy to address, but sometimes, you may need to change your code in non-trivial ways to achieve a significant improvement. It is very difficult to guess which collections need to be optimized to make the biggest impact. To avoid wasting time optimizing wrong parts of the code, you need to obtain a JVM heap dump and analyze it with an appropriate tool.



If you enjoyed this article and want to learn more about Java Collections, check out this collection of tutorials and articles on all things Java Collections.

Memory (storage engine) Java (programming language) garbage collection Open source Data structure app Element application Data (computing)

Opinions expressed by DZone contributors are their own.

Related

  • How Java Apps Litter Beyond the Heap
  • Troubleshooting Memory Leaks With Heap Profilers
  • Java Memory Management
  • Java, Spring Boot, and MongoDB: Performance Analysis and Improvements

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!