DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

How does AI transform chaos engineering from an experiment into a critical capability? Learn how to effectively operationalize the chaos.

Data quality isn't just a technical issue: It impacts an organization's compliance, operational efficiency, and customer satisfaction.

Are you a front-end or full-stack developer frustrated by front-end distractions? Learn to move forward with tooling and clear boundaries.

Developer Experience: Demand to support engineering teams has risen, and there is a shift from traditional DevOps to workflow improvements.

Related

  • Understanding Root Causes of Out of Memory (OOM) Issues in Java Containers
  • Charge Vertical Scaling With the Latest Java GCs
  • Java Thread Dump Analysis
  • Building Your Own Automatic Garbage Collector: A Guide for Developers

Trending

  • Scaling Azure Microservices for Holiday Peak Traffic Using Automated CI/CD Pipelines and Cost Optimization
  • What is Microsoft Fabric for Azure Cloud (Beyond the Buzz) and How It Competes with Snowflake and Databricks
  • Agile’s Quarter-Century Crisis
  • Monoliths, REST, and Spring Boot Sidecars: A Real Modernization Playbook
  1. DZone
  2. Coding
  3. Java
  4. Java Collection Overhead

Java Collection Overhead

In this article, we will concentrate on the overhead caused by lists that contain two or three elements, which can be easily overlooked.

By 
Eugene Kovko user avatar
Eugene Kovko
·
Nov. 24, 23 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
4.3K Views

Join the DZone community and get the full member experience.

Join For Free

The Java Virtual Machine enables Java applications to be platform-independent while optimizing performance. One crucial component to understand when considering performance, especially memory utilization, is how the Java Collections Framework, specifically the ArrayList, handles size and capacity.

In this article, we will concentrate on the overhead caused by lists that contain two or three elements. The reason for this is that it’s a more common situation, and it can be easily overlooked.

The Difference Between Size and Capacity

The size of a list refers to the number of elements currently stored in it. It can change as we add or remove elements. The method List.size() provides this number. If we have a list with ten items, its size is ten.

The capacity of a list pertains to the amount of memory allocated for storing elements, regardless of whether these memory locations are currently in use. Capacity is primarily a concern for lists backed by arrays, like ArrayList. Capacity represents the maximum number of elements the list can hold before resizing its internal storage array. The capacity is always greater than or equal to the size of the list.

If we initialize an ArrayList and add ten items to it; its size is ten. However, the underlying array might have a capacity for fifteen items. This means adding five more items to the list wouldn’t trigger an expansion of the underlying array.

Understanding the distinction between size and capacity is crucial. While the size determines the actual data count, the capacity impacts memory utilization and can influence performance due to the potential need for array resizing and data copying.

The Initial Capacity of a List

The ArrayList class has a default initial capacity. As of Java 17, this capacity is ten. If we know we’ll have more or fewer elements, it’s often a good idea to set an initial capacity to reduce the number of resizes.

The LinkedList, for example, does not have a concept of capacity. It’s a doubly-linked list, meaning each element points to both its predecessor and successor. There’s no underlying array that needs resizing.

When considering JVM performance, understanding the initial capacity of lists and how they grow is crucial. Setting appropriate initial capacities can reduce the need for list resizing, reduce memory churn, and improve performance.

Effect on the Code

Let’s run two tests to compare the performance of the code:

Java
 
@Benchmark 
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, jvmArgs = {"-Xlog:gc:file=list-creation-%t.txt,filecount=10,filesize=40gb -Xmx6gb 
-Xms6gb"}) 
public void listCreationBenchmark(HeapDumperState heapDumperState, Blackhole blackhole) {    
   final List<Integer> result = new ArrayList<>();    
   result.add(1);    
   result.add(2);    
   result.add(3);    
   blackhole.consume(result); 
}
@Benchmark 
@BenchmarkMode(Mode.AverageTime) 
@Fork(value = 2, jvmArgs = {"-Xlog:gc:file=limited-capacity-list-creation-
%t.txt,filecount=10,filesize=40gb -Xmx6gb -Xms6gb"}) 
public void limitedCapacityListCreationBenchmark(HeapDumperState heapDumperState, Blackhole blackhole) {    
final List<Integer> result = new ArrayList<>(3);    
result.add(1);    
result.add(2);    
result.add(3);    
blackhole.consume(result); 
}


Note that HeapDumperState is a state that triggers a heap dump after each iteration so that we will get the information about created objects. All the tests run ten ten-minute iterations in two separate forks. The overall duration of each test took around one hour and forty minutes.

Overall, the tests didn’t show a significant difference, and anecdotally, all of the runs actually show that the first option with default capacity might be faster:

Benchmark Mode Cnt Score Error Units
OverheadBenchmark.limitedCapacityListCreationBenchmark thrpt 20 116365294.187 ± 4748264.227 ops/s
OverheadBenchmark.listCreationBenchmark thrpt 20 121014905.085 ± 188451.671 ops/s


The average time for a single operation is so minuscule that it’s hard to tell the difference in these tests. Please bear in mind that we’re measuring the creation of an ArrayList with three elements against ten elements.

Memory Footprint

When optimizing for performance, it’s important to understand the impact of memory allocation decisions on the JVM. We might think that the default capacity of an ArrayList is harmless. However, in some cases, there are implications, not only for large lists or high-frequency list operations but also for lists with a small number of elements.

Let’s look deeper into the garbage collection logs produced by these tests. We’ll be using HeapHero for this analysis. The initial guess would be that the ArrayList test with a default capacity would take more heap space, have more garbage collection cycles, and lower throughput.

The Increase in Memory Use

When initializing an ArrayList with the default constructor (i.e., without specifying a capacity), the list allocates memory for ten elements. If we add only three elements, we use 30% of the allocated memory, leaving 70% unused.

If we initialize the ArrayList with a capacity of three (new ArrayList<>(3)), it will allocate memory just for those three elements. Consequently, there’s less wastage.

This is clearly seen in the difference in average heap sizes. The ArrayList with explicitly declared capacity provided the following results:

Fig 1: Memory allocation for an ArraysList with explicit capacity Fig 1: Memory allocation for an ArraysList with explicit capacity

At the same time, the ArrayList with default capacity, as we expected, resulted in consuming more memory:

Memory allocation with default capacityFig 2: Memory allocation for an ArraysList with default capacity Fig 2: Memory allocation for an ArraysList with default capacity
In essence, using the default capacity for storing fewer elements than its capacity results in unnecessary memory allocation. This difference might seem insignificant for our test case, but imagine creating thousands or millions of such lists in an application with a greater lifespan; the memory wastage accumulates.

The Effect on the Throughput

Due to the memory footprint, JVM has to manage memory more aggressively with more frequent garbage collection cycles. Let’s compare the KPIs for both cases:

Memory allocation for an ArraysList with explicitly capacity Fig 3: Memory allocation for an ArraysList with explicit capacity
Throughtput for an ArraysList with default capacity Fig 4: Throughtput for an ArraysList with default capacity
Both of the previous problems lead to lower throughput as JVM has to spend more time on memory management. Again, the throughput isn’t dramatically different, but even for this simple test, we can see the difference. One of the main reasons for this is the object creation rate, which is significantly higher in the tests that use default capacity ArrayLists.

Please consider that the objects are becoming unreachable almost instantaneously. However, on a busy web server, this might cause more issues.

yCrash Analysis

While the previous metrics identified the issue, they didn’t identify the source of it. To find it out, we might use heap dump analysis to identify the heap’s state. In particular, we would concentrate on the Inefficient Collection section.

This section provides a good overview of the wasted memory due to oversized collections. The main cause of this problem is the difference between the capacity and the size of the collections.

The heap dump was captured before a garbage collection cycle. This way, we could better see the collection objects in our heap:

Inefficient collection information Fig 5: Inefficient collection information in yCrash

From this, we can see that we waste memory, and most of our collections (almost all of them) take up more space than they need.

Conclusion

Constant monitoring and analysis of an application is crucial to its healthy and performant work. Sometimes, it’s hard to see issues with occasional heap dumps and garbage collection logs. That’s why it’s so important to have a system that would analyze the application constantly. yCrash application can help with the monitoring and not only produce a better user experience but also give a service a competitive advantage on the market.

Java virtual machine garbage collection Java (programming language)

Published at DZone with permission of Eugene Kovko. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Understanding Root Causes of Out of Memory (OOM) Issues in Java Containers
  • Charge Vertical Scaling With the Latest Java GCs
  • Java Thread Dump Analysis
  • Building Your Own Automatic Garbage Collector: A Guide for Developers

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: