DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Optimizing Java Applications for Arm64 in the Cloud
  • Java and Low Latency
  • Fixing OutOfMemoryErrors in Java Applications
  • Using Heap Dumps to Find Memory Leaks

Trending

  • RAG Is Not Enough: Advanced Retrieval Architectures Using Vertex AI Search on GCP
  • Feature Flag Debt: Performance Impact in Enterprise Applications
  • Bringing Intelligence Closer to the Source: Why Real-Time Processing is the Heart of Edge AI
  • Detecting Bugs and Vulnerabilities in Java With SonarQube
  1. DZone
  2. Coding
  3. Java
  4. JDK 17 Memory Bloat in Containers: A Post-Mortem

JDK 17 Memory Bloat in Containers: A Post-Mortem

Upgrading from JDK 8 to JDK 17 spiked container memory from ~50% to 100% due to excessive JVM threads, glibc malloc arenas, and G1GC native allocation.

By 
Saumya Tyagi user avatar
Saumya Tyagi
·
Dec. 02, 25 · Analysis
Likes (11)
Comment
Save
Tweet
Share
11.7K Views

Join the DZone community and get the full member experience.

Join For Free

When engineering teams modernize Java applications, the shift from JDK 8 to newer Long-Term Support (LTS) versions, such as JDK 11, 17, and soon 21, might seem straightforward at first. Since Java maintains backward compatibility, it's easy to assume that the runtime behavior will remain largely unchanged. However, that's far from reality.

In 2025, our team completed a major modernization initiative to migrate all of our Java microservices from JDK 8 to JDK 17. The development and QA phases went smoothly, with no major issues arising. But within hours of deploying to production, we faced a complete system breakdown.

Memory usage, which had been consistently reliable for years, jumped by four times. Containers that had previously operated without issue began to restart repeatedly. Our service level agreements (SLAs) degraded, and incident severity levels escalated. This prompted a multi-day diagnostic effort involving several teams—including platform experts, Java Virtual Machine (JVM) specialists, and service owners.

This post-mortem will cover the following:

  • Key differences between JDK 8 and JDK 17
  • How containerized environments amplify hidden JVM behaviors
  • The distinctions between native memory and heap memory
  • The reasons behind thread proliferation and its impact on memory
  • The specific commands, flags, and environment variables that resolved our issues
  • A validated checklist for anyone upgrading to JDK 17 (or 21)

The problems we faced were subtle and nearly invisible to standard Java monitoring tools. However, the lessons we learned reshaped our approach to upgrading JVM versions and transformed our understanding of memory usage in containerized environments.

The Incident

We deployed the JDK 17 version of our primary service to Kubernetes. The rollout was smooth, health checks turned out green, request latencies remained stable, and the logs showed no errors.

However, 2–3 hours later, our dashboards began lighting up.

Symptoms Observed

Metric JDK 8 (Before) JDK 17 (After)
Memory usage ~50% of container 95–100% (frequent OOMKills)
Thread count ~400 1600+ threads
Total native memory ~800 MB 3.4–3.6 GB
Container restarts None Multiple/hour
GC behavior Stable G1GC overhead spikes


Services that had been stable for years suddenly began to fail unpredictably.

The Challenge: Heap Monitoring Misled Us

Every Java engineer knows to keep an eye on heap usage. Initially, the heap looked perfectly fine, remaining constant around the configured Xmx. However, it was native memory that was surging.

Native memory includes:

  • Thread stacks
  • glibc malloc arenas
  • Auxiliary structures in Garbage Collector (GC)
  • JIT compiler buffers
  • Metaspace, Code Cache
  • NIO buffers
  • Internal JVM C++ structures

Unfortunately, this isn’t visible through heap dump tools and isn’t captured by standard Java monitoring. This is exactly what OOMKilled our containers.

Root Cause Analysis

During our investigation, we found that three independent JVM behaviors amplified under containers created a “perfect memory storm.”

After three days of thorough analysis—reviewing heap data, utilizing native memory tracking (jcmd VM.native_memory), sampling thread dumps, examining GC logs, and inspecting container cgroups—we identified three root causes.

Root Cause #1: Thread Proliferation Due to CPU Mis-Detection

What Happened

JDK 17 introduced changes to how Runtime.availableProcessors() functions. Specifically, in versions 17.0.5 and later, a regression caused the Java Virtual Machine (JVM) to ignore cgroup CPU limits and instead read the physical CPU count of the host.

Example:

Plain Text
 
Container CPU limit: 2 vCPUs
Host machine CPUs:   96
JVM detected:        96 CPUs ❌


This miscalculation caused various parts of the JVM to scale thread creation based on the inflated CPU count, including:

  • GC worker threads
  • JIT compiler threads
  • ForkJoin common pool
  • JVMTI threads
  • Async logging threads

So instead of:

Plain Text
 
~50–80 JVM system threads


the JVM spawned:

Plain Text
 
300–400+ threads


When factoring in application threads (async tasks, thread pools, I/O threads), the total count shot to:

Plain Text
 
1600+ threads


Why Threads Matter for Memory

Every thread typically reserves ~2 MB of stack by default (native memory). So:

Plain Text
 
1600 threads × 2 MB = ~3.2 GB native stack memory


Even if those threads remain idle, the stack is reserved. This thread bloat alone pushed us dangerously close to the memory limit of our container.

Root Cause #2: glibc malloc Arena Fragmentation

The thread explosion made things much worse. Glibc manages memory using malloc arenas, and, by default, it allocates:

Plain Text
 
8 × CPU_COUNT arenas


Due to the JVM incorrectly detecting 96 CPUs, glibc created:

Plain Text
 
8 × 96 = 768 arenas


A typical arena can consume 10 to 30 MB, depending on fragmentation patterns. Even when arenas are sparsely used, they still occupy virtual memory and contribute to Resident Set Size (RSS). In our case, this resulted in:

Plain Text
 
~1.5–2.0 GB consumed by glibc arenas


This was invisible to Java monitoring tools and heap analysis.

Root Cause #3: G1GC Native Memory Overhead (800–1000 MB Higher)

Another factor to consider is the shift to Garbage-First Garbage Collector (G1GC) in JDK 17, while JDK 8 commonly used ParallelGC. G1GC is known for using significantly more native memory:

Component Approx Native Memory
Remembered Sets 300–400 MB
Card Tables 100–200 MB
Region metadata 200 MB
Marking bitmaps 150+ MB
Concurrent refinement buffers 100 MB


Total for G1GC:

Plain Text
 
~800–1000 MB native memory


ParallelGC in JDK 8:

Plain Text
 
~150–200 MB


Difference:

Plain Text
 
+650–800 MB


This put us well beyond our container’s 4 GB memory limit.

Combined Memory Explosion Model

Let's look at the combined impact of the three root causes:

Under JDK 8 (~2.8 GB Total)

Plain Text
 
Heap:              2048 MB
Metaspace:          200 MB
Code Cache:         240 MB
Threads:             80 MB
Native GC:          150 MB
Other native:       100 MB
----------------------------------
Total:             ~2.8 GB


Under JDK 17 (~5.4 GB Total)

Plain Text
 
Heap:              2048 MB
Metaspace:          250 MB
Code Cache:         240 MB
Threads:            200 MB
G1GC:              1000 MB
glibc arenas:      1500 MB
Other native:       150 MB
----------------------------------
Total:             ~5.4 GB ❌


This puts us 1.4 GB over the container limit. No amount of heap tuning could have fixed this, because the heap itself was not the underlying problem.

The Fix: A Three-Part Solution

Fix #1: Explicitly Set CPU Count

Plain Text
 
-XX:ActiveProcessorCount=2


This is the most important setting for containerized Java on JDK 11 and above. It prevents the JVM from scaling threads based on the CPU count of the node.

Fix #2: Limit glibc Malloc Arenas

Set the environment variable:

Plain Text
 
export MALLOC_ARENA_MAX=2


This reduced native arena overhead from approximately 1.5GB to below 200MB. If you're dealing with very tight memory constrains, consider using:

Plain Text
 
export MALLOC_ARENA_MAX=1


Fix #3: Tune or Replace G1GC

You have two options here:

  1. Keep G1GC, but tune it, or
  2. Switch to ParallelGC, particularly for memory-sensitive workloads.

ParallelGC remains the lowest native memory footprint GC in modern Java.

Our tuning:

Plain Text
 
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:G1HeapRegionSize=16m


After implementing these fixes, we observed that memory usage stabilized in the range of 65% to 70%.

Additional Detection and Observability Improvements

The biggest operational takeaway is clear: relying solely on heap monitoring is not enough. JVM upgrades also require native memory monitoring.

Here's what we've implemented:

Native Memory Tracking (NMT)

We enabled NMT with the command:

Plain Text
 
-XX:NativeMemoryTracking=summary


From there, we used:

Plain Text
 
jcmd <pid> VM.native_memory summary


This provided us a detailed breakdown of memory usage across threads, arenas, GC, compiler, etc.

Thread Count Alerts

We established the following:

  • Baseline thread counts per service
  • Alerts for any increase exceeding 50% 
  • Dashboards showing thread growth patterns

Increases in thread counts often signal potential native memory leaks.

Monitoring Container-Level Memory Metrics

We shifted our focus to monitoring container-level memory instead of pod-level memory, which aggregates data from multiple containers:

Plain Text
 
container_memory_working_set_bytes


By concentrating on container-level metrics, we were able to identify memory overshoots sooner and with greater accuracy.

How We Reproduced the Issue Locally

To validate that the issue was inherent to JDK 17, we set up a local environment that mirrored the original setup.

Step 1: Run the Application in Docker

Plain Text
 
docker run \
  --cpus=2 \
  --memory=4g \
  -e MALLOC_ARENA_MAX=2 \
  myservice:java17


Step 2: Inspect CPU Detection

Plain Text
 
docker exec -it <container> bash
java -XX:+PrintFlagsFinal -version | grep -i cpu


Here's What We Found:

Before the fix:

Plain Text
 
active_processor_count = 96


After the fix:

Plain Text
 
active_processor_count = 2


Step 3: Inspect Native Memory:

Plain Text
 
jcmd <pid> VM.native_memory summary


The arena counts correlated exactly with the detected CPU.

Why This Problem Is Becoming More Common

A number of companies migrating from Java 8 to Java 17 (or 21) are encountering similar challenges. The reasons for this include:

  1. Containerization exposes previously hidden JVM behaviors.
  2. Local development machines typically have plenty of RAM and CPU power, unlike Kubernetes containers.
  3. G1GC has now become the default garbage collector, and its overhead is greater than that of ParallelGC.
  4. Many servers are equipped with 64 to 128 CPUs, and JVM thread scaling explodes if mis-detected.
  5. Native memory usage in Java applications is rarely monitored, even in large organizations.
  6. The behavior of glibc malloc arenas is poorly understood outside the realm of low-level systems engineering.

This combination of factors creates a “trap,” where JVM upgrades might pass all QA tests but may break instantly once deployed in production.

What We Would Do Differently Next Time

JVM Version Soak Testing

Moving forward, we will implement the following requirements:

  • A 48-hour load soak
  • A 24-hour canary production soak
  • Monitoring of thread counts
  • Oversight of native memory 
  • Analysis of GC behavior logs

We've learned that a functional test suite alone is not sufficient.

JVM Upgrade Runbooks

We have developed a runbook that includes:

  • Required flags for containers
  • Required environment variables (MALLOC_ARENA_MAX)
  • Monitoring dashboards to check before promotion
  • A rollback decision tree

Rigorous Baseline Establishment

For each service, we will establish baselines for:

  • Heap usage 
  • Native memory 
  • Thread counts
  • GC overhead

Once these baselines are defined, comparing JDK 8 to JDK 17 will become straightforward.

Upgrade Checklist

Pre-Upgrade Steps

  • Set -XX:ActiveProcessorCount explicitly
  • Set MALLOC_ARENA_MAX=1 or 2
  • Choose your garbage collection method: G1GC or ParallelGC
  • Enable Native Memory Tracking
  • Establish memory baselines for both heap and native memory
  • Take note of thread count baselines
  • Enable container-level memory metrics
  • Conduct soak tests for 24 to 48 hours
  • Monitor and validate GC pause times while under load

Post-Deployment Actions

  • Observe thread counts for 2 to 6 hours
  • Compare native memory usage against your baseline
  • Check and validate arena counts
  • Ensure CPU detection is accurate
  • Rollback immediately if native memory rises more than 10–15% beyond the baseline

Conclusion

The upgrade to JDK 17 served as one of the most instructive incidents our team has encountered.
It highlighted several crucial points:

  • Native memory dominates JVM behavior in containers
  • CPU detection bugs can silently cripple services
  • GC changes between JDK releases can add 500MB+ overhead
  • glibc malloc arenas can expand due to excessive thread proliferation
  • Monitoring heuristics from JDK 8 become less reliable when transitioning to JDK 17
  • Upgrading the JVM must be treated with the same caution as a major infrastructure overhaul, rather than simply a minor version update

The good news?

After applying the recommended fixes, our services now operate more efficiently on JDK 17 than they ever did on JDK 8. We're seeing improved GC throughput, reduced pause times, and improved overall performance.

However, this experience serves as a critical reminder:

Modern Java is fast and powerful but only when configured with an understanding of how the JVM interacts with container runtimes, native memory systems, and Linux allocators.

If you are planning a JDK 17 upgrade, use this guide, validate your assumptions, and closely monitor native memory alongside heap memory.

Java Development Kit Java virtual machine applications Memory (storage engine)

Opinions expressed by DZone contributors are their own.

Related

  • Optimizing Java Applications for Arm64 in the Cloud
  • Java and Low Latency
  • Fixing OutOfMemoryErrors in Java Applications
  • Using Heap Dumps to Find Memory Leaks

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook