DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

How does AI transform chaos engineering from an experiment into a critical capability? Learn how to effectively operationalize the chaos.

Data quality isn't just a technical issue: It impacts an organization's compliance, operational efficiency, and customer satisfaction.

Are you a front-end or full-stack developer frustrated by front-end distractions? Learn to move forward with tooling and clear boundaries.

Developer Experience: Demand to support engineering teams has risen, and there is a shift from traditional DevOps to workflow improvements.

Related

  • Java Memory Management
  • Principles to Handle Thousands of Connections in Java Using Netty
  • Singleton: 6 Ways To Write and Use in Java Programming
  • Exploring Exciting New Features in Java 17 With Examples

Trending

  • Domain-Centric Agile Modeling for Legacy Insurance Systems
  • How You Clear Your HTML5 Canvas Matters
  • Improving Cloud Data Warehouse Performance: Overcoming Bottlenecks With AWS and Third-Party Tools
  • Integrate Spring With Open AI
  1. DZone
  2. Coding
  3. Languages
  4. UseStringDeduplication: Pros and Cons

UseStringDeduplication: Pros and Cons

Why are there so many duplicate strings? And how can I fix this in my Java applications?

By 
Ram Lakshmanan user avatar
Ram Lakshmanan
DZone Core CORE ·
Updated Jul. 09, 19 · Presentation
Likes (44)
Comment
Save
Tweet
Share
66.3K Views

Join the DZone community and get the full member experience.

Join For Free

Let me start this article with an interesting statistic (based on the research conducted by the JDK development team):

  • 25 percent of Java applications memory is filled up with strings.
  • 13.5 percent are duplicate strings in Java applications.
  • Average string length is 45 characters.

Yes, you are right — 13.5 percent of memory is wasted due to duplicate strings. 13.5 percent is the average amount of duplicate strings present in Java application. To figure out how much memory your application is wasting because of duplicate strings, you may use tools like HeapHero, which can report how much memory is wasted because of duplicate strings and other inefficient programming practices.

What Are Duplicate Strings?

First, let’s understand what a duplicate string means. Look at the below code snippet:

String string1 = new String("Hello World");
String string2 = new String("Hello World");


In the above code, there are two string objects—string1 and string2 . They have the same contents, i.e. “Hello World,” but they are stored in two different objects. When you do  string1.equals(string2), it will return ‘true’, but  string1 == string2 will return ‘false’. This is what we call duplicate strings.

Why There Are So Many Duplicate Strings?

There are several reasons why an application ends up having a lot of duplicate strings. In this section, let's review the two most common patterns:

# 1. Developers create new string objects for every request, instead of referencing/reusing ‘public static final string literal.' The below example can be optimally written using a string literal pattern:

public static final String HELLO_WORLD = "Hello World";

String string1 = HELLO_WORLD;
String string2 = HELLO_WORLD;


# 2. Suppose you are building banking/e-commerce application. You are storing currency (i.e. ‘USD’, ‘EUR’, ‘INR’, ….) for every transaction record in the database. Say now a customer logs in to your application and is viewing their transaction history page. Now, your application will end up reading all transactions pertaining to this customer from the database. Suppose this customer lives in the US (then most, if not all, his transactions would be in USD). Since every transaction record has currency, your application will end up creating the ‘USD’ string object for every transaction record read from the database. If this customer has thousands of transactions, you will end up creating thousands of duplicate ‘USD’ string objects in memory just for this one single customer.

Similarly, your application could be reading multiple columns (customer name, address, state, country, account number, Ids,…..) from databases multiple times. There could be duplicates among them. Your application reads and writes XML/JSON with external applications, and it manipulates a lot of strings. All these operations can, and often will, create duplicate strings.

This problem has been long recognized by the JDK team since its origin (back in the mid-1990s), thus they have come up with multiple solutions so far. The latest addition to this solution list is ‘- XX:+UseStringDeduplication.’

-XX:+UseStringDeduplication

Least effort attempt to eliminate duplicate strings is to pass -XX:+UseStringDeduplication JVM argument. When you pass this JVM argument during application startup, JVM will try to eliminate duplicate strings as part of the garbage collection process. During the garbage collection process, the JVM inspects all the objects in memory, thus as part of that process, it tries to identify duplicate strings among them and tries to eliminate it.

Does that mean if you just pass ‘-XX:+UseStringDeduplication’ JVM argument will you be able to save 13.5% of memory immediately? Sounds pretty easy, right? We wish it is that easy. But there are some catches to this -XX:+UseStringDeduplication solution. Let’s discuss them.

(1). Works Only With the G1 GC Algorithm

There are several garbage collection algorithms (Serial, Parallel, CMS, G1,…). -XX:+UseStringDeduplication works only if you are using the G1 GC algorithm. So, if you are using some other GC algorithm, you need to switch to G1 GC algorithm to use  -XX:+UseStringDeduplication.

(2). Works Only on Long-Lived Objects

 -XX:+UseStringDeduplication eliminates duplicate strings. which live for a longer period of time. They don’t eliminate duplicate strings among short-lived string objects. If objects are short-lived, they are going to die down soon, and then, what is the point of spending resources to eliminate duplicate strings among them. Here is a real-life case study conducted on a major Java web application that didn’t show any memory relief when -XX:+UseStringDeduplication was used. However,  -XX:+UseStringDeduplication can be of value, if your application has a lot of caches (since cache objects typically tend to be long-lived objects).

(3). -XX:StringDeduplicationAgeThreshold

By default, strings become eligible for deduplication if they have survived three GC runs. It can be changed by passing this  -XX:StringDeduplicationAgeThreshold.

Example:
-XX:StringDeduplicationAgeThreshold=6


(4). Impact on GC Pause Times

Since string deduplication is performed during garbage collection, it has the potential to impact GC pause time. However, the assumption is that a high enough deduplication success rate will balance out most or all of this impact, because deduplication can reduce the amount of work needed in other phases of a GC pause (like reduced number of objects to evacuate) as well as reduce the GC frequency (due to reduced pressure on the heap). To analyze the GC pause time impact, you may consider using tools like GCeasy

(5). Only Underlying char[ ] Is Replaced

The java.lang.String class has two fields:

private final char[] value
private int hash


 -XX:+UseStringDeduplication doesn’t eliminate duplicate string object itself. It only replaces the underlying char[ ]. Deduplicating a string object is conceptually just a re-assignment of the value field, i.e. a  String.value = anotherString.value.

Each string object takes at least 24 bytes (the exact size of a string object depends on the JVM configuration, but 24 bytes is a minimum). Thus, this feature saves less memory if there are a lot of short duplicate strings.

(6). Java 8 Update 20

The -XX:+UseStringDeduplication feature is supported only from Java 8 update 20. Thus, if you are running on any older versions of Java, you will not be able to use this feature.

(7). -XX:+PrintStringDeduplicationStatistics

If you would like to see string deduplication statistics, such as how much time it took to run, how much duplicate strings were evacuated, how much savings you gained, you may pass the -XX:+PrintStringDeduplicationStatistics JVM argument. In the error console, statistics will be printed.

Conclusion

If your application is using G1 GC and running on a version above Java 8 update 20, you may consider enabling -XX:+UseStringDeduplication. You might get fruitful results especially if there are a lot of duplicate strings among long-lived objects. However, do thorough testing before enabling this argument in the production environment.

Strings garbage collection Database Web application Java (programming language) Data Types Object (computer science) Memory (storage engine) Cons

Published at DZone with permission of Ram Lakshmanan, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Java Memory Management
  • Principles to Handle Thousands of Connections in Java Using Netty
  • Singleton: 6 Ways To Write and Use in Java Programming
  • Exploring Exciting New Features in Java 17 With Examples

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: