DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

How does AI transform chaos engineering from an experiment into a critical capability? Learn how to effectively operationalize the chaos.

Data quality isn't just a technical issue: It impacts an organization's compliance, operational efficiency, and customer satisfaction.

Are you a front-end or full-stack developer frustrated by front-end distractions? Learn to move forward with tooling and clear boundaries.

Developer Experience: Demand to support engineering teams has risen, and there is a shift from traditional DevOps to workflow improvements.

Related

  • Exploring Exciting New Features in Java 17 With Examples
  • Rust’s Ownership and Borrowing Enforce Memory Safety
  • Reactive Kafka With Streaming in Spring Boot
  • Leetcode: Improving String Performance in Swift

Trending

  • Master SQL Performance Optimization: Step-by-Step Techniques With Case Studies
  • Beyond Web Scraping: Building a Reddit Intelligence Engine With Airflow, DuckDB, and Ollama
  • Guide to Repairing Damaged Apache Doris Tablets
  • TFVC to Git Migration: Step-by-Step Guide for Modern DevOps Teams
  1. DZone
  2. Data Engineering
  3. Data
  4. String Memory Internals

String Memory Internals

By 
Tomasz Nurkiewicz user avatar
Tomasz Nurkiewicz
DZone Core CORE ·
Aug. 08, 12 · Tutorial
Likes (5)
Comment
Save
Tweet
Share
52.2K Views

Join the DZone community and get the full member experience.

Join For Free

This article is based on my answer on StackOverflow. I am trying to explain how String class stores the texts, how interning and constant pool works.

The main point to understand here is the distinction between String Java object and its contents - char[] under private value field. String is basically a wrapper around char[] array, encapsulating it and making it impossible to modify so the String can remain immutable. Also the String class remembers which parts of this array is actually used (see below). This all means that you can have two different String objects (quite lightweight) pointing to the same char[].

I will show you few examples, together with hashCode() of each String and hashCode() of internal char[] value field (I will call it text to distinguish it from string). Finally I'll show javap -c -verbose output, together with constant pool for my test class. Please do not confuse class constant pool with string literal pool. They are not quite the same. See also Understanding javap's output for the Constant Pool

Prerequisites

For the purpose of testing I created such a utility method that breaks String encapsulation:

private int showInternalCharArrayHashCode(String s) {
    final Field value = String.class.getDeclaredField("value");
    value.setAccessible(true);
    return value.get(s).hashCode();
}

It will print hashCode() of char[] value, effectively helping us understand whether this particular String points to the same char[] text or not.

Two string literals in a class

Let's start from the simplest example.

Java code

String one = "abc";
String two = "abc";

BTW if you simply write "ab" + "c", Java compiler will perform concatenation at compile time and the generated code will be exactly the same. This only works if all strings are known at compile time.

Class constant pool

Each class has its own constant pool - a list of constant values that can be reused if they occur several times in the source code. It includes common strings, numbers, method names, etc.

Here are the contents of the constant pool in our example above:

const #2 = String   #38;    //  abc
//...
const #38 = Asciz   abc;

The important thing to note is the distinction between String constant object (#2) and Unicode encoded text "abc" (#38) that the string points to.

Byte code

Here is generated byte code. Note that both one and two references are assigned with the same #2 constant pointing to "abc" string:

ldc #2; //String abc
astore_1    //one
ldc #2; //String abc
astore_2    //two 

Output

For each example I am printing the following values:

System.out.println("one.value: " + showInternalCharArrayHashCode(one));
System.out.println("two.value: " + showInternalCharArrayHashCode(two));
System.out.println("one" + System.identityHashCode(one));
System.out.println("two" + System.identityHashCode(two));

No surprise that both pairs are equal:

one.value: 23583040
two.value: 23583040
one: 8918249
two: 8918249

Which means that not only both objects point to the same char[] (the same text underneath) so equals() test will pass. But even more, one and two are the exact same references! So one == two is true as well. Obviously if one and two point to the same object then one.value and two.value must be equal.

Literal and new String()

Java code

Now the example we all waited for - one string literal and one new String using the same literal. How will this work?

String one = "abc";
String two = new String("abc");

The fact that "abc" constant is used two times in the source code should give you some hint...

Class constant pool

Same as above.

Byte code

ldc #2; //String abc
astore_1    //one

new #3; //class java/lang/String
dup
ldc #2; //String abc
invokespecial   #4; //Method java/lang/String."<init>":(Ljava/lang/String;)V
astore_2    //two

Look carefully! The first object is created the same way as above, no surprise. It just takes a constant reference to already created String (#2) from the constant pool. However the second object is created via normal constructor call. But! The first String is passed as an argument. This can be decompiled to:

String two = new String(one); 

Output

The output is a bit surprising. The second pair, representing references to String object is understandable - we created two String objects - one was created for us in the constant pool and the second one was created manually for two. But why, on earth the first pair suggests that both String objects point to the same char[] value array?!

one.value: 41771
two.value: 41771
one: 8388097
two: 16585653

It becomes clear when you look at how String(String) constructor works (greatly simplified here):

public String(String original) {
    this.offset = original.offset;
    this.count = original.count;
    this.value = original.value;
}

See? When you are creating new String object based on existing one, it reuses char[] value. Strings are immutable, there is no need to copy data structure that is known to be never modified. Moreover, since new String(someString) creates an exact copy of existing string and strings are immutable, there is clearly no reason for the two to exist at the same time.

I think this is the clue of some misunderstandings: even if you have two String objects, they might still point to the same contents. And as you can see the String object itself is quite small.

Runtime modification and intern()

Java code

Let's say you initially used two different strings but after some modifications they are all the same:

String one = "abc";
String two = "?abc".substring(1);  //also two = "abc"

The Java compiler (at least mine) is not clever enough to perform such operation at compile time, have a look:

Class constant pool

Suddenly we ended up with two constant strings pointing to two different constant texts:

const #2 = String   #44;    //  abc
const #3 = String   #45;    //  ?abc
const #44 = Asciz   abc;
const #45 = Asciz   ?abc;

Byte Code

ldc #2; //String abc
astore_1    //one

ldc #3; //String ?abc
iconst_1
invokevirtual   #4; //Method String.substring:(I)Ljava/lang/String;
astore_2    //two

The fist string is constructed as usual. The second is created by first loading the constant "?abc" string and then calling substring(1) on it.

Output

No surprise here - we have two different strings, pointing to two different char[] texts in memory:

one.value: 27379847
two.value: 7615385
one: 8388097
two: 16585653

Well, the texts aren't really different, equals() method will still yield true. We have two unnecessary copies of the same text.

Now we should run two exercises. First, try running:

two = two.intern();

before printing hash codes. Not only both one and two point to the same text, but they are the same reference!

one.value: 11108810
two.value: 11108810
one: 15184449
two: 15184449

This means both one.equals(two) and one == two tests will pass. Also we saved some memory because "abc" text appears only once in memory (the second copy will be garbage collected).

The second exercise is slightly different, check out this:

String one = "abc";
String two = "abc".substring(1);

Obviously one and two are two different objects, pointing to two different texts. But how come the output suggests that they both point to the same char[] array?!?

one.value: 23583040two.value: 23583040one: 11108810two: 8918249

I'll leave the answer to you. It'll teach you how substring() works, what are the advantages of such approach and when it can lead to big troubles.

Lessons learnt

  • String object itself is rather cheap. It's the text it points to that consumes most of the memory
  • String is just a thin wrapper around char[] to preserve immutability
  • new String("abc") isn't really that expensive as the internal text representation is reused. But still avoid such construct.
  • When String is concatenated from constant values known at compile time, concatenation is done by the compiler, not by the JVM
  • substring() is tricky, but most importantly, it is very cheap, both in terms of used memory and run time (constant in both cases)
Strings Data Types Memory (storage engine)

Published at DZone with permission of Tomasz Nurkiewicz, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Exploring Exciting New Features in Java 17 With Examples
  • Rust’s Ownership and Borrowing Enforce Memory Safety
  • Reactive Kafka With Streaming in Spring Boot
  • Leetcode: Improving String Performance in Swift

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: