Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

The Right Way to Reverse a String in Java

DZone's Guide to

The Right Way to Reverse a String in Java

Learn the right way to reverse a String.

· Java Zone ·
Free Resource

Download Microservices for Java Developers: A hands-on introduction to frameworks and containers. Brought to you in partnership with Red Hat.

Facts and Terminology

As you probably know, Java uses UTF-16 to represent String. The char data type and the Character class are based on the original Unicode specification, which defined characters as fixed-width 16-bit entities. The Unicode Standard has since been changed to allow for characters whose representation requires more than 16 bits. Therefore, in the UTF-16 representation, there are characters (Code Points) that are represented by one- and some other characters that are represented by two char values (Code Units).

Please check out the Java String length confusion article and the JavaDoc of the Character class for more details and a more-detailed explanation.

Example

Character: A
"UTF-16 representation" in Java: "\u0041"

Character: Mathematical double-struck capital A (Unfortunately, the DZone editor has the same issue as what is described below, it prints ?? instead of the real character after save: ��)
"UTF-16 representation" in Java: "\uD835\uDD38"

The first one is straightforward; the second one is a little bit more interesting; this single character (Code Point) is represented by two Unicode escapes. This means a couple of things:

  • This single character is represented by two char (or Character) values (Code Units)

  • The length() of this String is two (see: Java String length confusion)

  • The toCharArray() method returns a char array (char[]), which has two elements (0xD835 and 0xDD38 respectively)

  • Both charAt(0) and charAt(1) return something (no StringIndexOutOfBoundsException), but these values are not valid characters

  • If you do any character manipulation, you need to consider this case and handle these characters, which consist of two char (surrogates)

  • Therefore, most of the character manipulation code we ever wrote is probably broken.

This basically means that you probably do not want to do any character manipulation (see below).

Broken String Reverse

By this point, you might have a good guess what is wrong with this (very commonly used) solution to reverse a String:

static String reverse(String original) {
    String reversed = "";
    for (int i = original.length() - 1;  0 <= i; i--) {
        reversed += original.charAt(i);
    }

    return reversed;
}


Let's see it in action:

String str = "\uD835\uDD38BC"; // Three characters: A, B, C (4 chars)
System.out.println(str); // prints ABC (A is the double-struck A)
System.out.println(reverse(str)); // prints CB??


If you run the reverse method above, it will produce a String like this: "CB\uDD38\uD835". C and B are ok but \uDD38\uD835 is invalid, that's why you see ?? when you print it. The method should not have reversed them; the valid result would be "CB\uD835\uDD38" (CBA(double-struck A)).

Solution

Usually, not writing code to solve problems is a good idea:

static String reverse(String original) {
    return new StringBuilder(original).reverse().toString();
}


If you want to take a (small) step further, here's a Java 8+ one-liner:

Function<String, String> reverse = s -> new StringBuilder(s).reverse().toString();


If you are curious how this is implemented under the hood, check out what StringBuilder's reverse() method does (it is in theAbstractStringBuilder class).

So what broken String manipulations have you seen? Let us know in the comments below!

Download Building Reactive Microservices in Java: Asynchronous and Event-Based Application Design. Brought to you in partnership with Red Hat

Topics:
java ,programming ,strings ,unicode ,unicode characters ,utf ,utf-16 ,character

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}