5 More Hidden Secrets in Java

If you're a Java developer, these problems have probably given you a headache at some point. Read on to learn how to deal with these 5 tricky secrets.

Justin Albano

CORE ·

Mar. 27, 18 · Tutorial

Likes (44)

Comment

Save

33.8K Views

Java is a large language with a long and storied history. Throughout more than two decades, numerous features have crept into the language, some contributing greatly to its improvement and others reeking havoc on the simplicity of it. In the latter case, many of these features have stuck around and are here to stay, if nothing other than for the sake of backward compatibility. In the previous installment in this series, we explored some strange features of the language that should probably not be used in daily practice. In this article, we will take a look at some useful, but often overlooked features of the language, as well as some interesting idiosyncrasies that can cause a serious commotion if ignored.

For each of these secrets, it is important to note that some of them, such as underscores in numerals and cached autoboxing may be useful in applications, but others, such as multiple classes in a single Java file, have been relegated to the backburner for a reason. Therefore, just because a feature is present in the language does not mean it should be used (even if it is not deprecated). Instead, judgment should be used on when, if ever, to apply these hidden features. Before delving into the good, the bad, and the ugly, we will start with a peculiarity of the language that can cause some serious bugs if ignored: Instruction reordering.

1. Instructions Can Be Reordered

Since multiprocessing chips entered the computing scene decades ago, multithreading has become an indispensable part of most non-trivial Java applications. Whether in a multithreaded Hypertext Transfer Protocol (HTTP) server or a Big Data application, threads allow for work to be executed concurrently, utilizing the processing budget provided by powerful Central Processing Units (CPUs). Although threads are an essential part of CPU utilization, they can be tricky to tame and their incorrect use may inject some unseemly and difficult-to-debug errors in an application.

For example, if we create a single-threaded application that prints the value of a variable, we can assume that the lines of code (more precisely, each instruction) that we provide in our source code are executed sequentially, starting with the first line and ending with the last line. Following this assumption, it should come as no surprise that the following snippet results in 4 being printed to standard output:

int x = 1;
int y = 3;
System.out.println(x + y);

While it may appear that a value of 1 is assigned to the variable x first and then a value of 3 is assigned to y, this may not always be the case. Upon close inspection, the ordering of the first two lines does not affect the output of the application: If y where to be assigned first and then x, the behavior of the system would not change. We would still see 4 printed to standard output. Using this observation, the compiler can safely reorder these two instructions as needed, since their reordering does not change the overall behavior of the system. When we compile our code, the compiler does just that: It is free to reorder the instructions above so long as it does not change the observable behavior of the system.

While it may appear a futile endeavor to reorder the lines above, in many cases, reordering may allow the compiler to make some very noticeable performance optimizations. For example, suppose that we have the following snippet, in which our x and y variables are incremented twice in an interleaving fashion:

int x = 1;
int y = 3;
x++;
y++;
x++;
y++;
System.out.println(x + y);

This snippet should print 8, regardless of any optimizations that the compiler performs, but leaving the instructions above in order forgoes a valid optimization. If the compiler instead reorders the increments into a non-interleaving fashion, they can be removed entirely:

// Reorded instructions
int x = 1;
int y = 3;
x++;
x++;
y++;
y++;
System.out.println(x + y);

// Condensed instructions
int x = 1;
int y = 3;
x += 2;
y += 2;
System.out.println(x + y);

// Fully-condensed instructions
int x = 3;
int y = 5;
System.out.println(x + y);

In actuality, the compiler would likely go one step further and simply inline the values of x and y in the print statement to remove the overhead of storing each value in a variable, but for the purpose of demonstration, it suffices to say that reordering the instructions allows the compiler to make some serious improvements to performance. In a single-threaded environment, this reordering has no effect on the observable behavior of the application, since the current thread is the only one that can see x and y, but in a multithreaded environment, this is not the case. Since having a coherent view of memory, which is often unneeded, requires a great deal of overhead on the part of the CPU (see cache coherence), CPUs usually forgo coherence unless instructed to do so. Likewise, the Java compiler is free to optimize code so that reordering can happen, even when multiple threads read or write the same data unless instructed otherwise.

In Java, this imposes a partial ordering of instructions, signified by the happens-before relationship, where hb(x, y) denotes that the instruction x happens-before y. In this context, happens-before does not actually mean that reordering of instructions does not occur, but rather, that the x reaches a coherent state before y (i.e. all modifications to x are performed and visible before executing y). For example, in the snippets above, the variables x and y must reach their terminal values (the result of all computations performed on x and y, respectively) prior to the execution of the print statement. In both single-threaded and multithreaded environments, all instructions within each thread are executed in a happens-before manner, thus we never experience a reordering problem when data is not published from one thread to another. When publication occurs (such as sharing data between two threads), very subliminal problems can arise.

For example, if we execute the following code (from Java Concurrency in Practice, pp. 340), it should come as no surprise to developers with experience in concurrency that thread interleaving may result in (0,1), (1,0) or (1,1) being printed to standard output; but, it is not impossible for (0,0) to also be printed due to reordering.

public class ReorderedProgram {

    static int x = 0, y = 0;
    static int a = 0, b = 0;

    public static void main(String[] args) throws InterruptedException {

        Thread one = new Thread(() ->  {
            a = 1;
            x = b;
        });

        Thread other = new Thread(() ->  {
            b = 1;
            y = a;
        });

        one.start();
        other.start();
        one.join();
        other.join();
        System.out.println("(" + x + "," + y + ")");
    }
}

Since the instructions within each thread do not have a happens-before relationship between one another, they are free to be reordered. For example, the thread one may actually execute x = b before a = 1 and likewise, the thread other may execute y = a before b = 1 (since, within the context of each thread, the order of execution for these instructions does not matter). If these two reorderings occur, the result could be (0,0). Note that this is different than interleaving, where the thread preemptions and thread-execution ordering affect the output of an application. Interleaving can only result in (0,1), (1,0), or (1,1) being printed: (0,0) is the sole result of reordering.

In order to force a happens-before relationship between the two threads, we need to impose synchronization. For example, the following snippet removes the possibility of reordering causing a result of (0,0), since it imposes a happens-before relationship between the two threads. Note, though, that (0,1) and (1,0) are the only two possible outcomes from this snippet, depending on the order in which each thread is run. For example, if thread one starts first, the result will be (0,1) but if thread other runs first the result will be (1,0).

public class ReorderedProgram {

    private int x = 0, y = 0;
    private int a = 0, b = 0;

    public synchronized void setX() {
        a = 1;
        x = b;
    }

    public synchronized void setY() {
        b = 1;
        y = a;
    }

    public synchronized int getX() { return x; }
    public synchronized int getY() { return y; }

    public static void main(String[] args) throws InterruptedException {

        ReorderedProgram program = new ReorderedProgram();

        Thread one = new Thread(program::setX);
        Thread other = new Thread(program::setY);

        one.start();
        other.start();
        one.join();
        other.join();
        System.out.println("(" + program.getX() + "," + program.getY() + ")");
    }
}

In general, there are a few explicit ways to impose a happens-before relationship, including (quoted from the java.util.concurrent package documentation):

Each action in a thread happens-before every action in that thread that comes later in the program's order.
An unlock (synchronized block or method exit) of a monitor happens-before every subsequent lock (synchronized block or method entry) of that same monitor. And because the happens-before relation is transitive, all actions of a thread prior to unlocking happen-before all actions subsequent to any thread locking that monitors.
A write to a volatile field happens-before every subsequent read of that same field. Writes and reads of volatile fields have similar memory consistency effects as entering and exiting monitors but do not entail mutual exclusion locking.
A call to start on a thread happens-before any action in the started thread.
All actions in a thread happen-before any other thread successfully returns from a join on that thread.

The happens-before partial ordering relationship is an involved topic, but suffice it to say that interleaving is not the only dilemma that can cause sneaky bugs in a concurrent program. In any case, where data or resources are shared between two or more threads, some synchronization mechanism (whether synchronized, locks, volatile, atomic variables, etc.) must be used to ensure that data is correctly shared. For more information, see section 17.4.5 of the Java Language Specification (JLS) and Java Concurrency in Practice.

2. Underscores Can Be Used in Numerals

Whether in computing or in paper-and-pencil mathematics, large numbers can be very difficult to read. For example, trying to discern that 1183548876845 is actually "1 trillion 183 billion 548 million 876 thousand 845" can be painstakingly tedious. Thankfully, English mathematics includes the comma delimiter, which allows for three digits at a time to be grouped together. For example, it is now much more obvious that 1,183,548,876,845 represents a number over one trillion (by counting the number of commas) than its previous counterpart.

Unfortunately, representing such large numbers in Java can often be a chore. For example, it is not uncommon to see such large numbers represented as constants in programs, as displayed in the following snippet that prints the number above:

public class Foo {
    public static final long LARGE_FOO = 1183548876845L;
}

System.out.println(LARGE_FOO);

While this suffices to accomplish our goal of printing our large number, it goes without saying that the constant we created is aesthetically lacking. Thankfully, since Java Development Kit (JDK) 7, Java has introduced an equivalent to the comma delimiter: the underscore. Underscores can be used in exactly the same way as commas, separating groups of numbers to increase the readability of large values. For example, we could rewrite the program above as follows:

public class Foo {
    public static final long LARGE_FOO = 1_183_548_876_845L;
}

System.out.println(LARGE_FOO);

In the same manner, as commas made our original mathematical value easier to read, we can now read large values in Java programs much easier. Underscore can also be used in floating-point values as well, as demonstrated by the following constant:

public static final double LARGE_BAR = 1.546_674_876;

It should also be noted that underscores can be placed at any point within a number (not just to separate groups of three digits) so long as it is not a prefix, a suffix, adjacent to the decimal point in a floating-point value, or adjacent to the x in a hexadecimal value. For example, all of the following are invalid numbers in Java:

3._1415
_3.1415
3.1415_
45787_l
_45787l
0x_1234
0_x1234

While this technique for separating numbers should not be overused—ideally, it should be used only in the same manner as a comma in English mathematics or to separate groups of three digitals after the decimal place in a floating-point value—it can help discern previously unreadable numbers. For more information on underscores in numbers, see the Underscore in Numeric Literals documentation by Oracle.

3. Autoboxed Integers Are Cached

Due to the inability of primitive values to be used as object references and as formal generic type parameters, Java introduced the concept of boxed counterparts for primitive values. These boxed values essential wrap primitive values—creating objects out of primitives—allowing them to be used as object references and formal generic types. For example, we can box an integer value in the following manner:

Integer myInt = 500;

In actuality, the primitive int 500 is converted to an object of type Integer and stored in myInt. This processing is called autoboxing since a conversion is automatically performed to transform a primitive integer value of 500 into an object of type Integer. In practice, this conversion amounts to the following (see Autoboxing and Unboxing for more information on the equivalence of the original snippet and the snippet below):

Integer myInt = Integer.valueOf(500);

Since myInt is an object of type Integer, we would expect that comparing its equality to another Integer object containing 500 using the == operator should result in false, since the two objects are not the same object (as is the standard meaning of the == operator); but calling equals on the two objects should result in true, since the two Integer objects are value objects that represent the same integer (namely, 500):

Integer myInt = 500;
Integer otherInt = 500;

System.out.println(myInt == otherInt);         // false
System.out.println(myInt.equals(otherInt));    // true

At this point, autoboxing operates exactly how we would expect any value object to act. What happens if we try this with a smaller number? For example, 25:

Integer myInt = 25;
Integer otherInt = 25;

System.out.println(myInt == otherInt);         // true
System.out.println(myInt.equals(otherInt));    // true

Surprisingly, if we try this with 25, we see that the two Integer objects are equal, both by identity (==) and by value. This means that the two Integer objects are actually the same object. This strange behavior is actually not an oversight or bug, but rather a conscious decision, as described in section 5.1.7. of the JLS. Since many of the autoboxing operations are performed on small numbers (under 127), the JLS specifies that Integer values between -128 and 127, inclusive, are cached. This is reflected in the JDK 9 source code for Integer.valueOf, which includes this caching:

public static Integer valueOf(int i) {
    if (i >= IntegerCache.low && i <= IntegerCache.high)
        return IntegerCache.cache[i + (-IntegerCache.low)];
    return new Integer(i);
}

If we inspect the source code for IntegerCache, we glean some very interesting information:

private static class IntegerCache {
    static final int low = -128;
    static final int high;
    static final Integer cache[];

    static {
        // high value may be configured by property
        int h = 127;
        String integerCacheHighPropValue =
            VM.getSavedProperty("java.lang.Integer.IntegerCache.high");
        if (integerCacheHighPropValue != null) {
            try {
                int i = parseInt(integerCacheHighPropValue);
                i = Math.max(i, 127);
                // Maximum array size is Integer.MAX_VALUE
                h = Math.min(i, Integer.MAX_VALUE - (-low) -1);
            } catch( NumberFormatException nfe) {
                // If the property cannot be parsed into an int, ignore it.
            }
        }
        high = h;

        cache = new Integer[(high - low) + 1];
        int j = low;
        for(int k = 0; k < cache.length; k++)
            cache[k] = new Integer(j++);

        // range [-128, 127] must be interned (JLS7 5.1.7)
        assert IntegerCache.high >= 127;
    }

    private IntegerCache() {}
}

Although this code may look complex, it is actually quite simple. The inclusive lower bound for the cached Integer value is always set to -128, as per section 5.1.7 of the JLS, but the inclusive upper bound is configurable by the Java Virtual Machine (JVM). By default, the upper bound is set to 127 (as per JLS 5.1.7), but it can be configured to be any number greater than 127 or less than the maximum integer value. Theoretically, we can set the upper bound to 500.

In practice, we can accomplish this using the java.lang.Integer.IntegerCache.high VM property. For example, if we rerun the original program based on the autoboxing of 500 with -Djava.lang.Integer.IntegerCache.high=500, the program behavior changes:

Integer myInt = 500;
Integer otherInt = 500;

System.out.println(myInt == otherInt);         // true
System.out.println(myInt.equals(otherInt));    // true

In general, unless there is a serious performance need, the VM should not be tuned to a higher Integer cache value. Additionally, the boxed forms of boolean, char, short, and long (namely, Boolean, Char, Short, and Long) are also cached, but generally, do not have VM settings to alter their cache upper bounds. Instead, these bounds are commonly fixed. For example, Short.valueOf is defined as follows in JDK 9:

public static Short valueOf(short s) {
    final int offset = 128;
    int sAsInt = s;
    if (sAsInt >= -128 && sAsInt <= 127) { // must cache
        return ShortCache.cache[sAsInt + offset];
    }
    return new Short(s);
}

For more information on autoboxing and cached conversations, see section 5.1.7 of the JLS and Why does 128==128 return false but 127==127 return true when converting to Integer wrappers?

4. Java Files Can Contain Multiple Non-Nested Classes

It is a commonly accepted rule that a .java file must contain only one non-nested class and the name of that class must match the name of the file. For example, Foo.java must contain only a non-nested class named Foo. While this is an important practice and an accepted convention, it is not entirely true. In particular, it is actually left as an implementation decision for a Java compiler whether or not to enforce the restriction that the public class of a file must match the file name. According to section 7.6 of the JLS:

If and only if packages are stored in a file system (§7.2), the host system may choose to enforce the restriction that it is a compile-time error if a type is not found in a file under a name composed of the type name plus an extension (such as .java or .jav) if either of the following is true:

The type is referred to by code in other ordinary compilation units of the package in which the type is declared.

The type is declared public (and therefore is potentially accessible from code in other packages).

In practice, most compiler implementations enforce this restriction, but there is also a qualification in this definition: the type is declared public. Therefore, by strict definition, this allows Java files to contain more than a single class, so long as at most one class is public. Stated differently, practically, all Java compilers enforce the restriction that the top-level public class must match the name of the file (disregarding the .java extension), which restricts a Java file from having more than one public top-level class (since only one of those classes could match the name of the file). Since this statement is qualified for public classes only, multiple classes can be placed in a Java source code file, so long as at most one class is public.

For example, the following file (named Foo.java) is valid even though it contains more than one class (but only one class is public and the public class matches the name of the file):

public class Foo {

    private final Bar bar;

    public Foo(int x) {
        this.bar = new Bar(x);
    }

    public int getBarX() {
        return bar.getX();
    }

    public static void main(String[] args) {
        Foo foo = new Foo(10);
        System.out.println(foo.getBarX());
    }
}

class Bar {

    private final int x;

    public Bar(int x) {
        this.x = x;
    }

    public int getX() {
        return x;
    }
}

If we execute this file, we see the value 10 printed to standard output, demonstrating that we can instantiate and interact with the Bar class (the second but non-public class in our Foo.java file) just as we would any other class. We can also interact with the Bar class from another Java file (Baz.java) so long as it is contained in the same package since the Bar class is package-private. Thus, the following Baz.java file prints 20 to standard output:

public class Baz {

    private Bar bar;

    public Baz(int x) {
        this.bar = new Bar(x);
    }

    public int getBarX() {
        return bar.getX();
    }

    public static void main(String[] args) {
        Baz baz = new Baz(20);
        System.out.println(baz.getBarX());
    }
}

Although it is possible to have more than one class in a Java file, it is not a good idea. It is a common convention to have only a single class in each Java file and breaking this convention can cause some difficulties and frustration for other developers reading the file. If more than one class is needed in a file, nested classes should be used. For example, we could easily reduce the Foo.java file to a single top-level class by nesting the Bar class (static nesting is used since the Bar class does not depend on a particular instance of Foo):

public class Foo {

    /* ... */

    public static class Bar { /* ... */ }
}

In general, multiple, non-nested classes in a single Java file should be avoided. For more information, see Can a Java file have more than one class?

5. StringBuilder Is Used for String Concatenation

String concatenation is a common part of nearly all programming languages, allowing multiple strings (or objects and primitives of dissimilar types) to be combined into a single string. One caveat that complicates string concatenation in Java is the immutability of Strings. This means that we cannot simply create a single String instance and continuously append to it. Instead, each append produces a new String object. For example, if we look at the concat(String) method of String, we see that a new String instance is produced:

public String concat(String str) {
    int olen = str.length();
    // ...
    int len = length();
    byte[] buf = StringUTF16.newBytesFor(len + olen);
    // ...
    return new String(buf, UTF16);
}

If such a technique were used for string concatenation, numerous intermediary String instances would be produced. For example, the following two lines would be functionally equivalent—with the second line producing two String instances just to perform the concatenation:

String myString = "Hello ," + "world!";
String myConcatString = new String("Hello, ").concat("world!");

While this may seem like a small price to pay for string concatenation, this technique becomes untenable when used on a larger scale. For example, the following would wastefully create 1,000 String instances that will never be used (1,001 total String instances are created and only the last one is used):

String myString = "";

for (int i = 0; i < 1000; i++) {
    myString += "a";
}

In order to reduce the number of wasted String instances created for concatenation, section 15.18.1 of the JLS provides a keen suggestion on how to implement string concatenation:

An implementation may choose to perform conversion and concatenation in one step to avoid creating and then discarding an intermediate String object. To increase the performance of repeated string concatenation, a Java compiler may use the StringBuffer class or a similar technique to reduce the number of intermediate String objects that are created by evaluation of an expression.

Instead of creating intermediary String instances, a StringBuilder is used as a buffer, allowing countless String values to be appended until the resultant String is needed. This ensures that only one String instance is created: the result. Additionally, a StringBuilder instance is created, but this combined overhead is much less than that of the individual String instances created otherwise. For example, the looped concatenation that we wrote above is semantically equivalent to the following:

StringBuilder builder = new StringBuilder();

for (int i = 0; i < 1000; i++) {
    builder.append("a"); 
}

String myString = builder.toString();

Instead of creating 1,000 wasted String instances, only one StringBuilder and one String instance is created. Although we could have manually written the snippet above instead of using concatenation (through the += operator), both are equivalent and there is no performance gain using one over the other. In many cases, using string concatenation, rather than an explicit StringBuilder, is syntactically easier, and thus preferred. Regardless of the selection made, it is important to know that the Java compiler will attempt to make performance gains where possible; thus, trying to prematurely optimize code (such as by using explicit StringBuilder instances) may provide little to no gain at the expense of readability.

Conclusion

Java is a large language with a storied history. Over the years, there have been countless great additions to the language and possibly just as many faulty additions as well. The combination of these two factors has led to some pretty peculiar features of the language: some good and some bad. Some of these facets, such as underscores in numerals, cached autoboxing, and string concatenation optimizations are important to any Java developer to understand, while features such as the multiplicity of classes in a single Java file have been relegated to the deprecated shelf. Others, such as the reorder of unsynchronized instructions can lead to some seriously tedious debugging if not handled correctly. In all, over the more than two decades of Java's lifespan, a large number of features, some pretty and some ugly, have crept into the shadows, but the intricacies of these hidden features should be well understood and may even come in handy in a pinch.

Java (programming language) File system Threading Data Types Strings application Big data Java compiler Virtual Machine

Opinions expressed by DZone contributors are their own.

Related

Trending