Going Beyond Java 8: Text Blocks
Learn how to master text blocks to define multi-line strings in Java and improve your coding skills in this in-depth tutorial.
Join the DZone community and get the full member experience.Join For Free
String is undoubtedly the most used class in Java, and represents an exception among the classes of the standard library. In fact, its objects are always immutable, and these can be instantiated with a simplified syntax that makes us avoid the verbosity of the new operator and the call to the constructor, as is standard for almost all other classes. In addition, the memory management of these
String objects is characterized by the reuse of instances already created through an internally-managed pool of strings.
In the latest versions, other improvements are being made to this fundamental class to make its use more efficient, simpler to use, and less verbose. The compact strings introduced in Java 9 have undoubtedly made strings more performing. Then with Java 13, a new feature called text blocks has been introduced that allows us to use the
String class in a more profitable and easier way.
This feature allows strings to be defined on multiple lines using a new syntax. The formatting of multiline strings is more natural than in the past: it will no longer be necessary to use string concatenations, escape characters such as
\n, and complex management of quotes and spaces. In this way the verbosity of the code decreases, and readability and ease of writing is improved. In Java 13 and Java 14, text blocks could be used as feature preview. Starting with Java 15 they have become a standard feature of the language.
What Are They for?
For example, suppose you want to use the following HTML code, in a Java program:
Before Java 13, to format HTML code like this inside a string, we were forced to use escape characters like
\n to go to the next line:
To make everything readable you also need to concatenate multiple strings with the
Formatting the code as done in the previous example, requires a lot of attention from the programmer. For example, if we want to use an attribute in the previous HTML tag that uses the double quotes symbol like this:
Then, we have to escape the double quotes (see Stranger things About Java Characters article) to avoid syntax errors, as follows:
From version 13, we can instead use a text block equivalently, which is similar to a normal
String literal, but spans multiple lines and is delimited by sequences of three double quotes:
We can see how the readability of the HTML code has improved, and how it is now easier with a copy-paste action to import the text block content from an HTML file or copy the text block content to an HTML file. Furthermore, there is no need to use escape characters for the HTML quotes. However, there are some points to clarify, as we will see starting from the next section.
As we saw in the previous example, a text block was defined inside an opening delimiter and a closing delimiter, represented by a sequence of three double quotes
""". Actually, the situation is a little more complex, so let's clarify by defining in detail the three parts that make up a text block: the opening delimiter, the text block content, and the closing delimiter.
The opening delimiter is defined by a sequence of three double quotes, followed by zero or more spaces and a line terminator. The content of the text block starts from the first character after the line terminator. Therefore, any white spaces between the three quotation marks and the line terminator are not taken into consideration.
With the term white space, we mean the non-visible characters, identifiable invoking the static method
boolean isWhitespace(int codepoint)of the
The closing delimiter, on the other hand, is defined only by three double quotes sequence. The content of the text block ends with the character preceding the first double quotes in the sequence, of the closing delimiter.
Finally, the text block content is equivalent to an ordinary
String literal at runtime. Once compiled, a text block therefore becomes a full-fledged
String literal, and at runtime it is stored in the string pool as usual. There is no possibility that at runtime the JVM will be able to distinguish ordinary string literals from those that have been created through a text block. Remember that the content of the text block starts from the first character after the line termination, however, it is necessary to read the next sections to master text blocks.
Compiling a Text Block
There are three phases that are performed during compile-time:
- Normalization of line terminators.
- Removal of white spaces that were introduced to align the text block with the Java code.
- Interpretation of escape characters.
Before talking about normalization, however, let's make a brief but fundamental premise. The content of a text block is usually made up of several lines formatted with a certain criterion. This involves managing both horizontal and vertical formatting.
Horizontal formatting is usually supported by the use of the space character and the horizontal tab character. The latter, which is obtained by pressing the TAB key on the keyboard, and can be represented by the escape character
\t, and by the Unicode code (code point)
To support vertical formatting instead, we need the characters to go to the next line or the so-called line terminators. These, however, are no longer explicit with an escape character
\n as is usually done in a normal
String literal, but are implicitly defined within the source code simply going to the next line. But Unix-based platforms (for example Linux and MacOS systems), to go to the next line within a text file, use the Line Feed character (which we abbreviate with LF), and which can be represented in Java with the escape character
\n, and with the code point
On the other hand, Windows systems use the Carriage Return and Line Feed character sequence as line terminators. In particular, the Carriage Return (which we abbreviate with CR), can be represented in Java with the escape character
\r, and with the code point
\u000D. We can therefore say that on Windows the line terminator is the combination CRLF (i.e.,
Normalization of Line Terminators
Normalization for text blocks always transforms all line terminators into LF, regardless of the platform on which it runs. This process is essential because, in carrying a source file from one platform to another, the number of characters may change. Suppose we have two Java source files that define the same text block. Let's also assume that one of the two classes has been edited on a Linux system (where the line terminator corresponds to LF), and the other on a Windows system (where the line terminator is CRLF). A possible check using the
equals method between the two text blocks will return
false, even if to the naked eye they would seem identical. In fact, in the file edited on Windows there will be an extra character for each line (
Removing Superfluous White Spaces
After the normalization process, a text block will clearly consist of one or more lines. The algorithm for removing superfluous white spaces, (i.e., the spaces introduced to align the text block code with the Java code) includes:
- The removal of all of the white spaces that are at the end of each line.
- The removal of all of the white spaces that are at the beginning of each line, common to all lines.
As for the first point, the white spaces at the end of a line are removed by default, because they are usually useless for formatting purposes.
As for the second point, however, if all non-empty lines begin with one or more white spaces, they are all examined by the compiler, which selects the minimum number of common initial white spaces. Then, just this number of white space is removed for each row. This is because it is assumed that these white spaces have been introduced to align the text box with the Java code that defines it. For example, consider the following code:
In this case, the HTML code defined in the text box has clearly been defined with different initial white spaces for each line, just for the purpose of aligning the content of the text block (the HTML code) with its opening delimiter (see figure 1).
Even the white spaces that precede the closing delimiter of text block in the last line are removed by the compiler.
For this reason, all the white spaces that are common to each line will be removed by the compiler, and the output of the previous class will be:
If the text box closing delimiter had been found on the next line, we would have had another line in the output. For example, the following text box:
It would have printed with an extra blank trailing line due to the line terminator being moved to the next line like this:
Consider this text box:
That box would have produced the following output:
In fact, the last line would have had zero leading white spaces, and this number would have been considered by the compiler as the number of white spaces to remove for all lines.
The algorithm described in this section is implemented through the use of the static method of the
Stringclass introduced with Java 13
Interpretation of Escape Characters
Within the text block, it is also possible to use escape characters (see Stranger Things about Java Characters article on the Java char type). Technically it is also possible to use the escape characters, and
\", but it is useless and therefore not recommended. In fact,
\n is used to go to the next line within
String literals, but text blocks are multiline in nature.
We can directly use the character
" instead of the escape character
\", since the delimiter of a text block does not consist of a single character
". In practice there is no possibility to confuse the characters
" belonging to the string as delimiters of the string block itself.
There is only one case in which it is necessary to the escape the double quotes character: when the last character of the content of a text block is just
", which would then be attached to the text block closing delimiter, compromising its definition. In this case you need to use the escape character
However, there are other escape characters that can be used.
It is important that the interpretation of the escape characters takes place after the first two phases of normalization of the line terminators, and removal of superfluous white spaces. So in fact, escape characters like
\f will not be removed during the first phase, while
\b (BACKSPACE) and
\t (TAB), will certainly not be removed in the second phase.
New Escape Characters
With Java 14, two new escape characters were introduced. The first coincides with the backslash
\ symbol and allows you to ignore line terminators following that character on the same line. In fact, if we have a string that is too long for which we don't want to go to the next line, we usually use string concatenation to improve its readability. For example:
Now, we can rewrite the same string with the following text block:
\ character can only be used within the text block, and not in string literals.
The other escape character introduced with Java 14 is
\s, and unlike the escape character
\, it can now also be used in string literals. It is equivalent to the space character (identified with the code point
\u0020), but used within a text block, it prevents the removal of white spaces at the end of the line that we have described in the "Removing superfluous white spaces" section. For example, we can write a text block, where each line always consists of 4 spaces. Figure 2 shows the detail of the execution with EJE (a Java editor created by me) of a program that uses this text block, where the selection of the output highlights the spaces that have been kept at the end of the line.
Text Block Concatenation
Within text blocks, it is technically possible to concatenate text blocks with other text blocks, string literals, variables or method calls. Basically, text blocks can be used in all cases where
Notice how we used concatenation to parameterize the function name.
The output will be as follows:
But, the readability of the code is not very good! So, let's try to use a text block to improve the readability like so:
The output will be identical to the previous one, but the readability has even worsened! In fact, each text block spans on at least two lines, given the complexity of the definition of the opening delimiter.
In cases like this, it is preferable to use a single text block on which to call the
replace method of the
String class, for example, as in the following snippet:
More simply, we can use the new
formatted method of the
String class introduced with Java 13, as follows:
We can see how the
formatted method has the same functionality on text blocks that the
format method has on
String literals. In fact, the above code is equivalent to the following snippet:
Even ignoring the increased security offered by the latest versions of the JDK, there are plenty of reasons to upgrade your knowledge of Java, or at least your own Java runtime installations. My book "Java for Aliens", which inspired the " Going beyond Java 8" series, contains all the information you need to learn Java from scratch, and uses a well-tested teaching method that has been perfected over 20 years of experience, which makes learning simple and exciting. It is also structured to deepen the topics and have superior knowledge that can make a difference to your career.
This article is mainly inspired by section 13.5.5 of chapter 13 of my book Java for Aliens. You can freely download this section as PDF file from the "Samples" section of the official website at https://www.javaforaliens.com. In that sample, you can also find a brief explanation about new
String methods that supports text blocks.
Published at DZone with permission of Claudio De Sio Cesari. See the original article here.
Opinions expressed by DZone contributors are their own.