Java JEP 400 Explained: Why UTF-8 Became the Default Charset
JEP 400 standardizes UTF-8 as Java’s default charset from JDK 18 onward, ensuring consistent file encoding across platforms and fewer cross-OS bugs.
Join the DZone community and get the full member experience.
Join For FreeA JDK Enhancement Proposal (JEP) is a formal process used to propose and document improvements to the Java Development Kit. It ensures that enhancements are thoughtfully planned, reviewed, and integrated to keep the JDK modern, consistent, and sustainable over time. Since its inception, many JEPs have introduced significant language and runtime features that shape the evolution of Java. One such important proposal, JEP 400, introduced in JDK 18 in 2022, standardizes UTF-8 as the default charset, addressing long-standing issues with platform-dependent encoding and improving Java’s cross-platform reliability.
Traditionally, Java’s I/O API, introduced in JDK 1.1, includes classes like FileReader and FileWriter that read and write text files. These classes rely on a Charset to correctly interpret byte data. When a charset is explicitly passed to the constructor, like in:
public FileReader(File file, Charset charset) throws IOException
public FileWriter(String fileName, Charset charset) throws IOException
the API uses that charset for file operations. However, these classes also provide constructors that don’t take a charset:
public FileReader(String fileName) throws IOException
public FileWriter(String filename) throws IOException
In these cases, Java defaults to the platform’s character set. As per the JDK 17 documentation:
"The default charset is determined during virtual-machine startup and typically depends upon the locale and charset of the underlying operating system."
This behavior can lead to bugs when files are written and read using different character sets—especially across environments.
To address this inconsistency, JEP 400 proposed using UTF-8 as the default charset when none is explicitly provided. This change makes Java applications more predictable and less error-prone, especially in cross-platform environments.
As noted in the JDK 18 API:
"The default charset is UTF-8, unless changed in an implementation-specific manner."
Importantly, this update doesn’t remove the ability to specify a charset. Developers can still set it via constructors or the JVM flag -Dfile.encoding.
Lets see the problem under discussion using an example:
package com.jep400;
import java.io.FileWriter;
import java.io.IOException;
import java.nio.charset.Charset;
public class WritesFiles {
public static void main(String[] args) {
System.out.println("Current Encoding: " + Charset.defaultCharset().displayName());
writeFile();
}
private static void writeFile() {
try (FileWriter fw = new FileWriter("fw.txt")){
fw.write("résumé");
System.out.println("Completed file writing.");
} catch (IOException e) {
e.printStackTrace();
}
}
}
In the method writeFile, we used a FileWriter constructor that does not take a character set as a parameter.
As a result, the JDK falls back on the default character set, which is either specified via the -Dfile.encoding JVM argument or derived from the platform’s locale.
The program writes a file containing some text. To simulate a character set mismatch, we run the program with a specific encoding:
Java -Dfile.encoding=ISO-8859-1 com.jep400.WritesFiles
Here, we’re explicitly setting the character set to ISO-8859-1 to mimic running the program on a system where the default charset is ISO-8859-1 and no charset is passed programmatically.
When executed, the program produces the following output:
Output:
Current Encoding: ISO-8859-1
Completed file writing.
Consider the following file that reads the same file but with different encoding
After the above program completes, it creates a file named fw.txt.
Next, let’s look at a program that reads the fw.txt file created by the previous program.
import java.io.FileReader;
import java.io.IOException;
import java.nio.charset.Charset;
public class ReadsFiles {
public static void main(String[] args) {
System.out.println("Current Encoding: " + Charset.defaultCharset().displayName());
readFile();
}
private static void readFile() {
try(FileReader fr = new FileReader("fw.txt")) {
int character;
while ((character = fr.read()) != -1) {
System.out.print((char) character);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
In the readFile method, we use a FileReader constructor that does not specify a character set.
To simulate running the program on a platform with a different default character set, we pass a VM argument:
java -Dfile.encoding=UTF-8 com.jep400.ReadsFiles
The following output will be displayed when running this command:
Current Encoding: UTF-8
r�sum�
The output shows text that does not match what the first program wrote.
This highlights the problem of not explicitly specifying the character set when reading and writing files, instead relying on the platform’s default character set.
This mismatch can cause the same incorrect output in the following scenarios:
- When the programs run on different machines with different default character sets.
- When upgrading to JDK 18 or later, which changes the default charset behavior.
Now, let’s see how the output looks when running the same programs in a JDK 18+ environment.
When running the first program, this output is observed:
Current Encoding: UTF-8
Completed file writing.
When the second program is run, the output appears as follows:
Current Encoding: UTF-8
résumé
We can see that the data is written and read using the standard UTF-8 character set, effectively resolving the character-set issues encountered earlier.
Conclusion
Since its introduction in JDK 18, JEP 400’s adoption of UTF-8 as the default charset has become a foundational improvement for Java applications worldwide. By standardizing on UTF-8, it effectively eliminates many charset-related issues that developers faced when running code across different platforms. While not a new change, its continued impact ensures better consistency and fewer bugs in modern Java projects. Developers should still specify charsets explicitly when necessary, but relying on UTF-8 as the default enhances cross-platform compatibility and helps future-proof applications as the Java ecosystem rapidly evolves. While not always required, aligning with this default supports consistency across diverse environments.
Opinions expressed by DZone contributors are their own.
Comments