DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Java 21 Record and Pattern Matching: Master Data-Oriented Programming[Video]
  • Alternative Structured Concurrency
  • Optimizing Java Applications for Arm64 in the Cloud
  • JDK 17 Memory Bloat in Containers: A Post-Mortem

Trending

  • Stateless JWT Auth Microservice Architecture With Spring Boot 3 and Redis Sentinel
  • Introduction to Retrieval Augmented Generation (RAG)
  • Run Gemma 4 on Your Laptop: A Hands-On Guide to Google's Latest Open Multimodal LLM
  • Design Patterns for GenAI Creative Systems in Advertising
  1. DZone
  2. Coding
  3. Java
  4. Java JEP 400 Explained: Why UTF-8 Became the Default Charset

Java JEP 400 Explained: Why UTF-8 Became the Default Charset

JEP 400 standardizes UTF-8 as Java’s default charset from JDK 18 onward, ensuring consistent file encoding across platforms and fewer cross-OS bugs.

By 
Ramana Singaperumal user avatar
Ramana Singaperumal
·
Aug. 15, 25 · Analysis
Likes (5)
Comment
Save
Tweet
Share
3.4K Views

Join the DZone community and get the full member experience.

Join For Free

A JDK Enhancement Proposal (JEP) is a formal process used to propose and document improvements to the Java Development Kit. It ensures that enhancements are thoughtfully planned, reviewed, and integrated to keep the JDK modern, consistent, and sustainable over time. Since its inception, many JEPs have introduced significant language and runtime features that shape the evolution of Java. One such important proposal, JEP 400, introduced in JDK 18 in 2022, standardizes UTF-8 as the default charset, addressing long-standing issues with platform-dependent encoding and improving Java’s cross-platform reliability.

Traditionally, Java’s I/O API, introduced in JDK 1.1, includes classes like FileReader and FileWriter that read and write text files. These classes rely on a Charset to correctly interpret byte data. When a charset is explicitly passed to the constructor, like in:

Java
 
public FileReader(File file, Charset charset) throws IOException  
public FileWriter(String fileName, Charset charset) throws IOException


the API uses that charset for file operations. However, these classes also provide constructors that don’t take a charset:

Java
 
public FileReader(String fileName) throws IOException  
public FileWriter(String filename) throws IOException


In these cases, Java defaults to the platform’s character set. As per the JDK 17 documentation:

"The default charset is determined during virtual-machine startup and typically depends upon the locale and charset of the underlying operating system."

This behavior can lead to bugs when files are written and read using different character sets—especially across environments.

To address this inconsistency, JEP 400 proposed using UTF-8 as the default charset when none is explicitly provided. This change makes Java applications more predictable and less error-prone, especially in cross-platform environments.

As noted in the JDK 18 API:

"The default charset is UTF-8, unless changed in an implementation-specific manner."

Importantly, this update doesn’t remove the ability to specify a charset. Developers can still set it via constructors or the JVM flag -Dfile.encoding.

Lets see the problem under discussion using an example:

Java
 
package com.jep400;

import java.io.FileWriter;
import java.io.IOException;
import java.nio.charset.Charset;

public class WritesFiles {

    public static void main(String[] args) {
        System.out.println("Current Encoding: " + Charset.defaultCharset().displayName());
        writeFile();
    }

    private static void writeFile() {
        try (FileWriter fw = new FileWriter("fw.txt")){
            fw.write("résumé");
            System.out.println("Completed file writing.");
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

}


In the method writeFile, we used a FileWriter constructor that does not take a character set as a parameter.

As a result, the JDK falls back on the default character set, which is either specified via the -Dfile.encoding JVM argument or derived from the platform’s locale.

The program writes a file containing some text. To simulate a character set mismatch, we run the program with a specific encoding:

Java  -Dfile.encoding=ISO-8859-1 com.jep400.WritesFiles 

Here, we’re explicitly setting the character set to ISO-8859-1 to mimic running the program on a system where the default charset is ISO-8859-1 and no charset is passed programmatically.

When executed, the program produces the following output:

Java
 
Output:
Current Encoding: ISO-8859-1
Completed file writing.
Consider the following file that reads the same file but with different encoding


After the above program completes, it creates a file named fw.txt.

Next, let’s look at a program that reads the fw.txt file created by the previous program.

Java
 
import java.io.FileReader;
import java.io.IOException;
import java.nio.charset.Charset;

public class ReadsFiles {

    public static void main(String[] args) {
        System.out.println("Current Encoding: " + Charset.defaultCharset().displayName());
        readFile();
    }

    private static void readFile() {
        try(FileReader fr = new FileReader("fw.txt")) {
            int character;
            while ((character = fr.read()) != -1) {
                System.out.print((char) character);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}


In the readFile method, we use a FileReader constructor that does not specify a character set.

To simulate running the program on a platform with a different default character set, we pass a VM argument:

java  -Dfile.encoding=UTF-8 com.jep400.ReadsFiles 

The following output will be displayed when running this command:

Java
 
Current Encoding: UTF-8
r�sum�


The output shows text that does not match what the first program wrote.

This highlights the problem of not explicitly specifying the character set when reading and writing files, instead relying on the platform’s default character set.

This mismatch can cause the same incorrect output in the following scenarios:

  1. When the programs run on different machines with different default character sets.
  2. When upgrading to JDK 18 or later, which changes the default charset behavior.

Now, let’s see how the output looks when running the same programs in a JDK 18+ environment.

When running the first program, this output is observed:

Java
 
Current Encoding: UTF-8

Completed file writing.


When the second program is run, the output appears as follows:

Java
 
Current Encoding: UTF-8

résumé


We can see that the data is written and read using the standard UTF-8 character set, effectively resolving the character-set issues encountered earlier.

Conclusion

Since its introduction in JDK 18, JEP 400’s adoption of UTF-8 as the default charset has become a foundational improvement for Java applications worldwide. By standardizing on UTF-8, it effectively eliminates many charset-related issues that developers faced when running code across different platforms. While not a new change, its continued impact ensures better consistency and fewer bugs in modern Java projects. Developers should still specify charsets explicitly when necessary, but relying on UTF-8 as the default enhances cross-platform compatibility and helps future-proof applications as the Java ecosystem rapidly evolves. While not always required, aligning with this default supports consistency across diverse environments.

JDK Enhancement Proposal Java Development Kit UTF-8

Opinions expressed by DZone contributors are their own.

Related

  • Java 21 Record and Pattern Matching: Master Data-Oriented Programming[Video]
  • Alternative Structured Concurrency
  • Optimizing Java Applications for Arm64 in the Cloud
  • JDK 17 Memory Bloat in Containers: A Post-Mortem

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook