Undocumented Java 16 Feature: The End-of-File Comment
A Unicode escape (\u001a) in Java 16 acts like an end-of-file comment, silently cutting off compilation—an undocumented quirk with real implications.
Join the DZone community and get the full member experience.
Join For FreeWhile working on some code where I wanted to obscure parts of it using Unicode escapes instead of the actual source, I accidentally stumbled upon an undocumented feature that’s been around since Java 16: what I call the end-of-file comment.
In Java, we typically have three types of comments:
- Single-line comment: starts with
//and runs to the end of the line. - Block comment: starts with
/*and ends with*/. It can span multiple lines. - Documentation comment: starts with
/**and ends with*/. This is a special kind of block comment used by thejavadoctool to generate documentation for classes, methods, fields, constructors, and so on. It must appear right before the element it describes.
End-of-File Comment
The end-of-file comment makes use of the end-of-file character, which causes everything after it in the file to be ignored by the compiler.
The Unicode escape for the end-of-file character is \u001a.
Consider the following sample Hello.java file:
class HelloWorld {
public static void main(String[] args) {
System.out.printf("Hello world\n");
}
}
interfface HelloInterface {
public static void main(String[] args) {
System.out.println("Hello from interface");
}
}
enum HelloEnum {
HELLO,
HI,
;
public static void main(String[] args) {
System.out.println("Hello from enum");
}
}
record HelloRecord() {
public static void main(String[] args) {
System.out.println("Hello from record");
}
}
@interface HelloAnnotation {
}
In the above Hello.java file, if we want to comment out the last two definitions (i.e., HelloRecord and HelloAnnotation), then we can use the end-of-file character before the HelloRecord definition, like so:
class HelloWorld {
public static void main(String[] args) {
System.out.printf("Hello world\n");
}
}
interfface HelloInterface {
public static void main(String[] args) {
System.out.println("Hello from interface");
}
}
enum HelloEnum {
HELLO,
HI,
;
public static void main(String[] args) {
System.out.println("Hello from enum");
}
}
\u001a The rest of the file content get commented
record HelloRecord() {
public static void main(String[] args) {
System.out.println("Hello from record");
}
}
@interface HelloAnnotation {
}
This end-of-file character works as the start of a comment up to the end of the file, starting from Java 16.
Prior to Java 16, this character could only be used as the last character in a Java file—nothing was accepted beyond it.
In other words, in the above code, you’d get a compilation error after the usage of the end-of-file character if you were using any Java compiler prior to Java 16.
About Keywords and Identifiers
A few interesting observations about Java keywords:
- Prior to Java 9, all Java keywords were restricted identifiers (i.e., they follow the rules of identifiers but are restricted from being used as identifiers).
- From Java 9, with the introduction of module definitions in
module-info.java, Java restricted usage of some additional identifiers in specific contexts—e.g.,modulein a module definition. Java preferred to call these restricted keywords. - Up to Java 15, more identifiers were restricted in specific contexts. Java referred to these as restricted identifiers, though technically all keywords are also restricted identifiers (they’re restricted in all contexts).
Then an interesting thing happened in Java 16.
non-sealed: The Non-Identifier Keyword in Java 16
In Java 16, keywords were categorized into reserved keywords and contextual keywords.
- Reserved keywords are restricted from being used as identifiers everywhere.
- Contextual keywords are restricted in only certain contexts.
One notable addition in Java 16 was the contextual keyword: non-sealed. This isn’t a restricted identifier like other keywords; instead, it behaves more like an expression.
Processing Identifier-Ignorable Characters
Another thing to note: when processing identifier-ignorable characters, the compiler treats all keywords as identifiers, and non-sealed is treated as two identifiers.
i.e., identifier-ignorable characters are allowed inside keywords.
For example:
instance\u00adof
(\u00ad is the Unicode escape for the soft hyphen character, which is one of the identifier-ignorable characters). This is equivalent to writing instanceof.
The identifier-ignorable characters are discussed and listed in the article Charsets and Unicode Identifiers in Java.
All these characters are valid as java-identifier-part, but not as java-identifier-start.
This can be checked with the following code:
IntStream.range(0, 0x10ffff)
.filter(Character::isIdentifierIgnorable)
.allMatch(Character::isJavaIdentifierPart); // returns true
IntStream.range(0, 0x10ffff)
.filter(Character::isIdentifierIgnorable)
.anyMatch(Character::isJavaIdentifierStart); // returns false
So, in the case of non-sealed, an identifier-ignorable character is not allowed in two places:
- At the beginning
- At the beginning of
sealedinnon-sealed
Acceptable:
no\u00adn-sealed
Not acceptable:
\u00adnon-sealed
non-\u00adsealed
It seems this undocumented behavior, the end-of-file comment, may have been unintentionally introduced while Java was addressing the contextual keyword non-sealed, which is the only keyword that isn’t a restricted identifier.
Conclusion
Starting with Java 16, the Unicode end-of-file character (\u001a) began behaving in a way that causes the compiler to ignore the rest of the file after its occurrence. This isn't technically a comment, but it acts similarly by truncating further compilation. The change appears to be a side effect of internal updates to Java’s parsing and keyword handling, particularly related to contextual keywords like non-sealed. While useful in niche cases, this behavior remains undocumented and isn't compatible with earlier Java versions.
Because it allows developers to bypass compilation for any content following the character, it can affect code clarity, debugging, and various developer tools. Java maintainers may eventually need to either formalize this behavior by updating the Java Language Specification (JLS) or clarify its unintended status and consider restricting or deprecating it in future versions to avoid confusion or accidental misuse.
Opinions expressed by DZone contributors are their own.
Comments