New Approach to Encoding in IntelliJ IDEA
EAP of IntelliJ IDEA version 8.0 came with numerous new features, and the new approach to encoding is one of them.
Formerly, encoding used to be a global setting that applied to all projects. Since EAP of the version 8.0, IntelliJ IDEA recognizes encoding of a file by its content, whether a file contains BOM, an explicit encoding declaration, or UTF-encoded characters, and suggests individual approach to encoding on the project level.
As a seasoned user of IntelliJ IDEA, you certainly have projects, which you would like to reuse with the new version of the product. While in the previous versions of IntelliJ IDEA the default encoding was used to open files that didn’t contain encoding information, now the files of a legacy project get another encoding which may be quite different from the old one. This is why thorough configuration of encoding becomes so important. Let’s try to deal with encoding hands on.
First, let’s define the default encoding of a project in the File/Directory Options dialog (File | Settings - Project Settings). This default encoding is a character set that will be used to create new files, unless their templates contain embedded encoding information. For example, we’ll select windows-1251 as the default project encoding:
Moreover, individual encoding can be defined for each directory or even a separate file that doesn’t contain encoding information within. Note that embedded encoding information overrides the project or directory settings.
Now let’s see how IntelliJ IDEA treats encoding of the different files, and how you can modify it to best suit your purposes. In our sample project we’ll create a Java class, a plain text file, HTML. XTML and JSP files: myClass.java, myTextFile.txt, myHtmlFile.html, myXhtmlFile.xhtml, and myJsp.jsp. All these files are created by templates, and the further behavior depends on whether encoding is defined in the file template or not.
Besides files of these types, there can be the files where encoding is defined by BOM. It is rather unlikely that you will produce such files in IntelliJ IDEA, but they can come from third-party tools, or be a result of some operations with encoding in the file system, and appear to be a part of your project. In our example, this is the file myTextFileWithBOM.txt.
When you open these files for editing, encoding of each one is reflected in the Status bar, and you can see that sometimes the modification of encoding is allowed, and sometimes it is prohibited. Let us study our examples in details:
- myClass.java, myHtmlFile.html, and
myTextFile.txt have encoding windows-1251,
which is the project default, and is editable both in the Status bar, and
in the File/Directory Options
Templates of the Java, HTML and text files do not contain embedded encoding information, and thus the files created by these templates accept the project or directory defaults, which in our case is windows-1251. If for some reason you are not happy with this charset, just click it in the Status bar, and select the desired one:
- myXhtmlFile.xhtml and myJsp.jsp have
encoding UTF-8, and it is not editable neither in the Status
bar nor in the File/Directory Options dialog! What does it mean?
XHTML and JSP files contain explicit encoding declaration that overrides encoding settings defined on the directory or project levels.
In this case, the only way to change encoding lies with the editor. Delete encoding in the source file, press Ctrl+Space, and enjoy the powerful IntelliJ IDEA’s code completion:
- What happens, when a file contains BOM? You cannot change encoding of such file neither in the Status bar or Settings dialog, nor in the editor:
However, if such file is modified, you can save it in a different encoding. To do that, just open this file for editing and start typing. Encoding icon in the Status bar becomes enabled, and you can easily select the desired charset: