To avoid Java/Tomcat unicode issues after moving to a new environment you need to verify locale settings, especially LC_ALL.
After migrating a complete Tomcat based site as cPanel tarball to another host we lost ability to download files containing Unicode characters in their names. These were mostly static resources - images.
Any try to access a file resulted in 404 Not found error and related entry in localhost_access_log
:
GET /images/396596_%E5%BC%A0%E6%97%A5%E6%B4%B2%E5%8C%97%E9%A9%AC.jpg HTTP/1.1" 404 1174
Access to a file not containing UTF-8 charactes in its filename in the same directory was successful.
As everything was working on the old host and Tomcat was just copied (and not freshly setup) it was not the common issue of missing connector attribute URIEncoding="UTF-8"
in Tomcat's server.xml
that gives similar effects. Finally, we went down to comparing all (sorted) Java system properties (diff -Nu oldhost newhost
) and discovered file.encoding
and sun.jnu.encoding
mismatch. As these properties are set based on LC_ALL
variable from system environment, we checked locale on the problematic account:
user@tomcat [~]# locale
LANG=
LC_CTYPE="POSIX"
...
LC_IDENTIFICATION="POSIX"
LC_ALL=
Well, that's not what we expected. And here is a quick JSP test (list.jsp
- see code below) that shows all files in ROOT/images
directory and gives links so that we could quickly test accessibility. It also displays file.encoding
and sun.jnu.encoding
. The bad values were shown here.
file.encoding=ANSI_X3.4-1968
sun.jnu.encoding=ANSI_X3.4-1968
sun.io.unicode.encoding=UnicodeLittle
file.encoding.pkg=sun.io
396596_.jpg /home/tomcat/tomcat/webapps/ROOT/images/396596_.jpg --> exists? false
Acessing the image link results in Tomcat's message below. Here you can see a series of EF-BD-EF
.
HTTP Status 404 - /396596_%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD.jpg
type Status report
description The requested resource (/396596_%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD.jpg) is not available.
Appending -Dsun.jnu.encoding=UTF-8 -Dfile.encoding=UTF-8
to JAVA_OPTS
does not help. You can also verify it with java ShowSystemProperties | grep encoding
(code below). The working solution is to add export LC_ALL="en_US.UTF-8"
to environment (e.g. in ~/.bashrc
), relogin or reread environment, check locale output and restart Tomcat. Setting LANG
instead of LC_ALL
seems to work fine too. Now list.jsp
reports:
file.encoding=UTF-8
sun.jnu.encoding=UTF-8
sun.io.unicode.encoding=UnicodeLittle
file.encoding.pkg=sun.io
396596_.jpg home/tomcat/tomcat/webapps/ROOT/images/396596_.jpg --> exists? true
and the image displays correctly when link clicked.
Contents of list.jsp
:
<%@page import="java.io.*" %>
<%@page contentType="text/html;charset=UTF-8"%>
<% ServletContext servletContext = getServletContext();
String contextPath = servletContext.getRealPath(File.separator);
contextPath = contextPath + "/images";
out.println("file.encoding=" + System.getProperty("file.encoding") + "</br>");
out.println("sun.jnu.encoding=" + System.getProperty("sun.jnu.encoding") + "</br>");
out.println("sun.io.unicode.encoding=" + System.getProperty("sun.io.unicode.encoding") + "</br>");
out.println("file.encoding.pkg=" + System.getProperty("file.encoding.pkg") + "</br>");
File f = new File(contextPath);
String[] children = f.list();
if (children != null) {
for (int i=0; i<children.length; i++) {
String filename = children[i];
out.print("<a href='" +filename + "'>" + filename + "</a> ");
File cf = new File(contextPath + "/" + filename);
out.println(cf.getAbsolutePath() + " --> <b>exists?</b> " + cf.exists() + "</br>");
}
} %>
Contents of ShowSystemProperties.java
:
import java.util.Properties;
public class ShowSystemProperties {
public static void main(String args[]) {
// Get all system properties
Properties props = System.getProperties();
//Properties props = System.getProperties();
props.list(System.out);
}
}