tomcat字符集与中文乱码的解决

如果你搞Java web，相信你一定遇到过乱码问题！

通常，你是否是这样处理中文传参的呢？

前台：

url=encodeURI(url);

后台：

String name = new String(request.getParameter("name").getBytes("ISO-8859-1"),"UTF-8");

页面：一通UTF-8或GB2312设置编码。

tomcat：统一UTF-8.

然后部署测试发现，乱码啊，你怎么还在纠缠着我！我又不是唐僧，身上更没肉！

乱码问题，是个臭虫！很臭的臭虫！我也遇到过多次，每次都是很纠结，每次都是经过一番转换，然后信誓旦旦地告诉自己和同事，这个乱码我给解决了！有一天，在邮箱里发现测试部提交了一个BUG，说日志里怎么全部是乱码~~~这时候同事们都朝我望过来，我只能满脸黑线|||

最近闲暇下来，看了一些文章，也总结了一些，这里还是比较推崇下面要贴的内容（因为主体还是别人的内容，暂且标为转帖更确切！）。我不清楚乱码问题是否已经根除了，但是我知道现在2轮测试的过程中，再也没有了乱码的踪影，也许她真得消失了？！！我不知道。

好吧，进入内容吧：

=========================================咯咯================================

使用 tomcat 时，相信大家都回遇到中文乱码的问题，具体表现为
1）通过表单取得的中文数据为乱码
2）页面提交中文数据，服务器端接收为乱码

一、初级解决方法
通过一番检索后，许多人采用了如下办法，首先对取得字符串按照 iso8859-1 进行解码转换，然后再按照 gb2312 进行编码，最后得到正确的内容。示例代码如下：
页面传参：http://xxx.do?ptname='我是中国人'
后台转换：

String strPtname = request.getParameter("ptname");
strPtname = new String(strPtname.getBytes("ISO-8859-1"), "UTF-8");
String para = new String( request.getParameter("para").getBytes("iso8859-1"), "gb2312");

具体的原因是因为美国人在写 tomcat 时默认使用 iso8859-1 进行编码造成的。
然而，在我们的 servlet 和 jsp 页面中有大量的参数需要进行传递，这样转换的话会带来大量的转换代码，非常不便。

二、入门级解决方法
后来，大家开始写一个过滤器，在取得客户端传过来的参数之前，通过过滤器首先将取得的参数编码设定为 gb2312 ，然后就可以直接使用 getParameter 取得正确的参数了。这个过滤器在 tomcat 的示例代码jsp-examples 中有详细的使用示例, 其中过滤器在 web.xml 中的设定如下，示例中使用的是日文的编码，我们只要修改为 gb2312 即可
view plaincopy to clipboardprint?

<filter>    
<filter-name>Set Character Encoding</filter-name>    
<filter-class>filters.SetCharacterEncodingFilter</filter-class>    
<init-param>    
<param-name>encoding</param-name>    
<param-value>EUC_JP</param-value>    
</init-param>    
</filter>   
<filter> 
<filter-name>Set Character Encoding</filter-name> 
<filter-class>filters.SetCharacterEncodingFilter</filter-class> 
<init-param> 
<param-name>encoding</param-name> 
<param-value>EUC_JP</param-value> 
</init-param> 
</filter>

过滤器的代码如下：

public class SetCharacterEncodingFilter implements Filter {    
// 编码的字符串    
protected String encoding = null;    
// 过滤器的配置    
protected FilterConfig filterConfig = null;    
// 是否忽略客户端的编码    
protected boolean ignore = true;    
// 销毁过滤器    
public void destroy() {    
this.encoding = null;    
this.filterConfig = null;    
}    
// 过滤方法    
public void doFilter(ServletRequest request, ServletResponse response,    
FilterChain chain)    
throws IOException, ServletException {    
// 如果使用过滤器，忽略客户端的编码，那么使用通过过滤器设定编码    
if (ignore || (request.getCharacterEncoding() == null)) {    
String encoding = selectEncoding(request);    
if (encoding != null)    
request.setCharacterEncoding(encoding);    
}    
// 传送给下一个过滤器    
chain.doFilter(request, response);    
}    

// 初始化过滤器    
public void init(FilterConfig filterConfig) throws ServletException {    
this.filterConfig = filterConfig;    
this.encoding = filterConfig.getInitParameter("encoding");    
String value = filterConfig.getInitParameter("ignore");    
if (value == null)    
this.ignore = true;    
else if (value.equalsIgnoreCase("true"))    
this.ignore = true;    
else if (value.equalsIgnoreCase("yes"))    
this.ignore = true;    
else    
this.ignore = false;    
}    
// 返回过滤器设定的编码    
protected String selectEncoding(ServletRequest request) {    
return (this.encoding);    
}    
}   
public class SetCharacterEncodingFilter implements Filter { 
// 编码的字符串 
protected String encoding = null; 
// 过滤器的配置 
protected FilterConfig filterConfig = null; 
// 是否忽略客户端的编码 
protected boolean ignore = true; 
// 销毁过滤器 
public void destroy() { 
this.encoding = null; 
this.filterConfig = null; 
} 
// 过滤方法 
public void doFilter(ServletRequest request, ServletResponse response, 
FilterChain chain) 
throws IOException, ServletException { 
// 如果使用过滤器，忽略客户端的编码，那么使用通过过滤器设定编码 
if (ignore || (request.getCharacterEncoding() == null)) { 
String encoding = selectEncoding(request); 
if (encoding != null) 
request.setCharacterEncoding(encoding); 
} 
// 传送给下一个过滤器 
chain.doFilter(request, response); 
}


// 初始化过滤器 
public void init(FilterConfig filterConfig) throws ServletException { 
this.filterConfig = filterConfig; 
this.encoding = filterConfig.getInitParameter("encoding"); 
String value = filterConfig.getInitParameter("ignore"); 
if (value == null) 
this.ignore = true; 
else if (value.equalsIgnoreCase("true")) 
this.ignore = true; 
else if (value.equalsIgnoreCase("yes")) 
this.ignore = true; 
else 
this.ignore = false; 
} 
// 返回过滤器设定的编码 
protected String selectEncoding(ServletRequest request) { 
return (this.encoding); 
} 
}

然而在 tomcat5 中，即使使用过滤器，仍然可能取得乱码，原因何在呢？

三、高级解决方法
原来，在 tomcat4 和 tomcat5 中，对参数的处理是不一样的！
在 tomcat4 中， get 与 post 的编码是一样的，所以只要在过滤器中通过 request.setCharacterEncoding 设定一次就可以解决 get 与 post 的问题。
然而，在 tomcat5 中，get 与 post 的处理却是分开进行的！
在 tomcat 5 中，为了解决编码问题，tomcat 的作者作了很多努力，具体表现为在 tomcat 的配置文件 server.xml 中对 Connector 元素增加了如下的配置参数，专门用来对编码进行直接的配置
URIEncoding 用来设定通过 URI 传递的内容使用的编码，tomcat 将使用这里指定的编码对客户端传送的内容进行编码。
什么是 URI 呢？
java doc 的说明中如下说明：URI 是统一资源标识符，而 URL 是统一资源定位符。因此，笼统地说，每个 URL 都是 URI，但不一定每个 URI 都是 URL。这是因为 URI 还包括一个子类，即统一资源名称 (URN)，它命名资源但不指定如何定位资源。

也就是说，我们通过 post 方法提交的参数实际上都是通过 uri 提交的，都由这个参数管理，如果没有设定这个参数，则 tomcat 将使用默认的 iso8859-1 对客户端的内容进行编码！

useBodyEncodingForURI 使用与 Body 一样的编码来处理 URI, 这个设定是为了与 tomcat4保持兼容。在 tomcat5 中，对post 的处理通过前面的 URIEncoding 进行处理，对get 的内容依然通过 request.setCharacterEncoding 处理，为了保持兼容，就有了这个设定。
将 useBodyEncodingForURI 设定为真后，就可以通过 request.setCharacterEncoding 直接解决 get 和 post 中的乱码问题。
这样，我们可以通过在 server.xml 中设定 URIEncoding 来解决 get 方法中的参数问题，使用过滤器来解决 post 方法中的问题。
或者也可以通过在 server.xml 中设定 useBodyEncodingForURI 为 true ，配合过滤器来解决编码的问题。
在这里，我强烈建议在网站的创作过程中，全程使用 utf-8 编码来彻底解决乱码问题。
具体操作如下：
1、页面内容使用 utf-8 格式保存，在页面中加入 <mete http-equiv="contentType" content="textml;charst=utf-8">
2、服务器端的 server.xml 中设定 useBodyEncodingForURI = true
3、使用过滤器，过滤器设定编码为 utf-8

四：如果有一些转码也转不过来的话，可是试试打开tomcat的server.xml，找到

<Connector acceptCount="100" connectionTimeout="20000" disableUploadTimeout="true" port="80" redirectPort="8443">

并在最后加上useBodyEncodingForURI="true" URIEncoding="UTF-8"，如下

<Connector acceptCount="100" connectionTimeout="20000" disableUploadTimeout="true" port="80" redirectPort="8443"  useBodyEncodingForURI="true" URIEncoding="UTF-8">

五：
如果用JSTL的话，可以自己写一个el的function，调用URLEncoder.encode来编码。

IE缺省对URL后面的参数是不编码发送的，但是tomat缺省是按ISO8859-1来进行URL解码，因此才会出现上述错误。好的做法是：

1、在URL参数中确保用UTF-8编码之，方法可以用js函数encodeURI()，或调用自定义的el function；
2、设置server.xml中的Connector熟悉URIEncoding="UTF-8"，确保解码格式与编码格式统一；

方法四：

<mce:script type="text/javascript"><!--   
for(var i=0;i<document.links.length;i++){   

document.links[i].href=encodeURI(document.links[i].href);   

}   
// --></mce:script>  
<mce:script type="text/javascript"><!--
for(var i=0;i<document.links.length;i++){


document.links[i].href=encodeURI(document.links[i].href);


}
// --></mce:script>

在action中：

String s=request.getParameter("s");
s=new String(s.getBytes("iso-8859-1"),"gbk");

六：js的乱码解决

1.客户端：

url=encodeURI(url);

服务器：

String linename = new String(request.getParameter("name").getBytes("ISO-8859-1"),"UTF-8");

2.客户端：

url=encodeURI(encodeURI(url)); //用了2次encodeURI

这个，是比较推崇的做法，为什么这么做，是有原因的，稍后整理下贴上来~~~

服务器：

String linename = request.getParameter(name);

//java : 字符解码

linename = java.net.URLDecoder.decode(linename , "UTF-8");

秒客网

tomcat字符集与中文乱码的解决

相关文章