推荐的在Java中转义HTML的方法。

Is there a recommended way to escape <, >, " and & characters when outputting HTML in plain Java code? (Other than manually doing the following, that is).

在普通Java代码中输出HTML时，是否有推荐的转义<、>和&字符的方法?(除了手动执行以下操作外，也就是)。

String source = "The less than sign (<) and ampersand (&) must be escaped before using them in HTML";
String escaped = source.replace("<", "&lt;").replace("&", "&amp;"); // ...

10 个解决方案

#1

233

StringEscapeUtils from Apache Commons Lang:

来自Apache Commons Lang的StringEscapeUtils:

import static org.apache.commons.lang.StringEscapeUtils.escapeHtml;
// ...
String source = "The less than sign (<) and ampersand (&) must be escaped before using them in HTML";
String escaped = escapeHtml(source);

For version 3:

版本3:

import static org.apache.commons.lang3.StringEscapeUtils.escapeHtml4;
// ...
String escaped = escapeHtml4(source);

#2

111

An alternative to Apache Commons: Use Spring's HtmlUtils.htmlEscape(String input) method.

Apache Commons的另一种选择是:使用Spring的HtmlUtils。htmlEscape(字符串输入)方法。

#3

Nice short method:

好短的方法:

public static String escapeHTML(String s) {
    StringBuilder out = new StringBuilder(Math.max(16, s.length()));
    for (int i = 0; i < s.length(); i++) {
        char c = s.charAt(i);
        if (c > 127 || c == '"' || c == '<' || c == '>' || c == '&') {
            out.append("&#");
            out.append((int) c);
            out.append(';');
        } else {
            out.append(c);
        }
    }
    return out.toString();
}

Based on https://*.com/a/8838023/1199155 (the amp is missing there). The four characters checked in the if clause are the only ones below 128, according to http://www.w3.org/TR/html4/sgml/entities.html

基于https://*.com/a/8838023/1199155 (amp缺失)。根据http://www.w3.org/TR/html4/sgml/entities.html, if子句中选中的四个字符是128以下的唯一字符

#4

There is a newer version of the Apache Commons Lang library and it uses a different package name (org.apache.commons.lang3). The StringEscapeUtils now has different static methods for escaping different types of documents (http://commons.apache.org/proper/commons-lang/javadocs/api-3.0/index.html). So to escape HTML version 4.0 string:

Apache Commons Lang库有一个更新的版本，它使用不同的包名(org.apache.commons.lang3)。StringEscapeUtils现在有不同的静态方法来转义不同类型的文档(http://commons.apache.org/proper/commons-lang/javadocs/api-3.0/index.html)。为了转义HTML 4.0版本字符串:

import static org.apache.commons.lang3.StringEscapeUtils.escapeHtml4;

String output = escapeHtml4("The less than sign (<) and ampersand (&) must be escaped before using them in HTML");

#5

On android (API 16 or greater) you can:

在android (API 16或更高)上，你可以:

Html.escapeHtml(textToScape);

or for lower API:

或更低的API:

TextUtils.htmlEncode(textToScape);

#6

Be careful with this. There are a number of different 'contexts' within an HTML document: Inside an element, quoted attribute value, unquoted attribute value, URL attribute, javascript, CSS, etc... You'll need to use a different encoding method for each of these to prevent Cross-Site Scripting (XSS). Check the OWASP XSS Prevention Cheat Sheet for details on each of these contexts -- https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet. You can find escaping methods for each of these contexts in the OWASP ESAPI library -- https://github.com/ESAPI/esapi-java-legacy.

小心这一点。HTML文档中有许多不同的“上下文”:在元素中、引用属性值、未引用属性值、URL属性、javascript、CSS等等……为了防止跨站点脚本编写(XSS)，您需要对每个脚本使用不同的编码方法。请查看OWASP XSS预防备查表，了解其中每个上下文的详细信息——https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet。在OWASP ESAPI库中，您可以找到每个上下文的方法，https://github.com/ESAPI/esapi-java-legacy。

#7

For those who use Google Guava:

对于使用谷歌番石榴的人:

import com.google.common.html.HtmlEscapers;
[...]
String source = "The less than sign (<) and ampersand (&) must be escaped before using them in HTML";
String escaped = HtmlEscapers.htmlEscaper().escape(source);

#8

For some purposes, HtmlUtils:

出于某种目的,HtmlUtils:

import org.springframework.web.util.HtmlUtils;
[...]
HtmlUtils.htmlEscapeDecimal("&")` //gives &#38;
HtmlUtils.htmlEscape("&")` //gives &amp;

#9

While @dfa answer of org.apache.commons.lang.StringEscapeUtils.escapeHtml is nice and I have used it in the past it should not be used for escaping HTML (or XML) attributes otherwise the whitespace will be normalized (meaning all adjacent whitespace characters become a single space).

@dfa答案是org.apache.common .lang. stringescapeutils . escapehtml很好，我过去使用过它，它不应该用于转义HTML(或XML)属性，否则空格将被规范化(意味着所有相邻的空格字符都变成了一个空格)。

I know this because I have had bugs filed against my library (JATL) for attributes where whitespace was not preserved. Thus I have a drop in (copy n' paste) class (of which I stole some from JDOM) that differentiates the escaping of attributes and element content.

我知道这一点，因为我的库(JATL)中有一些错误，因为它们没有保存空格。因此，我有一个(复制n' paste)类(我从JDOM中偷取了一些)，它可以区分属性和元素内容的转义。

While this may not have mattered as much in the past (proper attribute escaping) it is increasingly become of greater interest given the use use of HTML5's data- attribute usage.

虽然这在过去可能没有那么重要(适当的属性转义)，但考虑到使用HTML5的数据属性的使用，它正变得越来越重要。

#10

org.apache.commons.lang3.StringEscapeUtils is now deprecated. You must now use org.apache.commons.text.StringEscapeUtils by

org.apache.commons.lang3.StringEscapeUtils现在已经弃用。现在必须使用org.apache.common .text. stringescapeutils

    <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-text</artifactId>
        <version>${commons.text.version}</version>
    </dependency>

#1

233