I need to escape special characters in a String
.
我需要在String中转义特殊字符。
Guava provides the Escaper
class, which does exactly this:
Guava提供了Escaper类,它正是这样做的:
Escaper escaper = Escapers.builder()
.addEscape('[', "\\[")
.addEscape(']', "\\]")
.build();
String escapedStr = escaper.escape("This is a [test]");
System.out.println(escapedStr);
// -> prints "This is a \[test\]"
Now that I have an escaped String
, I need to unescape it and I can't find anything in Guava to do this.
现在我有一个转义字符串,我需要解除它,我在番石榴中找不到任何东西来做这件事。
I was expecting Escaper
to have a unescape()
method, but it isn't the case.
我期待Escaper有一个unescape()方法,但事实并非如此。
Edit : I'm aware that unescaping can be tricky, even impossible in some non-sense cases.
编辑:我知道,在某些无意义的情况下,取消内容可能很棘手,甚至是不可能的。
For example, this Escaper
usage can lead to ambiguities :
例如,此Escaper使用可能会导致含糊不清:
Escaper escaper = Escapers.builder()
.addEscape('@', " at ")
.addEscape('.', " dot ")
.build();
Unless the escaped data contains only email addresses and nothing more, you can't safely get your data back by unescaping it.
除非转义的数据仅包含电子邮件地址,否则您无法通过取消它来安全地获取数据。
A good example of a safe usage of the Escaper
is HTML entities :
安全使用Escaper的一个很好的例子是HTML实体:
Escaper escaper = Escapers.builder()
.addEscape('&', "&")
.addEscape('<', "<")
.addEscape('>', ">")
.build();
Here, you can safely escape any text, incorporate it in a HTML page and unescape it at any time to display it, because you covered every possible ambiguities.
在这里,您可以安全地转义任何文本,将其合并到HTML页面中并随时取消它以显示它,因为您涵盖了所有可能的含糊之处。
In conclusion, I don't see why unescaping is so controversial. I think it is the developper's responsability to use this class properly, knowing his data and avoiding ambiguities. Escaping, by definition, means you will eventually need to unescape. Otherwise, it's obfuscation or some other concept.
总之,我不明白为什么unescaping是如此有争议。我认为开发者有责任正确使用这个类,了解他的数据并避免含糊不清。根据定义,转义意味着您最终需要进行转换。否则,它是混淆或其他一些概念。
3 个解决方案
#1
5
No, it does not. And apparently, this is intentional. Quoting from this discussion where Chris Povirk answered:
不,不是的。显然,这是故意的。引用Chris Povirk回答的讨论:
The use case for unescaping is less clear to me. It's generally not possible to even identify the escaped source text without a parser that understands the language. For example, if I have the following input:
unescaping的用例对我来说不太清楚。如果没有理解语言的解析器,通常无法识别转义的源文本。例如,如果我有以下输入:
String s = "foo\n\"bar\"\n\\";
Then my parser has to already understand
\n
,\"
, and\\
in order to identify that...然后我的解析器必须已经理解\ n,\“和\\以便识别...
foo\n\"bar\"\n\\
...is the text to be "unescaped." In other words, it has to do the unescaping already. The situation is similar with HTML and other formats: We don't need an unescaper so much as we need a parser.
......是“未转义”的文字。换句话说,它必须已经做了unescaping。情况类似于HTML和其他格式:我们不需要一个unescaper,因为我们需要一个解析器。
So it looks like you'll have to do it yourself.
所以看起来你必须自己做。
#2
3
If you just need to unescape HTML entities, Unicode characters and control characters like \n
or \t
you can simply use the StringEscapeUtils class from Apache Commons Lang.
如果您只需要浏览HTML实体,Unicode字符和控制字符(如\ n或\ t),您只需使用Apache Commons Lang中的StringEscapeUtils类即可。
#3
0
In case if anybody will ever need a single char unescaper, here's one dead simple implementation below:
如果有人需要单个char unescaper,下面是一个简单的实现:
@Nonnull
public String unescape(@Nonnull String text) {
CharacterIterator i = new StringCharacterIterator(text);
StringBuilder result = new StringBuilder(text.length());
for (char c = i.first(); c != DONE; c = i.next()) {
if (c == escaped) {
result.append(i.next());
} else {
result.append(c);
}
}
return result.toString();
}
#1
5
No, it does not. And apparently, this is intentional. Quoting from this discussion where Chris Povirk answered:
不,不是的。显然,这是故意的。引用Chris Povirk回答的讨论:
The use case for unescaping is less clear to me. It's generally not possible to even identify the escaped source text without a parser that understands the language. For example, if I have the following input:
unescaping的用例对我来说不太清楚。如果没有理解语言的解析器,通常无法识别转义的源文本。例如,如果我有以下输入:
String s = "foo\n\"bar\"\n\\";
Then my parser has to already understand
\n
,\"
, and\\
in order to identify that...然后我的解析器必须已经理解\ n,\“和\\以便识别...
foo\n\"bar\"\n\\
...is the text to be "unescaped." In other words, it has to do the unescaping already. The situation is similar with HTML and other formats: We don't need an unescaper so much as we need a parser.
......是“未转义”的文字。换句话说,它必须已经做了unescaping。情况类似于HTML和其他格式:我们不需要一个unescaper,因为我们需要一个解析器。
So it looks like you'll have to do it yourself.
所以看起来你必须自己做。
#2
3
If you just need to unescape HTML entities, Unicode characters and control characters like \n
or \t
you can simply use the StringEscapeUtils class from Apache Commons Lang.
如果您只需要浏览HTML实体,Unicode字符和控制字符(如\ n或\ t),您只需使用Apache Commons Lang中的StringEscapeUtils类即可。
#3
0
In case if anybody will ever need a single char unescaper, here's one dead simple implementation below:
如果有人需要单个char unescaper,下面是一个简单的实现:
@Nonnull
public String unescape(@Nonnull String text) {
CharacterIterator i = new StringCharacterIterator(text);
StringBuilder result = new StringBuilder(text.length());
for (char c = i.first(); c != DONE; c = i.next()) {
if (c == escaped) {
result.append(i.next());
} else {
result.append(c);
}
}
return result.toString();
}