如何在Java的正则表达式中转义字符

时间:2022-09-02 00:14:03

I have a regex expression which removes all non alphanumeric characters. It is working fine for all special characters apart from ^. Below is the regex expression I am using.

我有一个regex表达式,它删除所有非字母数字字符。这是所有特殊字符除了^工作正常。下面是我正在使用的regex表达式。

String strRefernce = strReference.replaceAll("[^\\p{IsAlphabetic}^\\p{IsDigit}]", "").toUpperCase();

I tried modifying it to

我试着把它修改成

String strRefernce = strReference.replaceAll("[^\\p{IsAlphabetic}^\\p{IsDigit}]\\^", "").toUpperCase();

and

String strRefernce = strReference.replaceAll("[^\\p{IsAlphabetic}^\\p{IsDigit}\\^]", "").toUpperCase();

But these are also not able to remove this symbol. Can someone please help me with this.

但是这些也不能去除这个符号。有人能帮我一下吗?

2 个解决方案

#1


1  

The first ^ inside [^...] is a negation mark making the character class a negated one (matching characters other than what is inside).

第一次内部[^ ^……是一个否定标记,使字符类变为一个否定的字符(与内部的字符相匹配)。

The second one inside is considered a literal - thus, it should not be matched with the regex. Remove it, and a caret will get matched with it:

里面的第二个被认为是字面的——因此,它不应该与regex匹配。删除后,插入符号将与之匹配:

"[^\\p{IsAlphabetic}\\p{IsDigit}]"

or even shorter:

或更短:

"(?U)\\P{Alnum}"

The \P{Alnum} class stands for any character other than an alphanumeric character: [\p{Alpha}\p{Digit}] (see Java regex reference). When you pass (?U), the \P{Alnum} class will not match Unicode letters. See this IDEONE demo.

\P{Alnum}类表示除字母数字字符以外的任何字符:[\ P{Alpha}\ P{Digit}](请参阅Java regex引用)。当您通过(?U)时,\P{Alnum}类将不匹配Unicode字母。看到这个IDEONE演示。

Add a + at the end if you want to remove whole chunks of symbols other than \\p{IsAlphabetic} and \\p{IsDigit}.

如果您想删除除\p{isalphabet}和\p{IsDigit}之外的所有符号,请在结尾处添加一个+。

如何在Java的正则表达式中转义字符

#2


1  

This works as well.

这个作品。

System.out.println("Text 尖酸[刻薄 ^, More _0As text °ÑÑ"".replaceAll("(?U)[^[\\W_]]+", " "));  

Output

输出

Text 尖酸 刻薄 More 0As text Ñ Ñ 

Not sure but the word might be the more comprehensive list of alphanum characters.

不确定,但这个词可能是更全面的字母字符列表。

[\\W_] is a class containing non-words and an underscore.

[\\W_]是一个包含非单词和下划线的类。

When put into a negative Java class construct it becomes

当放入一个消极的Java类构造时,它就变成了

[^[\\W_]] is a negative class of a union between nothing and
a class containing non-words and an underscore.

[^[\ \ W_]]是一个消极的类之间的联盟,一个类包含的词和下划线。

#1


1  

The first ^ inside [^...] is a negation mark making the character class a negated one (matching characters other than what is inside).

第一次内部[^ ^……是一个否定标记,使字符类变为一个否定的字符(与内部的字符相匹配)。

The second one inside is considered a literal - thus, it should not be matched with the regex. Remove it, and a caret will get matched with it:

里面的第二个被认为是字面的——因此,它不应该与regex匹配。删除后,插入符号将与之匹配:

"[^\\p{IsAlphabetic}\\p{IsDigit}]"

or even shorter:

或更短:

"(?U)\\P{Alnum}"

The \P{Alnum} class stands for any character other than an alphanumeric character: [\p{Alpha}\p{Digit}] (see Java regex reference). When you pass (?U), the \P{Alnum} class will not match Unicode letters. See this IDEONE demo.

\P{Alnum}类表示除字母数字字符以外的任何字符:[\ P{Alpha}\ P{Digit}](请参阅Java regex引用)。当您通过(?U)时,\P{Alnum}类将不匹配Unicode字母。看到这个IDEONE演示。

Add a + at the end if you want to remove whole chunks of symbols other than \\p{IsAlphabetic} and \\p{IsDigit}.

如果您想删除除\p{isalphabet}和\p{IsDigit}之外的所有符号,请在结尾处添加一个+。

如何在Java的正则表达式中转义字符

#2


1  

This works as well.

这个作品。

System.out.println("Text 尖酸[刻薄 ^, More _0As text °ÑÑ"".replaceAll("(?U)[^[\\W_]]+", " "));  

Output

输出

Text 尖酸 刻薄 More 0As text Ñ Ñ 

Not sure but the word might be the more comprehensive list of alphanum characters.

不确定,但这个词可能是更全面的字母字符列表。

[\\W_] is a class containing non-words and an underscore.

[\\W_]是一个包含非单词和下划线的类。

When put into a negative Java class construct it becomes

当放入一个消极的Java类构造时,它就变成了

[^[\\W_]] is a negative class of a union between nothing and
a class containing non-words and an underscore.

[^[\ \ W_]]是一个消极的类之间的联盟,一个类包含的词和下划线。