如何处理特殊字符\ ^ $。? * | +()[{在我的正则表达式?

时间:2021-07-17 11:04:09

I want to match a regular expression special character, \^$.?*|+()[{. I tried:

我想一个正则表达式匹配特殊字符,\ ^ $ ?[{ * | +()。我试着:

x <- "a[b"
grepl("[", x)
## Error: invalid regular expression '[', reason 'Missing ']''

(Equivalently stringr::str_detect(x, "[") or stringi::stri_detect_regex(x, "[").)

(等价地stringr::str_detect(x,“[”)或stringi::stri_detect_regex(x,“[”))

Doubling the value to escape it doesn't work:

加倍值以逃避它不起作用:

grepl("[[", x)
## Error: invalid regular expression '[[', reason 'Missing ']''

Neither does using a backslash:

也不用反斜杠:

grepl("\[", x)
## Error: '\[' is an unrecognized escape in character string starting ""\["

How do I match special characters?

如何匹配特殊字符?


Some special cases of this in questions that are old and well written enough for it to be cheeky to close as duplicates of this:
Escaped Periods In R Regular Expressions
How to escape a question mark in R?
escaping pipe ("|") in a regex

在一些问题中有一些特殊的例子,这些问题已经写得很旧了,写得很好,以至于可以厚脸皮地将它们作为重复来结束:在R正则表达式中转义句点如何在R中转义问号?在正则表达式中转义管道(“|”)

2 个解决方案

#1


63  

Escape with a double backslash

R treats backslashes as escape values for character constants. (... and so do regular expressions. Hence the need for two backslashes when supplying a character argument for a pattern. The first one isn't actually a character, but rather it makes the second one into a character.) You can see how they are processed using cat.

R将反斜杠作为字符常量的转义值。(…正则表达式也是如此。因此,在为模式提供字符参数时需要使用两个反斜杠。第一个角色实际上不是一个角色,而是第二个角色变成了一个角色。您可以看到如何使用cat处理它们。

y <- "double quote: \", tab: \t, newline: \n, unicode point: \u20AC"
print(y)
## [1] "double quote: \", tab: \t, newline: \n, unicode point: €"
cat(y)
## double quote: ", tab:    , newline: 
## , unicode point: €

Further reading: Escaping a backslash with a backslash in R produces 2 backslashes in a string, not 1

进一步阅读:用R中的反斜杠转义反斜杠会在字符串中产生2个反斜杠,而不是1

To use special characters in a regular expression the simplest method is usually to escape them with a backslash, but as noted above, the backslash itself needs to be escaped.

要在正则表达式中使用特殊字符,最简单的方法通常是使用反斜杠来转义字符,但是如上所述,反斜杠本身需要转义。

grepl("\\[", "a[b")
## [1] TRUE

To match backslashes, you need to double escape, resulting in four backslashes.

要匹配反斜杠,需要重复转义,导致四个反斜杠。

grepl("\\\\", c("a\\b", "a\nb"))
## [1]  TRUE FALSE

The rebus package contains constants for each of the special characters to save you mistyping slashes.

rebus包包含每个特殊字符的常量,以避免输入错误。

library(rebus)
OPEN_BRACKET
## [1] "\\["
BACKSLASH
## [1] "\\\\"

For more examples see:

看到更多的例子:

?SpecialCharacters

Your problem can be solved this way:

你的问题可以这样解决:

library(rebus)
grepl(OPEN_BRACKET, "a[b")

Form a character class

You can also wrap the special characters in square brackets to form a character class.

您还可以将特殊字符封装在方括号中,以形成字符类。

grepl("[?]", "a?b")
## [1] TRUE

Two of the special characters have special meaning inside character classes: \ and ^.

两个特殊字符的特殊含义在字符类:\ ^。

Backslash still needs to be escaped even if it is inside a character class.

即使反斜杠在字符类中,也需要转义。

grepl("[\\\\]", c("a\\b", "a\nb"))
## [1]  TRUE FALSE

Caret only needs to be escaped if it is directly after the opening square bracket.

插入符号只需在左方括号之后直接转义即可。

grepl("[ ^]", "a^b")  # matches spaces as well.
## [1] TRUE
grepl("[\\^]", "a^b") 
## [1] TRUE

rebus also lets you form a character class.

rebus还允许您组成字符类。

char_class("?")
## <regex> [?]

Use a pre-existing character class

If you want to match all punctuation, you can use the [:punct:] character class.

如果想匹配所有的标点符号,可以使用[:punct:]字符类。

grepl("[[:punct:]]", c("//", "[", "(", "{", "?", "^", "$"))
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE

stringi maps this to the Unicode General Category for punctuation, so its behaviour is slightly different.

stringi将其映射到Unicode通用类别中的标点符号,因此其行为略有不同。

stri_detect_regex(c("//", "[", "(", "{", "?", "^", "$"), "[[:punct:]]")
## [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE

You can also use the cross-platform syntax for accessing a UGC.

您还可以使用跨平台语法访问UGC。

stri_detect_regex(c("//", "[", "(", "{", "?", "^", "$"), "\\p{P}")
## [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE

Use \Q \E escapes

Placing characters between \\Q and \\E makes the regular expression engine treat them literally rather than as regular expressions.

将字符放置在\Q和\E之间可以使正则表达式引擎从字面上处理它们,而不是将它们作为正则表达式。

grepl("\\Q.\\E", "a.b")
## [1] TRUE

rebus lets you write literal blocks of regular expressions.

rebus允许您编写正则表达式的文字块。

literal(".")
## <regex> \Q.\E

Don't use regular expressions

Regular expressions are not always the answer. If you want to match a fixed string then you can do, for example:

正则表达式并不总是答案。如果你想匹配一个固定的字符串,你可以这样做,例如:

grepl("[", "a[b", fixed = TRUE)
stringr::str_detect("a[b", fixed("["))
stringi::stri_detect_fixed("a[b", "[")

#2


1  

I think the easiest way to match the characters like

我认为最简单的方法是匹配角色。

\^$.?*|+()[

are using character classes from within R. Consider the following to clean column headers from a data file, which could contain spaces, and punctuation characters:

考虑使用r内的字符类来清理数据文件中的列标题(可能包含空格)和标点字符:

> library(stringr)
> colnames(order_table) <- str_replace_all(colnames(order_table),"[:punct:]|[:space:]","")

This approach allows us to string character classes to match punctation characters, in addition to whitespace characters, something you would normally have to escape with \\ to detect. You can learn more about the character classes at this cheatsheet below, and you can also type in ?regexp to see more info about this.

这种方法允许我们对字符类进行字符串匹配,除了空格字符之外,您通常需要使用\\来转义才能检测到这些字符。您可以在下面的这个cheatsheet上了解更多关于字符类的信息,还可以输入?regexp以查看更多关于这个的信息。

https://www.rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf

https://www.rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf

#1


63  

Escape with a double backslash

R treats backslashes as escape values for character constants. (... and so do regular expressions. Hence the need for two backslashes when supplying a character argument for a pattern. The first one isn't actually a character, but rather it makes the second one into a character.) You can see how they are processed using cat.

R将反斜杠作为字符常量的转义值。(…正则表达式也是如此。因此,在为模式提供字符参数时需要使用两个反斜杠。第一个角色实际上不是一个角色,而是第二个角色变成了一个角色。您可以看到如何使用cat处理它们。

y <- "double quote: \", tab: \t, newline: \n, unicode point: \u20AC"
print(y)
## [1] "double quote: \", tab: \t, newline: \n, unicode point: €"
cat(y)
## double quote: ", tab:    , newline: 
## , unicode point: €

Further reading: Escaping a backslash with a backslash in R produces 2 backslashes in a string, not 1

进一步阅读:用R中的反斜杠转义反斜杠会在字符串中产生2个反斜杠,而不是1

To use special characters in a regular expression the simplest method is usually to escape them with a backslash, but as noted above, the backslash itself needs to be escaped.

要在正则表达式中使用特殊字符,最简单的方法通常是使用反斜杠来转义字符,但是如上所述,反斜杠本身需要转义。

grepl("\\[", "a[b")
## [1] TRUE

To match backslashes, you need to double escape, resulting in four backslashes.

要匹配反斜杠,需要重复转义,导致四个反斜杠。

grepl("\\\\", c("a\\b", "a\nb"))
## [1]  TRUE FALSE

The rebus package contains constants for each of the special characters to save you mistyping slashes.

rebus包包含每个特殊字符的常量,以避免输入错误。

library(rebus)
OPEN_BRACKET
## [1] "\\["
BACKSLASH
## [1] "\\\\"

For more examples see:

看到更多的例子:

?SpecialCharacters

Your problem can be solved this way:

你的问题可以这样解决:

library(rebus)
grepl(OPEN_BRACKET, "a[b")

Form a character class

You can also wrap the special characters in square brackets to form a character class.

您还可以将特殊字符封装在方括号中,以形成字符类。

grepl("[?]", "a?b")
## [1] TRUE

Two of the special characters have special meaning inside character classes: \ and ^.

两个特殊字符的特殊含义在字符类:\ ^。

Backslash still needs to be escaped even if it is inside a character class.

即使反斜杠在字符类中,也需要转义。

grepl("[\\\\]", c("a\\b", "a\nb"))
## [1]  TRUE FALSE

Caret only needs to be escaped if it is directly after the opening square bracket.

插入符号只需在左方括号之后直接转义即可。

grepl("[ ^]", "a^b")  # matches spaces as well.
## [1] TRUE
grepl("[\\^]", "a^b") 
## [1] TRUE

rebus also lets you form a character class.

rebus还允许您组成字符类。

char_class("?")
## <regex> [?]

Use a pre-existing character class

If you want to match all punctuation, you can use the [:punct:] character class.

如果想匹配所有的标点符号,可以使用[:punct:]字符类。

grepl("[[:punct:]]", c("//", "[", "(", "{", "?", "^", "$"))
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE

stringi maps this to the Unicode General Category for punctuation, so its behaviour is slightly different.

stringi将其映射到Unicode通用类别中的标点符号,因此其行为略有不同。

stri_detect_regex(c("//", "[", "(", "{", "?", "^", "$"), "[[:punct:]]")
## [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE

You can also use the cross-platform syntax for accessing a UGC.

您还可以使用跨平台语法访问UGC。

stri_detect_regex(c("//", "[", "(", "{", "?", "^", "$"), "\\p{P}")
## [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE

Use \Q \E escapes

Placing characters between \\Q and \\E makes the regular expression engine treat them literally rather than as regular expressions.

将字符放置在\Q和\E之间可以使正则表达式引擎从字面上处理它们,而不是将它们作为正则表达式。

grepl("\\Q.\\E", "a.b")
## [1] TRUE

rebus lets you write literal blocks of regular expressions.

rebus允许您编写正则表达式的文字块。

literal(".")
## <regex> \Q.\E

Don't use regular expressions

Regular expressions are not always the answer. If you want to match a fixed string then you can do, for example:

正则表达式并不总是答案。如果你想匹配一个固定的字符串,你可以这样做,例如:

grepl("[", "a[b", fixed = TRUE)
stringr::str_detect("a[b", fixed("["))
stringi::stri_detect_fixed("a[b", "[")

#2


1  

I think the easiest way to match the characters like

我认为最简单的方法是匹配角色。

\^$.?*|+()[

are using character classes from within R. Consider the following to clean column headers from a data file, which could contain spaces, and punctuation characters:

考虑使用r内的字符类来清理数据文件中的列标题(可能包含空格)和标点字符:

> library(stringr)
> colnames(order_table) <- str_replace_all(colnames(order_table),"[:punct:]|[:space:]","")

This approach allows us to string character classes to match punctation characters, in addition to whitespace characters, something you would normally have to escape with \\ to detect. You can learn more about the character classes at this cheatsheet below, and you can also type in ?regexp to see more info about this.

这种方法允许我们对字符类进行字符串匹配,除了空格字符之外,您通常需要使用\\来转义才能检测到这些字符。您可以在下面的这个cheatsheet上了解更多关于字符类的信息,还可以输入?regexp以查看更多关于这个的信息。

https://www.rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf

https://www.rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf