在R正则表达式中突破一个特殊字符?

时间:2022-02-26 16:52:37

Despite reading the help page of R regex

尽管阅读了R regex的帮助页面

Finally, to include a literal -, place it first or last (or, for perl = TRUE only, precede it by a backslash).

最后,要包含一个文字 - ,将它放在第一个或最后一个(或者,仅对于perl = TRUE,在它前面加一个反斜杠)。

I can't understand the difference between

我无法理解之间的区别

grepl(pattern=paste("^thing1\\-",sep=""),x="thing1-thing2")

and

grepl(pattern=paste("^thing1-",sep=""),x="thing1-thing2")

Both return TRUE. Should I escape or not here? What is the best practice?

两者都返回TRUE。我应该逃避还是不在这里?什么是最佳做法?

3 个解决方案

#1


10  

The hyphen is mostly a normal character in regular expressions.

连字符在正则表达式中大多是正常字符。

You do not need to escape the hyphen outside of a character class; it has no special meaning.

你不需要在字符类之外转义连字符;它没有特别的意义。

Within a character class [ ] you can place a hyphen as the first or last character in the range. If you place the hyphen anywhere else you need to escape it in order to add it to your class.

在字符类[]中,您可以将连字符作为范围中的第一个或最后一个字符。如果您将连字符放在任何其他位置,您需要将其转义为将其添加到您的班级。

Examples:

例子:

grepl('^thing1-', x='thing1-thing2')
[1] TRUE
grepl('[-a-z]+', 'foo-bar')
[1] TRUE
grepl('[a-z-]+', 'foo-bar')
[1] TRUE
grepl('[a-z\\-\\d]+', 'foo-bar')
[1] TRUE

Note: It is more common to find a hyphen placed first or last within a character class.

注意:更常见的是在字符类中找到第一个或最后一个连字符。

#2


6  

To see what it means for - to have a special meaning inside of a character class (and how putting it last gives it its literal meaning), try the following:

要查看它的含义 - 在字符类中具有特殊含义(以及如何将其放在最后使其具有字面含义),请尝试以下操作:

grepl("[w-y]", "x")
# [1] TRUE
grepl("[w-y]", "-")
# [1] FALSE
grepl("[wy-]", "-")
# [1] TRUE
grepl("[wy-]", "x")
# [1] FALSE

#3


1  

They are both matching the exact same text in these instances. I.e.:

它们都匹配这些实例中的完全相同的文本。即:

x <- "thing1-thing2"
regmatches(x,regexpr("^thing1\\-",x))
#[1] "thing1-"
regmatches(x,regexpr("^thing1-",x))
#[1] "thing1-"

Using a - is a special character in certain situations though, for specifying ranges of values, such as characters between a and z when specifed inside [], e.g.:

在某些情况下使用 - 是特殊字符,用于指定值的范围,例如在[]中指定的a和z之间的字符,例如:

regmatches(x,regexpr("[a-z]+",x))
#[1] "thing"

#1


10  

The hyphen is mostly a normal character in regular expressions.

连字符在正则表达式中大多是正常字符。

You do not need to escape the hyphen outside of a character class; it has no special meaning.

你不需要在字符类之外转义连字符;它没有特别的意义。

Within a character class [ ] you can place a hyphen as the first or last character in the range. If you place the hyphen anywhere else you need to escape it in order to add it to your class.

在字符类[]中,您可以将连字符作为范围中的第一个或最后一个字符。如果您将连字符放在任何其他位置,您需要将其转义为将其添加到您的班级。

Examples:

例子:

grepl('^thing1-', x='thing1-thing2')
[1] TRUE
grepl('[-a-z]+', 'foo-bar')
[1] TRUE
grepl('[a-z-]+', 'foo-bar')
[1] TRUE
grepl('[a-z\\-\\d]+', 'foo-bar')
[1] TRUE

Note: It is more common to find a hyphen placed first or last within a character class.

注意:更常见的是在字符类中找到第一个或最后一个连字符。

#2


6  

To see what it means for - to have a special meaning inside of a character class (and how putting it last gives it its literal meaning), try the following:

要查看它的含义 - 在字符类中具有特殊含义(以及如何将其放在最后使其具有字面含义),请尝试以下操作:

grepl("[w-y]", "x")
# [1] TRUE
grepl("[w-y]", "-")
# [1] FALSE
grepl("[wy-]", "-")
# [1] TRUE
grepl("[wy-]", "x")
# [1] FALSE

#3


1  

They are both matching the exact same text in these instances. I.e.:

它们都匹配这些实例中的完全相同的文本。即:

x <- "thing1-thing2"
regmatches(x,regexpr("^thing1\\-",x))
#[1] "thing1-"
regmatches(x,regexpr("^thing1-",x))
#[1] "thing1-"

Using a - is a special character in certain situations though, for specifying ranges of values, such as characters between a and z when specifed inside [], e.g.:

在某些情况下使用 - 是特殊字符,用于指定值的范围,例如在[]中指定的a和z之间的字符,例如:

regmatches(x,regexpr("[a-z]+",x))
#[1] "thing"