R:“gsub”如何处理空格?

时间:2021-10-09 23:06:42

I have a character string "ab b cde", i.e. "ab[space]b[space]cde". I want to replace "space-b" and "space-c" with blank spaces, so that the output string is "ab[space][space][space][space]de". I can't figure out how to get rid of the second "b" without deleting the first one. I have tried:

我有一个字符串“ab b cde”,即“ab [space] b [space] cde”。我想用空格替换“space-b”和“space-c”,以便输出字符串是“ab [space] [space] [space] [space] de”。我无法弄清楚如何摆脱第二个“b”而不删除第一个。我努力了:

gsub("[\\sb,\\sc]", " ", "ab b cde", perl=T)

but this is giving me "a[spaces]de". Any pointers? Thanks.

但这给了我“a [spaces] de”。有什么指针吗?谢谢。

Edit: Consider a more complicated problem: I want to convert the string "akui i ii" i.e. "akui[space]i[space]ii" to "akui[spaces|" by removing the "space-i" and "space-ii".

编辑:考虑一个更复杂的问题:我想将字符串“akui i ii”,即“akui [space] i [space] ii”转换为“akui [spaces |”删除“space-i”和“space-ii”。

3 个解决方案

#1


2  

You can use lookbehind matching like this:

您可以像这样使用lookbehind匹配:

gsub("(?<=\\s)i+", " ", "akui i ii", perl=T)

Edit: lookbehind is still the way to go, demonstrated with an other example from your original post. Hope this helps.

编辑:lookbehind仍然是要走的路,用原始帖子中的另一个例子进行演示。希望这可以帮助。

#2


6  

[\sb,\sc] means "one character among space, b, ,, space, c". You probably want something like (\sb|\sc), which means "space followed by b, or space followed by c" or \s[bc] which means "space followed by b or c".

[\ sb,\ sc]表示“空格中的一个字符,b,,,空格,c”。您可能需要类似(\ sb | \ sc)的内容,这意味着“空格后跟b,或空格后跟c”或\ s [bc],意思是“空格后跟b或c”。

s <- "ab b cde"
gsub( "(\\sb|\\sc)",     "  ", s, perl=TRUE )
gsub( "\\s[bc]",         "  ", s, perl=TRUE )
gsub( "[[:space:]][bc]", "  ", s, perl=TRUE )  # No backslashes

To remove multiple instances of a letter (as in the second example) include a + after the letter to be removed.

要删除多个字母实例(如第二个示例中所示),请在要删除的字母后面添加一个+。

s2 <- "akui i ii"
gsub("\\si+", " ", s2)

#3


5  

There is a simple solution to this.

有一个简单的解决方案。

    gsub("\\s[bc]", " ", "ab b cde", perl=T)

This will give you what you want.

这会给你你想要的。

#1


2  

You can use lookbehind matching like this:

您可以像这样使用lookbehind匹配:

gsub("(?<=\\s)i+", " ", "akui i ii", perl=T)

Edit: lookbehind is still the way to go, demonstrated with an other example from your original post. Hope this helps.

编辑:lookbehind仍然是要走的路,用原始帖子中的另一个例子进行演示。希望这可以帮助。

#2


6  

[\sb,\sc] means "one character among space, b, ,, space, c". You probably want something like (\sb|\sc), which means "space followed by b, or space followed by c" or \s[bc] which means "space followed by b or c".

[\ sb,\ sc]表示“空格中的一个字符,b,,,空格,c”。您可能需要类似(\ sb | \ sc)的内容,这意味着“空格后跟b,或空格后跟c”或\ s [bc],意思是“空格后跟b或c”。

s <- "ab b cde"
gsub( "(\\sb|\\sc)",     "  ", s, perl=TRUE )
gsub( "\\s[bc]",         "  ", s, perl=TRUE )
gsub( "[[:space:]][bc]", "  ", s, perl=TRUE )  # No backslashes

To remove multiple instances of a letter (as in the second example) include a + after the letter to be removed.

要删除多个字母实例(如第二个示例中所示),请在要删除的字母后面添加一个+。

s2 <- "akui i ii"
gsub("\\si+", " ", s2)

#3


5  

There is a simple solution to this.

有一个简单的解决方案。

    gsub("\\s[bc]", " ", "ab b cde", perl=T)

This will give you what you want.

这会给你你想要的。