在R中:在第一个标点之前抓取所有alnum字符

时间:2022-06-08 20:22:50

I have a vector s of strings (or NAs), and would like to get a vector of same length of everything before first occurrence of punctionation (.).

我有一个字符串(或NAs)的向量,并希望在第一次出现punction(。)之前得到一个相同长度的向量。

s <- c("ABC1.2", "22A.2", NA)

I would like a result like:

我希望得到如下结果:

[1] "ABC1" "22A"  NA 

1 个解决方案

#1


2  

You can remove all symbols (incl. a newline) from the first dot with the following Perl-like regex:

您可以使用以下类似Perl的正则表达式从第一个点删除所有符号(包括换行符):

s <- c("ABC1.2", "22A.2", NA)
gsub("[.][\\s\\S]*$", "", s, perl=T)
## => [1] "ABC1" "22A"  NA  

See IDEONE demo

请参阅IDEONE演示

The regex matches

正则表达式匹配

  • [.] - a literal dot
  • [。] - 一个字面点

  • [\\s\\S]* - any symbols incl. a newline
  • [\\ s \\ S] * - 任何符号包括。换行符

  • $ - end of string.
  • $ - 结束字符串。

All matched strings are removed from the input with "". As the regex engine analyzes the string from left to right, the first dot is matched with \\., and the greedy * quantifier with [\\s\\S] will match all up to the end of string.

使用“”从输入中删除所有匹配的字符串。当正则表达式引擎从左到右分析字符串时,第一个点与\\。匹配,而带有[\\ s \\ S]的贪婪*量词将匹配所有直到字符串的结尾。

If there are no newlines, a simpler regex will do: [.].*$:

如果没有换行符,则更简单的正则表达式将执行:[。]。* $:

gsub("[.].*$", "", s)

See another demo

看另一个演示

#1


2  

You can remove all symbols (incl. a newline) from the first dot with the following Perl-like regex:

您可以使用以下类似Perl的正则表达式从第一个点删除所有符号(包括换行符):

s <- c("ABC1.2", "22A.2", NA)
gsub("[.][\\s\\S]*$", "", s, perl=T)
## => [1] "ABC1" "22A"  NA  

See IDEONE demo

请参阅IDEONE演示

The regex matches

正则表达式匹配

  • [.] - a literal dot
  • [。] - 一个字面点

  • [\\s\\S]* - any symbols incl. a newline
  • [\\ s \\ S] * - 任何符号包括。换行符

  • $ - end of string.
  • $ - 结束字符串。

All matched strings are removed from the input with "". As the regex engine analyzes the string from left to right, the first dot is matched with \\., and the greedy * quantifier with [\\s\\S] will match all up to the end of string.

使用“”从输入中删除所有匹配的字符串。当正则表达式引擎从左到右分析字符串时,第一个点与\\。匹配,而带有[\\ s \\ S]的贪婪*量词将匹配所有直到字符串的结尾。

If there are no newlines, a simpler regex will do: [.].*$:

如果没有换行符,则更简单的正则表达式将执行:[。]。* $:

gsub("[.].*$", "", s)

See another demo

看另一个演示