I would like to be able to take a dataframe df containing a column df$col that has entries like:
我希望能够获取包含列df $ col的数据帧df,其中包含以下条目:
I?m tired
You?re tired
You?re tired?
Are you tired?
?I am tired
and replace question marks that occur between letters with apostrophes and question marks that occur at the beginnings of strings with nothing:
并替换在带有撇号的字母和出现在字符串开头的问号之间出现的问号:
I'm tired
You're tired
You're tired?
Are you tired?
I am tired
2 个解决方案
#1
2
I would use a sub
for the question marks at the beginning and gsub
for the others, because there could be several question marks between words in a string but only one at the beginning.
我会在开头使用sub表示问号,而在其他表单使用gsub,因为字符串中的单词之间可能有几个问号但开头只有一个。
gsub("(\\w)\\?(\\w)", "\\1'\\2", sub("^\\?", "", df$col))
[1] "I'm tired" "You're tired" "You're tired?" "Are you tired?"
[5] "I am tired"
See https://regex101.com/r/jClVPg/1 for some explanation.
有关说明,请参阅https://regex101.com/r/jClVPg/1。
Some explanation:
-
1st Capturing Group (\\w):
第一捕获组(\\ w):
\\w matches any word character (equal to [a-zA-Z0-9_])
\\ w匹配任何单词字符(等于[a-zA-Z0-9_])
-
\\? matches the character ? literally (case sensitive)
\\?匹配角色?字面意思(区分大小写)
-
2nd Capturing Group (\\w):
第二捕获组(\\ w):
\\w matches any word character (equal to [a-zA-Z0-9_])
\\ w匹配任何单词字符(等于[a-zA-Z0-9_])
#2
0
We can use sub
我们可以使用sub
df$col <- sub("^'", "", sub("[?](?!$)", "'", df$col, perl = TRUE))
df$col
#[1] "I'm tired" "You're tired" "You're tired?" "Are you tired?" "I am tired"
Here we assume that there will be a single ?
as showed in the example. Otherwise, just replace the inner sub
with gsub
在这里我们假设会有一个?如示例中所示。否则,只需用gsub替换内部子
data
df <- structure(list(col = c("I?m tired", "You?re tired", "You?re tired?",
"Are you tired?", "?I am tired")), .Names = "col",
class = "data.frame", row.names = c(NA, -5L))
#1
2
I would use a sub
for the question marks at the beginning and gsub
for the others, because there could be several question marks between words in a string but only one at the beginning.
我会在开头使用sub表示问号,而在其他表单使用gsub,因为字符串中的单词之间可能有几个问号但开头只有一个。
gsub("(\\w)\\?(\\w)", "\\1'\\2", sub("^\\?", "", df$col))
[1] "I'm tired" "You're tired" "You're tired?" "Are you tired?"
[5] "I am tired"
See https://regex101.com/r/jClVPg/1 for some explanation.
有关说明,请参阅https://regex101.com/r/jClVPg/1。
Some explanation:
-
1st Capturing Group (\\w):
第一捕获组(\\ w):
\\w matches any word character (equal to [a-zA-Z0-9_])
\\ w匹配任何单词字符(等于[a-zA-Z0-9_])
-
\\? matches the character ? literally (case sensitive)
\\?匹配角色?字面意思(区分大小写)
-
2nd Capturing Group (\\w):
第二捕获组(\\ w):
\\w matches any word character (equal to [a-zA-Z0-9_])
\\ w匹配任何单词字符(等于[a-zA-Z0-9_])
#2
0
We can use sub
我们可以使用sub
df$col <- sub("^'", "", sub("[?](?!$)", "'", df$col, perl = TRUE))
df$col
#[1] "I'm tired" "You're tired" "You're tired?" "Are you tired?" "I am tired"
Here we assume that there will be a single ?
as showed in the example. Otherwise, just replace the inner sub
with gsub
在这里我们假设会有一个?如示例中所示。否则,只需用gsub替换内部子
data
df <- structure(list(col = c("I?m tired", "You?re tired", "You?re tired?",
"Are you tired?", "?I am tired")), .Names = "col",
class = "data.frame", row.names = c(NA, -5L))