I have a dataset that has something of the following:
我有一个数据集,它有以下内容:
ID Type Count
1 **Radisson** 8
2 **Renaissance** 9
3 **Hilton** New York Only 8
4 **Radisson** East Cost 8
I want to get a dataset that looks like
我想要一个看起来像的数据集。
ID Type Count
1 **Radisson** 8
2 **Renaissance** 9
3 **Hilton** 8
4 **Radisson** 8
Or even without the * if at all possible.
如果可能的话,甚至没有* if。
Any solutions?
有解决方案吗?
3 个解决方案
#1
3
You could just sub out everything that isn't between the stars in the beginning.
你可以把一开始不在恒星之间的所有东西都分出来。
df <- data.frame(Type = c("**Radisson**", "**Renaissance**", "**Hilton** New York Only",
"**Radisson** East Cost"),
Count = c(8, 9, 8, 8))
gsub("^(\\*{2}.*\\*{2}).*", "\\1", df$Type, perl = TRUE)
[1] "**Radisson**" "**Renaissance**" "**Hilton**" "**Radisson**"
So ...
所以…
df$Type <- gsub("^(\\*{2}.*\\*{2}).*", "\\1", df$Type, perl = TRUE)
df
Type Count
1 **Radisson** 8
2 **Renaissance** 9
3 **Hilton** 8
4 **Radisson** 8
#2
0
A solution is to use strsplit
on **
and pick 2nd element:
一种解决方案是在**上使用strsplit并选择第二个元素:
df$Type = sapply(strsplit(df$Type, split= "\\*{2}"), function(x)x[2])
df
# ID Type Count
# 1 1 Radisson 8
# 2 2 Renaissance 9
# 3 3 Hilton 8
# 4 4 Radisson 8
#3
0
Here is an option with str_extract
这里有一个带有str_extract的选项
library(stringr)
library(dplyr)
df %>%
mutate(Type = str_extract(Type, "[*]*[^*]*[*]*"))
# Type Count
#1 **Radisson** 8
#2 **Renaissance** 9
#3 **Hilton** 8
#4 **Radisson** 8
#1
3
You could just sub out everything that isn't between the stars in the beginning.
你可以把一开始不在恒星之间的所有东西都分出来。
df <- data.frame(Type = c("**Radisson**", "**Renaissance**", "**Hilton** New York Only",
"**Radisson** East Cost"),
Count = c(8, 9, 8, 8))
gsub("^(\\*{2}.*\\*{2}).*", "\\1", df$Type, perl = TRUE)
[1] "**Radisson**" "**Renaissance**" "**Hilton**" "**Radisson**"
So ...
所以…
df$Type <- gsub("^(\\*{2}.*\\*{2}).*", "\\1", df$Type, perl = TRUE)
df
Type Count
1 **Radisson** 8
2 **Renaissance** 9
3 **Hilton** 8
4 **Radisson** 8
#2
0
A solution is to use strsplit
on **
and pick 2nd element:
一种解决方案是在**上使用strsplit并选择第二个元素:
df$Type = sapply(strsplit(df$Type, split= "\\*{2}"), function(x)x[2])
df
# ID Type Count
# 1 1 Radisson 8
# 2 2 Renaissance 9
# 3 3 Hilton 8
# 4 4 Radisson 8
#3
0
Here is an option with str_extract
这里有一个带有str_extract的选项
library(stringr)
library(dplyr)
df %>%
mutate(Type = str_extract(Type, "[*]*[^*]*[*]*"))
# Type Count
#1 **Radisson** 8
#2 **Renaissance** 9
#3 **Hilton** 8
#4 **Radisson** 8